STAT 892-004 Integrative Data Science for Plant Phenomics

Due date

Tuesday, January 30, 5pm



For this assignment, we are going to use the Orange dataset included in the datasets package. This dataset shows the growth of orange trees. Type ?Orange to learn more about the dataset. Answer the 10 questions below using the Orange dataset.

Question 1

Describe the Orange dataset and explain each column.

Question 2

How many rows and columns are there in this dataset?

Question 3

How many different orange trees and age groups are there in the dataset?

Question 4

What is the mean of the circumference column?

Question 5

What is the variance of the circumference column?

Question 6

Compute the covariance between circumferences of Tree 3 and Tree 5.

Question 7

Compute the correlation between circumferences of Tree 3 and Tree 5.

Question 8

Which tree has the lowest correlation with Tree 1 in terms of circumferences?

Question 9

Use the subset function and find out which orange trees has the largest circumference in the youngest age group.

Question 10

The plot function generates a figure. Learn more about the plot function by typing ?plot. The following code plots the growth of orange trees. Interpret the results.

nTree <- length(unique(Orange$Tree))
plot(Orange[, 2], Orange[, 3], main = "Growth of Orange Trees", xlab = "Age", 
    ylab = "Circumference", col = rainbow(nTree)[Orange$Tree], pch = 19, cex = 2)
legend("topleft", title = "Orange Trees", fill = rainbow(nTree), levels(Orange$Tree))


Question 11

Log in to your MathWorks Account, and complete the MATLAB Onramp self-paced training course. After you complete this course, download and submit your progress report.

GM and HY

January 23, 2018