# STAT 892-004 Integrative Data Science for Plant Phenomics

# Homework assignment 1

# Due date

Tuesday, January 30, 5pm

# R

## Data

For this assignment, we are going to use the `Orange`

dataset included in the datasets package. This dataset shows the growth of orange trees. Type `?Orange`

to learn more about the dataset. Answer the 10 questions below using the `Orange`

dataset.

## Question 1

Describe the `Orange`

dataset and explain each column.

## Question 2

How many rows and columns are there in this dataset?

## Question 3

How many different orange trees and age groups are there in the dataset?

## Question 4

What is the mean of the circumference column?

## Question 5

What is the variance of the circumference column?

## Question 6

Compute the covariance between circumferences of Tree 3 and Tree 5.

## Question 7

Compute the correlation between circumferences of Tree 3 and Tree 5.

## Question 8

Which tree has the lowest correlation with Tree 1 in terms of circumferences?

## Question 9

Use the `subset`

function and find out which orange trees has the largest circumference in the youngest age group.

## Question 10

The `plot`

function generates a figure. Learn more about the `plot`

function by typing `?plot`

. The following code plots the growth of orange trees. Interpret the results.

```
nTree <- length(unique(Orange$Tree))
plot(Orange[, 2], Orange[, 3], main = "Growth of Orange Trees", xlab = "Age",
ylab = "Circumference", col = rainbow(nTree)[Orange$Tree], pch = 19, cex = 2)
legend("topleft", title = "Orange Trees", fill = rainbow(nTree), levels(Orange$Tree))
```

# MATLAB

## Question 11

Log in to your MathWorks Account, and complete the MATLAB Onramp self-paced training course. After you complete this course, download and submit your progress report.