Here, we will bridge the gap between “classical” ampelography (the classification of grapevines based on phenotypic characteristics) and “digital” ampelography. Pierre Galet devised a series of morphological descriptors that could be used to differentiate between wine grape cultivars in the early 1960’s. His efforts transformed ampelography from an art form to a science. However, imaging can be used to increase the throughput and accuracy of ampelography. This dataset was generated by Chitwood et al (2014). I encourage you to check out the paper, as it is an interesting read and combines imaging, genomics, and transcriptomics. The objective of their study was to provide a morphometric analysis of leaf shape for more than 1,200 grape accessions, and provide a genetic basis for these complex traits. The authors used photographs of leaves for a panel of 1,200 grape varieties at the USDA National Clonal Germplasm Repository. The authors used several metrics to describe the leaf shape (circularity, aspect ratio) and used the principle components from generalized Procrustes analysis (GPA) and Elliptical Fourier descriptors. In this dataset you’ll find the digital descriptors, as well as those used by Galet (obtained from the USDA GRIN database).
Source of Trait | Column Name | Trait Name | Description |
---|---|---|---|
Chitwood et al (2014) | circ | Circularity | Ratio of the area to perimeter of an outline |
ar | Aspect Ratio | Ratio of the major axis to the minor axis of a fitted ellipse | |
iPC1 | Inner PC1 | See figure 3B of Chitwood et al | |
iPC2 | Inner PC2 | See figure 3B of Chitwood et al | |
iPC3 | Inner PC3 | See figure 3B of Chitwood et al | |
oPC1 | Outer PC1 | See figure 3A of Chitwood et al | |
oPC2 | Outer PC2 | See figure 3A of Chitwood et al | |
oPC3 | Outer PC3 | See figure 3A of Chitwood et al | |
symPC1 | Symmetric PC1 | See figure 2 of Chitwood et al | |
symPC2 | Symmetric PC2 | See figure 2 of Chitwood et al | |
symPC3 | Symmetric PC3 | See figure 2 of Chitwood et al | |
symPC4 | Symmetric PC4 | See figure 2 of Chitwood et al | |
symPC5 | Symmetric PC5 | See figure 2 of Chitwood et al | |
Galet (1952) | A | A | Ratio of the length of superior lateral vein to midrib |
B | B | Ratio of the length of inferior lateral vein to midrib | |
C | C | Ratio of the length of petiolar vein to midrib | |
r | r | Ratio of the length to width of the leaf | |
Spri | S’ | Angle between midrib and inferior lateral vein | |
S | S | Angle between midrib and petiolar vein | |
lowSup | SI2/L2 | low estimate of superior lobing | |
lowInf | SI1/L3 | low estimate of inferior lobing | |
highSup | SI2/L2 | high estimate of superior lobing | |
highInf | SI1/L3 | low inferior of superior lobing |
Here we’ll load the phenotype data, and take a look at the first six lines. Each row corresponds to a grapevine accession from the USDA germplasm repository. Although 1200 accesssions were phenotyped by Chitwood et al (2012), only 122 had descriptions using Galet’s method. The first 13 columns are the digital traits, and the last 10 are the classical measurements.
chitwood = read.csv("Phenotypes/Chitwood_Galet.csv", row.names = 1)
# head(chitwood)
Here, we’ll look at the correlation between digital and classical amelography measurements using Spearman’s Rank correlation.
sp.cor = cor(chitwood[1:13], chitwood[14:23], method = "spearman")
Now, we’ll plot the results as a nice heatmap.
# install.packages(gplots)
library(gplots)
heatmap.2(sp.cor, Rowv = F, Colv = F, dendrogram = "none", trace = "none", cellnote = round(sp.cor,
2), notecol = "black", notecex = 0.5, breaks = seq(-1, 1, length.out = 100),
margins = c(6, 6))
The purpose of this exercise is to explore some hyperspectral imaging data, and determine which spectral bands and/or indicies can be used to predict plant water status. Here, we will look at some hyperspectral data collected from some corn plants under control and drought conditions. The plants were imaged over a period of 22 days during the V5 (5 leaf) to V6 (6 leaf) stage at UNL’s Innovation Campus greenhouse. See the Corny News Network for a description of the growth stages of corn. All plants were watered to 80% feild capacity prior to imaging, and a subset of plants were allowed to dry-down to 50% feild capacity throughout the imaging period. On the last total plant water content was measured for X plants (credit: Sajag Adhikari).
The goal of the exercise is to find the “best” index or hyperspectral band(s) for predicting plant water status. Feel free to try out some other published indicies or some alternative analyses José R. Rodríguez-Pérez et al (2007) provides an extensive list. Below, you will find a few functions to calculate indices commonly used in hyperspectral studies. I will provide a brief example of how to calculate NDVI for each plant (\(NDVI = \frac{R_{800} - R_{640}}{R_{800} + R_{640}}\)) You can modify the code to use your own indices.
The imaging system provides a stack of 235 greyscale images for each plant. Each image in the stack is a specific wavelength. For each image, we remove the background objects leaving just the “plant pixels”. Each plant pixel will be a single chandanel and the intensity of the channel should be proportional to the reflectance. So, a higher intensity means more reflectance for a given wavelength. For each image we obtain the mean pixel intensity (sum the pixel intensities and divide by the total number of plant pixels). This is equivalent to the mean reflectance for each plant. Here, we will read in the file of the mean pixel intensities for each plant.
dat = read.csv("Phenotypes/Hyperspec_pixel_density.csv")
# Check the first 10 columns of the phenotypic data head(dat[1:10])
# Plot the refectance for all wavelengths
dat.means = apply(dat[4:ncol(dat)], 2, mean)
plot(seq(from = 585, to = 1755, by = 5), dat.means, type = "l", ylab = "Reflectance",
xlab = "Wavelength")
Here, I will calculate the normalized difference vegetation index (NDVI). I will create a new dataframe named ‘NDVI’ that has the NDVI calculations, the plant ID, the treatment, and the day of imaging. I’ll create a plot of NDVI over the six time points (1 - 22 days of imaging).
# Make the data frame
NDVI.df = data.frame(ID = dat$ID, Treatment = dat$Treatment, DayOfImaging = dat$DayOfImaging,
NDVI = NDVI(dat))
# boxplot(NDVI~Treatment, data=NDVI.df, ylab='NDVI')
# Plot results install.packages('ggplot2')
library(ggplot2)
ggplot(NDVI.df, aes(x = DayOfImaging, y = NDVI, colour = Treatment)) + geom_smooth(method = "loess") +
xlab("Day of Imaging") + ylab("NDVI")
The indices defined above only capture a small portion of the total phenotype described by the hyperspectral data. Here, we will first utilize the whole hyperspectral range to predict plant water content. This data was collected at 24 days after imaging. For this part of the excersize we will utilize partial least squares regression in the package ‘pls’. Below I will provide a brief explaination of the methodology. For those interested, the ‘pls’ manual provides an in-depth explaination of the PLSR technique, and a few example datasets PLS.
# install.packages('pls')
library(pls)
# This dataset contains the reflacance values for all the wavelengths. These
# will be used as predictors for the percent of total biomass that is water
# ('Percent.Water').
dat2 = read.csv("Phenotypes/Hyperspec_PWC.csv")
dat2$ID = NULL
dat2$Treatment = NULL
dat2$TotalWater = NULL
# Make a training set and validation set to assess the prediction accuracy
# This is similar to what we did for genomic prediction, however here we
# will only sample the data once.
traindat <- dat2[1:50, ]
testdat <- dat2[51:87, ]
# Now we'll fit the PLSR model. We're regressing the Percent water on the
# hyperspectral data (Percent.Water ~ a).
full.spec <- plsr(Percent.Water ~ ., data = traindat)
summary(full.spec)
Data: X dimension: 50 242
Y dimension: 50 1
Fit method: kernelpls
Number of components considered: 49
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
X 97.32 98.97 99.26 99.33 99.38 99.43
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps
X 99.50 99.55 99.58 99.61 99.64 99.67
13 comps 14 comps 15 comps 16 comps 17 comps 18 comps
X 99.70 99.71 99.74 99.76 99.80 99.81
19 comps 20 comps 21 comps 22 comps 23 comps 24 comps
X 99.82 99.84 99.85 99.87 99.88 99.89
25 comps 26 comps 27 comps 28 comps 29 comps 30 comps
X 99.9 99.91 99.92 99.92 99.93 99.94
31 comps 32 comps 33 comps 34 comps 35 comps 36 comps
X 99.94 99.95 99.95 99.96 99.97 99.97
37 comps 38 comps 39 comps 40 comps 41 comps 42 comps
X 99.97 99.98 99.98 99.98 99.98 99.99
43 comps 44 comps 45 comps 46 comps 47 comps 48 comps
X 99.99 99.99 99.99 100 100 100
49 comps
X 100
[ reached getOption("max.print") -- omitted 1 row ]
plot(RMSEP(full.spec), legendpos = "topright")
# This line shows the how well the model fits our data
plot(full.spec, ncomp = 20, asp = 1, line = TRUE)
# Now we predict the percent water for the testing dataset using the model
# fit using the training dataset
Y.tst = predict(full.spec, ncomp = 20, newdata = testdat)
plot(Y.tst, testdat$Percent.Water, xlab = "Percent Water Predicted", ylab = "Percent Water Observed")
abline(0, 1, col = "red")
cor(Y.tst, testdat$Percent.Water)
[1] 0.2602748
sub.hyperspec <- function(startBW, endBW) {
if (sum(colnames(dat2) == paste0("bw_", startBW)) > 0) {
start.col = which(colnames(dat2) == paste0("bw_", startBW))
} else {
print("Column name for first bandwidth does not exist")
}
if (sum(colnames(dat2) == paste0("bw_", endBW)) > 0) {
end.col = which(colnames(dat2) == paste0("bw_", endBW))
} else {
print("Column name for last bandwidth does not exist")
}
tmp.df = as.data.frame(cbind(dat2$Percent.Water, dat2[, start.col:end.col]))
colnames(tmp.df)[1] = "Percent.Water"
return(tmp.df)
}
dat3 = sub.hyperspec(585, 605)
traindat2 <- dat3[1:50, ]
testdat2 <- dat3[51:87, ]
# Now we'll fit the PLSR model. We're regressing the Percent water on the
# hyperspectral data (Percent.Water ~ a).
full.spec <- plsr(Percent.Water ~ ., data = traindat2)
summary(full.spec)
Data: X dimension: 50 5
Y dimension: 50 1
Fit method: kernelpls
Number of components considered: 5
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps
X 91.53 94.02 95.49 97.94 100.00
Percent.Water 18.38 24.36 24.63 24.65 24.65
plot(RMSEP(full.spec), legendpos = "topright")
plot(full.spec, asp = 1, line = TRUE)
plot(Y.tst, testdat$Percent.Water, xlab = "Percent Water Predicted", ylab = "Percent Water Observed")
abline(0, 1, col = "red")
cor(Y.tst, testdat$Percent.Water)
[1] 0.2602748