The flowering cherry trees on the University of Washington Liberal Arts Quadrangle (UW quad) provide the public with an annual spectacle of their beautiful pink flowers. Every spring, thousands of people come to the UW quad for the express purpose of seeing the trees in bloom, taking pictures and spending time admiring the display these trees have dutifully put on since the 1960's. While the trees were planted in their current location on the quad in 1962, they are significantly older, with some estimated to be almost a century old. As trees and plants age they, like every other organism on our planet, face declining health and susceptibility to pathogens and damage. Necessarily, several of the original trees have been replaced over the years, each entailing an arduous process of removing the old tree and installing the new. These new trees are younger, and though they have the same showy attributes and wonderful annual bloom as the originals, they do not necessarily convey the same historical grandeur as do trees that have been planted on the quad since it was established. To properly honor these noble giants, we have immortalized them by exploring and sharing their smallest and most complex identity: their DNA.
The trees on the UW quad are all individuals of a popular ornamental cultivar called 'Somei-yoshino', or the yoshino cherry. Because yoshino is a flowering cherry cultivar, which is sterile and does not produce fruit nor seed, it has been propagated exclusively through clonal methods including cuttings, grafting, and tissue culture. These propagation techniques have the advantage of bypassing the sexual or meiotic step necessary for reproduction via seeds which in turn avoids any transfer or mixture of genetic information. This means that trees produced this way maintain consistent genetics which allows breeders and sellers to deliver trees that have extremely consistent and predictable traits, from growth habit to flower color. Clonal reproduction does, however, have several drawbacks. Due to the absence of genetic transfer from other individuals, any mutations or genetic changes that do occur on a somatic level, or in non-reproductive cells, accumulate without an opportunity to be diluted down, effectively maintaining even highly deleterious, or negative, traits. This is why many trees that are clonally reproduced, including more recent installations on the UW quad, are grafted onto seed-grown root stock which have a more robust root system and more resistance to pathogens and disease. The lack of genetic transfer or meiotic events also means that these trees are extremely similar, both on a observable macroscopic level, and on the level of their genetic sequence. By sequencing the genomes of trees on the UW quad, we can therefore compare them directly to other trees of the yoshino cultivar to infer which trees they are most closely related to, and perhaps shed a bit of light on their shrouded past.
Since being planted on the quad in 1962, the famous University of Washington cherry trees have been well accounted for. At that time though, the trees were already several years old, and with limited documentation for where they came from of who procured them, there has been no shortage of speculation as to their origin. The prevailing theory has been that they were purchased from the Washington Park Arboretum in 1939, but only transplanted in 1962, however a significant research undertaking by Yuki Shiotani, an exchange student from Waseda University who attended the UW in 2016-2017 exposed this theory as most likely false. Another equally fantastical theory is that the trees on the quad were gifted to the UW by the country of Japan, or a Japanese-American organization. The truth behind this theory is half, as while the UW has actually received cherry trees in donation from the Japan Commerce Association of Washington, D.C., these were donated in 2014 and can now be seen bordering Rainier Vista, leading from Red Square to Drumheller Fountain. The trees we now appreciate on the quad were most likely planted at the Washington Park Arboretum in or before 1936, near what is now SR 520 and the Evergreen Point Floating Bridge. In 1960, the bridge began construction, threatening the placement of the then nearly 30 year old trees. To avoid their destruction, they were transplanted to their current location on the quad in 1962, where they stand ever since.
Due to the propagation style of the yoshino cherry cultivar, it is likely that all extant individuals arose from just one or two hybridization events, or just a few seeds produced from the cross of two other species or cultivars. It is known that the original hybridization occurred in Japan, and clones have since been exported around the globe. Therefore the oldest yoshino cherry specimen must, if still alive, grow in Japan. By exploring the UW quad cherry trees' genetic heritage, we hoped to discover which trees in Japan they are most closely related to, and thereby where they originated geographically.
Campus cherry trees from the University of Washington Quad were compared, using whole-genome SNV (single nucleotide variation) analysis, to 46 trees sampled as part of a 2024 study of trees of the same cultivar that are spread throughout parks and campuses across the whole of Japan (Shirasawa et al., 2024). The original study included some of the oldest extant yoshino cherry trees in the world which are locate in Ueno park, Tokyo. By injecting the UW quad cherry trees into the data set produced by Shirasawa et al., we identified trees to which the UW quad individuals are most closely related. Their relatedness was quantified by the number of variations in genetic sequence they shared across the entire genome. The data set was composed of 684 sites of variation, of which were relatively highly conserved between UWQ1 and several of the other trees. The trees with which UWQ1 shared the most SNVs, and therefore the trees we can infer are UWQ1's closest relatives were OKY2 (85%), a tree located on the Okayama Prefectural Multipurpose Grounds, SMN2 (87%) and SMN3 (88%), both located on the Shimane University campus, and TKY3 (86%), one of the trees located at Ueno park in Tokyo. While OKY2 had the lowest shared SNV count at 85%, several variations were shared uniquely with UWQ1, which could indicate that the closest sampled relative to the UW quad cherries is actually located in Okayama prefecture. The oldest of the trees closely related to UWQ1 is TKY3, in Ueno park. This could show that all the trees in this group were propagated from TKY3, however a more extensive historical comparison of these 5 would be required to lend credibility to this theory. It is, however, further supported by the similarity between TKY3 and the other members of the group. While SMN2 and SMN3 shared 95% of their variation, SMN3 also shares 94% with TKY3. The high level of shared variation within the group could also point to TKY3 being the progenitor of the group, and therefore UWQ1 having descended from the tree in Ueno park, perhaps via a tree in Okayama.
Figure 1: Heatmap showing pairwise relatedness comparisons between all tree samples, including UW quad cherry tree 1 (UWQ1, purple box) and 46 tree samples from Shirasawa, 2024. Values represent the percent of variations shared by each pair. Infill color represents high (red), neutral (yellow) and low (blue) percent shared variation. Purple lines used only to emphasize UWQ1.
Figure 2: Geographical distribution of samples from Shirasawa 2024 across Japanese prefectures. 46 trees sampled are shown in their approximate locations. Based on data shown in Fig. 1, trees most closely related to UWQ1 (red, diamond), in the same lineage cluster (purple, triangle), and more distantly related individuals (grey, circle) are shown. Trees from Okayama (OKY2), Matsue (SMN2, SMN3) and Tokyo (TKY3) were found to be most closely related to UWQ1, sharing 85% or more of their variations. Trees in the same lineage were defined as sharing 81-84% of their variation with UWQ1, and trees considered to be from a different lineage shared fewer than 81% of their variations.
Figure 3: Representative model of relatedness inference made in Fig. 1 and Fig. 2. Sequences are from UW quad cherry tree 1 (UWQ1), closely related trees (purple highlight, OKY2, SMN3, SMN2), trees that clustered to the same group ("Lineage), trees that clustered to different groups ("Distant"), and the inferred ancestral sequence from Shirasawa, 2024 ("Ancestral"). Bases shown are either consistent with UWQ1 sequence (purple) or not (black). This figure is representative only, and shows the minimum data needed to make this inference. Bases were selected to reflect overall conclusions and describe their attainment.
Figure 4: Map of the University of Washington Liberal Arts Quadrangle. Trees sampled for this project (circled in green) are at the northern corner of the quad nearest the Raitt and Founders Halls. Trees have been numbered for convenience and listed numbers are consistent with representation in the data. UWQ1 is tree 1 on this map.
Leaf tissue samples were collected from low-hanging branches of 3 trees on the UW Quad, and sent for sequencing to the Harkess Lab at the HudsonAlpha Institute for Biotechnology. Adapter-trimmed DNA sequencing reads were produced by the Harkess Lab on the PacBio Revio system. 3.2 million reads comprised a yield of 37.23 Gb with a mean read length of 117300 bases, an N50 of 12902 bases, and a median quality of 34 (99.96% base call accuracy). Trimmed reads were converted from binary to FASTq for alignment using bam2fastq. To simplify later comparisons, reads were aligned to an extant, high quality, haplotype-resolved genome composed of 18 psuedomocules: 8 pseudomolecules representing 8 chromosomes for each of the two parent haplotypes and 2 pseudomolecules as bins of unmapped sequences (Shirasawa et al., 2019). Reads were aligned using Minimap2, an accurate aligner optimized for long read data sets. The resulting alignment was filtered to exclude secondary alignments, as the haplotype resolution would have been lost if included, and the primary alignments were sorted and indexed using samtools. An error rate of 0.396% was calculated using samtools stats, and fell within the expected range for error produced by sequencing errors compounded with real variation from the reference. Bcftools mpileup was used to extract location and allele data for locations of somatic mutation found in a previous study (Shirasawa et al., 2024). Variants were filtered for confidence using bcftools filter. Final variant information was called from our alignment using bcftools call. Called variants and reference dataset were integrated using bcftools merge. Variation in UW tree was compared to variation in trees in Japan, and a representative heatmap was created using R. Clustering results were confirmed using RaxML.
samtools: bam2fastq
*Filetype conversion
minimap2: -ax map-hifi
*Optimizes the algorithm for PacBio HiFi reads
samtools: sort
*Sorts the aligned reads for access
samtools: view -F 0x900
*Filters out any non-primary alignments 0x100(secondary) + 0x800(supplementary)
samtools: stats
*Creates a sheet of statistics for quality checking
bcftools mpileup: -f -R -a FORMAT/DP,FORMAT/AD
*Ensures necessary information is included
bcftools filter: -i 'QUAL>=30 && FORMAT/DP>=120 && INFO/MQ>=60 && INFO/VDB>=0.5 && INFO/DP4[1]=0'
* Includes only variants with high base quality, high allelic frequency, mapping quality, highly random variant distribution, and low strand bias
bcftools call: --ploidy 1 -m Ov -V indels
*Ignores diploid inference, keeps only variant sites, and excludes insertion/deletion mutations which confounded results
bcftools merge: -m all
*keeps all sites including variants unique to one sample
vcftools: --vcf --012
*Separates variant information into matrices for processing using R
Fig. 1 heatmap was created in R using ggplot2 and hclust to create matrix and sort samples according to their inferred relatedness. Emphasis lines and Fig. 3 were created in Inkscape. Fig. 2 geographical map was created in R using ggplot2 and a custom script. Colors were chose for emphasis, relatedness thresholds were selected based on available data and clustering patterns.
library(ggplot2)
library(reshape2)
library(dplyr)
relatedness_matrix <- cor(genotypes.012, use = "pairwise.complete.obs")
d <- dist(relatedness_matrix)
hc <- hclust(d)
ord <- hc$order
reord_rel <- relatedness_matrix[ord, ord]
melted_cormat <- melt(reord_rel, varnames = c("x", "y"), value.name = "correlation")
df_long <- melted_cormat %>%
filter(as.numeric(x) >= as.numeric(y))
my_colors <- colorRampPalette(c("cornflowerblue", "#F2EEB8", "firebrick3"))(100)
ggplot(df_long, aes(x = x, y = y, fill = correlation)) +
ggtitle("Percent Shared SNVs (684 Sites)")+
geom_tile(color = "white") +
labs(fill ="Shared SNVs (%)", size =36)+
geom_text(aes(label = round(correlation, 2)), size = 9) +
scale_fill_gradientn(
colours = my_colors,
limits = c(0,1),
oob = scales::squish ) +
theme_minimal() +
theme(
axis.text.x = element_text(size = 36, angle = 45, hjust = 1),
axis.text.y = element_text(size = 36), plot.title = element_text(hjust = 0.5, size = 75), legend.title = element_text(size = 30),
legend.text = element_text(size = 20), axis.title.x = element_blank(), legend.key.height = unit(2.8, "cm"),
legend.key.width = unit(1, "cm") ) +
coord_fixed()
UW Campus Tree Map (Fig. 4):
pacman::p_load(
dplyr, tibble, ggplot2, sf, rnaturalearth, rnaturalearthdata,
ggrepel, ggspatial, lwgeom, cowplot, googleway, BiocManager,
rmarkdown, tidyr, rnaturalearthhires, lattice, OpenStreetMap )
QTreeSet<- data.frame(
number = c(1, 2, 3, 4, 5, 6, 7),
longitude = c(-122.30682453884923, -122.30690936640025, -122.30698429740366, -122.30708891804993, -122.30718929731864, -122.30729815934247, -122.30740702136627),
latitude = c(47.6578693083986, 47.6578150290537, 47.657776938251665, 47.657723611082154, 47.65765504749843, 47.65757505653697, 47.65750077910585))
lon1 <- -122.30907914639634; lon2 <- -122.30565766531174
lat1 <- 47.65616533419451; lat2 <- 47.658150552992154
QMap <- openmap(c(lat2, lon1), c(lat1, lon2), zoom = 18, type = "esri-topo", mergeTiles = TRUE)
QMap2 <- openproj(QMap)
QTreeMap <- OpenStreetMap::autoplot.OpenStreetMap(QMap2) +
annotate("text", label = NA, x = -122.307, y = 47.657, size = 3.0, angle = -70)
print(QTreeMap)
Map of Trees Across Japan (Fig. 2):
pacman::p_load(
dplyr, tibble, ggplot2, sf, rnaturalearth, ggrepel, ggspatial,
lwgeom, cowplot, googleway, BiocManager, rmarkdown, tidyr,
rnaturalearthhires, lattice)
treeset <- read.csv("Shirasawa_Mapping_Data.csv")
modtreeset <- read.csv("ModTreeset.csv")
world <- ne_countries(scale = "Medium", returnclass = "sf")
# filtering out Sy from treeset
treeset <- treeset %>%
mutate(Name = gsub("Sy", "", Name))
create_location_labels <- function(treeset) {
unique_numbers <- treeset[!duplicated(treeset[c("Latitude", "Longitude")]), ]
unique_numbers$label <- seq_len(nrow(unique_numbers))
treeset_numbers <- merge(treeset, unique_numbers[, c("Latitude", "Longitude", "label")],
by = c("Latitude", "Longitude"), all.x = TRUE)
unique_numbers$label <- factor(unique_numbers$label, levels = unique_numbers$label)
treeset_numbers$label <- factor(treeset_numbers$label, levels = unique_numbers$label)
legend_names <- treeset_numbers %>%
group_by(label) %>%
summarise(names = paste(Name, collapse = ", "), .groups = 'drop') %>%
mutate(legend_text = paste0(label, ": ", names))
legend_labels <- setNames(legend_names$legend_text, legend_names$label)
return(list(
unique_locations = unique_numbers,
labeled_data = treeset_numbers,
legend_data = legend_names,
labels_data = legend_labels))}
modtreeset$simplerelatedness <- factor(modtreeset$simplerelatedness,
levels = c("Closest Relatives",
"Same Lineage",
"Other Lineages"))
modtreeset <- modtreeset %>%
arrange(factor(simplerelatedness, levels = c("Other Lineages", "Same Lineage", "Closest Relatives")))
location_data <- create_location_labels(treeset)
Unique_numbers <- location_data$unique_locations
Treeset_numbers <- location_data$labeled_data
legend_names <- location_data$legend_data
legend_labels <- location_data$labels_data
location_colors <- c()
for(i in 1:nrow(legend_names)) {
label_num <- as.character(legend_names$label[i])
tree_names <- legend_names$names[i]
if(grepl("SMN2", tree_names)) {
location_colors[label_num] <- "#F05624"
} else if(grepl("SMN3", tree_names)) {
location_colors[label_num] <- "#F05624"
} else if(grepl("TKY", tree_names)) {
location_colors[label_num] <- "#F05624"
} else if(grepl("AOM2", tree_names)) {
location_colors[label_num] <- "#F05624"}
else {location_colors[label_num] <- "deepskyblue4"}}
annotating_legend_labels<- function(legend_labels) {
annotated_labels <- legend_labels
if("13" %in% names(legend_labels)) {
annotated_labels["13"] <- paste0(legend_labels["13"], "\n OKY2 closely related to Sample 1")
}
if("14" %in% names(legend_labels)) {
annotated_labels["14"] <- paste0(legend_labels["14"], "\n SMN2 and SMN3 closely related to Sample 1")
}
if("18" %in% names(legend_labels)) {
annotated_labels["18"] <- paste0(legend_labels["18"], "\n TKY3 closely related to Sample 1")
}
return(annotated_labels)}
legend_labels_annotated <- annotating_legend_labels(legend_labels)
mod_legend_labels <- function(data) {
levels_present <- levels(data$simplerelatedness)
mod_legend_labels <- levels_present
names(mod_legend_labels) <- levels_present
if("Closest Relatives" %in% levels_present) {
mod_legend_labels["Closest Relatives"] <- paste0("Closest Relatives", "\n ≥ 85% SNPs shared with Sample 1")
}
if("Same Lineage" %in% levels_present) {
mod_legend_labels["Same Lineage"] <- paste0("Same Lineage", "\n ≥ 81% SNPs shared with Sample 1")
}
if("Other Lineages" %in% levels_present) {
mod_legend_labels["Other Lineages"] <- paste0("Other Lineages", "\n < 81% SNPs shared with Sample 1 ")
}
return(mod_legend_labels)}
mod_labels_annotated <- mod_legend_labels(modtreeset)
japan_cities <- data.frame(
name = c("Tokyo", "Osaka", "Kyoto", "Nagoya", "Sapporo", "Fukuoka",
"Hiroshima", "Sendai", "Niigata", "Kanazawa", "Matsue",
"Takamatsu", "Kumamoto", "Kagoshima", "Naha"),
longitude = c(139.6917, 135.5023, 135.7681, 136.9066, 141.3545, 130.4017,
132.4596, 140.8719, 139.0238, 136.6256, 133.0505,
134.0434, 130.7414, 130.5581, 127.6792),
latitude = c(35.6895, 34.6937, 35.0116, 35.1815, 43.0642, 33.5904,
34.3853, 38.2682, 37.9161, 36.5944, 35.4676,
34.3401, 32.7503, 31.5604, 26.2124))
combined_labels <- rbind(
data.frame(
x = Unique_numbers$Longitude,
y = Unique_numbers$Latitude,
label = Unique_numbers$label,
type = "tree",
stringsAsFactors = FALSE),
data.frame(
x = japan_cities$longitude,
y = japan_cities$latitude,
label = japan_cities$name,
type = "city",
stringsAsFactors = FALSE))
TreeMap <- ggplot(world) +
geom_sf(fill = "papayawhip", color = "gray60", linewidth = 0.5) +
geom_point(data = Treeset_numbers,
aes(x = Longitude, y = Latitude, color = label),
size = 0.01) +
geom_point(data = japan_cities,
aes(x = longitude, y= latitude, color = name),
size = 0.000001,
shape = 9) +
geom_point(data = modtreeset,
aes(x = Longitude, y = Latitude,
fill = simplerelatedness,
shape = simplerelatedness),
size = 3) +
labs(title = "Sequenced Somei-Yoshino Trees") +
geom_text_repel(data = combined_labels,
aes(x = x, y = y, label = label,
fontface = ifelse(type == "tree", "bold", "italic"),
color = ifelse(type == "city", "gray30", "gray30")),
size = ifelse(combined_labels$type == "tree", 11 / .pt, 4),
box.padding = 0.5,
xlim = c(NA, Inf),
ylim = c(-Inf, Inf),
max.overlaps = getOption("ggrepel.max.overlaps", default = Inf)) +
coord_sf(xlim = c(129, 142.5), ylim = c(30.5, 42), expand = FALSE) +
scale_color_manual(
values = location_colors,
labels = legend_labels_annotated,
na.value = "black",
name = "Tree IDs") +
guides(color = guide_legend(override.aes = list(shape = NA, fill = NA)),
linetype = guide_legend(override.aes = list(fill = NA))) +
scale_fill_manual(
values = c("Closest Relatives" = "red",
"Same Lineage" = "#785EF0",
"Other Lineages" = "gray40"),
breaks = c("Closest Relatives", "Same Lineage", "Other Lineages"),
labels = mod_labels_annotated,
name = "Relatedness to UW Quad Tree 1") +
scale_shape_manual(
values = c("Closest Relatives" = 23,
"Same Lineage" = 24,
"Other Lineages" = 21),
breaks = c("Closest Relatives", "Same Lineage", "Other Lineages"),
labels = mod_labels_annotated,
name = "Relatedness to UW Quad Tree 1") +
theme_bw() +
theme(
legend.position = c(0.01, 0.99),
legend.justification = c(0.01, 0.99),
legend.background = element_rect(fill = "white", color = NA),
legend.key = element_rect(fill = "white", color = NA),
legend.title = element_text(size = 12, face = "bold", family = "serif"),
legend.text = element_text(size = 11, family = "serif"),
legend.key.size = unit(0.4, "cm"),
legend.key.width = unit(0.25, "cm"),
legend.key.height = unit(0.5, "cm"),
legend.margin = margin(t = 2, r = 3, b = 2, l = 3),
legend.box = "vertical",
legend.box.spacing = unit(0, "cm"),
legend.spacing.y = unit(0, "cm"),
panel.background = element_rect(fill = "powderblue", color = "powderblue"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(color = "black"),
plot.title = element_text(family = "serif", face = "bold", size = 20),
plot.background = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank())
print(TreeMap)
Aiden Eno
Caitlin Dong
Nandini Pathak
Preston Lam
With help from:
Professor Adam Steinbrenner
Ben Sheppard
Dr. Kelsey Wood
UWQ1 TREE (ADAM, CAITLIN, AIDEN, BEN, PRESTON, NANDINI. NOT IN PHOTO: KELSEY )
Kenta Shirasawa, Tomoya Esumi, Hideki Hirakawa, Hideyuki Tanaka, Akihiro Itai, Andrea Ghelfi, Hideki Nagasaki, Sachiko Isobe, Phased genome sequence of an interspecific hybrid flowering cherry, ‘Somei-Yoshino’ (Cerasus × yedoensis), DNA Research, Volume 26, Issue 5, October 2019, Pages 379–389, https://doi.org/10.1093/dnares/dsz016
Kenta Shirasawa, Tomoya Esumi, Akihiro Itai, Katsunori Hatakeyama, Tadashi Takashina, Takuji Yakuwa, Katsuhiko Sumitomo, Takeshi Kurokura, Eigo Fukai, Keiichi Sato, Takehiko Shimada, Katsuhiro Shiratake, Munetaka Hosokawa, Yuki Monden, Makoto Kusaba, Hidetoshi Ikegami, Sachiko Isobe, Propagation path of a flowering cherry (Cerasus × yedoensis) cultivar ‘Somei-Yoshino’ traced by somatic mutations, DNA Research, Volume 31, Issue 5, October 2024, dsae025, https://doi.org/10.1093/dnares/dsae025