The emerging global infectious COVID-19 coronavirus disease by novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first identified in patients with severe pneumonia in Wuhan, China in December 2019 (Chan et al., 2019). To this date, the SARS-CoV-2 has caused a pandemic and represents a significant threat to the global public health. Since the pandemic of COVID-19, there are over 3 millions confirmed cases from 213 countries across the globe and over 200, 000 casualties from the rapidly-spreading disease as of Apr 27, 2020 (WHO, 2020).
SARS-CoV-2 belongs to the Coronaviridae family and is classified into Betacoronavirus genus ( β -Cov) (Letko et al., 2020). Coronaviruses have a large (30+ kb) single-stranded positive- sense RNA genome encoding for several open reading frames. Several coronaviruses are known to infect humans, causing diseases ranging from severe respiratory symptoms (SARS-CoV, MERS-CoV) to mild common cold symptoms (HKU1, NL63, OC43) (Corman et al, 2018).
While there is an ongoing collaborative effort from scientists across the world trying to understand the molecular mechanisms of SARS-CoV-2, we currently do not have vaccine/official treatment for COVID-19. Vaccine development is focused on targeting the spike (S) glycoprotein on the viral surface of SARS-CoV-2. Similar to all coronaviruses, SARS-CoV-2 utilizes the S protein to bind to the host-cell receptor and to mediate viral entry (Letko et al., 2020). The S protein is a trimeric class I fusion protein that exists in a metastable prefusion conformation that undergoes a substantial structural rearrangement to fuse the viral membrane with the host cell membrane (Fang Li, 2016; Bosch et al., 2003). It has been recently shown that spike protein of SARS-CoV-2 binds to the human host cell receptor, angiotensin-converting enzyme 2 (ACE2) (Letko et al., 2020). When the receptor-binding domain (RBD) of S1 binds to a ACE2, it destabilizes the pre-fusion trimer, resulting in shedding of the S1 subunit and transition of the S2 subunits a stable postfusion conformation. The fusion of viral and endosomal membranes trigger release of viral RNA into the cytosol (Fehr and Perlman, 2015 ) . The viral RNA has a 5’cap structure and a 3’poly(A) tail that allows expression of the replicase. The viral replicase is encoded by ORF1ab which is approximately two-thirds of the genome. It is expressed as two polyproteins: pp1a and pp1ab, and these include up to 16 nonstructural proteins(nsps). The other third encodes for the structural and accessory proteins, including structural proteins S, matrix (M) protein, and envelope (E). In many coronaviruses, the S protein is cleaved into two subunits, S1 and S2, often by furin-like proteases with the cleavage site at R685/S686.
The structure of the S protein of SARS-CoV-2 was solved in record time at high resolution, contributing to the understanding of its vaccine target (Wrapp et al., 2020). While SARS and SARS-CoV-2 share the same host target binding receptor (ACE2), one large difference between SARS-CoV S and SARS-CoV-2 S is the position of the receptor-binding domains(RBDs) in their respective receptor-inaccessible conformation (Wrapp et al., 2020). Also, the mutations in contact of residues of SARS-Cov-2 spike protein have proven to likely play an important role in driving the viral pandemics. Therefore, we have a target antigen that can be incorporated into advanced vaccine platforms, such as recombinant S-protein-based vaccines.
In sharp contrast with the SARS outbreak in 2002-2003 where efficient transmission occurs in healthcare facilities, community transmission is the driving force of infection rate and high fatality rate for COVID-19. Recent epidemiological and clinical evidence demonstrates the superior efficiency of SARS-CoV-2 transmission when compared with SARS (Guan et al., 2020). To further understand the rapid evolution and transmission pattern of SARS-CoV-2, it is important to genotype virus isolates on a population scale coupled with contact tracing. Multiple sequence alignment (MSA) of DNA and/or amino acid sequences are widely used to infer molecular evolution history. These tools are beneficial in detecting the highly mutated SARS-CoV-2 genomes due to an error-prone RNA-dependent RNA polymerase in genome replication.
To understand the molecular evolution and genotype distribution of SARS-CoV-2 in USA, we establish the genotyping method and investigate the genotypes changes during the transmission of SARS-CoV-2 using phylogenetic and phylodynamic analysis of the viral sequences. Furthermore, we predict the structure of S protein with the most frequent genomic mutation site and overlay it onto the published Cryo-EM 3D structure of the S protein. Our results show that the genotypes of the virus are not uniformly distributed among the complete genomes of SARS-CoV-2 from different states. This genotyping study discovers a few highly frequent mutations in the SARS-CoV-2 genomes. The mutations are located respectedly in the S protein, RNA polymerase, E protein, N protein. We narrow down our analysis to the S gene which is major target for vaccine development. These findings indicate that the frequent mutations might be fixed in the SARS-CoV-2 strains circulated in USA.