At AGBT 2019, the T2T announced a whole-genome de novo assembly that surpasses the continuity of GRCh38, along with the first complete, telomere-to-telomere assembly of a human X chromosome. As of January 2019, we have collected 50X coverage of ultra-long Oxford Nanopore sequencing for the CHM13hTERT cell line, including 44 Gb of sequence in reads 100 kb+ and a maximum read length exceeding 1 Mb.
This unprecedented coverage of ultra-long reads enabled the resolution of most repeats in the genome, including large fractions of the centromeric satellite arrays and short arms of the acrocentrics. A de novo assembly combining this nanopore data with 70X of existing PacBio data achieved an NG50 contig size of 75 Mb (compared to 56 Mb for GRCh38), with some chromosomes broken only at the centromere.
Using this assembly as a basis, we chose to manually finish the X chromosome. The few unresolved segmental duplications were assembled using ultra-long reads spanning the individual copies, and the ~2.8 Mbp X centromere was assembled by identifying unique variants within the array and using these to anchor overlapping ultra-long reads.
These results demonstrate that it is now possible to finish entire human chromosomes without gaps, and our future work will focus on completing and validating the remainder of the genome.