17 - Status of COI records in GenBank and implications for future record re-usability
A new preprint discussing the status of COI records in GenBank shows growth over time, the onset of an increasing proportion of insufficiently identified records, and uneven levels of metadata annotation.
Porter, T.M. and Hajibabaei, M. 2018. Over 2.5 million COI sequences in GenBank and growing. bioRxiv, doi: https://doi.org/10.1101/353904
Supporting code and datasets are now available at https://github.com/terrimporter/COI_NCBI_2018 . This repo contains scripts that can be used to retrieve data from NCBI taxonomy and nucleotide databases, as well as from the BOLD API and data releases.