Changelog

February 23, 2011: dbNSFP and search_dbNSFP v0.9 released.

April 4, 2011: A bug related to the prediction scores of MutationTaster is fixed. dbNSFP v1.0 released. A change to the chromosome search order of the search_dbNSFP. A readme file added. search_dbNSFP v1.0 released.

May 30, 2011: dbNSFP and search_dbNSFP v1.1 released. Version 1.1 added the following entries: rs numbers from UniSNP (a cleaned version of dbSNP build 129), allele frequency recorded in dbSNP, allele frequency reported by 1000 Genomes Project, alternative gene names, descriptive gene name, database cross references (gene IDs of HGNC, MIM, Ensembl and HPRD). The unziped database is 18Gb.

May 31, 2011: dbNSFP_light and search_dbNSFP_light v1.0 released. dbNSFP_light v1.0 is a light version of dbNSFP, which contains less annotation entries but some additional 9,285,316 NSs that are not in CCDS version 20090327. Scores of PhyloP, SIFT, Polyphen2, LRT and MutationTaster are included but missing data are not  imputed. Prediction of LRT and MutationTaster are also included, as well as the omega estimated by LRT. The unziped database is 6Gb.

 

October 24, 2011: dbNSFP_light v1.1 and search_dbNSFP_light v1.1 released. dbNSFP v1.2 and search_dbNSFP v1.2 released. The new versions added GERP++ neutral rates and RS scores.

October 25, 2011: dbNSFP v1.3 released. It added Uniprot ID, accession number and amino acid position based on Polyphen-2 annotation. Users now can search amino acid change directly referring to a Uniprot ID or accession number. 

 

November 3, 2011: dbNSFP_light v1.2 released. It added Uniprot ID, accession number and amino acid position based on Polyphen-2 annotation. Users now can search amino acid change directly referring to a Uniprot ID or accession number.

 

November 10, 2011:  A bug fixed in the conpanion search program for dbNSFP v1.3, which causes invalid search using AA mutations with Uniprot ID or accession number.

 

December 16, 2011: dbNSFP_light v1.3 released. It updated SIFT scores (August, 2011 version) and Polyphen-2 scores (May, 2011 version). Uniprot ID, accession number and amino acid position based on the Polyphen-2 annotations have been updated too.

April 11, 2012: dbNSFP2.0b1_variant released. This is beta test version of the variant sub-database of dbNSFP v2.0, which is rebuilt based on Gencode release 9 / Ensembl version 64.

June 2, 2012: dbNSFP v2.0b2 released. It includes both the dbNSFP_variant and dbNSFP_gene sub-databases. Slight changes have been made to the Ensembl gene and transcript ids of dbNSFP_variant in order to be compatible to other database sources.

July 2, 2012: dbNSFP v2.0b3 released. An additional 2.2 million splicing site SNPs have been added to dbNSFP_variant. In the table those SNPs have missing (".") in aaref, aaalt and "-1" in aapos. There's no change to the format of search input file.

August 28, 2012: The companion java search program search_dbNSFP20b3 is updated. Added features include supporting vcf file as input file and options for output contents (columns).

October 27, 2012: dbNSFP v2.0b4 is released. A new functional prediction score MutationAssessor is added (I thank Mr. Yevgeniy Antipin for his recommendation). Allele frequencies from ESP 5400 data set are replaced by ESP 6500 data set.

February 25, 2013: dbNSFP v2.0 is released. A new functional prediction score FATHMM is added.

March 22, 2013: A bug which caused a lot of missing FATHMM scores has been fixed.  

May 31, 2013: The source code of the companion Java search program is now available under the RECEX SHARED SOURCE LICENSE.

October 3, 2013: dbNSFP v2.1 is released. MutationTaster and FATHMM scores have been updated. Converted scores of SIFT, LRT, MutationTaster, MutationAssessor and FATHMM have been added. Columns of SIFT and FATHMM predictions have been added. The gene database has also been updated. Database IDs are updated. GO Slim terms, pathway and protein interaction information from the ConsensusPathDB, and list of essential and non-essential genes (based on phenotypes of mouse homologs) have been added.

January 23, 2014: dbNSFP v2.2 is released. SIFT and FATHMM now have multiple scores corresponding to different Ensembl ENSP ids and amino acid positions (aapos_SIFT and aapos_FATHMM). Accordingly, our companion search program now supports SNP searches based on Ensembl ENSP ids and amino acid positions. A bug is fixed for a small proportion of MutationTaster scores.

January 26, 2014: dbNSFP v2.3 is released.  Two ensemble scores (RadialSVM and LR) and their predictions have been added.

February 12, 2014: A bug was fixed in dbNSFP v2.2 and v2.3, which caused missing delimiters in columns aapos_SIFT, SIFT_score_converted and SIFT_pred. (I thank Mr. Yevgeniy Antipin for his reminder). 

March 5, 2014: dbNSFP v2.4 is released. A whole genome functional prediction score called CADD was added, along with five more conservation scores (phyloP46way_primate, phyloP100way_vertebrate, phastCons46way_primate, phastCons46way_placental, phastCons100way_vertebarate). To facilitate comparison between scores, we added rank scores for most functional prediction scores and conservation scores, and replacing the  "converted" scores in the previous versions.

June 1, 2014: dbNSFP v2.5 is released. A new functional score VEST 3.0 has been added. We thank Dr. Karchin for kindly providing the score. A bug that causes the MutationTaster score error since v2.1 for variants with a prediction of  "Polymorphism_automatic" has been fixed. We thank John McGuigan and James Ireland for reporting this bug. As MutationTaster can also predict splicing change and other functional effects, in case a variant has multiple predictions based on their different model, we took the most damaging score and prediction for dbNSFP. 

July 26, 2014: dbNSFP v2.6 is released. rs numbers from dbSNP 141 have been added to the variant database files. Mouse and zebra fish homolog genes and phenotypes have been added to the gene database file (I thank Alex Li for his suggestion and helps). Trait_association(GWAS) was also updated. An attached database called dbscSNV is available for download. It includes all potential human SNVs within splicing consensus regions (−3 to +8 at the 5’ splice site and −12 to +2 at the 3’ splice site), i.e. scSNVs, related functional annotations and two ensemble prediction scores for predicting their potential of altering splicing. search_dbNSFP26 now supports searching dbNSFP along with dbscSNV using option "-s".

Septermber 12, 2014: dbNSFP v2.7 is released. Chromosomes and postions of human reference hg38 have been added. search_dbNSFP27.class now supports query dbNSFP using the positions based on hg38 with the "-v hg38" option. clinvar (freeze 20140902) annotations have been added. Allele frequencies from 2303 exomes of African Americans  and 3203 exomes of European Americans from the Atherosclerosis Risk in Communities (ARIC) cohort study  have been added. As the columns for gene interactions in dbNSFP_gene table contain very long strings, especially  for gene UBC, which may cause problems when viewing the results in Excel, now we only report the number of  interacting genes in those columns. Full information is retained in the dbNSFP_gene.complete table.

November 21, 2014: dbNSFP v2.8 is released. COSMIC (Catalogue Of Somatic Mutations In Cancer) annotations have been added. Pathway information from BioCarta and KEGG (old version) has been added to the dbNSFP2.8_gene. A bug causing inconsistency between MutationTaster scores and MutationTaster_pred, which affects v2.5 to v2.7, has been fixed. I thank Adam Novak for reporting this bug. 

February 3, 2015: dbNSFP v2.9 is released. SIFT score has been updated to ensembl66 version. PROVEAN (Protein Variation Effect Analyzer) score v1.1 has been added. CADD score has been updated to 1.3 version. Allele frequency v0.3 of ~60,706 unrelated individuals from the Exome Aggregation Consortium (ExAC) has been added.

April 6, 2015: dbNSFP v3.0b1 is released. The core set of nsSNVs and ssSNVs has been rebuilt based on Gencode 22/Ensembl 79 with human reference sequence hg38. Putative genes have been included. Genes with incomplete 5' have been excluded (I thank Chris Gillies for reporting the issues for genes with incomplete 5' end). Genes on mitochondrial DNA have been included. Allele frequencies from the UK10K cohorts and genotypes of two Neanderthals have been added. Some resources have been updated, including the MutationTaster (I thank Dr. Dominik Seelow for kindly providing the scores), allele frequencies from the 1000 Genomes Project populations, ancestral alleles, dbSNP, ClinVar and InterPro. The presentation of the prediction scores has been improved by adding columns for the corresponding transcript/protein ids. PhyloP and PhastCons conservation scores based on hg19 have been replaced by the scores based on hg38. Some resources have been dropped due to various reasons, including SLR test statistic, UniSNP ids, allele frequencies from the ARIC cohorts and allele counts in COSMIC. dbNSFP_gene has also been completely rebuilt using the up-to-date resources. Residual Variation Intolerance Scores (RVIS) have been added. GO Slim terms have been replaced by full GO terms. Two branches of dbNSFP are now provided: dbNSFP3.0b1a suitable for academic use, which includes all the resources, and dbNSFP3.0b1c suitable for commercial use, which does not include VEST3 and CADD.

April 12, 2015: dbNSFP v3.0b2 is released. This update fixed the issues due to inconsistent mitochondrial reference sequences used by different resources. I thank Dr. Lishuang Shen at MEEI for helping solving the issues. For mitochondrial SNV, the pos (i.e. hg38) refers to the rCRS (GenBank: NC_012920) and hg19_pos refers to a YRI sequence (GenBank: AF347015). The ancestral allele of mitochondrial SNV now comes from the Reconstructed Sapiens Reference Sequence (RSRS, doi:10.1016/j.ajhg.2012.03.002). The affected content include ancestral alleles, Neanderthal/Denisova genotypes and MutationTaster columns of the chrM file. The rankscores of MutationTaster has also been updated to reflect the update of its chrM scores. dbscSNV has been updated to v1.1 and added hg38 positions liftovered from its hg19 positions.

August 3, 2015: dbNSFP v3.0 is released. Three new functional prediction scores (DANN, fathmm-MKL and fitCons) and two conservation scores (phyloP20way_mammalian and phastCons20way_mammalian) have been added to dbNSFP v3.0a. All five scores except DANN are also included in bNSFP v3.0c. CADD scores have been updated to v1.3. I thank Dr. Xueqiu Jian and Kirill Prusov for suggestions on README files. 

Columns updated: CADD_raw (dbNSFP v3.0a only), CADD_raw_rankscore (dbNSFP v3.0a only), CADD_phred (dbNSFP v3.0a only). 

New columns: DANN_score (dbNSFP v3.0a only), DANN_rankscore (dbNSFP v3.0a only), fathmm-MKL_coding_score, fathmm-MKL_coding_rankscore, fathmm-MKL_coding_pred, fathmm-MKL_coding_group, integrated_fitCons_score, integrated_fitCons_rankscore, integrated_confidence_value, GM12878_fitCons_score, GM12878_fitCons_rankscore, GM12878_confidence_value, H1-hESC_fitCons_score, H1-hESC_fitCons_rankscore, H1-hESC_confidence_value, HUVEC_fitCons_score, HUVEC_fitCons_rankscore, HUVEC_confidence_value.

August 13, 2015: In the variant files released on August 3, in case there are multiple FATHMM scores/predictions for a SNV, only the (predicted) most deleterious one is presented, instead of all scores/predictions. This issue has been fixed. 

November 24, 2015: dbNSFP v3.1 is released. Significant eQTLs from GTEx V6 have been added. dbSNP rs has been updated to build 144. Gene expression information (rpkm of RNAseq) of 53 tissues from GTEx V6 has been added to dbNSFP_gene. Three gene intolerance scores (RVIS based on ExAC r0.3, GDI and LoFtool) have been added to dbNSFP_gene. 

January 21, 2016: Polyphen-2 scores from the "c" branches of dbNSFP v3.x have been removed.  

March 20, 2016:  dbNSFP v3.2 is released. Eigen score, Eigen PC score (doi: 10.1038/ng.3477) and GenoCanyon score (doi:10.1038/srep10576) have been added. Allele frequencies of two commonly used subsets of ExAC data (nonTCGA and nonpsych) have been added. Mutation Assessor scores have been updated to release 3. PhyloP7way_vertebrate and PhastCons7way_vertebrate conservation scores have been updated to phyloP100way_vertebrate and PhastCons100way_vertebrate, respectively. rankscores have been updated accordingly. Ancestral alleles have been updated based on Ensembl 84. dbSNP has been updated to build 146. Clinvar has been updated to 20160302, review status (golden stars) was added. InterPro has been updated to v56. Gene name cross-links, IntAct, Uniprot, GWAS catalog, BioGRID, GO, ConsensusPathDB, mouse genes and zebra fish genes information for the dbNSFP_gene table have been updated.

March 30, 2016: dbNSFP v2.9.1 is released. MutationTaster has been updated to those based on Ensembl 69, i.e. the same version as in dbNSFP v3. Mutation Assessor has been updated to release 3. 

November 30, 2016: dbNSFP v3.3 and v2.9.2 are released. M-CAP score (DOI: 10.1038/ng.3703) has been added. We thank Dr. Gill Bejerano for providing the score. Eigen and Eigen PC scores have been updated to v1.1. dbSNP has been updated to v147. clinvar has been updated to 20161101.

March 12, 2017: dbNSFP v3.4 and v2.9.3 are released. REVEL score ( doi: 10.1016/j.ajhg.2016.08.016) and MutPred score (doi: 10.1093/bioinformatics/btp528) have been added. SORVA gene ranking scores (doi: 10.1101/103218) have been added to gene annotation. 


August 6, 2017: dbNSFP v3.5 is released. Allele frequencies from the exomes and genomes of the Genome Aggregation Database (gnomAD) have been added. Interpro, dbSNP, clinvar, ancestral alleles, Altai Neanderthal genotypes, Denisova genotypes and GTEx eQTLs have been updated. dbNSFP_gene has been rebuilt with updated annotations. Other changes to dbNSFP_gene include: Interactions columns now show the gene list instead of the total number; GTEx gene expression annotations have been removed; LoF FDR p-value from RVIS has been added; Genome-wide haploinsufficiency score (GHIS) has been added; LoF and CNV intolerance/tolerance scores based on ExAC data have been added.

December 8, 2018: dbNSFP v4.0b1 is released for beta testing. The core set of nsSNVs and ssSNVs has been rebuilt based on Gencode 29/Ensembl 94 with human reference sequence hg38. Eight deleteriousness prediction scores (ALoFT, DEOGEN2, FATHMM-XF, MPC, MVP,  PrimateAI, LINSIGHT, SIFT4G) have been added. Three conservation scores (phyloP17way_primate, phastCons17way_primate, bStatistic)  have been added. Allele frequencies from the gnomAD controls subsets, eQTLs from the Geuvadis project, and genotypes of a Vindija33.19  Neanderthal have been added. Some resources have been updated, including VEST (We thank Dr. Karchin), CADD, M-CAP, ancestral  alleles, dbSNP, ClinVar, GTEx and InterPro. The presentation of the prediction scores has been further improved by adding  the correspondence to transcript/protein ids in a systematic way. APPRIS, GENCODE_basic, TSL and VEP_canonical have been added to facilitate the choice of appropriate transcripts. dbNSFP_gene has also been completely rebuilt using the up-to-date  resources. HIPred, gene constraint scores from the gnomAD data, essential genes predictions based on CRISPR, gene-trap and gene networks have been added.

December 30, 2018: A bug causing id mapping issue from Uniprot to Ensembl, which further causing increased missing rates of Polyphen2, MutationAssessor and DEOGEN2, has been found and fixed (We thank Dr. Daniele Raimondi). 

February 20, 2019: Uniprot sprot_varsplic was included in the mapping from Uniprot to Ensembl. Fixed column title inconsistency between the README file and data file. (We thank Kevin Xin and Julius Jacobsen for pointing out the inconsistency.) dbMTS was added as an attached database. search_dbNSFP added support for searching dbMTS with option '-m'. 

May 3, 2019: dbNSFP v4.0 is released. HGSV c. and p. presentations from ANNOVAR, SnpEff and VEP have been added. search_dbNSFP now supports search based on HGSV c. and p. presentations. MedGen ID, OMIM ID and Orphanet ID from clinvar have been added. 

December 5, 2019: A minor bug is fixed in dbNSFP v4.0. In the previous release the content of the following columns were compressed, i.e. if annotations for all transcripts are identical, only one annotation was presented: genename, cds_strand, refcodon, codonpos, codon_degeneracy, FATHMM_score, FATHMM_pred, Interpro_domain. In this release those columns are decompressed, i.e. have the same number of annotations as the number of transcripts. A Java-based graphic user interface (GUI) search program (search_dbNSFP40a.jar or search_dbNSFP40c.jar) has been added. Users can double-click the jar file to launch the GUI (it supports commandline also, please check the search_dbNSFP readme pdf for details).

May 15, 2020:  A minor bug is fixed in dbNSFP v4.0. In the previous release, the column Primate_AI_pred was not 100% correct. We thank Alex Kouris for reporting this issue.

June 16, 2020: dbNSFP v4.1 is released. BayesDel (https://doi.org/10.1002/humu.23158), ClinPred (https://doi.org/10.1016/j.ajhg.2018.08.005) and LIST-S2 (https://doi.org/10.1093/nar/gkaa288) scores have been added. CADD has been updated to v1.6, CADD score based on hg19 model has been added. Clinvar, GTEx and gnomAD genomes have been updated. HPO terms have been added to the dbNSFP_gene. search_dbNSFP programs now support searching SpliceAI as an attached database.

Jan 27, 2021: The command-line only version of the search programs for v4.1a and v4.1c were added. 

Feb 10, 2021: A bug fixed. In the previous release, the gnomAD_pLI, gnomAD_pRec and gnomAD_pNull scores in dbNSFP4.1_gene.gz and dbNSFP4.1_gene.complete.gz have a problem that the scores are not always corresponding to the canonical transcripts of the genes. 

March 12, 2021: A bug fixed. In the previous release, some ALoFT scores/information are missing in dbNSFP.

April 6, 2021: dbNSFP v4.2 is released. MetaRNN scores have been added. Allele frequencies of gnomAD exome have been updated to r2.1.1. Allele Frequencies of gnomAD genome have been updated to v3.1. dbSNP has been updated to 154. clinvar has been updated to 20210131.

February 18, 2022: dbNSFP v4.3 is released. REVEL scores have been updated with transcript ids, i.e., the scores are now transcript-specific. Genotypes of Chagyrskaya neandertals have been added. dbSNP has been updated to b155. clinvar has been updated to 20220122.

May 6, 2023: dbNSFP v4.4 is released. gMVP and VARITY scores have been added. Allele frequencies of ALFA (Allele Frequency Aggregator) have been added. dbSNP has been updated to b156. clinvar has been updated to 20230430. phyloP30way_mammalian has been replaced by phyloP470way_mammalian. phastCons30way_mammalian has been replaced by phastCons470way_mammalian. A bug in MutPred scores (not all SNVs causing the same AA change have scores) has been fixed. 

November 2, 2023: dbNSFP v4.5 is released. ClinVar has been updated to 20231028. ESM1b, EVE and AlphaMissense scores have been added. 

February 18, 2024: dbNSFP v4.6 is released. ClinVar has been updated to 20240215. GTEx V8 splicing QTLs (sQTLs) have been added. eQTLs from eQTLGen phase I have been added. There was a bug in v4.5 causing a large proportion of ESM1b scores to be misaligned. It has been fixed.

March 3, 2024: dbNSFP v4.7 is released. CADD has been updated to v1.7. Allele frequencies of gnomAD exomes and genomes have been updated to v4.0.0. One bug in v4.6 causing eQTLGen eQTLs of some tissues missing has been fixed.