STR-validator
3xNews
19.07.2023: STR-validator version 2.4.1 released on CRAN.
09.06.2023: Added a known issue when trying to create plots using R 4.3.0.
24.09.2022: STR-validator version 2.4.0 released on CRAN.
Introduction
The program STR-validator is written and maintained by Oskar Hansson at Oslo University Hospital (OUS), Section of Forensic Genetics. The work has received external funding from the European Union seventh Framework Programme (FP7/2007-2013) under grant agreement n° 285487 (EUROFORGEN-NoE), and maintenance are performed as a part of my position at OUS.
The R-package strvalidator is a free and open source software package developed mainly for internal validation of forensic STR DNA typing kit. However, it is equally suited for validation of other methods and instruments, or for process control. Its graphical user interface, which I refer to as STR-validator, makes it very easy to analyse data exported from e.g. GeneMapper® software, without any knowledge about R commands. It provides convenient functions to import, view, edit, and export data. After completed analysis the results, generated plots, heat-maps, and data can be saved in a project for easy access. Currently, analysis modules for stutter, balance, drop-out, concordance, mixtures, precision, pull-up, result types, and analytical threshold are available. STR-validator can greatly increase the speed of validation by reducing the time and effort needed for analysis of the validation data. It allows easy exploration of the characteristics of DNA typing kits according to ENFSI and SWGDAM recommendations. Another area of use is monitoring of the contamination level which is essential to estimate the probability of drop-in. In this way STR-validator facilitates the implementation of probabilistic interpretation of DNA results.
Tutorial, installation instructions and other material are available for download at the bottom of the page.
Please report errors in the manual and tutorials to the package maintainer (link).
strvalidator is currently developed on a Windows 10 system, and is optimised to use the gWidgets2RGtk2 package and gWidgets2tcltk for its graphical user interface. The former requires the user to install the GTK+ library, which may be tricky under restricted permission, the latter uses the native R package tcltk and should work wherever there is R.
Who uses STR-validator?
Mostly forensic genetics laboratories. However, user support questions have revealed that STR-validator is used in some university forensic classes, and within other fields such as human and horse parentage testing.
The original CRAN server does not have any download statistics. However, the popular RStudio software by default uses its own CRAN mirror and makes download logs available. These can be queried through a web API using cranlogs. The RStudio CRAN mirror is not the only CRAN mirror, but it’s a popular one. The actual number of downloads over all CRAN mirrors is unknown, which mean that the counts shown below may be an underestimate. The code to generate the plot is available in this post.
RStudio CRAN mirror download statistics. The actual number of downloads over all CRAN mirrors is unknown, which mean that the counts are likely an underestimate.
World map showing country of residence for STR-validator users requesting support, workshops where STR-validator was introduced, and dedicated STR-validator workshops.
Workshops and courses involving STR-validator
2022
"Validation - Experimental Design and Analysis Using STR-validator", Washington (USA) 30 August
2018
"Efficient Validation Using STR-validator", Araraquara (Brazil) 13 September
2017
"Efficient Validation Using STR-Validator", Seattle (USA) 1 October
2016
"Statistical methods in forensic genetics", Bologna (Italy) 6-7 June
"Analysis of Internal Validation Datasets Using Open-Source Software STR-validator" by Sarah Riman, Erica L. Romsos, Lisa Borsuk, and Peter M. Vallone. Gaithersburg (USA) 9 November
2015
"Statistical methods in forensic genetics Train the trainers Workshop", Copenhagen (Denmark) 20-23 April
"Mixtures, complex DNA profiles, and interpretation with the LRmix Studio software", Krakow (Poland) 1 September
2014
"Statistical methods in forensic genetics Train the trainers Workshop", Copenhagen (Denmark) 20-23 May
Papers citing STR-validator
2021
Evaluation of tissues applicable for identification of decomposed corpses by DNA analysis: comparison of tissues from toenail parts with costal cartilages (in Korean), Journal : Police Science Research Vol. 21 No. 3 pp.97~114( 17 pages), Issue date: 2021.09
2019
The importance of forensic storage support: DNA quality from 11-year-old saliva on FTA cards, doi:10.1007/s00414-019-02146-6
Estimation of genotyping errors of STR markers in dogs and wolves, doi:10.1016/j.fsigss.2019.11.014
2018
Socio-technical disagreements as ethical fora: Parabon NanoLab’s forensic DNA Snapshot™ service at the intersection of discourses around robust science, technology validation, and commerce, doi:10.1057/s41292-018-0138-8
2017
Investigating the effects of different library preparation protocols on STR sequencing, doi.org/10.1016/j.fsigss.2017.09.155
Characterisation of artefacts and drop-in events using STR-validator, doi:10.1016/j.fsigen.2017.04.015
2016
Characterization of degradation and heterozygote balance by simulation of the forensic DNA analysis process, doi:10.1007/s00414-016-1453-x
The Development and Release of a Collection of Computational Tools and a Large-Scale Empirical Data Set for Validation: The PROVEDIt Initiative, ISHI oral abstract
2015
Systematic study on the analytical parameters relevant to achieve reliable STR profiles, as assessed in a multicentre data set, doi:10.1016/j.fsigss.2015.09.224
Genotyping and Interpretation of STR-DNA: Low-template,Mixtures and Database Matches - twenty years of research and development, doi:10.1016/j.fsigen.2015.03.014
Probabilistic Characterisation of Baseline Noise in STR Profiles, doi:10.1016/j.fsigen.2015.07.001
Exploring the Impacts of Ordinary Laboratory Alterations During Forensic DNA Processing on Peak Height Variation, Thresholds, and Probability of Dropout, doi:10.1111/1556-4029.12899
2014
Contribute to STR-validator
Contribution to the strvalidator R package or STR-validator community is more than welcome. Do not hesitate to contact the developer to:
Contribute with improvements or new functions to the strvalidator package.
Contribute with translations of course material, manuals or tutorials.
Collaborate to implement new functions.
Translations of the graphical user interface. Language support was added in version 2.3.0. A translation guide is available in the download section and here. Translations will be released with future updates of the strvalidator package. The name of translators will be listed in the main graphical user interface (STR-validator). Contact the developer for further instructions.
English (default) by Oskar Hansson
Italian translation provided by Massimiliano Stabile
Spanish translation provided by Lourdes Prieto
French translation in progress by Vanessa Duchamp
Code validation
The strvalidator package uses the 'testthat' package to write tests for important functions. It works by comparing the result of calculations with the expected hard-coded result, often calculated using e.g. a spread-sheet software. Tests are automatically run upon compilation of a new version of strvalidator. If a test fail the new version cannot be created. New functions must have tests written for them. If bugs are found tests are written to check that the bug is fixed. Currently the following functions are specifically tested:
addSize
calculateAllT (indirectly tests calculateT)
calculateConcordance
calculateDropout
calculateHb
calculateHeight
calculateLb
calculateMixture
calculateOL
calculateStatistics
calculateStutter
filterProfile
heightToPeak
trim
Known problems
Known problems and bugs with the current version 2.4.1 (for a complete list of current and past reported bugs and issues see GitHub (direct link).
Estimation of analytical thresholds (AT) - using any of the metacharacters in extended regular expressions: . \ | ( ) [ { ^ $ * + ?. may prevent masking of data using a reference dataset. Avoiding these characters in sample names will keep you out of trouble. For more information see this Facebook post by Alexander at the STR-validator community.
Bugs may be fixed in the current developer version on GitHub. In case you want to try out the latest development version it can be download by typing this into the command window:
devtools::install_github("oskarhansson/strvalidator")
You may need to install devtools to make it work:
install.packages("devtools", dependencies=TRUE)
Note that the development version is not stable and some changes may be reverted before the next stable version is released to CRAN.
User community and support
Discuss STR-validator in the Facebook group: https://www.facebook.com/groups/strvalidator/
Get news, tips, and other information at the Facebook page: https://www.facebook.com/STRvalidator
There will occasionally be sent out information by mail. Contact me if you wish to be added to the mailing list (e-mail is found the STR-validator CRAN page).
Report bugs
Please report bugs at GitHub (direct link). Remember to provide a reproducible example if possible.
Links
The source code is hosted at GitHub: https://github.com/OskarHansson/strvalidator
Please report bugs at GitHub (direct link). Remember to provide a reproducible example if possible.
Link to strvalidator on CRAN: https://cran.r-project.org/web/packages/strvalidator/index.html
For potentially better performance see: https://mran.microsoft.com/ and https://github.com/oracle/fastr
The online book Open Forensic Science in R where Chapter 2 Validation of DNA Interpretation Systems introduces strvalidator using command line functions.
YouTube channel: https://www.youtube.com/channel/UCs7TxzK21OKvWebQygxAHHA
Video Tutorials using STR-validator version 2.0.0
Estimation of analytical thresholds: https://youtu.be/8bTZbO2zrc4
Estimation of allele sizing precision: https://youtu.be/FTYX9ZRYg0g
Analysis of stutter ratios: https://youtu.be/N2qlc1YziTo
Analysis of balance: https://youtu.be/E1-KpvZ9CJY
Estimation of stochastic threshold: https://youtu.be/UMmRbX3q6wg
Video Tutorials using STR-validator version 1.2.0
This video show how to analyse heterozygous balance, inter-locus balance, stutter ratio, and sizing precision: http://youtu.be/aUDlDI744ZI
PhD material
STR-validator was developed as part of my PhD project. Download the thesis Development of computer software to characterise and simulate molecular biology processes used in forensic DNA profiling assays (ISBN 978-82-8377-319-4) or view the short popular scientific presentation and the trial lecture Epigenetics in forensics.
Work in progress
General flexible plotting function using plotly which will be better suitable for MPS data
Routine maintenance
Version history - main features [release date]
2.3.0 - Language support. General function for summary statistics [10.07.2020]
2.2.0 - Support for tcltk, which should work in restricted IT environments. [22.03.2019]
2.1.0 - Calculate ST for all models at once. Plot cumulative distributions of multiple groups. [25.08.2018]
2.0.0 - Migration to gWidgets2. New audit trail. Remember last used paths. [12.08.2017]
1.9.0 - Minor improvements and corrections. New function to add marker order to a dataset. [08.03.2017]
1.8.0 - Several functions rewritten for faster analysis. Calculation of profile proportion in height metrics, and many other improvements. [04.10.2016]
1.7.0 - New functions for drop-in analyses. Support for quality sensors. Numerous minor improvements. [05.07.2016]
1.6.0 - Automatic calculation of average peak height when analysing drop-out and balance. New functions for efficient profile balance, and marker ratio calculations. [19.01.2016]
1.5.2 - Calculation for AT6 corrected (use standard error of the regression instead of standard error of the intercept). [31.08.2015]
1.5.1 - (not on CRAN) Fixes some bugs in the AT analysis module. [27.06.2015]
1.5.0 - Estimate analytical thresholds. Improved import (autotrim, autoslim). [10.06.2015]
1.4.0 - Analyse pull-ups, generate EPG's, changes to kit file to handle multiple sex markers. [07.01.2015]
1.3.1 - Fixed window losing focus (hidden windows). Simpler installation. Bug fixes.
1.3.0 - Integrated project manager. New module for analysis of concordance and mixtures. Bug fixes. [13.08.2014]
1.2.0 - Added Fusion and GlobalFiler kit. New module for analysis of capillary balance. Bug fixes.
1.1.0 - Compatibility update (testthat 0.8) and multiple improvements.
1.0.0 - Major update with important bug fixes. New kit file structure and several new features.
0.3.0 - Option to save GUI state
0.2.0 - Graphical user interface (GUI)
0.1.0 - Initial release on CRAN
Publications
Hansson O, Gill P. Egeland T. STR-validator: An open source platform for validation and process control. Forensic Science International: Genetics. 2014;13:154-166.
Hansson O, Gill P. Free open source software for internal validation of forensic STR typing kits. Forensic Science International: Genetics Supplement Series. 2013;4(1):e300–e301.
Posters
"Free open source software for internal validation of forensic STR typing kits" presented at the 25th World Congress of the International Society for Forensic Genetics, 2 – 7 September 2013, Melbourne, Australia. [Poster_ISFG2013.pdf]
Downloads
Here is a selection of downloads (a complete list is available at the bottom of the web page and under the respective workshop sub pages).
Official manuals and tutorials
Language support instructions for STR-validator (written for STR-validator version 2.3.0): [language_support_in_strvalidator.pdf]
Installation instructions for STR-validator (written for STR-validator version 1.6.0): [strvalidator_installation.pdf]
The STR-validator manual (written for STR-validator version 1.3.0): [strvalidator_manual.pdf] (Italian translation provided by Massimiliano Stabile and Stella Eugenia Cirillo)
Tutorial covering the basics of the STR-validator package (written for STR-validator version 2.3.0): [strvalidator_tutorial.pdf] (Italian translation (written for STR-validator version 2.0.0): provided by Massimiliano Stabile and Stella Eugenia Cirillo)
Short instruction for how to estimate analytical thresholds (written for STR-validator version 1.7.0): [estimate_analytical_thresholds.pdf]
Exercises and presentations
Refer to the workshop pages for the most up-to-date material.
Other STR-validator material
Presentation, software download instructions, and files for exercises from the NIST workshop "Analysis of Internal Validation Datasets Using Open-Source Software STR-validator" by Sarah Riman at Forensics@NIST 2016 (Gaithersburg, MD, USA) [source: http://strbase.nist.gov/NISTpub.htm]
The online book Open Forensic Science in R where Chapter 2 Validation of DNA Interpretation Systems introduces strvalidator using command line functions.
Screenshots
STR-validator, the graphical user interface.
Kits marker range comparison.
ESX17 EPG generated from allele and peak height information.
Blocked data ranges in a positive control sample prior to estimation of AT.
Linear regression of serial dilutions to estimate AT.
Histogram of peak heights in PCR negative control samples.
Dropout modelling.
Dropout modelling.
Dropout event dotplot by locus.
Heatmap allele and locus dropout.
Empirical cumulative distribution of peak heights for single heterozygous alleles and homozygous peaks.
Heterozygous peak balance by mean peak height plotted by locus.
Heterozygous peak balance by allele repeat difference plotted by locus.
Inter locus balance by mean peak height plotted by locus.
Contaminations in negative extraction and PCR controls.
Result type analysis of low template samples.
Size precision analysis of allelic ladders by allele and locus.
Peak height distribution of drop-in contamination.
Fragment size distribution of drop-in contamination.