CoNet is an ensemble network analysis method that combines several similarity and dissimilarity methods into a single tool.
Network analysis is commonly used in microbiome studies to identify keystone species and identify clusters of co-occurring or co-exclusionary species.
A recent paper by Shah et al., 2018 explains the default behaviour in BLAST using the --max-target-seq setting.
Non destructive COI metabarcoding refers to using a sample preparation method where the individual or bulk community is not homogenized prior to DNA-extraction or direct-PCR.
A new preprint discussing the status of COI records in GenBank shows growth over time, the onset of an increasing proportion of insufficiently identified records, and uneven levels of metadata annotation.
How to deal with variable sequence library sizes boils down to which objective is being addressed: 1) normalization prior to alpha/beta diversity analyses, 2) normalization prior to differential abundance analysis (DAA).
Bioinformatic processing of high throughput sequencing data uses batches of large files as input and creates large batches of out files at nearly every processing step--that can quickly consume lots of disk space!
So we've all been using operational taxonomic units (OTUs) since the 2000's but now everyone is talking about exact sequence variants (ESVs).
Renaming batches of poorly named files can be facilitated using command-line tools but first we need to clear up confusion between the linux rename utility and Perl rename script.
A comparison of the popular top BLAST hit and RDP classifier methods.
You don't want to work with abundance data though so let's convert it quickly into a presence-absence matrix.
When .bcl files were generated with Illumina RTA < 1.18.54, the older bcl2fastq v1.8.4 needs to be used to convert base calls to fastq files. Since Illumina does not provide a package that can be installed by apt-get on Ubuntu, I've compiled the steps I had to take to get this older software up and running.
The free to use community edition of MEGAN6 is a tool that can be used to parse through BLAST output, setting varying stringency criteria to help sort out good from not-so-good taxonomic assignments and is a nice alternative to the widely used top BLAST hit taxonomic assignment approach.
BioPerl offers modules to easily allow taxonomic information to be mined from GenBank, but what happens when taxids are updated/deleted/merged by GenBank staff?
With the increasing use of next-generation sequencing, comes the problem of how to efficiently process massive fasta files for BLAST searches.
Just as the strategy used for single template PCRs can be optimized to account for problems such as GC-content or the presence of PCR inhibitors, so too can mixed template PCRs be optimized to account for problems like the generation of PCR artefacts.
This perl one-liner will convert a fastq file into a fasta file.
There are many resources available, but they are scattered across the internet. Here is a list of useful resources related to the VEGAN and ECODIST packages in [R].
There does not appear to be a built-in function in VEGAN for creating scree plots. Since scree plots are useful for choosing how many dimensions should be used with the metaMDS function, reference to a sample function is provided.
Using the nmds function in the ECODIST package, when evaluating more than two dimensions, all possible pair-wise combinations of dimensions are shown by default when using the plot function. In this example, only two specific dimensions are chosen for plotting.
Normally symbols and text are both used in a plot legend. In this example, some legend entries are simply colour coded without showing a corresponding symbol.