Annotation Pipeline Change Log

Pipeline v.5.1 (10/03/2022)


Reference Database and Version Information

Pipeline Code Changes

Slight changes were made in each subversion to fix bugs, add additional error checking, or make minor improvements.

[5.1.11] - 2022-10-03

Fixed

[5.1.10] - 2022-09-28

Fixed

[5.1.9] - 2022-08-23

Fixed

[5.1.8] - 2022-06-16

Changed

[5.1.7] - 2022-06-01

Added

[5.1.6] - 2022-05-26

Changed

[5.1.5] - 2022-03-12

Changed

[5.1.4] - 2022-02-25

Changed

[5.1.3] - 2022-02-12

Changed

[5.1.2] - 2022-02-10

Changed

[5.1.1] - 2022-01-04

Changed

[5.1.0] - 2021-12-02

Changed

Pipeline v.5.0

Starting from Pipeline v.5.0, isolate genomes and metagenomes are processed using the same code base. Moreover, additional functions (Cath-Funfam, SuperFamily and SMART) have been added.


Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8269246/


Highlight of IMG Annotation Pipeline v.5.0: https://img.jgi.doe.gov/docs/pipelineV5/


Reference Database and Version Information

Pipeline Code Changes

Slight changes were made in each subversion to fix bugs, add additional error checking, or make minor improvements.


## [5.0.25] - 2021-09-10

### Changed

- If Prodigal and GeneMark predict the same gene, but it gets shortened in two different places, only the shortened GeneMark gene is kept.


## [5.0.24] - 2021-08-20

### Changed

- The filtering script for the LAST results now makes an additional run over the output to check which subjects are actually needed in the MD5 lookup.


## [5.0.23] - 2021-02-02

### Added

- The GFF stats script also creates an additional output file in JSON format now.


## [5.0.22] - 2020-01-21

### Fixed

- Fixed bug that caused the Pfam e-value in the *_pfam.gff to always get set to 13.


### Added

- The cmsearch command for the Rfam step can now also contain a -Z argument that gets read from the annotation config.


## [5.0.21] - 2020-12-11

### Changed

- Pulling the /dev/null redirect out of the hmmsearch command variables (since it causes JAWS to fail).


## [5.0.20] - 2020-10-22

### Changed

- Set the default max overlap ratio back 0.1 for cath-funfam, cog, smart and superfamily.

- Making sure that new logs don't get deleted when removing files from a potentially failed previous run.


## [5.0.19] - 2020-07-28

### Added

- Added print out of non-IUPAC characters if any get detected in the pre-QC step.


## [5.0.18] - 2020-07-17

### Added

- Added number of genes per 1M bp check to the post-QC step.


## [5.0.17] - 2020-07-09

### Changed

- Increased the default minimum contig length from 150 to 200 bp.


## [5.0.16] - 2020-07-06

### Added

- Added check to make sure that the contig sequences only contain IUPAC letters before replacing all non-ACGTN cha

racters with Ns.


## [5.0.15] - 2020-04-03

### Changed

- The poly-N stretch length indicating a gap of unknown length can now be set via the config file. A 0 turns this

feature off.


## [5.0.14] - 2020-03-17

### Added

- Commands to remove tmp and results files at the beginning of every step that uses GNU parallel (in case there wa

s a previous run that got killed or failed due to non-pipeline related reasons).


## [5.0.13] - 2020-03-06

### Changed

- The abortion of the pipeline processing at the encounter of poly-N stretches of length 100 can now be turned on

and off via the config file.


## [5.0.12] - 2020-02-28

### Added

- The pre-QC step now removes leading/trailing Ns from the contigs' ends and checks for N-stretches of exactly 100 bp, which stand for gaps of unknown length. If such a gap exists the pipeline aborts processing and produces a GA

PS_OF_UNKNOWN_LENGTH.txt file that lists the contig names and start positions of these gaps of unknown length.


## [5.0.11] - 2020-01-27

### Changed

- The pre-QC step now creates a PRE_QC_FAILURE file in the input file directory if any of the QC rules got offended. The file lists the detailed reason.


## [5.0.10] - 2020-01-15

### Changed

- For metagenomes the tRNA prediction now also uses B and A models instead of the general one.


## [5.0.9] - 2019-11-28

### Changed

- Each hmmsearch module now executes multiple instances of hmmsearch in parallel. The number of parallel hmmsearch instances is now an entry in the yaml config file.

- The -Z argument was added to the command hmmsearch command lines. The value for the -Z argument is now also an entry in the config file.


## [5.0.8] - 2019-11-27

### Changed

- The minimum contig length used by the pre-qc scripts does now get set via the annotation config yaml.


## [5.0.7] - 2019-11-11

### Added

- Additional Warning line handling in tRNAscan parser.


## [5.0.6] - 2019-09-25

### Removed

- The locus_tag attribute from all GFF files.


## [5.0.5] - 2019-06-22

### Added

- The pre-QC and GFF and Fasta stats steps can now get turned on and off via the annotation config file.


## [5.0.4] - 2019-06-08

### Changed

- Prodigal: If an isolate has less than 20000 bp it will get processed in meta mode.


### Fixed

- Fixed some typos.


## [5.0.3] - 2019-06-02

### Added

- Tracking of which module got started last (for in-house purposes, to catch datasets that time out on a specific module).


### Fixed

- Fixed bug that in some cases caused duplicate genes, when one gene got shortened.


## [5.0.2] - 2019-05-24

### Fixed

- Fixed bug that in some cases caused SignalP or TMHMM results not to be present at the product name assignment step.


## [5.0.1] - 2019-04-04

### Added

- Added checkpointing information for every step of the pipeline, so that previously finished modules don't need to get executed again, when a job gets resubmitted.


## [5.0.0] - 2019-03-05

### Release

- New IMG Annotation Pipeline version that unifies isolate and metagenome processing.

Pipeline v.4

Previously IMG used slightly different pipelines to process isolate genomes and metagenomes.


Publications: