Useful links

The below list is merely from my scientific experience and perspective. Hope it helps!

Useful omics tools:

  • DolphinNext: A distributed data processing platform for high throughput genomics [link]
  • TAM 2.0: tool for MicroRNA set analysis [link]
  • DegPack: A web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples [link]
  • RNA-seq analysis using multiple algorithms [link]
  • DABEST: Data Analysis using Bootstrap-Coupled ESTimation [link]
  • Rain Plots - Visualizing results of multiple association analyses [link]
  • Comprehensive study of the exposome and omic data using rexposome Bioconductor packages [link]
  • Block Forests: random forests for blocks of clinical and omics covariate data [link]
  • Gene Information eXtension (GIX): effortless retrieval of gene product information on any website [link]
  • UCSCXenaTools: Download and Explore Datasets from UCSC Xena Data Hubs [link]
  • Filtering procedures for untargeted LC-MS metabolomics data [link]
  • GenPipes: an open-source framework for distributed and scalable genomic analyses [link]
  • The in silico human surfaceome [link]
  • SurfaceGenie: A web-based application for prioritizing cell-type specific marker candidates [link]
  • Mass Spectral Feature List Optimizer (MS-FLO): a tool to minimize false positive peak reports in untargeted LC-MS data processing [link]
  • BiocPkgTools: Toolkit for Mining the Bioconductor Package Ecosystem [link]
  • Next-generation characterization of the Cancer Cell Line Encyclopedia [link]
    • Multiple omics data of CCLE database.
  • EMBL-MCF LC-Orbitrap-MS/MS spectral library [link]
  • Metascape provides a biologist-oriented resource for the analysis of systems-level datasets [link]
  • CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification [link]
  • MetDNA (Metabolite identification and Dysregulated Network Analysis) for the large-scale and ambiguous identification of metabolites [link]
  • MASST: A Web-based Basic Mass Spectrometry Search Tool for Molecules to Search Public Data [link]
  • CSI:FingerID: Searching molecular structure databases with tandem mass spectra [link]
  • MetaboAnalystR 2.0: From Raw Spectra to Biological Insights [link]
  • CliqueMS: A computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network [link]
  • BioPortal, the world's most comprehensive repository of biomedical ontologies [link]
  • Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets [link]
  • Bioinformatics tools by Int'l Human Epigenome Consortium [link]
  • MultiQC -A modular tool to aggregate results from bioinformatics analyses across many samples into a single report [link]
    • An awesome toolkit to work with NGS data
  • LipidMaps for lipid annotation and more [link]
    • It is good to combine LipidMaps with LipidBlast for lipid annotation.
  • NormalizeMets - Data normalization and batch correction R package [link]
  • CluMSID: an R package for similarity-based clustering of tandem mass spectra to aid feature annotation in metabolomics [link]
  • METASPACE: A community-populated knowledge base of spatial metabolomes in health and disease [link]
    • Lots of data sets available
  • POMA: Statistical analysis tool for targeted metabolomic data [link]
    • This tool supports analyzing metabolomics data together with a covariate matrix containing age, gender, etc.
  • PiGx: Pipelines in Genomics [link]
  • Network-Based Integrative Analysis (for metabolomics data combined with related transcriptomics information) with MetaBridge [link]
  • statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data [link]
    • Provides a graphical user interface for quality control based signal drift correction (QC-RFSC), integration of data from multi-batch MS-based experiments, and the comprehensive statistical analysis in metabolomics and proteomics.
  • TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data [link] [link]
  • MetProc: Separating Measurement Artifacts from True Metabolites in an Untargeted Metabolomics Experiment [link]
  • Small Molecule Pathway Database [link]
  • The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins [link]
  • FELLA: an R package to enrich metabolomics data [link]
  • LIPEA: a package for Lipid Pathway Enrichment Analysis [link]
  • CEU Mass Mediator: a tool for searching metabolites in different databases (Kegg, HMDB, LipidMaps, Metlin, MINE and an in-house library) [link]
  • MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis [link]
  • Qiita: rapid, web-enabled microbiome meta-analysis [link]
  • REVIGO [link] and WEGO [link] for GO annotation and visualization.
  • BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud [link]
  • LipidPedia - An encyclopedia of lipids providing associated biomedical information [link]
  • MitoMiner 4.0 - A database of mammalian mitochondrial localisation evidence, phenotypes and diseases [link]
  • LipiDex: An Integrated Software Package for High-Confidence Lipid Identification [link]
  • MetaboList: Annotation of Metabolites from Liquid Chromatography-Mass Spectrometry Data [link]

Good data handling and analysis packages

  • Tidymodels - A tidy way to perform data handling, modeling and analysis [link]
  • car: Companion to Applied Regression [link]: I sometimes use this package to reassign the name of the columns in the data set
  • forcats for handling categorical variables, especially for visualization [link]. E.g., fct_rev(fct_infreq(x)) for "ordered from top to bottom, highest count to lowest"

Visualization tools:

  • BPG: Seamless, automated and interactive visualization of scientific data [link]
  • Biorender - professional visualization for life sciences [link]
    • Note: not free for publication!
  • From data to viz (with code) [link]
  • DIVE: Turn your data into stories without writing code [link]

Good books:

  • Learning Statistics with R [link]
  • Mathematics for Machine Learning [link]
  • PH525x series - Biomedical Data Science [link]
  • Computational Genomics With R [link]
  • Modern Statistics for Modern Biology [link]
  • ComplexHeatmap Complete Reference [link]
  • Bioinformatics related courses and tutorials [link]
  • Interpretable Machine Learning [link]
  • Seeing theory - to study probability and statistics [link]
  • Fundamentals of Data Visualization [link]
  • Mathematics for Machine Learning - only for beginners or practical users. [link]
  • Feature Engineering and Selection: A Practical Approach for Predictive Models [link]
  • Metabolic profiling series from Springer. Click here for the recent version

Good papers:

  • Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction [link]
  • Shorthand notation for lipid structures derived from mass spectrometry [link]
  • Use cases, best practice and reporting standards for metabolomics in regulatory toxicology [link]
  • Identification of metabolic vulnerabilities of receptor tyrosine kinases-driven cancer [link]
  • Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective [link]
  • False Discovery Rate Control in Cancer Biomarker Selection Using Knockoffs [link]
  • The Consortium of Metabolomics Studies (COMETS): Metabolomics in 47 Prospective Cohort Studies [link]
  • Comprehensive Integration of Single-Cell Data [link]
  • Empowering statistical methods for cellular and molecular biologists [link]
  • Collection of Untargeted Metabolomic Data for Mammalian Urine Applying HILIC and Reversed Phase Ultra Performance Liquid Chromatography Methods Coupled to a Q Exactive Mass Spectrometer [link]
  • Lipidomics biomarker studies: Errors, limitations, and the future [link]
  • A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling [link]
  • Identification of Double Bond Position Isomers in Unsaturated Lipids by m-CPBA Epoxidation and Mass Spectrometry Fragmentation [link]
  • The emerging role of ion mobility-mass spectrometry in lipidomics to facilitate lipid separation and identification [link]
  • Serum metabolic signatures of coronary and carotid atherosclerosis and subsequent cardiovascular disease [link]
  • A longitudinal big data approach for precision health [link]
  • A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action [link]
  • Fast Proteome Identification and Quantification from Data-Dependent Acquisition–Tandem Mass Spectrometry (DDA MS/MS) Using Free Software Tools [link]
  • Deep learning: new computational modelling techniques for genomics [link]
  • Drying Enhances Signal Intensities for Global GC–MS Metabolomics [link]
  • Reshaping Lipid Biochemistry by Pushing Barriers in Structural Lipidomics [link]
  • A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [link]
  • SIMPLEX: a combinatorial multimolecular omics approach for systems biology [link]
  • Expanding lipidomics coverage: effective ultra performance liquid chromatography-high resolution mass spectrometer methods for detection and quantitation of cardiolipin, phosphatidylglycerol, and lysyl-phosphatidylglycerol [link]
  • Tools and resources for metabolomics research community: A 2017–2018 update [link]
  • Discovering and linking public omics data sets using the Omics Discovery Index [link]
  • Systematic Error Removal using Random Forest (SERRF) for Normalizing Large-Scale Untargeted Lipidomics Data [link]
  • Review of recent developments in GC–MS approaches to metabolomics-based research [link]
  • Deciphering complex metabolite mixtures by un- and supervised substructure discovery and semi-automated annotation from MS/MS spectra [link]
  • Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap [link]
  • Structure annotation of all mass spectra in untargeted metabolomics [link]
  • Peak annotation and verification engine (PAVE) for untargeted LC-MS metabolomics [link]
  • From RNA-seq reads to differential expression results [link]
  • Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data [link]
  • From mass to metabolite in human untargeted metabolomics: recent advances in annotation of metabolites applying liquid chromatography-mass spectrometry data [link]
  • Bad practices in evaluation methodology relevant to class-imbalanced problems [link]

Good courses:

  • Practical bioinformatics courses from prof. Wong (NUS) [link]
  • Lipidomics tutorials from LIPIDMAPS [link]
  • Basic courses of metabolomics [link]
  • Hands-on Machine Learning with R [link]
  • Khan academy [link]
  • DataCamp [link]
  • MIT Computational biology [link]

Ubuntu tips

  • Install latest R in Ubuntu [link]
    • For configurations (tested 2019 March 14, Ubuntu 18.04.2):
      • sudo apt-get install libxml2-dev
      • sudo apt-get install libcurl4-openssl-dev
      • sudo apt-get install libssl-dev
      • Finally install packages in R (depending on your bioinformatics tools of interest):
        • curl, XML, RCurl, xml2, openssl, base64 (CRAN)
        • GenomeInfoDb, httr, illuminaio, GenomicRanges, rvest, Rsamtools, annotate, biomaRt, SummarizedExperiment, GenomicAlignments, geneplotter, AnnotationHub, genefilter, ExperimentHub, ShortRead, DESeq, rtracklayer, sva, sesameData, GenomicFeatures, EDASeq, sesame (Bioconductor)

Others:

  • Academic: the website builder for Hugo [link]
  • Scripts for "Current best-practices in single-cell RNA-seq: a tutorial" [link]
  • The collection of mitochondrial metabolism in cancer [link]
  • Bioinformatics resources [link]
  • Sample preparation and metabolite extraction sample protocols [link]
  • How to write a peer-review (tips from PLOS) [link]
  • Beautiful.ai - Beautiful presentation [link]
  • Lipidomics Standards Initiatives [link]
  • BBMRI-NL omics atlas [link]
  • Awesome Multiomics tool collection github page [link]
  • dfcrm: Dose-Finding by the Continual Reassessment Method in R [link]
  • Tabula - Get tables from pdf files [link]
  • N. Ireland Cancer Registry [link]
    • From 1993 to 2017 (updated March, 2019)
  • Single cell omics methods collection and more [link]
  • OmicsDI - database for omics data [link]
  • Cancer Clinical Proteomics Research [link]
  • Mendeley data [link]
  • Sources for validated LC-MS conditions [link]
  • Loni - available data in neuroscience [link]
  • pdfconvert: PDF - TIFF and vice versa [link]
  • Papers with code [link]
  • RNAseq analysis in R [link]
  • Common transition words and phrases [link]
  • Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data [link]
  • Writing for a Nature journal [link]
  • The Conversation - evidence-based news [link]