Useful Links for Omics Science

  • The below list is merely from my scientific experience and perspective. Hope it helps!

Useful omics tools/libraries:

  • BatchQC: interactive software for evaluating sample and batch effects in genomic data [link]

    • Combat using sva package [link]

  • UpSetR: an R package for the visualization of intersecting sets and their properties [link]

  • MolDiscovery: Learning Mass Spectrometry Fragmentation of Small Molecules [link] [tool link]

  • A large-scale genome–lipid association map guides lipid identification [link]

    • LipidGenie

  • MetaDBparse: Annotate Mass over Charge Values with Databases and Formula Prediction [link]

  • MRMkit: automated data processing for large-scale targeted metabolomics analysis [link]

  • METLIN MS2 molecular standards database: a broad chemical and biological resource [link]

  • Bioinformatics recipes: creating, executing and distributing reproducible data analysis workflows [link]

  • FastBMD: an online tool for rapid benchmark dose-response analysis of transcriptomics data [link]

  • BioContainers: An open-source and community-driven framework for software standardization [link]

  • Metabolite AutoPlotter - an application to process and visualise metabolite data in the web browser [link]

  • agilp - Agilent expression array processing package [link]

  • The Minimum Information about a Molecular Interaction Causal Statement (MI2CAST) [link]

  • xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data [link]

  • UFO - a tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization [link]

  • PCAtools: everything Principal Component Analysis [link]

  • BBCancer: an expression atlas of blood-based biomarkers in the early diagnosis of cancers [link]

  • SMfinder - suitable for untargeted and targeted metabolomics, both with label free samples and carbon labelled compounds [link]

  • Visualizing and interpreting cancer genomics data via the Xena platform [link]

  • Circular heatmap (tutorial) with circlize package [link]

  • A comprehensive overview of oncogenic pathways in human cancer [paper link] [link]

  • ConsensusPathDB-human - binary and complex protein-protein, genetic, metabolic, signaling, gene regulatory and drug-target interactions, as well as biochemical pathways. (32 sources) [link]

  • Clinical Knowledge Graph Integrates Proteomics Data into Clinical Decision-Making [link]

  • MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics [link]

  • Parsimonious Gene Correlation Network Analysis (PGCNA): a tool to define modular gene co-expression for refined molecular stratification in cancer [link]

  • LipidCreator workbench to probe the lipidomic landscape [link] [github]

    • FlipR - optimal collision energy selection for molecules based on MS2 data [link]

  • A deep learning architecture for metabolic pathway prediction [link]

  • MatchMixeR: a cross-platform normalization method for gene expression data integration [link]

  • GlyMDB: Glycan Microarray Database and analysis toolset [link]

  • MetaboMAPS - Pathway sharing and multi-omics data visualization in metabolic context [link]

  • ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis [link] [bioconductor link]

  • SIAMCAT R package - statistical and machine learning analyses for case-control microbiome datasets [link]

  • lipidr: a software tool for data mining and analysis of lipidomics datasets [link] [link]

  • CROP: Correlation-based reduction of feature multiplicities in untargeted metabolomics data [link] [link]

  • MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra [link]

  • Computational Methods and Data Analysis for Metabolomics (book) [link]

  • RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes [link]

  • Algorithmic Learning for Auto-deconvolution of GC-MS Data to Enable Molecular Networking within GNPS [link]

  • AltAnalyze - extremely user-friendly and open-source analysis toolkit that can be used for a broad range of genomics analyses [link]

  • multiSLIDE: a query-driven heatmap visualization tool for multi-omics data [link]

  • Metascape. A biologist-friendly resource for big data analysis [link]

  • LAS: A Lipid Annotation Service Capable of Explaining the Annotations It Generates [link]

  • CancerGeneNet: linking driver genes to cancer hallmarks [link]

  • AutoTuner: High fidelity, robust, and rapid parameter selection for metabolomics data processing [link]

  • Feature-based Molecular Networking in the GNPS Analysis Environment [link]

  • Nonparametric causal effects based on incremental propensity score interventions [link]

  • XCMS Parameter Optimization with IPO [link]

    • Most updated XCMS tutorial [link]

    • Autonomous METLIN-guided in-source fragment detection increases annotation confidence in untargeted metabolomics [link]

    • An alternative package: AutoTuner [link]. It can be used for mzmine as well.

  • pClean: An Algorithm To Preprocess High-Resolution Tandem Mass Spectra for Database Searching (peptide/protein search) [link]

  • Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis [link]

  • The metaRbolomics Toolbox in Bioconductor and beyond [link]

    • Two hundred metabolomics specific packages (CRAN, Bioconductor and GitHub)

  • UniBind - a comprehensive map of direct transcription factor (TF) – DNA interactions in the human genome [link]

  • R-MetaboList 2: A Flexible Tool for Metabolite Annotation from High-Resolution Data-Independent Acquisition Mass Spectrometry Analysis [link]

  • PathwayMatcher: proteoform-centric network construction enables fine-granularity multiomics pathway mapping [link]

  • hypeR: An R Package for Geneset Enrichment Workflows [link] [tutorial]

  • Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing [link] [tutorials]

  • OmicsX: a web server for integrated OMICS analysis [link]

  • VIST - a Variant-Information Search Tool for precision oncology [link]

  • MIXTURE: an improved algorithm for immune tumor microenvironment estimation based on gene expression data [link]

  • Human MitoCarta2.0: 1158 mitochondrial genes [link]

  • The nPYc-Toolbox, a Python module for the pre-processing, quality-control, and analysis of metabolic profiling datasets [link]

  • GSMA: an approach to identify robust global and test Gene Signatures using Meta-Analysis [link]

  • Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data [link]

  • A table of tools for the analysis of single-cell RNA-seq data [link]

    • Very nice to catch the development of this "hot" field

  • DolphinNext: A distributed data processing platform for high throughput genomics [link]

  • TAM 2.0: tool for MicroRNA set analysis [link]

  • DegPack: A web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples [link]

    • Note: Not updated.

  • RNA-seq analysis using multiple algorithms [link]

  • DABEST: Data Analysis using Bootstrap-Coupled ESTimation [link]

  • Rain Plots - Visualizing results of multiple association analyses [link]

  • Comprehensive study of the exposome and omic data using rexposome Bioconductor packages [link]

  • Block Forests: random forests for blocks of clinical and omics covariate data [link]

  • Gene Information eXtension (GIX): effortless retrieval of gene product information on any website [link]

  • Sources for validated LC-MS conditions [link]

  • UCSCXenaTools: Download and Explore Datasets from UCSC Xena Data Hubs [link]

  • Filtering procedures for untargeted LC-MS metabolomics data [link]

  • GenPipes: an open-source framework for distributed and scalable genomic analyses [link]

  • The in silico human surfaceome [link]

  • SurfaceGenie: A web-based application for prioritizing cell-type specific marker candidates [link]

  • Mass Spectral Feature List Optimizer (MS-FLO): a tool to minimize false positive peak reports in untargeted LC-MS data processing [link]

  • BiocPkgTools: Toolkit for Mining the Bioconductor Package Ecosystem [link]

  • Next-generation characterization of the Cancer Cell Line Encyclopedia [link]

    • Multiple omics data of CCLE database.

  • EMBL-MCF LC-Orbitrap-MS/MS spectral library [link]

  • Metascape provides a biologist-oriented resource for the analysis of systems-level datasets [link]

  • CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification [link]

  • MetDNA (Metabolite identification and Dysregulated Network Analysis) for the large-scale and ambiguous identification of metabolites [link]

  • MASST: A Web-based Basic Mass Spectrometry Search Tool for Molecules to Search Public Data [link]

  • CSI:FingerID: Searching molecular structure databases with tandem mass spectra [link]

  • MetaboAnalystR 2.0: From Raw Spectra to Biological Insights [link]

  • CliqueMS: A computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network [link]

  • BioPortal, the world's most comprehensive repository of biomedical ontologies [link]

  • Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets [link]

  • Bioinformatics tools by Int'l Human Epigenome Consortium [link]

  • MultiQC -A modular tool to aggregate results from bioinformatics analyses across many samples into a single report [link]

    • An awesome toolkit to work with NGS data

  • LipidMaps for lipid annotation and more [link]

    • It is good to combine LipidMaps with LipidBlast for lipid annotation.

  • NormalizeMets - Data normalization and batch correction R package [link]

  • CluMSID: an R package for similarity-based clustering of tandem mass spectra to aid feature annotation in metabolomics [link]

  • METASPACE: A community-populated knowledge base of spatial metabolomes in health and disease [link]

    • Lots of data sets available

  • POMA: Statistical analysis tool for targeted metabolomic data [link]

    • This tool supports analyzing metabolomics data together with a covariate matrix containing age, gender, etc.

  • PiGx: Pipelines in Genomics [link]

  • Network-Based Integrative Analysis (for metabolomics data combined with related transcriptomics information) with MetaBridge [link]

  • statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data [link]

    • Provides a graphical user interface for quality control based signal drift correction (QC-RFSC), integration of data from multi-batch MS-based experiments, and the comprehensive statistical analysis in metabolomics and proteomics.

  • TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data [link] [link]

  • MetProc: Separating Measurement Artifacts from True Metabolites in an Untargeted Metabolomics Experiment [link]

  • Small Molecule Pathway Database [link]

  • The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins [link]

  • FELLA: an R package to enrich metabolomics data [link]

  • LIPEA: a package for Lipid Pathway Enrichment Analysis [link]

  • CEU Mass Mediator: a tool for searching metabolites in different databases (Kegg, HMDB, LipidMaps, Metlin, MINE and an in-house library) [link]

  • MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis [link]

  • Qiita: rapid, web-enabled microbiome meta-analysis [link]

  • REVIGO [link] and WEGO [link] for GO annotation and visualization.

  • PROGgeneV2: enhancements on the existing database [link]

  • BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud [link]

  • LipidPedia - An encyclopedia of lipids providing associated biomedical information [link]

  • MitoMiner 4.0 - A database of mammalian mitochondrial localisation evidence, phenotypes and diseases [link]

  • LipiDex: An Integrated Software Package for High-Confidence Lipid Identification [link]

  • MetaboList: Annotation of Metabolites from Liquid Chromatography-Mass Spectrometry Data [link]

  • METABOSEARCH - metabolite identification in mass-spectrometry-based metabolomic analysis [link]

Good data handling and analysis packages

  • Tidymodels - A tidy way to perform data handling, modeling and analysis [link]

  • car: Companion to Applied Regression [link]: I sometimes use this package to reassign the name of the columns in the data set

  • forcats for handling categorical variables, especially for visualization [link]. E.g., fct_rev(fct_infreq(x)) for "ordered from top to bottom, highest count to lowest"

Visualization tools:

  • d3 visualization [link]

  • Biocyc.org [link]

  • ExPanDaR - a toolbox for interactive exploratory data analysis [link]

  • Exploring a world of a thousand dimensions - PHATE enhances the visualization of high-dimensional data [link]

  • ggVennDiagram [link]

  • A beautiful PCA visualization implementation in R [link]

  • Dealing with colors in ggplot2 [link]

    • Very handy to make a more attractive figure.

  • Pathway drawing tools/sites

    • Wikipathways [link]

    • PathVisio [link]

      • PathVisio 3: An Extendable Pathway Analysis Toolbox [link]

      • Multi-omics visualization [link]

    • Cell Illustrator [link]

    • MetExplore [link]

  • BPG: Seamless, automated and interactive visualization of scientific data [link]

  • Biorender - professional visualization for life sciences [link]

    • Note: not free for publication!

  • From data to viz (with code) [link]

  • DIVE: Turn your data into stories without writing code [link]

Good books/learning materials:

  • LC principles

    • Basic HPLC Theory and Definitions: Retention, Thermodynamics, Selectivity, Zone Spreading, Kinetics, and Resolution [link]

    • Basic LC Method Development and Optimization [link]

    • Hydrophilic Interaction Liquid Chromatography [link]

    • Solvents in Chromatography and Electrophoresis [link]

  • Data Analysis for the Life Sciences with R and Genomics Data Analysis [link]

  • An automatically and constantly up-to-date collection of the best ML resources by topic [link]

  • Intermediate Machine Learning [link]

    • Learning materials and exercises

  • Interactive web-based data visualization with R, plotly, and shiny [link]

  • Unix, R and python tools for genomics and data science [link]

  • Teaching resources for data science [link]

  • Hands-on Machine Learning with R [link]

  • Data Visualisation with R - 111 Examples [link]

  • Microarray related materials

    • Affymetrix complete workflow [link]

  • RNA-seq related materials

    • RNA-seq workflow: gene-level exploratory analysis and differential expression [link]

    • RNA‐seq: Basic Bioinformatics Analysis [link]

    • Informatics for RNA-seq: A web resource for analysis on the cloud [link]

    • RNA-seq resources, tools, and databases [link]

    • Mapping RNA‐seq Reads with STAR [link]

    • Analysis of single cell RNA-seq data [link]

    • An RNA-Seq Protocol for Differential Expression Analysis [link]

  • Causal Inference [link]

  • Biostatistics for Biomedical Science [link]

  • Advanced R [link]

    • Advanced R solutions [link]

  • Collection of tools for Visual Exploration, Explanation and Debugging of Predictive Models [link]

  • Learning Statistics with R [link]

  • Mathematics for Machine Learning [link]

  • Encyclopedia of Systems Biology [link]

  • PH525x series - Biomedical Data Science [link]

  • Computational Genomics With R [link]

  • Modern Statistics for Modern Biology [link]

  • ComplexHeatmap Complete Reference [link]

  • Bioinformatics related courses and tutorials [link]

  • Interpretable Machine Learning [link]

  • Explainable AI: Interpreting, Explaining and Visualizing Deep Learning [link]

  • Seeing theory - to study probability and statistics [link]

  • Fundamentals of Data Visualization [link]

  • Mathematics for Machine Learning - only for beginners or practical users. [link]

  • Feature Engineering and Selection: A Practical Approach for Predictive Models [link]

  • Metabolic profiling series from Springer. Click here for the recent version

Good papers:

  • An atlas of human metabolism [link]

  • Probabilistic fine-mapping of transcriptome-wide association studies [link]

  • Prediction of sepsis mortality using metabolite biomarkers in the blood: a meta-analysis of death-related pathways and prospective validation [link]

  • Dissemination and analysis of the quality assurance (QA) and quality control (QC) practices of LC–MS based untargeted metabolomics practitioners [link]

  • In vivo mRNA display enables large-scale proteomics by next generation sequencing [link]

  • Deep learning meets metabolomics: a methodological perspective [link]

  • Assessment of human plasma and urine sample preparation for reproducible and high-throughput UHPLC-MS clinical metabolic phenotyping [link]

  • mRNAs, proteins and the emerging principles of gene expression control [link]

  • Integrative Proteomic Characterization of Human Lung Adenocarcinoma [link]

  • Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma [link]

  • Chemical Derivatization in LC-MS Based Metabolomics Study [link]

  • Steroid metabolomics: machine learning and multidimensional diagnostics for adrenal cortical tumors, hyperplasias, and related disorders [link]

  • Harmonization of quality metrics and power calculation in multi-omic studies [link]

  • Using Reactome to build an autophagy mechanism knowledgebase [link]

  • Challenges and emergent solutions for LC‐MS/MS based untargeted metabolomics in diseases [link]

  • Methods and tools for RNA-seq-based co-expression network analysis [link]

  • Lipidomics for studying metabolism [link]

  • Validation of lipidomic analysis of human plasma and serum by supercritical fluid chromatography–mass spectrometry and hydrophilic interaction liquid chromatography–mass spectrometry [link]

  • Chemical Discovery in the Era of Metabolomics [link]

  • Executable cancer models: successes and challenges [link]

  • Celebrating Discoveries in Wnt Signaling: How One Man Gave Wings to an Entire Field [link]

  • Systems metabolomics: from metabolomic snapshots to design principles [link]

  • Implementation of liquid chromatography–high resolution mass spectrometry methods for untargeted metabolomic analyses of biological samples: A tutorial [link]

  • Comprehensive Metabolomic Search for Biomarkers to Differentiate Early Stage Hepatocellular Carcinoma from Cirrhosis [link]

  • Atoms to Phenotypes: Molecular Design Principles of Cellular Energy Metabolism [link]

  • From mass to metabolite in human untargeted metabolomics: Recent advances in annotation of metabolites applying liquid chromatography-mass spectrometry data [link]

  • Key challenges facing data-driven multicellular systems biology [link]

  • Analytical challenges of shotgun lipidomics at different resolution of measurements [link]

  • A systems approach to clinical oncology uses deep phenotyping to deliver personalized care [link]

  • Caenorhabditis elegans As a Promising Alternative Model for Environmental Chemical Mixture Effect Assessment—A Comparative Study [link]

  • Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view [link]

  • Variable selection and validation in multivariate modelling [link]

  • Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling [link]

  • Quantitative analysis of positional isomers of triacylglycerols via electrospray ionization tandem mass spectrometry of sodiated adducts [link]

  • Proteomics of Melanoma Response to Immunotherapy Reveals Mitochondrial Dependence [link]

  • Integrated Proteomics Sample Preparation and Fractionation: Method Development and Applications [link]

  • Lipidomes in health and disease: analytical strategies and considerations [link]

  • Deep Neural Networks for Classification of LC-MS Spectral Peaks [link]

  • Using deep learning to evaluate peaks in chromatographic data [link]

  • Data integration and predictive modeling methods for multi-omics datasets [link]

  • A Unified Conceptual Framework for Metabolic Phenotyping in Diagnosis and Prognosis [link]

  • Plasma metabolites associated with colorectal cancer stage: findings from an international consortium [link]

  • How close are we to complete annotation of metabolomes? [link]

  • A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [link]

  • Subcellular metabolic pathway kinetics are revealed by correcting for artifactual post harvest metabolism [link]

  • Perspectives on Data Analysis in Metabolomics: Points of Agreement and Disagreement from the 2018 ASMS Fall Workshop [link]

  • Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology [link]

  • The Missing Pieces of Artificial Intelligence in Medicine [link]

  • A large-scale analysis of targeted metabolomics data from heterogeneous biological samples provides insights into metabolite dynamics [link]

    • The supplementary file contains the dMRM conditions and CAS numbers for apx. 400 metabolites.

  • Expanding lipidomics coverage: effective ultra performance liquid chromatography-high resolution mass spectrometer methods for detection and quantitation of cardiolipin, phosphatidylglycerol, and lysyl-phosphatidylglycerol [link]

  • Proteogenomic landscape of squamous cell lung cancer [link]

  • Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection [link]

  • Sample normalization methods in quantitative metabolomics [link]

  • Comorbid chronic diseases and cancer diagnosis: disease-specific effects and underlying mechanisms [link]

  • Lipidomics needs more standardization [link]

  • Accurate mass and retention time library of serum lipids for type 1 diabetes research [link]

  • A gas chromatography–mass spectrometry-based whole-cell screening assay for target identification in distal cholesterol biosynthesis [link]

  • Statistical Workflow for Feature Selection in Human Metabolomics Data [link]

  • Immunometabolism in cancer at a glance [link]

  • Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction [link]

  • Shorthand notation for lipid structures derived from mass spectrometry [link]

  • Use cases, best practice and reporting standards for metabolomics in regulatory toxicology [link]

  • Identification of metabolic vulnerabilities of receptor tyrosine kinases-driven cancer [link]

  • Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective [link]

  • False Discovery Rate Control in Cancer Biomarker Selection Using Knockoffs [link]

  • The Consortium of Metabolomics Studies (COMETS): Metabolomics in 47 Prospective Cohort Studies [link]

  • Comprehensive Integration of Single-Cell Data [link]

  • Empowering statistical methods for cellular and molecular biologists [link]

  • Collection of Untargeted Metabolomic Data for Mammalian Urine Applying HILIC and Reversed Phase Ultra Performance Liquid Chromatography Methods Coupled to a Q Exactive Mass Spectrometer [link]

  • Increasing lipidomic coverage by selecting optimal mobile-phase modifiers in LC–MS of blood plasma [link]

  • Lipidomics biomarker studies: Errors, limitations, and the future [link]

  • A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling [link]

  • Identification of Double Bond Position Isomers in Unsaturated Lipids by m-CPBA Epoxidation and Mass Spectrometry Fragmentation [link]

  • The emerging role of ion mobility-mass spectrometry in lipidomics to facilitate lipid separation and identification [link]

  • Serum metabolic signatures of coronary and carotid atherosclerosis and subsequent cardiovascular disease [link]

  • A longitudinal big data approach for precision health [link]

  • A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action [link]

  • Fast Proteome Identification and Quantification from Data-Dependent Acquisition–Tandem Mass Spectrometry (DDA MS/MS) Using Free Software Tools [link]

  • Deep learning: new computational modelling techniques for genomics [link]

  • Drying Enhances Signal Intensities for Global GC–MS Metabolomics [link]

  • Reshaping Lipid Biochemistry by Pushing Barriers in Structural Lipidomics [link]

  • A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [link]

  • SIMPLEX: a combinatorial multimolecular omics approach for systems biology [link]

  • Tools and resources for metabolomics research community: A 2017–2018 update [link]

  • Discovering and linking public omics data sets using the Omics Discovery Index [link]

  • Systematic Error Removal using Random Forest (SERRF) for Normalizing Large-Scale Untargeted Lipidomics Data [link]

  • Review of recent developments in GC–MS approaches to metabolomics-based research [link]

  • Deciphering complex metabolite mixtures by un- and supervised substructure discovery and semi-automated annotation from MS/MS spectra [link]

  • Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap [link]

  • Structure annotation of all mass spectra in untargeted metabolomics [link]

  • Peak annotation and verification engine (PAVE) for untargeted LC-MS metabolomics [link]

  • From RNA-seq reads to differential expression results [link]

  • Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data [link]

  • From mass to metabolite in human untargeted metabolomics: recent advances in annotation of metabolites applying liquid chromatography-mass spectrometry data [link]

  • Bad practices in evaluation methodology relevant to class-imbalanced problems [link]

Good courses:

  • Practical bioinformatics courses from prof. Wong (NUS) [link]

  • Lipidomics tutorials from LIPIDMAPS [link]

  • Basic courses of metabolomics [link]

  • Hands-on Machine Learning with R [link]

  • Khan academy [link]

  • DataCamp [link]

  • MIT Computational biology [link]

Ubuntu tips

  • Install latest R in Ubuntu [link]

    • For configurations (tested 2019 March 14, Ubuntu 18.04.2):

      • sudo apt-get install libxml2-dev

      • sudo apt-get install libcurl4-openssl-dev

      • sudo apt-get install libssl-dev

      • Finally install packages in R (depending on your bioinformatics tools of interest):

        • curl, XML, RCurl, xml2, openssl, base64 (CRAN)

        • GenomeInfoDb, httr, illuminaio, GenomicRanges, rvest, Rsamtools, annotate, biomaRt, SummarizedExperiment, GenomicAlignments, geneplotter, AnnotationHub, genefilter, ExperimentHub, ShortRead, DESeq, rtracklayer, sva, sesameData, GenomicFeatures, EDASeq, sesame (Bioconductor)

Database/Good data sets

  • High-Resolution mRNA and Secretome Atlas of Human Enteroendocrine Cells [link]

  • MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature [link]

  • The Ivy Glioblastoma Atlas Project [link]

  • Baltimore Longitudinal Study of Aging (BLSA) [link]

  • Biofluid-based metabolomics data sets [link] [github]

  • Metabolomics of Coronary Heart Disease (CHD) in the WHI [link]

  • BBMRI metabolomics Consortium [link]

    • Dutch biobanks

  • Japanese version of cancer genome atlas, JCGA, analyzed by fresh frozen tumors obtained from 5143 cancer patients [link]

  • The Observational Health Data Sciences and Informatics [link]

  • GENIE (Genomics Evidence Neoplasia Information Exchange) [link1] [link2] [GDC]

  • Multi-Omics Data Sharing [link]

  • Pathway Commons 2019 Update: integration, analysis and exploration of pathway data [link]

  • LinkedOmics: analyzing multi-omics data within and across 32 cancer types [link]

    • CPTAC proteomics and TCGA multi-omics data.

  • PathBank: a comprehensive pathway database for model organisms [link]

  • PhaSepDB: a database of liquid-liquid phase separation related proteins [link]

  • BioCyc Database Collection [link]

    • BioCyc is a collection of 14735 Pathway/Genome Databases (PGDBs), plus software tools for exploring them [Karp17].

  • Compendiums of cancer transcriptomes for machine learning applications [link]

  • Small Molecule Identifier Database [link] (C. elegans)

    • "SMIDs aim to make life easier for describing biogenic small molecules in metabolomic and genomic applications."

  • The Blood Exposome Database [link]

  • Organelle genome (e.g., mitochondria) [link]

  • Catalogue of Genomic Data Initiatives [link]

  • Medical data sets available in the processed form [link]

  • BioBankRead: Data pre-processing in Python for UKBiobank clinical data [link]

  • Korean public and hospital data for estimating LDL-cholesterol [link]

  • Thyroid cancer database [link]

    • They provide the data for research with proper requests

  • N. Ireland Cancer Registry [link]

    • From 1993 to 2017 (updated March, 2019)

  • Loni - available data in neuroscience [link]

Others:

  • Kite - Kite is the autocomplete developers trust to improve their productivity [link]

  • Deep Learning Drizzle [link]

  • compgen command in Linux with Examples [link]

    • compgen is used to list all the commands in the Linux system.

  • Framework for Data Preparation Techniques in Machine Learning [link]

  • Improving reproducibility in computational biology research [link]

  • Code Ocean - computational code and data from anywhere, with anyone. [link]

  • Wolfram Alpha - Compute expert-level answers using Wolfram’s breakthrough algorithms, knowledgebase and AI technology [link]

  • Network Biology: Introduction to STRING and Cytoscape [link]

  • Vectr - free vector graphics software, suitable for scientific illustration [link]

  • cleanEHR

    • electronic health care record (EHR) data cleaning and processing platform, which works with the Critical Care Health Informatics Collaborative's data set [link]

  • A nice collection of Machine Learning resources [link]

  • Common statistical tests are linear models (or: how to teach stats) [link]

  • Technical Notes On Using Data Science & Artificial Intelligence (Python) [link]

  • Academic: the website builder for Hugo [link]

  • Scripts for "Current best-practices in single-cell RNA-seq: a tutorial" [link]

  • The collection of mitochondrial metabolism in cancer [link]

  • Bioinformatics resources [link]

  • Bioinformatics Training Resources - Coppola Lab [link]

  • Sample preparation and metabolite extraction sample protocols [link]

  • How to write a peer-review (tips from PLOS) [link]

  • Beautiful.ai - Beautiful presentation [link]

  • Lipidomics Standards Initiatives [link]

  • BBMRI-NL omics atlas [link]

  • Awesome Multiomics tool collection github page [link]

  • dfcrm: Dose-Finding by the Continual Reassessment Method in R [link]

  • Tabula - Get tables from pdf files [link]

  • Single cell omics methods collection and more [link]

  • OmicsDI - database for omics data [link]

  • Cancer Clinical Proteomics Research [link]

  • Mendeley data [link]

  • pdfconvert: PDF - TIFF and vice versa [link]

  • Papers with code [link]

  • RNAseq analysis in R [link]

  • Common transition words and phrases [link]

  • Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data [link]

  • Writing for a Nature journal [link]

  • The Conversation - evidence-based news [link]