SPECTRA is the optimisation engine I designed to address the hidden influence of parameter choices in multi-omics analysis. Rather than committing to a single pipeline, SPECTRA systematically explores the analytical decision space and identifies configurations that are mathematically robust and biologically coherent.
Preliminary work has explored over 170,000 parameter combinations across temporal RNA-seq and ATAC-seq datasets, showing just how much downstream biology depends on choices analysts rarely revisit.
You can access the architecture code on my Github.
In a conventional pipeline, every analytical decision is fixed at one value, producing one result whose dependence on those choices is invisible. In a combinatorial pipeline, each step is expanded into a set of parameters; the resulting space of outputs is passed to an inference layer that identifies the most robust, biologically coherent configuration.
This reframes pipeline design as an optimisation problem rather than a recipe.
Most published RNA-seq and ATAC-seq analyses rarely justify their parameter choices. Yet in temporal data, where the signal is the change between conditions, even small shifts in filtering, normalisation, or clustering can move cluster boundaries, alter Gene Ontology results, and change which transcription factors appear to drive a process.
The reasons are practical, not principled. Running alternatives is infeasible on a laptop and tedious on HPC, where managing thousands of jobs is error-prone. So the field defaults to a "single harness" of conventional choices, and bias accumulates silently.
SPECTRA's caching architecture and modular HPC execution directly address this barrier: the combinatorial space becomes computationally tractable, and the decisions become visible.
Filtering: count thresholds, peak presence, variance criteria
Normalisation: DESeq2, edgeR, TPM, library-size scaling, SCTransform
Dimensionality reduction: PCA, LSI, non-linear embeddings
Clustering: Mfuzz, k-means, spectral clustering, maSigPro
Enrichment: GO term selection, redundancy reduction, hypergeometric integration
Motif analysis: HOMER, MEME, scanning thresholds
RNA–ATAC integration:concordance scoring, peak-to-gene linkage strategies
Each configuration is scored on a composite objective combining mathematical and biological signals:
SPECTRA is being applied to temporal RNA-seq and ATAC-seq from decidualising human endometrial stromal cells. Decidualisation has well-defined temporal phases and established biomarkers: FOXO1, PRL, IGFBP1, HOXA10, making it an ideal system for validating combinatorial pipelines against known biology.
This work is supported by the Biomedical Research Unit in Reproductive Health, with access to the Warwick Multi-omics Facility, the PathLake digital pathology repository, and HPC infrastructure via the Bioinformatics and Digital Health RTP. Clinical relevance is anchored through Dr Jan Brosens' collaborations in reproductive medicine.
The translational stakes are real: in the UK, IVF access varies sharply by region, this is called "postcode lottery". Despite national guidelines. Reproducible diagnostics for endometrial receptivity directly support more equitable reproductive healthcare.
This research would not be possible without the patients at University Hospitals Coventry and Warwickshire who generously donated the samples that underpin this work. Behind every dataset is a person who chose to contribute to something larger than themselves, often at a difficult moment in their own care. I'm deeply grateful for that trust, and I carry it into every analytical decision. My hope is that this work, in time, helps build a clearer understanding of what makes the uterine environment receptive to implantation and contributes to more reproducible, more equitable reproductive healthcare for the patients who come next.
Single-cell extension: exhaustive parameter exploration over SCTransform, Seurat, LSI, Leiden clustering and trajectory inference for multi-omic time-series.
ML-assisted pipeline selection: training models on exhaustive runs to recommend high-quality pipelines for unseen datasets without re-running the search.
Gene Regulatory Network Analysis: using optimised regulatory timelines to work out transitions between biological states.
Best-practice framework: establishing standards for integrating chromatin accessibility and transcriptional output in temporal contexts.