The analysis begins with a quality assessment of the trimmed FASTQ files using FastQC, which generates detailed HTML reports for each sample, showing metrics such as per-base sequence quality, adapter contamination, and GC content. These reports are consolidated with MultiQC into a comprehensive summary, which helps identify potential issues (e.g., low-quality reads or biases) across the 8 sample groups. This step ensures data integrity before proceeding to alignment.
Trimmed reads are aligned to the mouse reference genome (GRCm39 from NCBI) using a STAR aligner. Key steps include:
Genome Indexing: Creating a custom genome index with STAR’s genomeGenerate mode using the FASTA and GTF annotation files.
Alignment: Mapping unzipped FASTQ files to the genome, generating sorted BAM files.
The aligned BAM files are quantified at the gene level using featureCounts from the Rsubread package in R. This tool counts reads that overlap with annotated genes (based on the GTF file), producing a raw count matrix. A metadata table links sample IDs to experimental conditions (d0_24h_0nm vs. d0_24h_100nm). Merging the count data into a unified matrix required careful handling to avoid duplicate column names.
The count matrix and metadata are imported into DESeq2 for differential expression analysis between treatment groups. Genes with low expression (total counts ≤10) are removed to reduce noise. DESeq2’s median-of-ratios method normalizes data to account for library size differences. Log2 fold changes and adjusted p-values are calculated for the contrast between d0_24h_100nm and d0_24h_0nm. Results are saved as a CSV file, annotated with gene names.
The results are visualized to highlight key biological insights:
Volcano Plot: Displays significantly upregulated/downregulated genes (padj < 0.05, |log2FC| > 1), with top candidates labeled.
PCA Plot: Visualizes sample clustering based on expression variation, colored by treatment group.
Gene-Specific Boxplots: Illustrates expression differences for specific genes across conditions. All plots are saved as publication-ready PNG files.