Methods

Data Collection and Experimental Setup

Biological Samples

Lung cells were grown under two conditions:
Control: Cells cultured in standard media (media-only).
PM Exposure: Cells treated with urban particulate matter (PM) at 125 µg/mL for 24 hours.

This experimental design simulates environmental stress and allows us to investigate potential changes in pseudouridine (Ψ) modifications in response to PM exposure. Data collected by Matthew Burroughs and Sean Engles.

Sequencing Platforms

Nanopore Direct RNA Sequencing:
Directly sequences native RNA molecules by reading changes in electrical current as RNA passes through a nanopore.
It provides real-time, isoform-specific information and can detect modifications by their distinct signal changes.

BID‑Seq (Bisulfite-Induced Deletion Sequencing):
This method uses a sodium bisulfite treatment to selectively mark pseudouridine residues, enabling their detection in the sequence reads.
BID‑Seq reports modifications at the genomic coordinate level, offering high specificity but with less isoform resolution.

Data Processing

Nanopore Data Processing

Input:
CSV files containing transcript-level data: transcript IDs, positions, bases, read counts, and modification probabilities.
Filtering:
High-confidence Ψ sites were filtered by selecting only those with modification probabilities ≥ 0.8.
Mapping Transcript IDs to Genes:
Transcript IDs were mapped to gene names using a GTF annotation file (GENCODE v38).
This was done using the Python library gffutils [4].
The GTF database was generated to query parental gene information for each transcript.
Aggregation:
For each gene, the mean modification probability was computed to represent the gene-level signal for both PM and control samples.

BID‑Seq Data Processing

Input:
Excel files for both conditions
PM (Expressed Data) used the “exp.ratio” to quantify modification
Control (Unexpressed Data) used the “unex.ratio” as the metric
Aggregation:
BID‑Seq data was aggregated by computing the mean ratio per gene.
In both cases, gene names provided in the dataset were used as the grouping key.

Cross-Platform Comparison

Within-Platform Analysis:
For both Sequencing methods, the number of genes detected per condition, and shared between both, were determined.
This comparison provides insight into condition-specific modifications.
Merging Data Across Platforms:
The gene-level data from both methods were merged using gene names as the common key.
This allowed a direct comparison of the modification signals (mean Nanopore mod_prob versus mean BID‑Seq ratio) for the shared genes.
Statistical Analysis:
Pearson correlation coefficients were computed to quantitatively assess the relationship between Nanopore and BID‑Seq signals.
Scatter plots were generated to visualize these relationships.

Visualization

Scatter Plots:
Separate scatter plots were generated for PM and control conditions to compare the modification metrics from the two platforms visually.

Code

Results and Discussion

Page updated

Report abuse