There were 7500 oligonucleotides with unique tags representing 2500 unique STR alleles. Each allele is represented three times with unique tags to ensure that there will be representation of each allele within the library after cloning, and for error analysis. The filler region is a pre-screened sequence sometimes required to keep each sequence generated at the same length for synthesis.
The plasmid backbone (pMPRA1) was designed by Melnikov et al. and is available to obtain via Addgene.6 F1 and R1 primers were used in emulsion PCR to amplify the library and added SfiI restriction sites to it. The plasmid backbone also contains two SfiI recognition sequences. Only one restriction enzyme digest is required for the recombination of the library with the plasmid backbone.
The SfiI digest and ligation were completed to construct the “mid” construct, a plasmid vector library containing our sequences of interest. The mid construct was transformed into bacteria and extracted using an endotoxin-free midi-prep extraction kit.
A plasmid containing the reporter gene with a minimal promoter is also available from Melnikov et al. and can be found on Addgene as pMPRAdonor2.6 The restriction sites for KpnI and XbaI are also found in the reporter plasmid. The plasmid library and reporter underwent restriction digest and then ligated together to give the "full" construct to be used for transfection.
HEK 293T cells were transfected with the full construct library, and then were allowed to recover for a few days to resume growth and gene expression. Then the cells were lysed and their RNA was extracted and pooled into one RNA library. The pooled RNA was then converted to cDNA using a reverse transcriptase kit for sequencing.
The tags of the cDNA generated from the previous sub-project were counted using next-generation sequencing. To analyze the gene expression effects, the tags from the plasmid library used in the transfection (called gDNA) were also counted.
All primary steps of this design required validation steps before safely continuing the next step of the process. The validation at each stage followed a similar series of steps:
Gel electrophoresis of the prepared sequencing libraries to validate proper size after PCR.
Sequencing of the libraries with iSeq 100 instrument.
Align FASTQs to a custom reference genome made up of our initial oligonucleotide pool variance region sequences.
Note: This step was only done for the initial Agilent library and mid construct.
Create replicate plots to find bias by comparing log tag counts of replicates in the DNA FASTQ sequences.
Identify counts for the alleles that dropout of our sequences.
Identify potential bias based on length of the motif in the short tandem repeat.
The above graphs compare allele replicate tag counts for each plasmid construct. Ideally we would see a strong positive correlation, which indicates high experimental reproducibility. A random cloud of points indicates either low reproducibility or too much noise, which can be reduced by increasing the number of reads.
The primary dropout of sequences is the cloning step of inserting the pool into the plasmid construct as seen by the dropout of approximately 34.7% between those stages. However, this stays constant with 62.7% retention found in the full construct.
Gene expression is quantified by the ratio of tag count from the cDNA and gDNA samples. Example results expected from sequencing at a great enough depth are shown here.