Real-World Applications

RNA Sequencing Combined with Data Science- JGI Experiment

In this experiment Jose and Sharon challenged us to observe and analyze the data of two samples of algae exposed to normal sun and artificial light. Furthermore, which genes were turned on in exposure to stronger light. We combined the use basic python code using pandas, and slicing techniques to help develop code to analyze data.

Step 1: The algae stores amounts of energy from the sun, and it is important for scientists to know which genes are expressed when the algae was exposed to high light and how the algae reacts to environments and conditions.

Step 2: We needed to download the RNA-seq data of the algae and to do this we had to download a text file that contained a matrix of many of the specific algae C. zofinginesis samples. Then we need to read the data by using the "import pandas" function as well as the "read.csv" function.

Step 3: We then sliced the data using the "rna_data.iloc[0:10,0:10]" function. This gave us the first 10 rows and columns of the data set. It allowed us to further analyze the data and look more closely. The gene IDs (row headers) all have the form 'Cz' followed by a number, then the letter 'g', then another number. 'Cz' indicates that the gene is from the species Chromochloris zofingiensis, the first number indicates which chromosome the gene is on, and the second number is a randomly assigned ID number.

The sample names (column headers) tell us whether each algae was grown in medium light (ML) or high light (HL), and how many hours the algae was in the light before they collected a sample (eg. 0.5h, 1h, 12h).

Conclusion:

In the future I would like to continue working on this! However, it gave me exposure as to what working in the field of Environmental Genomics can look like. We used data science tools and coding techniques that Ajayi taught us to work through the beginning steps of analyzing the data.

Page updated

Report abuse