Galaxy and MEME

Galaxy and MEME to Process Sequencing Data

After performing in vitro selection, we are able to run our aptamer pools through sequencing. Then we are able to use software, such as Galaxy and MEME, to analyze the data received and identify aptamer candidates. On this page, we will dive deeper into what Galaxy and MEME are and how they work.

Galaxy

Galaxy is a program for the analysis of sequenced data for other applications. Galaxy is available to the public and requires no computer science background. The program analyzes data from next generation sequencing.

After SELEX is completed, the data is put into FASTQ format. Galaxy will then interpret that data for the user. The Galaxy program goes through several steps: pre-processing, data base searching, sorting, filtering, and analysis. Then the researcher sees the results.

The image is a flow chart of the steps previously discussed.

MEME

MEME is another software used for motif analysis. MEME stands for Multiple Expectation Maximization for Motif Elicitation. MEME can also tell you the probability that A, C, T, or G is at a certain position in the sequence which you input.

You input your sequences, and MEME outputs the motifs. For MEME to work, the motifs do not need to be in the same position in each sequence. It can identify common motifs, even if they are in different places on their respective sequences.

MEME has the ability to find similar structures as well as similar biological function in a number of different sequences.

The second image in the carousel is how one may interpret MEME data. It communicates the frequency of a particular nucleotide base (A, T, G, or C) at a particular position in the sequence.

Motif

A motif is a particular sequence that is found in an aptamer. Motifs are repeated sequences. The aptamers grouped by motifs can show a significant sequence similarity in the single stranded regions. Motifs can vary in length and location as well. When two things share a number of common motifs, it tends to indicate some sort of relationship.

FASTQ

FASTQ is a format that stores biological sequences and the corresponding quality score associated with the sequence. This can help the researcher determine how correct and useful their sequenced data is. This is a way to store the data retrieved from high-throughput sequencing, like Illumina sequencing. FASTQ format is four lines. Each line gives the person interpreting the data different information.

Nucleotide

A nucleotide is an organic molecule with a phosphate group, sugar molecule and nitrogen containing base. The nitrogen containing bases are either guanine (G), cytosine (C), adenine (A), or thymine (T). They form the basic structure of DNA and RNA.

Relevant Sources

Analyzing HT-SELEX data with the Galaxy Project tools--A web based bioinformatics platform for biomedical research

Thiel WH, Giangrande PH. Analyzing HT-SELEX data with the Galaxy Project tools--A web based bioinformatics platform for biomedical research. Methods. 2016;97:3-10. doi:10.1016/j.ymeth.2015.10.008

This paper is about Galaxy. It talks about the development and the uses of the program. It explains how to use the program to obtain your desired results. Galaxy has a number of uses. This paper talks about some of them and what they do. The paper is a great source for learning how to use the program, as it gives step by step instructions on how to use the program to get the results the user is looking for.

Manipulation of FASTQ data with Galaxy

Blankenberg, D., Gordon, A., Von Kuster, G., Coraor, N., Taylor, J., Nekrutenko, A., and Team, the G. (2010). Manipulation of FASTQ data with Galaxy. Bioinformatics 26, 1783–1785.

This paper discusses the use of Galaxy software to manipulate and analyze FASTQ data. The program analyzes NGS data. It also gives the user statistics about each nucleotide in the sequence. You can also filter the quality level by adding a minimum and maximum. This helps with quality control.

Page updated

Report abuse