(salvo indicación en contrario, las referencias son de la wikipedia)
El doble del tamaño de la intersección de dos conjuntos, dividido la suma del tamaño de los conjuntos Para strings y bigramas: 2n_t/ (n_x + n_y) número de bigramas en ambos strings, n_x y n_y número de bigramas de cada string Gene / Proteins In cells, a gene is a portion of an organism's DNA which contains both "coding" sequences that determine what the gene does, and "non-coding" sequences that determine when the gene is active (expressed.) When a gene is active, the coding and non-coding sequences are copied in a process called transcription, producing an RNA copy of the gene's information. This piece of RNA can then direct the synthesis of proteins via the genetic code. In other cases, the RNA is used directly, for example as part of the ribosome. The RNA may undergo special post-transcriptional processing
steps required to convert it into a mature, functional form. These
molecules resulting from gene expression, whether RNA or protein, are
known as gene products, and are responsible for the development and functioning of all living things. Genomics Genomics is the study of an organism's entire genome. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. Molecular Biology Molecular biology is the study of biology at a molecular level. Functional genomics Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. Unlike genomics and proteomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. Medidas de performance (Raychaudhuri) Sensitivity = Recall = TP / (TP + FN) La sensitividad es lo mismo que el recall, cuánto de los buenos que existen traigo. Specifity = TN / (TN + FP) La especificidad es cuántos de los que NO son identifico correcamente. Accuracy = (TP+TN)/(TP+TN+FP+FN) La accuracy es cuantos pego bien del total (tanto positivos como negativos). Precision = TP / (FP + TP) La precisión es cuántos de los positivos pego bien. DNA Sequencing The term DNA sequencing encompasses biochemical methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a DNA oligonucleotide. The sequence of DNA constitutes the heritable genetic information in nuclei, plasmids, mitochondria, and chloroplasts that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or forensic research. Protein Sequencing Proteins are found in every cell and are essential to every biological process, protein structure is very complex: determining a protein's structure involves first protein sequencing - determining the amino acid sequences of its constituent peptides; and also determining what conformation it adopts and whether it is complexed with any non-peptide molecules. Genome In biology the genome of an organism is its whole hereditary information and is encoded in the DNA (or, for some viruses, RNA). This includes both the genes and the non-coding sequences of the DNA. Genetic code The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. Specifically, the code defines a mapping between tri-nucleotide sequences called codons, and amino acids; every triplet of nucleotides in a nucleic acid sequence specifies a single amino acid. Gene expression Gene expression is the process by which inheritable information from a gene, such as the DNA sequence, is made into a functional gene product, such as protein or RNA.
measure gene expression Quiere
decir
medir la cantidad de ARN producida (bajo ciertas condiciones). La
cantidad de ARN --famoso intermediario-- determina en buena medida
cuánta proteína --el obrero celular-- habrá en esas condidiciones. Los
microarrays miden cantidad de ARN para miles de genes, en una condición
dada. (Martín Graña) The expression of many genes is regulated after transcription (i.e., by microRNAs or ubiquitin ligases),
so an increase in mRNA concentration need not always increase
expression. Nevertheless, mRNA levels can be quantitatively measured by
Northern blotting, a process in which a sample of RNA is separated on an agarose gel and hybridized to a radio-labeled RNA probe that is complementary to the target sequence. Homologous [Anatomical structures that perform the same function in different biological species and evolved from the same structure in some ancestor species are homologous. In genetics, homology can be observed in DNA sequences that code for proteins (genes) and in noncoding DNA. For protein coding genes, one can compare translated amino-acid sequences of different genes. Sequence homology may also indicate common function.] SWISS-PROT Swiss-Prot is a manually curated biological database of protein sequences. Yeast Levadura BLAST In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene;
This program is used to find distant relatives of a protein. First, a list of all closely related proteins is created. These proteins are combined into a general "profile" sequence, which summarises significant features present in these sequences. A query against the protein database is then run using this profile, and a larger group of proteins is found. This larger group is used to construct another profile, and the process is repeated. By including related proteins in the search, PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST aberrantly expressed gene si es un gen con comportamiento conocido (i.e. niveles de expresión acá), deben querer decir que es muy promoter site In biology, a promoter is a regulatory region of DNA generally located upstream (towards the 5' region of the sense strand) of a gene that allows transcription of the gene. metabolic pathway In biochemistry, a metabolic pathway is a series of chemical reactions occurring within a cell. In each pathway, a principal chemical is modified by chemical reactions. chips de microarray El diseño de un chip de microarrays requiere un genoma secuenciado (o
en su defecto, una lista de pedazos de genes para un microarray más
modesto). Disponer del genoma permite definir las sondas que captan los
genes de interés. En tu ejemplo, probablemente hayan elegido esos 5000
y pico de genes y fabricado sondas para medir expresión de ese grupo en
varias condiciones. Cada "spot" es especíico para un gen, y podrá
detectar el gen de una muestra de ADN total (e.g. para comparar bichos
en términos de presencia/ausencia de un gen) o ARN total (como en tu
caso, para medir la expresión del grupo de genes bajo distintas
condiciones). (MG) cosine metric x \times x^T / sqrt(||x|| ||y||) calcula la distancia entre dos vectores (por ejemplo, de palabras) tf-idf The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. |