Consensus sequence

In molecular biology and bioinformatics, consensus sequence refers to the most common nucleotide or amino acid at a particular position after multiple sequences are aligned. A consensus sequence is a way of representing the results of a multiple sequence alignment, where related sequences are compared to each other, and similar functional sequence motifs are found. The consensus sequence shows which residues are most abundant in the alignment at each position.

Developing software for pattern recognition is a major topic in genetics, molecular biology, and bioinformatics. Specific sequence motifs can function as regulatory sequences controlling biosynthesis, or as signal sequences that direct a molecule to a specific site within the cell or regulate its maturation. Since the regulatory function of these sequences is important, they are thought to be conserved across long periods of evolution. In some cases, evolutionary relatedness can be estimated by the amount of conservation of these sites.

The conserved sequence motifs are called consensus sequences and they show which residues are conserved and which residues are variable. Consider the following example DNA sequence:

A[CT]N{A}YR

In this notation, A means that an A is always found in that position; [CT] stands for either C or T; N stands for any base; and {A} means any base except A. Y represents any pyrimidine, and R indicates any purine.

In this example, the notation [CT] does not give any indication of the relative frequency of C or T occurring at that position. An alternative method of representing a consensus sequence uses a sequence logo. This is a graphical representation of the consensus sequence, in which the size of a symbol is related to the frequency that a given nucleotide (or amino acid) occurs at a certain position. In sequence logos the more conserved the residue, the larger the symbol for that residue is drawn, the less frequent, the smaller the symbol. Sequence logos can be generated using WebLogo, or using the Gestalt Workbench, a publicly availablable visualization tool written by Gustavo Glusman at the Institute for Systems Biology. Further discussion on the limitations of consensus sequences is given in a paper 'Consensus Sequence Zen'.^[1]

A protein binding site, represented by a consensus sequence, may be a short sequence of nucleotides which is found several times in the genome and is thought to play the same role in its different locations. For example, many transcription factors recognize particular patterns in the promoters of the genes they regulate. In the same way restriction enzymes usually have palindromic consensus sequences, usually corresponding to the site where they cut the DNA. Transposons act in much the same manner in their identification of target sequences for transposition. Finally splice sites (sequences immediately surrounding the exon-intron boundaries) can also be considered as consensus sequences.

Thus a consensus sequence is a model for a putative DNA recognition site: it is obtained by aligning all known examples of a certain recognition site and defined as the idealized sequence that represents the predominant base at each position. All the actual examples shouldn't differ from the consensus by more than a few substitutions, but counting mismatches in this way can lead to inconsistencies.^[1]

Any mutation allowing a mutated nucleotide in the core promoter sequence to look more like the consensus sequence is known as an up mutation. This kind of mutation will generally make the promoter stronger and thus the RNA polymerase forms a tighter bind to the DNA it wishes to transcribe and transcription is up regulated. On the contrary, mutations that destroy conserved nucleotides in the consensus sequence are known as down mutations. These types of mutations down regulate transcription since RNA polymerase can no longer bind as tightly to the core promoter sequence.

References

^ ^a ^b Schneider TD (2002). "Consensus Sequence Zen". Appl Bioinformatics 1 (3): 111–119. PMC 1852464. PMID 15130839. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=1852464.

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

consensus sequence — noun A DNA sequence found with minor variations and similar function in widely divergent organisms • • • Main Entry: ↑consensus … Useful english dictionary
consensus sequence — Of a series of related DNA, RNA or protein sequences, the sequence that reflects the most common choice of base or amino acid at each position. Areas of particularly good agreement often represent conserved functional domains. The generation of… … Dictionary of molecular biology
consensus sequence — a sequence of nucleotides that is common to different genes or genomes, usually with some variations but showing substantial similarity; frequently, the prototype sequence that most others approach … Medical dictionary
consensus sequence — The part of a gene or signal sequence that is shared over a wide range of members of a gene family, both within a given species, or in comparisons between species … Glossary of Biotechnology
Kozak consensus sequence — The Kozak consensus sequence, Kozak consensus or Kozak sequence, is a sequence which occurs on eukaryotic mRNA and has the consensus (gcc)gccRccAUGG, where R is a purine (adenine or guanine) three bases upstream of the start codon (AUG), which is … Wikipedia
Interferon Consensus Sequence-binding protein — (ICSBP) or Interferon Regulatory Factor 8 (IRF 8). It is a transcription factor that plays critical roles in the regulation of lineage commitment and in myeloid cell maturation. The critical role for ICSBP is in the decision for a Common Myeloid… … Wikipedia
Sequence motif — In genetics, a sequence motif is a nucleotide or amino acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed … Wikipedia
Sequence alignment — In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1]… … Wikipedia
Consensus (disambiguation) — For the Wikipedia policy, see Wikipedia:Consensus. The word consensus may mean: 1992 Consensus Consensual nonconsent Consensus (computer science) achieving coherence, or quorum, among nodes of a distributed computer system. Consensus (medical)… … Wikipedia
Sequence promoteur — Séquence promoteur Une séquence promotrice est une région située à proximité d un gène et indispensable à la transcription, sur laquelle se fixe l ARN polymérase. Les séquences promotrices peuvent être situées en aval du site d initiation de la… … Wikipédia en Français

Academic Dictionaries and Encyclopedias

Consensus sequence

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Consensus sequence

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link