Stockholm format

Stockholm format

Stockholm format is a Multiple sequence alignment format used by Pfam and Rfam to disseminate protein and RNA sequence alignmentscite journal |author=Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A |title=Rfam: annotating non-coding RNAs in complete genomes. |journal=Nucleic Acids Res |volume=33 |issue=Database issue |pages=D121-4 |year=2005 |pmid=15608160] cite journal |author=Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR |title=Rfam: an RNA family database. |journal=Nucleic Acids Res |volume=31 |issue=1 |pages=439-41 |year=2003 |pmid=12520045] cite journal |author=Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A |title=The Pfam protein families database. |journal=Nucleic Acids Res |volume=36 |issue=Database issue |pages=D281-8 |year=2008 |pmid=18039703] . The alignment editors [http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ Ralee] and [ftp://ftp.cgb.ki.se/pub/prog/belvu Belvu] support Stockholm format as do the probabilistic database search tools, [http://infernal.janelia.org/ Infernal] and HMMER. A simple example of an Rfam alignment (UPSK RNA) in Stockholm format is shown below:


# STOCKHOLM 1.0

AF035635.1/619-641 UGAGUUCUCGAUCUCUAAAAUCGM24804.1/82-104 UGAGUUCUCUAUCUCUAAAAUCGJ04373.1/6212-6234 UAAGUUCUCGAUCUUUAAAAUCGM24803.1/1-23 UAAGUUCUCGAUCUCUAAAAUCG
#=GC SS_cons .AAA....<<<>>>//

A minimal well formed Stockholm files should contain the header which states the format and version identifier, currently '# STOCKHOLM 1.0'. Followed by the sequences and corresponding unique sequence names:

'' stands for "sequence name", typically in the form "name/start-end" or just "name". Finally, the "//" line indicates the end of the alignment. Sequence letters may include any characters except whitespace. Gaps may be indicated by "." or "-".

The alignment mark-up:

Mark-up lines may include any characters except whitespace. Use underscore ("_") instead of space.


#=GF
#=GC
#=GS
#=GR

Magic or recommended features:

#=GF

(See [ftp://selab.janelia.org/pub/Pfam/userman.txt Pfam documentation,] under "Description of fields")

For embedding trees:


#=GF NH
#=GF TN

* Notes: A tree may be stored on multiple #=GF NH lines.
* If multiple trees are stored in the same file, each tree must be preceded by a #=GF TN line with a unique tree identifier. If only one tree is included, the #=GF TN line may be omitted.

#=GS

Rfam and Pfam uses these features:

Feature Description --------------------- ----------- AC ACcession number DE DEscription DR ; ; Database Reference OS OrganiSm (species) OC Organism Classification (clade, etc.) LO Look (Color, etc.)

#=GR

Feature Description Markup letters ------- ----------- -------------- SS Secondary Structure For RNA [.,;<>(){} [] AaBb...] , For protein [HGIEBTSCX] SA Surface Accessibility [0-9X] (0=0%-10%; ...; 9=90%-100%) TM TransMembrane [Mio] PP Posterior Probability [0-9*] (0=0.00-0.05; 1=0.05-0.15; *=0.95-1.00) LI LIgand binding [*] AS Active Site [*] IN INtron (in or after) [0-2]

#=GC

The same features as for #=GR with "_cons" appended, meaning "consensus". Example: "SS_cons".

Notes:

*Do not use multiple lines with the same #=GR label. Only one unique feature assignment can be made for each sequence.

*"X" in SA and SS means "residue with unknown structure".

*The protein SS letters are taken from DSSP: H=alpha-helix, G=3/10-helix, I=p-helix, E=extended strand, B=residue in isolated b-bridge, T=turn, S=bend, C=coil/loop.)

*The RNA SS letters are taken from WUSS (Washington University Secondary Structure) notation. Matching nested parentheses characters <>, (), [] , or {} indicate a basepair. The symbols '.', ',' and ';' indicate unpaired regions and matched upper and lower case characters from the English alphabet indicate pseudoknot interactions.

Recommended placements:

* #=GF Above the alignment
* #=GC Below the alignment
* #=GS Above the alignment or just below the corresponding sequence
* #=GR Just below the corresponding sequence

ize limits:

*No size limits on any field.

*However, a simple parser that uses fixed field sizes should work safely on Pfam and Rfam alignments with these limits:

** Line length: 10000.
** : 255.
** : 255.

References

ee also

*FASTA format
*Rfam
*Pfam

External links

* [http://sonnhammer.sbc.su.se/Stockholm.html Erik Sonnhammers' definition of Stockholm format]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Stockholm syndrome — is a psychological response sometimes seen in an abducted hostage, in which the hostage shows signs of loyalty to the hostage taker, regardless of the danger (or at least risk) in which they have been placed. The syndrome is named after the… …   Wikipedia

  • Stockholm Institute for Scandinavian Law — is affiliated to the Law Faculty at Stockholm University. The overall objective of the Institute is to disseminate knowledge about Scandinavian law and jurisprudence abroad by presenting Scandinavian law and legal theory to a wide readership in… …   Wikipedia

  • Stockholm (album de Jean-Louis Aubert) — Stockholm Album par Jean Louis Aubert Sortie 1997 Enregistrement Paris, Stockholm Durée 56:24 Genre Rock …   Wikipédia en Français

  • Stockholm congestion tax — The Stockholm congestion tax ( sv. Trängselskatt i Stockholm), also found referred to as the Stockholm congestion charge, is a congestion pricing system implemented as a tax which is levied on most vehicles entering and exiting central Stockholm …   Wikipedia

  • Stockholm Syndrome (song) — Single infobox | Name =Stockholm Syndrome Artist = Muse from Album = Absolution Released = 14 July 2003 (UK Download Single) Format = mp3 (download only) Recorded = 2003 Genre = Progressive metal Progressive rock Length = 4:58 Label = Mushroom… …   Wikipedia

  • FASTA format — In bioinformatics, FASTA format is a text based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single letter codes.The format also allows for sequence names and …   Wikipedia

  • MS Stockholm (1941) — MS Stockholm was the name of two near identical ocean liners built by Cantieri Riuniti dell Adriatico, Monfalcone, Italy between 1936 and 1941 for the Swedish American Line. Neither of the ships entered service for the company that had ordered… …   Wikipedia

  • Järntorget (Stockholm) — For other meanings see Järntorget. Järntorget ( sv. The Iron Square) is a small public square in Gamla stan, the old town in central Stockholm, Sweden. Located in the southernmost corner of the old town, the square connects the thoroughfares… …   Wikipedia

  • Trekanten (Stockholm) — Infobox lake lake name = Trekanten image lake = Nybohov trekanten sommar 2006.jpg caption lake = View from Nybohov image bathymetry = caption bathymetry = coords = coord|59|18|42|N|18|0|56|E|region:SE type:waterbody… …   Wikipedia

  • Admiralty House (Stockholm) — The Admiralty House ( sv. Amiralitetshuset) is an Admiralty House on the islet Skeppsholmen in central Stockholm, Sweden.Built in 1647 50 as the Admiralty Board moved over to Skeppsholmen, and probably designed by Louis Gillis, a Dutch architect… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”