Newick format

Newick format

In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way to represent graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick's restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first tree-drawing programs in Felsenstein's PHYLIP package.[1]

Contents

Examples

The following tree:

NewickExample.svg

could be represented in Newick format in several ways

(,,(,));                               no nodes are named
(A,B,(C,D));                           leaf nodes are named
(A,B,(C,D)E)F;                         all nodes are named
(:0.1,:0.2,(:0.3,:0.4):0.5);           all but root node have a distance to parent
(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;       all have a distance to parent
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);       distances and leaf names (popular)
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;     distances and all names
((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;    a tree rooted on a leaf node (rare)

Newick format is typically used for tools like PHYLIP and is a minimal definition for a phylogenetic tree.

Rooted, unrooted, and binary trees

When an unrooted tree is represented in Newick notation, an arbitrary node is chosen as its root. Whether rooted or unrooted, typically a tree's representation is rooted on an internal node and it is rare (but legal) to root a tree on a leaf node.

A rooted binary tree that is rooted on an internal node has exactly two immediate descendant nodes for each internal node. An unrooted binary tree that is rooted on an arbitrary internal node has exactly three immediate descendant nodes for the root node, and each other internal node has exactly two immediate descendant nodes. A binary tree rooted from a leaf has at most one immediate descendant node for the root node, and each internal node has exactly two immediate descendant nodes.

Grammar

A grammar for parsing the Newick format:

The grammar nodes

   Tree: The full input Newick Format for a single tree
   Subtree: an internal node (and its descendants) or a leaf node
   Leaf: a leaf node
   Internal: an internal node (and its descendants)
   BranchSet: a set of one or more Branches
   Branch: a tree edge and its descendant subtree.
   Name: the name of a node
   Length: the length of a tree edge.

The grammar rules

Note, "|" separates alternatives.

   Tree --> Subtree ";" | Branch ";"
   Subtree --> Leaf | Internal
   Leaf --> Name
   Internal --> "(" BranchSet ")" Name
   BranchSet --> Branch | BranchSet "," Branch
   Branch --> Subtree Length
   Name --> empty | string
   Length --> empty | ":" number

Whitespace (spaces, tabs, carriage returns, and linefeeds) within number is prohibited. Whitespace within string is often prohibited. Whitespace elsewhere is ignored. Sometimes the Name string must be of a specified fixed length. The Tree --> Branch ";" production makes the entire tree descendant from nowhere, which can be nonsensical, and is sometimes prohibited.

Note that when a tree having more than one leaf is rooted from one of its leaves, a representation that is rarely seen in practice, the root leaf is characterized as an Internal node by the above grammar. Generally, a root node labeled as Internal should be construed as a leaf if and only if it has exactly one Branch in its BranchSet. One can make a grammar that formalizes this distinction by replacing the above Tree production rule with

   Tree --> RootLeaf ";" | RootInternal ";" | Branch ";"
   RootLeaf --> Name | "(" Branch ")" Name
   RootInternal --> "(" BranchSet "," Branch ")" Name

The first RootLeaf production is for a tree with exactly one leaf. The second RootLeaf production is for rooting a tree from one of its two or more leaves.

See also

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Newick — For other uses, see Newick format. Coordinates: 50°59′N 0°01′E / 50.98°N 0.02°E / 50.98; 0.02 …   Wikipedia

  • Newick — est le nom d un format de fichier utilisé en biologie pour décrire les relations phylogénétiques entre diverses êtres vivants ou molécules biologiques telles que l ADN. L extension des fichiers Newick est .nwk. Sommaire 1 Structure des données 2… …   Wikipédia en Français

  • Nexus file — For the NeXus format used in particle physics, see Nexus (data format). Nexus file format[1] is widely used in Bioinformatics. Several popular phylogenetic programs such as Paup*,[2] MrBayes,[3] Mesquite, and MacClade[4] use this format …   Wikipedia

  • PHYLIP — is a free Computational phylogenetics package of programs for inferring evolutionary trees (phylogenies). The name is an acronym for PHYL ogeny I nference P ackage. It consists of 35 portable programs, i.e. The source code is written in C and… …   Wikipedia

  • Tree structure — A tree structure showing the possible hierarchical organization of an encyclopedia …   Wikipedia

  • UCSF Chimera — Chimera main window (FSH and receptor, 1xwd) and sequence window (alignment of FSH receptors from different species). Developer(s) Resource for Biocomputing, Visualization, and Informatics (RBVI), UCSF Stable release …   Wikipedia

  • Nwk — Newick Newick est le nom d un format de fichier utilisé en biologie pour décrire les relations phylogénétiques entre diverses êtres vivants ou molécules biologiques telles que l ADN. L extension des fichiers Newick est .nwk. Sommaire 1 Structure… …   Wikipédia en Français

  • Coopérations décentralisées des communes de l'Essonne — En 2010, soixante huit communes françaises de l Essonne avaient conclu des accords de coopération décentralisée prenant la forme de jumelages, d appui ou d aide au développement avec cent cinquante entités administratives étrangères dans vingt… …   Wikipédia en Français

  • Weald and Downland Open Air Museum — The museum covers convert|50|acre|ha, with nearly 50 historic buildings dating from the thirteenth to nineteenth centuries, along with gardens, farm animals, walks and a lake.The buildings at the museum were all threatened with destruction. They… …   Wikipedia

  • Wealden Line — Taking its name from its route through the chalk hills of the North and South Downs of the Weald, England, the Wealden Line [http://www.wealdenlink.org.uk/ Wealden Link Website] is a partly abandoned double track railway line in East Sussex and… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”