Nested set model

Nested set model

The nested set model is a particular technique for representing nested sets (also known as trees or hierarchies) in relational databases. The term was apparently introduced by Joe Celko; others describe the same technique without naming it [1] or using different terms. [2]

Contents

Motivation

The technique is an answer to the problem that the standard relational algebra and relational calculus, and the SQL operations based on them, are unable to express all desirable operations on hierarchies directly. A hierarchy can be expressed in terms of a parent-child relation - Celko calls this the adjacency list model - but if it can have arbitrary depth, this does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one relational join per level. This is often known as the bill of materials problem.[citation needed]

Several resolutions exist and are available in some relational database management systems:

  • support for a dedicated hierarchy data type, such as in SQL's hierarchical query facility;
  • extending the relational language with hierarchy manipulations, such as in the nested relational algebra.
  • extending the relational language with transitive closure, such as SQL's CONNECT statement; this allows a parent-child relation to be used, but execution remains expensive;
  • the queries can be expressed in a language that supports iteration and is wrapped around the relational operations, such as PL/SQL, T-SQL or a general-purpose programming language

When these solutions are not available or not feasible, another approach must be taken.

The technique

The nested set model is to number the nodes according to a tree traversal, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements which use rational numbers instead of integers can avoid renumbering, and so are faster to update, although much more complicated [3].

Example

In a clothing store catalog, clothing may be categorized according to the hierarchy given on the left:

A hierarchy: types of clothing
The numbering assigned by tree traversal

The resulting representation

Node Left Right
Clothing 1 22
Men's 2 9
Women's 10 21
Suits 3 8
Slacks 4 5
Jackets 6 7
Dresses 11 16
Skirts 17 18
Blouses 19 20
Evening Gowns 12 13
Sun Dresses 14 15

The "Clothing" category, with the highest position in the hierarchy, encompasses all subordinating categories. It is therefore given left and right domain values of 1 and 22, the latter value being the double of the total number of nodes being represented. The next hierarchical level contains "Men's" and "Women's", both containing levels within themselves that must be accounted for. Each level's data node is assigned left and right domain values according to the number of sublevels contained within, as shown in the table data.

Performance

Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as MySQL[4]. However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as PostgreSQL[5], Oracle[6], and Microsoft SQL Server[7].

Drawbacks

Nested set are very slow for inserts because it requires updating lft and rgt for all records in the table after the insert. This can cause a lot of database thrash as many rows are rewritten and indexes rebuilt.

The Nested interval model does not suffer from this problem, but is more complex to implement, and is not as well known. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). [1]

Variations

Using the nested set model as described above has some performance limitations during certain tree traversal operations. For example, trying to find the immediate child nodes given a parent node requires pruning the subtree to a specific level as in the following SQL code example:

SELECT Child.Node, Child.LEFT, Child.RIGHT
FROM Tree AS Parent, Tree AS Child
WHERE
        Child.LEFT BETWEEN Parent.LEFT AND Parent.RIGHT
        AND NOT EXISTS (    -- No Middle Node
                SELECT *
                FROM Tree AS Mid
                WHERE Mid.LEFT BETWEEN Parent.LEFT AND Parent.RIGHT
                        AND Child.LEFT BETWEEN Mid.LEFT AND Mid.RIGHT
                        AND Mid.Node NOT IN (Parent.Node AND Child.Node)
        )
        AND Parent.LEFT = 1  -- Given Parent Node Left Index

The query gets even more complicated when searching for children more than one level deep. To overcome this limitation and simplify Tree traversal an additional column is added to the model to maintain the depth of a node within a tree.

The resulting representation

Node Left Right Depth
Clothing 1 22 0
Men's 2 9 1
Women's 10 21 1
Suits 3 8 2
Slacks 4 5 3
Jackets 6 7 3
Dresses 11 16 2
Skirts 17 18 2
Blouses 19 20 2
Evening Gowns 12 13 3
Sun Dresses 14 15 3

In this model, finding the immediate children given a parent node can be accomplished with the following SQL code:

SELECT Child.Node, Child.LEFT, Child.RIGHT
FROM Tree AS Child, Tree AS Parent
WHERE
        Child.Depth = Parent.Depth + 1
        AND Child.LEFT > Parent.LEFT
        AND Child.RIGHT < Parent.RIGHT
        AND Parent.LEFT = 1  -- Given Parent Node Left Index

See also

References

  1. ^ Recursive Hierarchies: The Relational Taboo!, by Michael J. Kamfonas, in: The Relational Journal - October/November 1992,
  2. ^ Storing Hierarchical Data in a Database: Modified Pre-order Tree Traversal, by Gijs van Tulder, at articles.sitepoint.com
  3. ^ Hazel, Daniel. "Using rational numbers to key nested sets". arXiv:0806.3115. 
  4. ^ Quassnoi (29 September 2009), "Adjacency list vs. nested sets: MySQL", Explain Extended, http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/, retrieved 11 December 2010 
  5. ^ Quassnoi (24 September 2009), "Adjacency list vs. nested sets: PostgreSQL", Explain Extended, http://explainextended.com/2009/09/24/adjacency-list-vs-nested-sets-postgresql/, retrieved 11 December 2010 
  6. ^ Quassnoi (28 September 2009), "Adjacency list vs. nested sets: Oracle", Explain Extended, http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/, retrieved 11 December 2010 
  7. ^ Quassnoi (25 September 2009), "Adjacency list vs. nested sets: SQL Server", Explain Extended, http://explainextended.com/2009/09/25/adjacency-list-vs-nested-sets-sql-server/, retrieved 11 December 2010 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Hereditarily finite set — Nested set redirects here. Nested set may also refer to the Nested set model in relational databases …   Wikipedia

  • Set theory — This article is about the branch of mathematics. For musical set theory, see Set theory (music). A Venn diagram illustrating the intersection of two sets. Set theory is the branch of mathematics that studies sets, which are collections of objects …   Wikipedia

  • Entity–attribute–value model — (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In… …   Wikipedia

  • Nested word — In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of… …   Wikipedia

  • Nested case-control study — A nested case control study is a variation of a case cohort study in which only a subset of controls from the cohort are compared to the incident cases. In a case cohort study, all incident cases in the cohort are compared to a random subset of… …   Wikipedia

  • Statistical model — A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical… …   Wikipedia

  • Frameworks supporting the polyhedral model — Use of the polyhedral model within a compiler requires software to represent the objects of this framework (sets of integer valued points in regions of various spaces) and perform operations upon them (e.g., testing whether the set is empty). Two …   Wikipedia

  • Answer set programming — (ASP) is a form of declarative programming oriented towards difficult (primarily NP hard) search problems. It is based on the stable model (answer set) semantics of logic programming. In ASP, search problems are reduced to computing stable models …   Wikipedia

  • Database model — A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database… …   Wikipedia

  • The Big Model — is a body of role playing game theory developed primarily by Ron Edwards. It serves as a capstone and organizing principle to the amorphous body of work commonly referred to as GNS Theory.tructureThe Big Model attempts to contextualize the many… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”