XML-Retrieval

XML-Retrieval

XML Retrieval, or XML Information Retrieval[1], is the content-based retrieval of documents structured with XML (eXtensible Markup Language). As such it is used for computing relevance of XML documents.[2]

Contents

Queries

Most XML retrieval approaches do so based on techniques from the information retrieval (IR) area, e.g. by computing the similarity between a query consisting of keywords (query terms) and the document. However, in XML-Retrieval the query can also contain structural hints. So-called "content and structure" (CAS) queries enable users to specify what structure the requested content can or must have.

Exploiting XML structure

Taking advantage of the self-describing structure of XML documents can improve the search for XML documents significantly. This includes the use of CAS queries, the weighting of different XML elements differently and the focused retrieval of subdocuments.

Ranking

Ranking in XML-Retrieval can incorporate both content relevance and structural similarity, which is the resemblance between the structure given in the query and the structure of the document. Also, the retrieval units resulting from an XML query may not always be entire documents, but can be any deeply nested XML elements, i.e. dynamic documents. The aim is to find the smallest retrieval unit that is highly relevant. Relevance can be defined according to the notion of specificity, which is the extent to which a retrieval unit focuses on the topic of request.[3]

Existing XML search engines

An overview of two potential approaches is available.[4][5] The INitiative for the Evaluation of XML-Retrieval (INEX) was founded in 2002 and provides a platform for evaluating such algorithms.[3] Three different areas influence XML-Retrieval:[6]

Traditional XML query languages

Query languages such as the W3C standard XQuery[7] supply complex queries, but only look for exact matches. Therefore, they need to be extended to allow for vague search with relevance computing. Most XML-centered approaches imply a quite exact knowledge of the documents' schemas.[8]

Databases

Classic database systems have adopted the possibility to store semi-structured data[6] and resulted in the development of XML databases. Often, they are very formal, concentrate more on searching than on ranking, and are used by experienced users able to formulate complex queries.

Information retrieval

Classic information retrieval models such as the vector space model provide relevance ranking, but do not include document structure; only flat queries are supported. Also, they apply a static document concept, so retrieval units usually are entire documents.[8] They can be extended to consider structural information and dynamic document retrieval. Examples for approaches extending the vector space models are available: they use document subtrees (index terms plus structure) as dimensions of the vector space.[9]

See also

References

  1. ^ Luk, R.W.P.; H.V. Leong, T.S. Dillon, Alvin T. S. Chan, W. B. Croft and J. Allan (2002). "A survey in indexing and searching XML documents". Journal of the American Society for Information Science and Technology 53 (6): 415–437. doi:10.1002/asi.10056. 
  2. ^ Winter, Judith; Drobnik, Oswald (November 9, 2007). "An Architecture for XML Information Retrieval in a Peer-to-Peer Environment". ACM. ftp://ftp.tm.informatik.uni-frankfurt.de/pub/papers/ir/An%20Architecture%20for%20XML%20Information%20Retrieval%20in%20a%20Peer-to-Peer%20Environment_2007.pdf. Retrieved 2009-02-10. 
  3. ^ a b Malik, Saadia; Trotman, Andrew; Lalmas, Mounia; Fuhr, Norbert (2007). "Overview of INEX 2006". Proceedings of the Fifth Workshop of the INitiative for the Evaluation of XML Retrieval. http://www.cs.otago.ac.nz/homepages/andrew/2006-10.pdf. Retrieved 2009-02-10. 
  4. ^ Amer-Yahia, Sihem; Lalmas, Mounia (2006). "XML Search: Languages, INEX and Scoring". SIGMOD Rec. Vol. 35, No. 4. http://www.sigmod.org/record/issues/0612/p16-article-yahia.pdf. Retrieved 2009-02-10. [dead link]
  5. ^ Pal, Sukomal (June 30, 2006). "XML Retrieval: A Survey". Technical Report, CVPR. http://66.102.1.104/scholar?q=cache:R6ZYFNoTRrUJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.109.5986%26rep%3Drep1%26type%3Dpdf. Retrieved 2009-02-10. 
  6. ^ a b Fuhr, Norbert; Gövert, N.; Kazai, Gabriella; Lalmas, Mounia (2003). "INEX: Initiative for the Evaluation of XML Retrieval". Proceedings of the First INEX Workshop, Dagstuhl, Germany, 2002. ERCIM Workshop Proceedings, France. http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fuhr_etal:02a.pdf. Retrieved 2009-02-10. 
  7. ^ Boag, Scott; Chamberlin, Don; Fernández, Mary F.; Florescu, Daniela; Robie, Jonathan; Siméon, Jérôme (23 January 2007). "XQuery 1.0: An XML Query Language". W3C Recommendation. World Wide Web Consortium. http://www.w3.org/TR/2007/REC-xquery-20070123/. Retrieved 2009-02-10. 
  8. ^ a b Schlieder, Torsten; Meuss, Holger (2002). "Querying and Ranking XML Documents". Journal of the American Society for Information Science and Technology, Vol. 53, No. 6. http://google.com/search?q=cache:KHBo9BRjO7QJ:www.cis.uni-muenchen.de/people/Meuss/Pub/JASIS02.ps.gz. Retrieved 2009-02-10. 
  9. ^ Liu, Shaorong; Zou, Qinghua; Chu, Wesley W. (2004). "Configurable Indexing and Ranking for XML Information Retrieval". SIGIR'04. ACM. http://www.cobase.cs.ucla.edu/tech-docs/sliu/SIGIR04.pdf. Retrieved 2009-02-10. 

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • XML-Retrieval — oder XML Information Retrieval ist das inhaltsbasierte Retrieval von Dokumenten, die mit der Extensible Markup Language (XML) strukturiert sind.[1] Inhaltsverzeichnis 1 Anfragen 2 Nutzung von XML Struktur 3 Ranking …   Deutsch Wikipedia

  • XML Retrieval — oder XML Information Retrieval ist das inhaltsbasierte Retrieval von Dokumenten, die mit der Extensible Markup Language (XML) strukturiert sind.[1] Inhaltsverzeichnis 1 Anfragen 2 Nutzung von XML Struktur 3 Ranking 4 …   Deutsch Wikipedia

  • Inhaltsbasiertes XML-Retrieval — XML Retrieval oder XML Information Retrieval ist das inhaltsbasierte Retrieval von Dokumenten, die mit der Extensible Markup Language (XML) strukturiert sind.[1] Inhaltsverzeichnis 1 Anfragen 2 Nutzung von XML Struktur 3 Ranking 4 …   Deutsch Wikipedia

  • XML Information Retrieval — XML Retrieval oder XML Information Retrieval ist das inhaltsbasierte Retrieval von Dokumenten, die mit der Extensible Markup Language (XML) strukturiert sind.[1] Inhaltsverzeichnis 1 Anfragen 2 Nutzung von XML Struktur 3 Ranking 4 …   Deutsch Wikipedia

  • XML Management System — (shortened as XMLMS) is a system that allows queries and manipulation of XML data (similar to RDBMS querying and manipulating generic data). Current XMLMS can be divided into two categories: XML Enabled databases and Native XML databases.XML… …   Wikipedia

  • Information retrieval — This article is about information retrieval in general. For the fictional government department, see Brazil (film). Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for… …   Wikipedia

  • Document retrieval — is defined as the matching of some stated user query against a set of free text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can… …   Wikipedia

  • Concept Search — A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is… …   Wikipedia

  • Translation memory — A translation memory, or TM, is a type of database that stores segments that have been previously translated. A translation memory system stores the words, phrases and paragraphs that have already been translated and aid human translators. The… …   Wikipedia

  • ECM-Komponenten — Für Enterprise Content Management Systeme (ECMS) werden die unterschiedlichsten ECM Komponenten und Techniken kombiniert, die zum Teil auch als eigenständige Lösungen sinnvoll nutzbar sind ohne den Anspruch an ein unternehmensweites System[1].… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”