Statistical database

Statistical database

A statistical database is a database used for statistical analysis purposes. It is an OLAP instead of OLTP system, although this term precedes that modern decision, and classical statistical databases are often closer to the relational model than the multidimensional model commonly used in OLAP systems today.

Statistical databases often incorporate support for advanced statistical analysis techniques, such as correlations, which go beyond SQL. They also pose unique security concerns, which were the focus of much research, particularly in the late 1970s and early to mid 1980s.

Security in statistical databases

In a statistical database, it is often desired to allow query access only to aggregate data, not individual records. However, securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual.

Some common approaches are:
* only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)
* rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k)
* return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.)
* don't allow overly selective WHERE clauses
* audit all users queries, so users using system incorrectly can be investigated
* use intelligent agents to detect automatically inappropriate system use

Research in this area has largely stalled; reference 3 below showed that, in general, securing statistical databases was an impossible aim: if they were open to legitimate use, they were also open to abuse; and if they were restricted so tightly as to be incapable of abuse, they would then be useless for practical statistical purposes. To quote::The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.

Some further reading

[http://www.informatik.uni-trier.de/~ley/db/conf/ssdbm/ Statistical and Scientific Database Management (SSDBM)] An important series of conferences in this field

Some key papers in this field:
#doi|10.1145/320613.320616 - Dorothy E. Denning, Secure statistical databases with random sample queries, ACM Transactions on Database Systems (TODS), Volume 5, Issue 3 (September 1980), Pages: 291 - 315
#doi|10.1145/319830.319834 - Wiebren de Jonge, Compromising statistical databases responding to queries about means, ACM Transactions on Database Systems, Volume 8, Issue 1 (March 1983), Pages: 60 - 80
#doi|10.1145/320128.320138 - Dorothy E. Denning, Jan Schlörer, A fast procedure for finding a tracker in a statistical database, ACM Transactions on Database Systems, Volume 5, Issue 1 (March 1980) . Pages: 88 - 102


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Food and Agriculture Organization Corporate Statistical Database — The FAO Corporate Statistical Database was a multilingual on line database of statistics on agriculture, nutrition, fisheries, forestry, food aid, land use and population that is administered by the United Nations Food and Agriculture… …   Wikipedia

  • Database marketing — is a form of direct marketing using databases of customers or potential customers to generate personalized communications in order to promote a product or service for marketing purposes. The method of communication can be any addressable medium,… …   Wikipedia

  • Statistical process control software — There are a number of software programs designed to aid in statistical process control (SPC). Typically the software program undertakes two functions: data collection and data analysis. Data collectionThe software replaces the traditional log… …   Wikipedia

  • Statistical potential — In protein structure prediction, a statistical potential (also knowledge based potential, empirical potential, or residue contact potential) is an energy function derived from an analysis of known structures in the Protein Data Bank. Typical… …   Wikipedia

  • Database — A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports… …   Wikipedia

  • Animation database — A dancer s movements, captured via optical motion capture can be stored in an animation database, then analyzed and reused. An animation database is a database which stores fragments of animations or human movements and which can be accessed,… …   Wikipedia

  • MANET database — The Molecular Ancestry Network (MANET) database is a bioinformatics database that maps evolutionary relationships of protein architectures directly onto biological networks. [http://www.manet.uiuc.edu] It was originally developed by Hee Shin Kim …   Wikipedia

  • NCSS (statistical software) — NCSS Developer(s) NCSS, LLC [1] Stable release NCSS 2007 / May, 2007 Operating system Windows Type num …   Wikipedia

  • NCSS Statistical Software — Infobox Software name = NCSS developer = NCSS, LLC [http://www.ncss.com/] latest release version = NCSS 2007 latest release date = May, 2007 operating system = Windows genre = numerical analysis license = Proprietary software website =… …   Wikipedia

  • Relational database — A visual diagram showing the relationship between the two tables, as indicated by the arrow A relational database matches data by using common characteristics found within the data set. The resulting groups of data uses the relational model (a… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”