Distributed database

Distributed database

A distributed database is a database in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of databases improves database performance at end-user worksites. [1][clarification needed]

To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations.[2]

Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.

Contents

Basic architecture

A database User accesses the distributed database through:

Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.

A distributed database does not share main memory or disks.

Important considerations

Care with a distributed database must be taken to ensure the following:

  • The distribution is transparent — users must be able to interact with the system as if it were one logical system. This applies to the system's performance, and methods of access among other things.
  • Transactions are transparent — each transaction must maintain database integrity across multiple databases. Transactions must also be divided into subtransactions, each subtransaction affecting one database system.

Advantages of distributed databases

  • Management of distributed data with different levels of transparency like fragmentation transparency,replication transparency..etc..
  • Increase reliability and availability.
  • Easier expansion.
  • Reflects organizational structure — database fragments are located in the departments they relate to.
  • Local autonomy or site autonomy — a department can control the data about them (as they are the ones familiar with it.)
  • Protection of valuable data — if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations.
  • Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.)
  • Economics — it costs less to create a network of smaller computers with the power of a single large computer.
  • Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems).
  • Reliable transactions - Due to replication of database.
  • Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence.
  • Continuous operation.
  • Distributed Query processing.
  • Distributed Transaction management.

Single site failure does not affect performance of system. All transactions follow A.C.I.D. property: a-atomicity, the transaction takes place as whole or not at all; c-consistency, maps one consistent DB state to another; i-isolation, each transaction sees a consistent DB; d-durability, the results of a transaction must survive system failures. The Merge Replication Method used to consolidate the data between databases.

Disadvantages of distributed databases

  • Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.
  • Economics — increased complexity and a more extensive infrastructure means extra labour costs.
  • Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites).
  • Difficult to maintain integrity — but in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible.,
  • Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.
  • Lack of standards — there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS.
  • Database design more complex — besides of the normal difficulties, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication.
  • Additional software is required.
  • Operating System should support distributed environment.
  • Concurrency control: it is a major issue. It is solved by locking and timestamping.

See also

References

  1. ^ O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin
  2. ^ O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Distributed Database —   [engl.], verteilte Datenbank …   Universal-Lexikon

  • distributed database —    A database managed as a single system even though it includes many clients and many servers at both local and remote sites. A distributed database requires that data redundancy is managed and controlled …   Dictionary of networking

  • distributed database — paskirstytoji duomenų bazė statusas T sritis informatika apibrėžtis ↑Duomenų bazė, laikoma keliose skirtingose kompiuterinėse sistemose. Naudotojui pateikiama kaip vientisa duomenų bazė, tarsi ji būtų jo kompiuteryje. Paskirstytąją duomenų bazę… …   Enciklopedinis kompiuterijos žodynas

  • Distributed Database Consulting — (DDBC) is a database consulting company that operates Ask DDBC, a popular technology forum. Founded in 1997, DDBC is noted for its contributions to the Oracle community and for providing specialist resources. External links Distributed Database… …   Wikipedia

  • Distributed database management system — A distributed database management system ( DDBMS ) is a software system that permits the management of a distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically… …   Wikipedia

  • distributed database — /dɪstrɪbjətəd ˈdeɪtəbeɪs/ (say distribyuhtuhd daytuhbays) noun Computers a database, the various sections of which are located in more than one computer …  

  • Distributed concurrency control — is the concurrency control of a system distributed over a computer network (Bernstein et al. 1987, Weikum and Vossen 2001). In database systems and transaction processing (transaction management) distributed concurrency control refers primarily… …   Wikipedia

  • Distributed computing — is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal …   Wikipedia

  • Database administration and automation — Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use… …   Wikipedia

  • Database — A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”