Database
Structures
|
A database is formed
by combining techniques from: |
|
Data structures and |
|
File structures discussed
previously |
|
To obtain a single
mass storage data system that can appear to have a multitude of organizations
for serving a variety of applications. |
|
Such structures eliminate
the duplication (and harmful redundancy) found in the file-oriented approach. |
|
They provide separate
data systems for each application even though these applications may require
much of the same information.
|
General
Issues
|
Loosely speaking,
any collection of data can be considered a database. |
|
More specifically,
the term is reserved to mean: |
|
A collection of data, |
|
Stored in mass storage. |
|
Can take on a variety
of appearances depending on the requirements at the time. |
|
Can serve as the
data source for a variety of applications. |
|
The requirement to
have the data in several forms was addressed previously in the discussion
of employee records. |
|
The data needed to
be shown as a: |
|
Sequential file
for payroll update |
|
Direct access file
for general employee information retrieval |
|
Indexed system was
found to provide this dual appearance |
|
Indexed files are
generally accepted to contain the rudiments of a database. |
|
It must be remembered
that a “full fledged” database contains a great deal of diversity that
greatly exceeds the simplicity provided above |
|
Databases in use
today: |
|
Contain information
encompassing the full spectrum of business activities. |
|
Provide access to
selected portions of those data in a large number of formats. |
|
Schemas and Subschemas |
|
From another point
of view, a database may be considered to result from the combination of
a variety of data collections organized into a single integrated collection. |
|
This consolidation
approach to the database concept reflects the historical development of
automated data storage and maintenance. |
|
As computing machinery
found wider and wider uses in information management, each application
tended to be implemented as a separate system with its own collection of
data. |
|
Typically, the need
to process payroll gave rise to a sequential file, and later the need for
interactive data retrieval produced an entirely different system using
a direct access file. |
|
Independent systems
represented an improvement over the corresponding manual techniques previously
used, but taken as a whole the collection of individual automated systems
still constituted a limited and inefficient use of resources when compared
to the possibilities of a combined database system. |
|
For example, different
departments were not able to share the data they all needed, so much of
the information required by an organization was duplicated in storage. |
|
When an employee
moved: |
|
Address change cards
throughout the organization had to be completed |
|
Typographical errors,
misplaced cards, and employee apathy could soon result in erroneous and
conflicting data within the various data systems. |
|
After a move, an
employee's newsletter might begin to arrive at the new address but with
the wrong name, while the payroll records could continue to reflect the
old address. |
|
In this setting,
database systems emerged as a means of consolidating the information stored
and maintained by a particular organization. |
|
With such a system,
both payroll and the mailing of newsletters could be processed from a single
integrated data system. |
|
Another advantage
of a consolidated data system is the control achieved by the organizationwhen
the information it owns is placed in one common pot. |
|
As long as each department
has complete control over its own data, those data tend to be used for
the good of the department rather than for the good of the organization. |
|
When a central database
is implemented in a large organization, the control of information is normally
consolidated in the administrative position known as the Database
Administrator (DBA). |
|
This central administrator
knows both the data available within the organization and the needs of
the various departments. |
|
It is within this
structure that decisions regarding data organization and access can be
made with the entire organization in mind. |
|
Other concerns center
around access to sensitive data. |
|
Someone working on
the organization's newsletter: |
|
might need access
to employee names and addresses |
|
should not have access
to payroll data. |
|
An employee processing
payroll should not have: |
|
access to the other
financial records of the corporation. |
|
The ability to control
access to the information in the database is often as important as the
ability to share it. |
|
To provide for this
distinction of access privileges, database systems often rely on schemas
and
subschemas. |
|
A schema is a description
of the entire database structure that is used by the database software
to maintain the database. |
|
A subschema is a
description of only that portion of the database pertinent to a
particular user's needs. |
|
In a university database: |
|
Student records contains
such items as the current address and phone number of that student in addition
to that student's academic record. |
|
Student records are
linked to the record of the individual student's faculty adviser. |
|
The faculty member’s
record contains that person's address, employment history, etc. |
|
A pointer system
is maintained that links the information about a student to the employment
history of a faculty member. |
|
To keep the university's
registrar from using this linkage to obtain privileged information about
the faculty, the registrar's access to the database must be restricted
to a subschema whose description of the faculty records does not include
employment history. |
|
Under this subschema
a user can find out which faculty member is a particular student's adviser
but cannot obtain access to additional information about that faculty member. |
|
In contrast, the
subschema for the payroll department provides the employment history of
each faculty member but does not include the linkage between students and
advisers. |
|
Thus the payroll
department can modify a faculty member's salary but cannot find the students
advised by that person. |
Big brother is
watching:
|
The size and scope
of databases have increased rapidly. Extremely large collections
of data can be assembled and interrogated with little effort over wide
geographic areas. This creates the opportunity for: |
|
Increases in misinformation
and misapplications of information. |
|
Incidents of injustices
due to inaccurate credit reports, faulty criminal records, and discrimination
resulting from unauthorized or unethical access to personal information. |
|
Courts still have
to decide the rights to collect and hold information in the first place: |
|
What kind of information
does an insurance company have a right to collect regarding its clients? |
|
Does a government
have the right to maintain accounts of an individual citizen's voting record? |
|
Does a credit card
company have the right to sell records of its customers' purchasing patterns
to marketing firms?
|
The
Layered Approach to Database Implementation
|
The average individual
using database systems are not trained in computer science and should not
be required to consider the details of computer technology and techniques. |
|
One of the goals
of a database system must be to allow the end
user to concentrate on the problems of the
application at hand. |
|
The systems must
provide for “ad hoc.” questions. |
|
Data must be presented
in a usable form. |
|
To accomplish this
goal in an organized manner, a database system is constructed from layers
of abstraction |
|
The image of the
data given to the end user is produced by the application
software that communicates with the user in
an interactive manner and in the application's terminology. |
|
The design of the
application software is what gives it its personality. |
|
For example, it may
communicate with the end user via: |
|
Question-and-answer
or |
|
Fill-in-the-blanks
scenario. |
|
Whatever the scenario,
the purpose is to learn what information is required and later, having
obtained the requested information, presents it to the user in a meaningful
format. |
|
The actual manipulation
of the database is accomplished by another software package called the
database
management system (DBMS). |
|
One reason for the
division of duties is to: |
|
Simplify the design
process. |
|
This is particularly
true in the context of a distributed database (a database spread over several
machines in a network). |
|
Without the database
management system, the application program must contain routines for keeping
up with the actual location of the various portions of the database. |
|
With a well-designed
database management system, the application software can be written as
though the database were stored on a single machine. |
|
A second reason for
the separation of the application software from the database management
system is that such an organization provides a means for controlling
access to the database. |
|
The system enforces
the restrictions imposed by the various subschemas by dictating that all
access to the database be performed by a central database management system. |
|
The database management
system can use the entire database schema for its internal needs but require
that each user remain within the bounds described by that user's subschema. |
|
A third reason for
separating the user interface and actual data manipulation into two packages
is to achieve data independence. |
|
The ability to change
the organization of the database without changing the application software. |
|
The personnel department
might need to add an additional field to each employee's record for the
health insurance program. |
|
If the application
software dealt directly with the database, all application programs would
have to be modified. |
|
Change instigated
by the personnel department would cause changes to the payroll program
as well as to the program for printing mailing labels for the company's
newsletter. |
|
To implement a change
required by a single user: |
|
Only the schema used
by the central system and the subschemas of those users involved in the
change must be changed. |
|
All other subschemas
remain the same, so the corresponding application software executes as
though no changes were made. |
|
A fourth reason for
separating the application software from the database management system
is that it allows the application software
to be written in terms of a simplified, conceptual view ofthe database
rather than the actual, complex structure involving disk tracks, pointers,
and overflow areas. |
|
Database management
system contains routines that can be used as abstract tools by the application
software to convert commands in terms of a conceptual view of the database,
called the database model, into terms of the actual database storage. |
End User |
Data seen in
terms of the
application |
Applicaion
Software |
Data seen in
terms of a
Database model |
Database
Management
System |
Data seen in
its actual
organization |
Actual Database |
Application
Software
Host
Language
|
Application software
is often written in general-purpose programming languages. |
|
These languages provide
the basic ingredients for algorithm expression but lack
the operations that make manipulation of the
database convenient. |
|
The routines provided
by the database management system in effect extend
the capabilities of the language being used
in a manner that supports the conceptual image of the database model. |
|
This concept of the
general-purpose language being the original system to which the capabilities
of the database management system are added results in the original language
being referred to as the host language. |
|
Many commercial database
management system packages today are actually combinations of the traditional
database system and a host language. This tends to disguise the two
as one, although the distinction still exists within. |
|
To develop application
software for a database installation, a programmer must understand the
abstract tools provided by the database management system being used. |
|
The relational
database model is the model provided by most
of today's database management systems. |
|
It allows the application
software to be written as though the data in the database were stored in
tables with rows and columns. |
The
Relational Model
|
In this section we
introduce the relational database model, which is the most popular model
today. |
|
Its popularity stems
from the simplicity of its structure. |
|
It portrays data
as being stored in tables called relations. |
|
As an example, the
representation of the previously introduced employee information system
might be represented in the relational model by the following table or
relation: |
EmplId
Name
Address
SSNum
25X15 |
Joe E. Baker |
33 Nowhere St. |
111223333 |
34Y70 |
Cheryl H. Clark |
563 Downtown Ave. |
999009999 |
23Y34 |
G. Jerry Smith |
1 555 Circle Dr |
111005555
|
etc.
|
|
|
|
|
A row
in a relation is called a tuple. |
|
Tuples consist of
the information about a particular employee. |
|
A column
in a relation is referred to as an attribute: |
|
Each entry in a column
describes some characteristic, or attribute, of the entity represented
by the corresponding tuple. |
Relational
Design
|
The design of a database
in terms of the relational model centers on the design of the relations
making up the database. |
|
Although this may
appear to be a simple task, many subtleties are waiting to trap the unwary
designer. |
|
Suppose that, in
addition to the information contained in the above relation, we want to
include information about the jobs held by the employees. |
|
Associated with each
employee, we may want to include: |
|
a job history, consisting
of such attributes as job title, |
|
a job identification
code (unique to each job), |
|
the skill code associated
with each job, |
|
the department in
which the job exists, |
|
the period during
which the employee held the job in terms of a starting date and termination
date. |
Extending the relation
above to include these attributes as additional columns in the table, as
shown below reveals several problems.
Emplid Name
Address
SSN
Jobid JobTitle SkillCode
Dept StartDate
TermDate
25X I 5 Joe E.
33 Nowhere 111223333
F5 Floor
FM3 Sales
9-1-95 9-30-96
Baker
St.
manager
25XI 5 Joe E.
33 Nowhere 111223333
D7 Dept.
D2
Sales 10-1-96
Baker
St.
head
34Y70 Cheryl H. 563 Downtown
999009999 F5
Floor FM3
Sales 10-1-95
Clark Ave.
manager
23Y34 G. Jerry
1555 Circle 111005555
S25X Secretary T5
Personnel 3-1-93
4-30-95
Smith Dr
23Y34 G. Jerry
1555 Circle 111005555
S25Z Secretary T6
Accounting 5-1-95
Smith Dr
|
|
The relation contains
one tuple for each employee assignment to a job. |
|
I.e. the information
contained in the original relation (each employee's name, address, etc.)
must be repeated. |
|
If a particular job
has been held by numerous employees, the department associated with that
job must be identified in each tuple representing an assignment of the
job. |
|
A more serious problem
with the extended relation surfaces when we consider deleting information
from the database. |
|
Suppose, Joe E. Baker
is the only employee to hold the job identified as D7. |
|
If he were to leave
the company and be deleted from the database represented below we would
lose the information about job D7. |
|
The only tuple containing
the fact that job D7 requires a skill level of D2 is the tuple relating
to Joe Baker. |
|
You might argue that
the ability to erase only a portion of a tuple could solve the problem,
but this would in turn introduce other complications. |
|
Should the information
relating to job F5 also be retained in a partial tuple, or does this information
reside elsewhere in the relation? |
|
The temptation to
use partial tuples is a strong indication that
the design of the relation is not compatible with the application. |
|
The source of these
problems is that we have combined more than one concept into a single relation. |
|
As it is proposed,
the extended relation contains information dealing directly with: |
|
Employees |
|
Name, I.D. number,
address, Social Security number |
|
Company job information |
|
Job identification,
job title, department, skill code |
|
Employees - jobs
relationship information |
|
Start date, termination
date |
|
Having made this
observation, we find that our problems can be solved by redesigning
the system using three relations-one for each of the preceding topics. |
|
Using the new system,
we: |
|
Keep the original
relation as it was originally designed |
|
Insert the additional
information in two new relations: |
|
JOB and |
|
ASSIGNMENT. |
|
A database consisting
of these three relations contains the pertinent information about: |
|
Employees through
the EMPLOYEE relation, |
|
Available jobs through
the JOB relation, |
|
Job history through
the ASSIGNMENT
relation. |
|
Additional information
is implicitly available by combining the information from different relations. |
|
The departments in
which a given employee has worked may be found by first finding all the
jobs that employee has held using the ASSIGNMENT relation. |
|
Additional information
is implicitly available by combining the information from different relations. |
|
The departments in
which a given employee has worked may be found by first finding all the
jobs that employee has held using the ASSIGNMENT relation and then finding
the departments associated with those jobs by means of the JOB relation. |
|
In some cases a relation
can be decomposed into smaller relations without losing information (called
a non-loss decomposition), and at other times information is lost. |
|
The classification
of such characteristics has been, and still is, a concern in computer science. |
|
Such questions concerning
the properties of relations have resulted in a hierarchy of relation classes
called first normal form, second normal form,
third normal form, and so on, with the relations
in each class being more conducive to use in a database than those in the
preceding class. |
|
At times we need
to select certain tuples from a relation. |
|
To retrieve the information
about an employee: |
|
We must select the
tuple with the appropriate identification attribute value from the EMPLOYEE
relation, |
|
To obtain a list
of the job titles in a certain department: |
|
We must select the
tuples from the JOB relation having that department as their department
attribute. |
|
The result of this
selection is another relation (another table) consisting of the tuples
selected from the parent relation. |
|
The outcome of selecting
information about a particular employee results in a relation containing
only one tuple from the EMPLOYEE relation. |
|
The outcome of selecting
the tuples associated with a certain department probably results in several
tuples from the JOB relation. |
Relational
Operations
|
One operation we
may want to perform on a relation is to: |
|
SELECT
tuples possessing certain characteristics and |
|
Place these selected
tuples in a new relation. |
|
To express this operation,
we adopt the syntax |
|
NEW <- SELECT
from EMPLOYEE where EmplId = "34Y70" |
The semantics
of this statement creates a new relation called NEW containing those tuple(s)
from EMPLOYEE whose EmplId is 34Y70.
|
In contrast to the
SELECT operation, which extracts rows from a relation, the PROJECT
operation extracts columns. |
|
The operation is
expressed as: |
|
MAIL <- PROJECT
Name, Address from EMPLOYEE |
|
The result
is the creation of another new relation (called MAIL) that contains the
two column of values from the corresponding columns of relation EMPLOYEE. |
|
JOIN is
a third operation. It JOINs two relations (or more) to produce a new relation
whose attributes consist of the attributes from the original relations. |
|
The names of these
attributes are the same as those in the original relations except that
each
is prefixed by the name of the relation of its origin. |
|
This naming convention
ensures that the attributes in the new relation have unique names. |
|
The tuples (rows)
of the new relation are produced by concatenating tuples from the two original
relations. |
|
Which tuples are
actually joined to form tuples in the new relation is determined by the
condition under which the JOIN is constructed. |
|
One such condition
is that designated attributes have the same value. |
Example:
|
A tuple from relation
A is concatenated with a tuple from relation B if the attributes w and
x in the two tuples are equal. |
|
The concatenation
of the tuple (r, 2) from relation A with the tuple (2, m, q) from relation
B appears in the result because the value of attribute W in the first equals
the value of attribute X in the second. |
Relation A
Relation B
v
w
x y z
r
2
5 g p
t
4
4 d e
p
6
2 m q
4 t
f
C
<- JOIN A and B where A.W = B.X
Relation C
A.V
A.W B.X
B.Y B.Z
r
2
2 m
q
t
4 4
d e
t
4
4 t
f
SQL
|
SQL (Structured Query
Language) is used extensively in the data
processing community for manipulating relational model databases. It is
popular because: |
|
It has been standardized
by ANSI. |
|
It was originally
developed and marketed by IBM. |
|
A query involving
a sequence of SELECT, PROJECT, and JOIN operations can be expressed as
a single SQL statement. |
|
SQL statements do
not specify a particular sequence of operations. |
|
Although a query
stated in SQL is expressed in an imperative-sounding form, the reality
is that it is essentially a declarative statement. |
|
The significance
of this is that: |
|
SQL relieves the
database user from developing a sequence of steps needed to obtain the
information desired. |
|
He or she needs merely
to describe that information. |
|
A query in which
employee identification numbers along with their corresponding departments
could be: |
|
developed in a three-step
process |
|
or could be stated
in a single SQL statement: |
select EmplId, Dept
from ASSIGNMENT, JOB
where ASSIGNMENT.Jobid = JOB.Jobid
and ASSIGNMENT.TermDate = '*'
|
As indicated by this
example, each SQL query statement can contain three clauses-a select
clause,
a from clause,
and a where clause. |
|
Roughly speaking,
such a statement is a request for the result of: |
|
Forming the JOIN
of all the relations listed in the from
clause, |
|
SELECTing
those tuples that satisfy the conditions in the where
clause, |
|
PROJECTing
those tuples listed in the select
clause. |
|
Note that SQL encompasses
statements for: |
|
Defining the structure
of relations, |
|
Creating relations,
and |
|
Modifying the contents
of relations |
|
as well as performing
queries. |
insert
into EMPLOYEE
values
('42ZI2', 'Lloyd A. Burt', '333 Endless Ave.', '444661111')
delete from
EMPLOYEE
where Name
= 'G. Jerry Smith'
update EMPLOYEE
set Address
'1812 Napoleon Ave.'
where Name
'Joe E. Baker'
Object-Oriented
Databases
|
One of the newer
areas in database research involves applying the object-oriented paradigm
to the construction of a database, resulting in an object-oriented database.
The motivation behind this movement is at least fourfold. |
|
First, data
independence can be achieved by means of encapsulation. |
|
Second, the concepts
of classes and inheritance
appear ready-made for describing schemas and
subschemas of databases. |
|
Third, the image
of a database consisting of intelligent data
objects that can answer questions themselves
rather than being interrogated by a supervisory program is inviting. |
|
Fourth, early research
indicates that the object-oriented approach may
overcome some of the restrictions inherent in other database models.' |
|
Let us consider an
object-oriented implementation of the employee database from the previous
section. |
|
We have three classes
(types of objects): |
|
EMPLOYEE, |
|
An object from the
EMPLOYEE class contains subobjects for such entries as EmplId, Name, Address,
and SSNum; |
|
JOB, |
|
An object from the
class JOB contains subobjects for the items Jobid, JobTitle, SkillCode,
and Dept; |
|
ASSIGNMENT. |
|
Each object from
the class ASSIGNMENT contains subobjects for StartDate and TermDate. |
|
Each of the subobjects
consists of a storage structure encapsulated with a collection of methods
describing how that subobject responds to messages regarding its contents. |
|
In turn, each object
from the classes EMPLOYEE, JOB, and ASSIGNMENT contains methods describing
how that object responds to requests for information and commands to update
data. |
|
Each object from
the class EMPLOYEE might have: |
|
A method for reporting
that employee's job history and perhaps |
|
A method for changing
that employee's job assignment. |
|
Each object from
the JOB class might have: |
|
A method for reporting
those employees who have held that particular job. |
|
Note that these methods
accomplish their tasks by sending messages to the appropriate subobjects
and other objects in the system. |
|
Proponents of object-oriented
database technology argue that: |
|
The image presented
by |
|
a database
consisting of objects representing employees and jobs |
|
more closely represents
the user's environment than |
|
a database consisting
of relations containing tuples and attributes. |
|
For example: |
|
To add a new employee
to an object-oriented database: |
|
The user actually
adds a new employee object |
|
Rather than inserting
a tuple into a relation. |
|
Some form of linkage
between different objects must be maintained for the methods within objects
to perform their tasks in an efficient manner: |
|
How, for example,
does an object from the EMPLOYEE class know which objects from the ASSIGNMENT
class represent assignments pertaining to its employee? |
|
One approach toward
providing this linkage is to extend the composition of objects beyond that
found in traditional object-oriented environments. |
|
Traditional objects
are composed of data structures and methods. |
|
For the purposes
of constructing an object-oriented database, this composition can be extended
to include a third kind of component,
a list of other objects. |
|
Objects in such a
system therefore consist of data structures, methods, and lists of objects. |
|
If such an approach
is applied to our employee database, an object from the class EMPLOYEE
can encompass and maintain a list of those objects from the class ASSIGNMENT
that relate to the pertinent employee. |
|
Methods within the
object then can use this list when responding to inquiries about the object's
job history. |
|
The result is a database
consisting of objects that maintain records about the existence of other
objects and are therefore able to identify the appropriate objects and
communicate with them when responding to requests for information. |
The
Commit/Rollback Protocol
|
A single transaction,
such as the transfer of funds from one bank account to another, the cancellation
of an airline reservation, and the registration of a student in a university
course, may involve multiple steps at the database level. |
|
For example, a transfer
of funds between bank accounts requires that the balance in one account
be decremented and the balance in the other be incremented. |
|
Between such steps
the information in the database may be inconsistent. |
|
Indeed, funds are
missing during the brief period after the first account has been decremented
but before the other has been incremented. |
|
Likewise, when reassigning
a passenger's seat on a flight there may be an instant when the passenger
has no seat or an instant when the passenger list appears to be one passenger
greater than it actually is. |
|
In the case of large
databases that are subject to heavy transaction loads, it is highly likely
that a random snapshot will find the database in the middle of some transaction. |
|
A request for the
execution of a transaction or an equipment malfunction will therefore likely
occur at a time when the database is in an inconsistent state. |
|
Let us first consider
the problem of a malfunction. |
|
The goal of the database
management system is to assure that such a problem will not freeze the
database in an inconsistent state. |
|
This is often accomplished
by maintaining a log containing
a record of each transaction's activities in a nonvolatile storage system,
such as a disk. |
|
Before a transaction
is allowed to alter the database the alteration to be performed is first
recorded in the log. |
|
In fact, some
databases use a deferred update protocol,
in which all of a transaction's proposed alterations must be recorded in
the log before any action is taken on the database itself. |
|
The point at which
all the steps in a transaction have been recorded in the log is called
the commit point. |
|
It is at this point
that the database management system has the information it needs to reconstruct
the transaction on its own if that should become necessary. |
|
At this point the
database management system becomes committed to the transaction in the
sense that it accepts the responsibility of guaranteeing that the transaction's
activities will be reflected in the database. |
|
In the case of an
equipment malfunction the database management system can use the information
in its log to reconstruct the transactions that have been completed (committed)
since the last backup was made. |
|
If problems should
arise before a transaction has reached its commit point, the database management
system may find itself with a partially executed transaction that cannot
be completed. |
|
In this case the
log can be used to roll back (undo)
the activities actually performed by the transaction. In the case
of a malfunction, for instance, the database management system could recover
by rolling back those transactions that were incomplete
(noncommitted) at the time of the malfunction. |
|
Rollbacks of transactions
are not restricted, however, to the process of recovering from equipment
malfunctions. They are often a part of a database management system's normal
operation. |
|
For example, a transaction
may be terminated before it has completed all its steps due to an attempt
to access privileged information, or it may be involved in a deadlock in
which competing transactions find themselves waiting for data being used
by the other. |
|
In these cases the
database management system can use the log to roll back a transaction and
thus avoid an erroneous database due to incomplete transactions. |
|
To emphasize the
delicate nature of database management system design, we should note that
there are subtle problems lurking within the rollback process. |
|
The rolling back
of one transaction may affect database entries that have been used by other
transactions. |
|
For example, the
transaction
being rolled back may have updated an account balance, and another transaction
may have already based its activities on this updated value. |
|
This may mean that
these additional transactions must also be
rolled back, which may adversely affect still
other transactions. |
|
The result is the
problem known as cascading rollback. |
Locking
|
We now consider the
problem of a transaction being executed while the database is in a state
of flux from another transaction, a situation that can lead to inadvertent
interaction between the transactions and produce erroneous results. |
|
For instance, the
problem known as the incorrect summary problem
can
arise if one transaction is in the middle of transferring funds from one
account to another when another transaction tries to compute the total
deposits in the bank. |
|
This could result
in a total that is either too large or too small depending on the order
in which the transfer steps are performed. |
|
Another possibility
is known as the lost update problem,
which is exemplified by two transactions, each of which makes a deduction
from the same account. |
|
If one transaction
reads the account's current balance at the point when the other has just
read the balance but has not yet calculated the new balance, then both
transactions will base their deductions on the same initial balance. |
|
In turn, the effect
of one of the deductions will not be reflected in the database. |
|
To solve such problems,
a database management system could force transactions to execute in their
entirety on a one-at-a-time basis by holding each new transaction in a
queue until those preceding it have completed. |
|
But, a transaction
often spends a lot of time waiting for disk operations to be performed. |
|
By interweaving the
execution of transactions, the time during which one transaction is waiting
can be used by another transaction to process data it has already retrieved. |
|
Most large database
management systems therefore contain a scheduler
to
coordinate time-sharing among transactions in much the same way that a
time-sharing operating system coordinates interweaving of processes. |
|
To guard against
such anomalies as the incorrect summary problem and the lost update problem,
such schedulers incorporate a locking protocol
in
which the items within a database that are currently being used by some
transaction are marked as such. |
|
These marks are called
locks; marked items are said to be locked. |
|
Two types of locks
are common-shared locks
and exclusive locks. |
|
They correspond to
the two types of access a transaction may require to a data item-shared
access and exclusive access. |
|
If a transaction
is not going to alter the data item, then it requires shared access, meaning
that other transactions are also allowed to view the data item. |
|
However, if the transaction
is going to alter the item, it must have exclusive access, meaning that
it must be the only transaction with access to the item. |
|
In a locking protocol,
each time a transaction requests access to a data item it must also tell
the database management system the type of access it requires. |
|
If a transaction
requests shared access to a data item that is either unlocked or locked
with a shared lock, that access is granted and the item is marked with
a shared lock. |
|
If, however, the
requested item is already marked with an exclusive lock, the additional
access is denied. |
|
If a transaction
requests exclusive access to a data item, that request is granted only
if the item has no lock associated with it. |
|
In this manner, a
transaction that is going to alter a data item protects that item from
other transactions by obtaining exclusive access. |
|
At the same time,
several transactions can share access to a data item if none of them are
going to change it. |
|
Of course, once a
transaction is finished with an item, it notifies the database management
system and the associated lock is removed. |
|
Various algorithms
are used to handle the case when a transaction's access request is rejected. |
|
One is that the transaction
is merely forced to wait until the requested item becomes available. |
|
This approach, however,
can lead to deadlock,
since two transactions that require exclusive access to the same two data
items could block each other's progress if each obtains exclusive access
to one of the items and then insists on waiting for the other. |
|
To avoid such deadlocks,
some database management systems give priority to older transactions. |
|
That is, if an older
transaction requires access to a data item that is locked by a younger
transaction, the younger transaction is forced to release all of its data
items, its activities are rolled back (based on the log), the older transaction
is given access to the data item, and the younger transaction is forced
to start again. |
|
If a younger transaction
is repeatedly preempted, it will grow older in the process and ultimately
become one of the older transactions. |
|
This protocol, known
as the wound-wait protocol (old
transactions wound young transactions, young transactions wait for old
ones), assures that every transaction will ultimately be allowed to complete
its task. |
The next
page (SOC5) deals with Artificial Intelligence.
|