Database Structures

A database is formed by combining techniques from:

	Data structures and
	File structures discussed previously
	To obtain a single mass storage data system that can appear to have a multitude of organizations for serving a variety of applications.

Such structures eliminate the duplication (and harmful redundancy) found in the file-oriented approach.

They provide separate data systems for each application even though these applications may require much of the same information.

General Issues

	Loosely speaking, any collection of data can be considered a database.
	More specifically, the term is reserved to mean:

	A collection of data,
	Stored in mass storage.
	Can take on a variety of appearances depending on the requirements at the time.
	Can serve as the data source for a variety of applications.

The requirement to have the data in several forms was addressed previously in the discussion of employee records.

The data needed to be shown as a:

	Sequential file for payroll update
	Direct access file for general employee information retrieval

Indexed system was found to provide this dual appearance

	Indexed files are generally accepted to contain the rudiments of a database.
	It must be remembered that a “full fledged” database contains a great deal of diversity that greatly exceeds the simplicity provided above

Databases in use today:

	Contain information encompassing the full spectrum of business activities.
	Provide access to selected portions of those data in a large number of formats.

Schemas and Subschemas

From another point of view, a database may be considered to result from the combination of a variety of data collections organized into a single integrated collection.

	This consolidation approach to the database concept reflects the historical development of automated data storage and maintenance.
	As computing machinery found wider and wider uses in information management, each application tended to be implemented as a separate system with its own collection of data.
	Typically, the need to process payroll gave rise to a sequential file, and later the need for interactive data retrieval produced an entirely different system using a direct access file.

Independent systems represented an improvement over the corresponding manual techniques previously used, but taken as a whole the collection of individual automated systems still constituted a limited and inefficient use of resources when compared to the possibilities of a combined database system.

For example, different departments were not able to share the data they all needed, so much of the information required by an organization was duplicated in storage.

When an employee moved:

	Address change cards throughout the organization had to be completed
	Typographical errors, misplaced cards, and employee apathy could soon result in erroneous and conflicting data within the various data systems.
	After a move, an employee's newsletter might begin to arrive at the new address but with the wrong name, while the payroll records could continue to reflect the old address.

In this setting, database systems emerged as a means of consolidating the information stored and maintained by a particular organization.

With such a system, both payroll and the mailing of newsletters could be processed from a single integrated data system.

Another advantage of a consolidated data system is the control achieved by the organizationwhen the information it owns is placed in one common pot.

As long as each department has complete control over its own data, those data tend to be used for the good of the department rather than for the good of the organization.

When a central database is implemented in a large organization, the control of information is normally consolidated in the administrative position known as the Database Administrator (DBA).

This central administrator knows both the data available within the organization and the needs of the various departments.

It is within this structure that decisions regarding data organization and access can be made with the entire organization in mind.

Other concerns center around access to sensitive data.

Someone working on the organization's newsletter:

	might need access to employee names and addresses
	should not have access to payroll data.

An employee processing payroll should not have:

access to the other financial records of the corporation.

The ability to control access to the information in the database is often as important as the ability to share it.

To provide for this distinction of access privileges, database systems often rely on schemas and subschemas.

	A schema is a description of the entire database structure that is used by the database software to maintain the database.
	A subschema is a description of only that portion of the database pertinent to a particular user's needs.

In a university database:

	Student records contains such items as the current address and phone number of that student in addition to that student's academic record.
	Student records are linked to the record of the individual student's faculty adviser.
	The faculty member’s record contains that person's address, employment history, etc.
	A pointer system is maintained that links the information about a student to the employment history of a faculty member.

To keep the university's registrar from using this linkage to obtain privileged information about the faculty, the registrar's access to the database must be restricted to a subschema whose description of the faculty records does not include employment history.

Under this subschema a user can find out which faculty member is a particular student's adviser but cannot obtain access to additional information about that faculty member.

In contrast, the subschema for the payroll department provides the employment history of each faculty member but does not include the linkage between students and advisers.

Thus the payroll department can modify a faculty member's salary but cannot find the students advised by that person. Big brother is watching:

The size and scope of databases have increased rapidly. Extremely large collections of data can be assembled and interrogated with little effort over wide geographic areas. This creates the opportunity for:

	Increases in misinformation and misapplications of information.
	Incidents of injustices due to inaccurate credit reports, faulty criminal records, and discrimination resulting from unauthorized or unethical access to personal information.

Courts still have to decide the rights to collect and hold information in the first place:

	What kind of information does an insurance company have a right to collect regarding its clients?
	Does a government have the right to maintain accounts of an individual citizen's voting record?
	Does a credit card company have the right to sell records of its customers' purchasing patterns to marketing firms?

The Layered Approach to Database Implementation

The average individual using database systems are not trained in computer science and should not be required to consider the details of computer technology and techniques.

One of the goals of a database system must be to allow the end user to concentrate on the problems of the application at hand.

The systems must provide for “ad hoc.” questions.

Data must be presented in a usable form.

To accomplish this goal in an organized manner, a database system is constructed from layers of abstraction The image of the data given to the end user is produced by the application software that communicates with the user in an interactive manner and in the application's terminology. The design of the application software is what gives it its personality.

For example, it may communicate with the end user via:

Question-and-answer or

Fill-in-the-blanks scenario.

Whatever the scenario, the purpose is to learn what information is required and later, having obtained the requested information, presents it to the user in a meaningful format. The actual manipulation of the database is accomplished by another software package called the database management system (DBMS). One reason for the division of duties is to:

Simplify the design process.

This is particularly true in the context of a distributed database (a database spread over several machines in a network).

Without the database management system, the application program must contain routines for keeping up with the actual location of the various portions of the database. With a well-designed database management system, the application software can be written as though the database were stored on a single machine. A second reason for the separation of the application software from the database management system is that such an organization provides a means for controlling access to the database.

The system enforces the restrictions imposed by the various subschemas by dictating that all access to the database be performed by a central database management system.

The database management system can use the entire database schema for its internal needs but require that each user remain within the bounds described by that user's subschema.

A third reason for separating the user interface and actual data manipulation into two packages is to achieve data independence.

The ability to change the organization of the database without changing the application software.

The personnel department might need to add an additional field to each employee's record for the health insurance program.

If the application software dealt directly with the database, all application programs would have to be modified.

Change instigated by the personnel department would cause changes to the payroll program as well as to the program for printing mailing labels for the company's newsletter.

To implement a change required by a single user:

Only the schema used by the central system and the subschemas of those users involved in the change must be changed.

All other subschemas remain the same, so the corresponding application software executes as though no changes were made.

A fourth reason for separating the application software from the database management system is that it allows the application software to be written in terms of a simplified, conceptual view ofthe database rather than the actual, complex structure involving disk tracks, pointers, and overflow areas.

Database management system contains routines that can be used as abstract tools by the application software to convert commands in terms of a conceptual view of the database, called the database model, into terms of the actual database storage.

End User Data seen in
terms of the
application Applicaion
Software Data seen in
terms of a
Database model Database
Management
System Data seen in
its actual
organization Actual Database

Application Software
Host Language

Application software is often written in general-purpose programming languages.

These languages provide the basic ingredients for algorithm expression but lack the operations that make manipulation of the database convenient.

The routines provided by the database management system in effect extend the capabilities of the language being used in a manner that supports the conceptual image of the database model.

This concept of the general-purpose language being the original system to which the capabilities of the database management system are added results in the original language being referred to as the host language.

Many commercial database management system packages today are actually combinations of the traditional database system and a host language. This tends to disguise the two as one, although the distinction still exists within.

To develop application software for a database installation, a programmer must understand the abstract tools provided by the database management system being used.

The relational database model is the model provided by most of today's database management systems.

It allows the application software to be written as though the data in the database were stored in tables with rows and columns.

The Relational Model

In this section we introduce the relational database model, which is the most popular model today.

	Its popularity stems from the simplicity of its structure.
	It portrays data as being stored in tables called relations.

As an example, the representation of the previously introduced employee information system might be represented in the relational model by the following table or relation: EmplId Name Address SSNum

25X15	Joe E. Baker	33 Nowhere St.	111223333
34Y70	Cheryl H. Clark	563 Downtown Ave.	999009999
23Y34	G. Jerry Smith	1 555 Circle Dr	111005555
etc.

A row in a relation is called a tuple.

Tuples consist of the information about a particular employee.

A column in a relation is referred to as an attribute:

Each entry in a column describes some characteristic, or attribute, of the entity represented by the corresponding tuple.

Relational Design

The design of a database in terms of the relational model centers on the design of the relations making up the database.

Although this may appear to be a simple task, many subtleties are waiting to trap the unwary designer.

Suppose that, in addition to the information contained in the above relation, we want to include information about the jobs held by the employees.

Associated with each employee, we may want to include:

a job history, consisting of such attributes as job title,

a job identification code (unique to each job),

the skill code associated with each job,

the department in which the job exists,

the period during which the employee held the job in terms of a starting date and termination date.

Extending the relation above to include these attributes as additional columns in the table, as shown below reveals several problems.

Emplid        Name       Address               SSN              Jobid      JobTitle     SkillCode   Dept        StartDate        TermDate
25X I 5       Joe E.      33 Nowhere        111223333       F5       Floor          FM3           Sales         9-1-95         9-30-96
                   Baker            St.                                                     manager
25XI 5        Joe E.      33 Nowhere        111223333       D7       Dept.          D2              Sales        10-1-96
                   Baker            St.                                                      head
34Y70        Cheryl H. 563 Downtown   999009999        F5       Floor          FM3           Sales         10-1-95
                                   Clark Ave.                                               manager
23Y34        G. Jerry   1555 Circle         111005555     S25X     Secretary    T5              Personnel    3-1-93        4-30-95
                   Smith       Dr
23Y34        G. Jerry   1555 Circle         111005555     S25Z      Secretary    T6              Accounting 5-1-95
                   Smith       Dr

The relation contains one tuple for each employee assignment to a job.

I.e. the information contained in the original relation (each employee's name, address, etc.) must be repeated.

If a particular job has been held by numerous employees, the department associated with that job must be identified in each tuple representing an assignment of the job.

A more serious problem with the extended relation surfaces when we consider deleting information from the database.

Suppose, Joe E. Baker is the only employee to hold the job identified as D7.

If he were to leave the company and be deleted from the database represented below we would lose the information about job D7.

The only tuple containing the fact that job D7 requires a skill level of D2 is the tuple relating to Joe Baker.

You might argue that the ability to erase only a portion of a tuple could solve the problem, but this would in turn introduce other complications.

Should the information relating to job F5 also be retained in a partial tuple, or does this information reside elsewhere in the relation?

The temptation to use partial tuples is a strong indication that the design of the relation is not compatible with the application. The source of these problems is that we have combined more than one concept into a single relation.

As it is proposed, the extended relation contains information dealing directly with:

Employees

Name, I.D. number, address, Social Security number

Company job information

Job identification, job title, department, skill code

Employees - jobs relationship information

Start date, termination date

Having made this observation, we find that our problems can be solved by redesigning the system using three relations-one for each of the preceding topics. Using the new system, we:

Keep the original relation as it was originally designed

Insert the additional information in two new relations:

JOB and

ASSIGNMENT.

A database consisting of these three relations contains the pertinent information about:

Employees through the EMPLOYEE relation,

Available jobs through the JOB relation,

Job history through the ASSIGNMENT relation.

Additional information is implicitly available by combining the information from different relations.

The departments in which a given employee has worked may be found by first finding all the jobs that employee has held using the ASSIGNMENT relation.

Additional information is implicitly available by combining the information from different relations.

The departments in which a given employee has worked may be found by first finding all the jobs that employee has held using the ASSIGNMENT relation and then finding the departments associated with those jobs by means of the JOB relation.

In some cases a relation can be decomposed into smaller relations without losing information (called a non-loss decomposition), and at other times information is lost. The classification of such characteristics has been, and still is, a concern in computer science.

Such questions concerning the properties of relations have resulted in a hierarchy of relation classes called first normal form, second normal form, third normal form, and so on, with the relations in each class being more conducive to use in a database than those in the preceding class.

At times we need to select certain tuples from a relation.

To retrieve the information about an employee:

We must select the tuple with the appropriate identification attribute value from the EMPLOYEE relation,

To obtain a list of the job titles in a certain department:

We must select the tuples from the JOB relation having that department as their department attribute.

The result of this selection is another relation (another table) consisting of the tuples selected from the parent relation.

The outcome of selecting information about a particular employee results in a relation containing only one tuple from the EMPLOYEE relation.

The outcome of selecting the tuples associated with a certain department probably results in several tuples from the JOB relation.

Relational Operations

One operation we may want to perform on a relation is to:

SELECT tuples possessing certain characteristics and

Place these selected tuples in a new relation.

To express this operation, we adopt the syntax

NEW <- SELECT from EMPLOYEE where EmplId = "34Y70"

The semantics of this statement creates a new relation called NEW containing those tuple(s) from EMPLOYEE whose EmplId is 34Y70. In contrast to the SELECT operation, which extracts rows from a relation, the PROJECT operation extracts columns.

The operation is expressed as:

MAIL <- PROJECT Name, Address from EMPLOYEE

The result is the creation of another new relation (called MAIL) that contains the two column of values from the corresponding columns of relation EMPLOYEE.

JOIN is a third operation. It JOINs two relations (or more) to produce a new relation whose attributes consist of the attributes from the original relations.

The names of these attributes are the same as those in the original relations except that each is prefixed by the name of the relation of its origin.

This naming convention ensures that the attributes in the new relation have unique names.

The tuples (rows) of the new relation are produced by concatenating tuples from the two original relations. Which tuples are actually joined to form tuples in the new relation is determined by the condition under which the JOIN is constructed. One such condition is that designated attributes have the same value.             Example:

A tuple from relation A is concatenated with a tuple from relation B if the attributes w and x in the two tuples are equal.

The concatenation of the tuple (r, 2) from relation A with the tuple (2, m, q) from relation B appears in the result because the value of attribute W in the first equals the value of attribute X in the second.

Relation A                   Relation B
v      w                          x     y        z
r       2                           5    g         p
t       4                           4    d         e
p      6                           2    m       q
                                        4     t         f

C <- JOIN A and B where A.W = B.X
Relation C

A.V        A.W        B.X      B.Y       B.Z
r              2              2            m           q
         t               4          4            d         e
     t             4             4            t              f

SQL

SQL (Structured Query Language) is used extensively in the data processing community for manipulating relational model databases. It is popular because:

It has been standardized by ANSI.

It was originally developed and marketed by IBM.

A query involving a sequence of SELECT, PROJECT, and JOIN operations can be expressed as a single SQL statement.

SQL statements do not specify a particular sequence of operations.

Although a query stated in SQL is expressed in an imperative-sounding form, the reality is that it is essentially a declarative statement.

The significance of this is that:

SQL relieves the database user from developing a sequence of steps needed to obtain the information desired.

He or she needs merely to describe that information.

A query in which employee identification numbers along with their corresponding departments could be:

developed in a three-step process

or could be stated in a single SQL statement:

                                      select EmplId, Dept
                                      from ASSIGNMENT, JOB
                                      where ASSIGNMENT.Jobid = JOB.Jobid
                                                and ASSIGNMENT.TermDate = '*'
As indicated by this example, each SQL query statement can contain three clauses-a select clause, a from clause, and a where clause. Roughly speaking, such a statement is a request for the result of:

Forming the JOIN of all the relations listed in the from clause,

SELECTing those tuples that satisfy the conditions in the where clause,

PROJECTing those tuples listed in the select clause.

Note that SQL encompasses statements for:

Defining the structure of relations,

Creating relations, and

Modifying the contents of relations

as well as performing queries.

insert into EMPLOYEE
values ('42ZI2', 'Lloyd A. Burt', '333 Endless Ave.', '444661111')

delete from EMPLOYEE
where Name = 'G. Jerry Smith'

update EMPLOYEE
set Address '1812 Napoleon Ave.'
where Name 'Joe E. Baker'

Object-Oriented Databases

One of the newer areas in database research involves applying the object-oriented paradigm to the construction of a database, resulting in an object-oriented database. The motivation behind this movement is at least fourfold.

	First, data independence can be achieved by means of encapsulation.
	Second, the concepts of classes and inheritance appear ready-made for describing schemas and subschemas of databases.
	Third, the image of a database consisting of intelligent data objects that can answer questions themselves rather than being interrogated by a supervisory program is inviting.
	Fourth, early research indicates that the object-oriented approach may overcome some of the restrictions inherent in other database models.'

Let us consider an object-oriented implementation of the employee database from the previous section.

We have three classes (types of objects):

EMPLOYEE,

An object from the EMPLOYEE class contains subobjects for such entries as EmplId, Name, Address, and SSNum;

JOB,

An object from the class JOB contains subobjects for the items Jobid, JobTitle, SkillCode, and Dept;

ASSIGNMENT.

Each object from the class ASSIGNMENT contains subobjects for StartDate and TermDate.

Each of the subobjects consists of a storage structure encapsulated with a collection of methods describing how that subobject responds to messages regarding its contents.

In turn, each object from the classes EMPLOYEE, JOB, and ASSIGNMENT contains methods describing how that object responds to requests for information and commands to update data.

Each object from the class EMPLOYEE might have:

	A method for reporting that employee's job history and perhaps
	A method for changing that employee's job assignment.

Each object from the JOB class might have:

A method for reporting those employees who have held that particular job.

Note that these methods accomplish their tasks by sending messages to the appropriate subobjects and other objects in the system.

Proponents of object-oriented database technology argue that:

The image presented by

a database consisting of objects representing employees and jobs

more closely represents the user's environment than

a database consisting of relations containing tuples and attributes.

For example:

To add a new employee to an object-oriented database:

	The user actually adds a new employee object
	Rather than inserting a tuple into a relation.

Some form of linkage between different objects must be maintained for the methods within objects to perform their tasks in an efficient manner:

How, for example, does an object from the EMPLOYEE class know which objects from the ASSIGNMENT class represent assignments pertaining to its employee?

One approach toward providing this linkage is to extend the composition of objects beyond that found in traditional object-oriented environments.

Traditional objects are composed of data structures and methods.

For the purposes of constructing an object-oriented database, this composition can be extended to include a third kind of component, a list of other objects.

Objects in such a system therefore consist of data structures, methods, and lists of objects.

If such an approach is applied to our employee database, an object from the class EMPLOYEE can encompass and maintain a list of those objects from the class ASSIGNMENT that relate to the pertinent employee.

Methods within the object then can use this list when responding to inquiries about the object's job history.

The result is a database consisting of objects that maintain records about the existence of other objects and are therefore able to identify the appropriate objects and communicate with them when responding to requests for information.

Maintaining Database Integrity

	Inexpensive database management systems for personal use are relatively simple systems.
	They tend to have a single objective:

To shield the user from the technical details of the database implementation.

The databases maintained by these systems are relatively small and generally contain information whose loss or corruption would be inconvenient rather than disastrous.

When a problem does arise, the user can usually correct the erroneous items directly or reload the database from a backup copy and manually make the modifications required to bring that copy up to date.

This process may be inconvenient, but the cost of avoiding the inconvenience tends to be greater than the inconvenience itself.

In any case, the inconvenience is restricted to only a few people and any financial loss is generally limited.

In the case of large, multiuser, commercial database systems, however, the stakes are much higher.

The cost of incorrect or lost data can be enormous and can have devastating consequences.

In these environments a major role of the database management system is to maintain the database's integrity by guarding against problems such as:

	Operations that for some reason are only partially completed or
	Different operations that may interact inadvertently to cause inaccurate information in the database.

It is this role of a database management system that we address in this section.

The Commit/Rollback Protocol

	A single transaction, such as the transfer of funds from one bank account to another, the cancellation of an airline reservation, and the registration of a student in a university course, may involve multiple steps at the database level.
	For example, a transfer of funds between bank accounts requires that the balance in one account be decremented and the balance in the other be incremented.
	Between such steps the information in the database may be inconsistent.
	Indeed, funds are missing during the brief period after the first account has been decremented but before the other has been incremented.
	Likewise, when reassigning a passenger's seat on a flight there may be an instant when the passenger has no seat or an instant when the passenger list appears to be one passenger greater than it actually is.
	In the case of large databases that are subject to heavy transaction loads, it is highly likely that a random snapshot will find the database in the middle of some transaction.
	A request for the execution of a transaction or an equipment malfunction will therefore likely occur at a time when the database is in an inconsistent state.
	Let us first consider the problem of a malfunction.

	The goal of the database management system is to assure that such a problem will not freeze the database in an inconsistent state.
	This is often accomplished by maintaining a log containing a record of each transaction's activities in a nonvolatile storage system, such as a disk.
	Before a transaction is allowed to alter the database the alteration to be performed is first recorded in the log.
	In fact, some databases use a deferred update protocol, in which all of a transaction's proposed alterations must be recorded in the log before any action is taken on the database itself.

The point at which all the steps in a transaction have been recorded in the log is called the commit point.

	It is at this point that the database management system has the information it needs to reconstruct the transaction on its own if that should become necessary.
	At this point the database management system becomes committed to the transaction in the sense that it accepts the responsibility of guaranteeing that the transaction's activities will be reflected in the database.
	In the case of an equipment malfunction the database management system can use the information in its log to reconstruct the transactions that have been completed (committed) since the last backup was made.

If problems should arise before a transaction has reached its commit point, the database management system may find itself with a partially executed transaction that cannot be completed.

In this case the log can be used to roll back (undo) the activities actually performed by the transaction. In the case of a malfunction, for instance, the database management system could recover by rolling back those transactions that were incomplete (noncommitted) at the time of the malfunction.

Rollbacks of transactions are not restricted, however, to the process of recovering from equipment malfunctions. They are often a part of a database management system's normal operation.

	For example, a transaction may be terminated before it has completed all its steps due to an attempt to access privileged information, or it may be involved in a deadlock in which competing transactions find themselves waiting for data being used by the other.
	In these cases the database management system can use the log to roll back a transaction and thus avoid an erroneous database due to incomplete transactions.

To emphasize the delicate nature of database management system design, we should note that there are subtle problems lurking within the rollback process.

The rolling back of one transaction may affect database entries that have been used by other transactions.

For example, the transaction being rolled back may have updated an account balance, and another transaction may have already based its activities on this updated value.

This may mean that these additional transactions must also be rolled back, which may adversely affect still other transactions.

The result is the problem known as cascading rollback.

Locking

We now consider the problem of a transaction being executed while the database is in a state of flux from another transaction, a situation that can lead to inadvertent interaction between the transactions and produce erroneous results.

For instance, the problem known as the incorrect summary problem can arise if one transaction is in the middle of transferring funds from one account to another when another transaction tries to compute the total deposits in the bank.

This could result in a total that is either too large or too small depending on the order in which the transfer steps are performed.

Another possibility is known as the lost update problem, which is exemplified by two transactions, each of which makes a deduction from the same account.

	If one transaction reads the account's current balance at the point when the other has just read the balance but has not yet calculated the new balance, then both transactions will base their deductions on the same initial balance.
	In turn, the effect of one of the deductions will not be reflected in the database.

To solve such problems, a database management system could force transactions to execute in their entirety on a one-at-a-time basis by holding each new transaction in a queue until those preceding it have completed.

But, a transaction often spends a lot of time waiting for disk operations to be performed.

By interweaving the execution of transactions, the time during which one transaction is waiting can be used by another transaction to process data it has already retrieved.

Most large database management systems therefore contain a scheduler to coordinate time-sharing among transactions in much the same way that a time-sharing operating system coordinates interweaving of processes.

To guard against such anomalies as the incorrect summary problem and the lost update problem, such schedulers incorporate a locking protocol in which the items within a database that are currently being used by some transaction are marked as such.

These marks are called locks; marked items are said to be locked.

Two types of locks are common-shared locks and exclusive locks.

	They correspond to the two types of access a transaction may require to a data item-shared access and exclusive access.
	If a transaction is not going to alter the data item, then it requires shared access, meaning that other transactions are also allowed to view the data item.
	However, if the transaction is going to alter the item, it must have exclusive access, meaning that it must be the only transaction with access to the item.

In a locking protocol, each time a transaction requests access to a data item it must also tell the database management system the type of access it requires.

If a transaction requests shared access to a data item that is either unlocked or locked with a shared lock, that access is granted and the item is marked with a shared lock.

If, however, the requested item is already marked with an exclusive lock, the additional access is denied.

If a transaction requests exclusive access to a data item, that request is granted only if the item has no lock associated with it.

In this manner, a transaction that is going to alter a data item protects that item from other transactions by obtaining exclusive access.

At the same time, several transactions can share access to a data item if none of them are going to change it.

Of course, once a transaction is finished with an item, it notifies the database management system and the associated lock is removed.

Various algorithms are used to handle the case when a transaction's access request is rejected.

One is that the transaction is merely forced to wait until the requested item becomes available.

This approach, however, can lead to deadlock, since two transactions that require exclusive access to the same two data items could block each other's progress if each obtains exclusive access to one of the items and then insists on waiting for the other.

To avoid such deadlocks, some database management systems give priority to older transactions.

That is, if an older transaction requires access to a data item that is locked by a younger transaction, the younger transaction is forced to release all of its data items, its activities are rolled back (based on the log), the older transaction is given access to the data item, and the younger transaction is forced to start again.

If a younger transaction is repeatedly preempted, it will grow older in the process and ultimately become one of the older transactions.

This protocol, known as the wound-wait protocol (old transactions wound young transactions, young transactions wait for old ones), assures that every transaction will ultimately be allowed to complete its task.

The next page (SOC5) deals with Artificial Intelligence.

	The systems must provide for “ad hoc.” questions.
	Data must be presented in a usable form.

	The system enforces the restrictions imposed by the various subschemas by dictating that all access to the database be performed by a central database management system.
	The database management system can use the entire database schema for its internal needs but require that each user remain within the bounds described by that user's subschema.

	The personnel department might need to add an additional field to each employee's record for the health insurance program.
	If the application software dealt directly with the database, all application programs would have to be modified.
	Change instigated by the personnel department would cause changes to the payroll program as well as to the program for printing mailing labels for the company's newsletter.

	Only the schema used by the central system and the subschemas of those users involved in the change must be changed.
	All other subschemas remain the same, so the corresponding application software executes as though no changes were made.

	These languages provide the basic ingredients for algorithm expression but lack the operations that make manipulation of the database convenient.
	The routines provided by the database management system in effect extend the capabilities of the language being used in a manner that supports the conceptual image of the database model.
	This concept of the general-purpose language being the original system to which the capabilities of the database management system are added results in the original language being referred to as the host language.

	The relational database model is the model provided by most of today's database management systems.
	It allows the application software to be written as though the data in the database were stored in tables with rows and columns.

	a job history, consisting of such attributes as job title,
	a job identification code (unique to each job),
	the skill code associated with each job,
	the department in which the job exists,
	the period during which the employee held the job in terms of a starting date and termination date.

	Suppose, Joe E. Baker is the only employee to hold the job identified as D7.
	If he were to leave the company and be deleted from the database represented below we would lose the information about job D7.
	The only tuple containing the fact that job D7 requires a skill level of D2 is the tuple relating to Joe Baker.

	Keep the original relation as it was originally designed
	Insert the additional information in two new relations:

	Employees through the EMPLOYEE relation,
	Available jobs through the JOB relation,
	Job history through the ASSIGNMENT relation.

	The outcome of selecting information about a particular employee results in a relation containing only one tuple from the EMPLOYEE relation.
	The outcome of selecting the tuples associated with a certain department probably results in several tuples from the JOB relation.

	SELECT tuples possessing certain characteristics and
	Place these selected tuples in a new relation.

	The names of these attributes are the same as those in the original relations except that each is prefixed by the name of the relation of its origin.
	This naming convention ensures that the attributes in the new relation have unique names.

	It has been standardized by ANSI.
	It was originally developed and marketed by IBM.

	SQL relieves the database user from developing a sequence of steps needed to obtain the information desired.
	He or she needs merely to describe that information.

	developed in a three-step process
	or could be stated in a single SQL statement:

	Forming the JOIN of all the relations listed in the from clause,
	SELECTing those tuples that satisfy the conditions in the where clause,
	PROJECTing those tuples listed in the select clause.

	Defining the structure of relations,
	Creating relations, and
	Modifying the contents of relations
	as well as performing queries.