Data Structures
|
Data are stored in
the computer’s hard drive and used in the computer’s main memory. |
|
On disk, as in memory,
data are represented as “1’s” and “0’s.” |
|
This format is not
commensurate with human intellect. It is often more convenient to associate
other data arrangements, such as the contents of a single record in linear
form and the contents of a file of records as being in tabular form. |
|
These abstract data
organizations are totally human inspired and do not represent the actual
organization of data at all! |
Arrays
|
Arrays should be
thought of as being limited to a single, homogeneous
data
type. |
|
High level languages
encourage the concept of the data being manipulated as if it were stored
in a linear or tabular
fashion. |
|
The programmer might
refer to the fifth element in a one-dimensional array or the third row
and sixth column of a two-dimensional array. |
|
It becomes the task
of the translator
to convert such references into the terminology of memory cells and addresses.
|
One-Dimensional
Arrays
|
Suppose that we have
an algorithm for manipulating a series of 24 hourly temperature readings,
expressed in a high level language. |
|
The space to contain
the data for each hour is a single byte. |
|
Thus the data can
be stored in a sequence of 24 memory cells with consecutive addresses. |
|
The array, Readings
might be thought of as being able to be accessed by means of the various
positions in the list. |
|
I.E. the first Reading
is in the first location, etc. |
|
The “logical” location
of each succeeding element is calculated by incrementing by one for each
element. |
|
A given position
might be accessed by its subscript or index within the array. |
|
The first might be
referenced by Readings[1], the second by Readings[2], etc. |
|
This is a “language
dependent” manner of using subscripts. |
|
The conversion from
a one-dimensional array’s “logical” organization to its “actual” arrangement
within the machine is very straightforward. |
|
The address at which
a given hourly reading is stored may be determined in a three step manner: |
|
1). The address of
the array is determined. |
|
2). The first element
is located at the address at which the array is stored. |
|
(Its “displacement”
from the starting location is “0.”) |
|
3). The address of
the element desired is determined by adding the “displacement” to the “address
of the array.” |
|
Thus, in this case,
to find the address of the 4th element in the array (4 a.m.): |
|
13 + (4 - 1) = 16 |
|
Multidimensional
Arrays |
|
Multidimensional
Arrays use additional subscripts to address elements. |
|
The following two
dimensional example requires 2 subscripts: |
|
Weekly sales records
- by salesperson (3) by day of week (5). |
|
Row major order indicates
that the first subscript refers to the row, the second to the column within
that row: |
|
Sales[3,4] is shaded
below: |
Multidimensional
Arrays
|
Conversion of subscripts
for a multidimensional array into memory addresses is not as simple as
it is for single dimensional arrays. |
|
The address of a
given element in the array is: |
|
E=A+S*((C*(I-1))+(J-1)),
where: |
|
E - The address of
the Element being sought. |
|
A - The beginning
Address of the array. |
|
C - The number of
Columns in the array. |
|
I - The current row
(subscript). |
|
J - The current column
(subscript). |
|
S - The Size of each
data element - 1 byte, 1 full word, etc. |
Lists
|
The form of an array
(number of dimensions, number of entries in each dimension, size and data
type of each element in the arrays) must be
determined prior to “compile time.” The compiler
needs to reserve a fixed amount of space for the array and for all of the
other STATIC data structures (constants, literals and variables) in the
program. |
|
This requirement
is often inconvenient. Assume a state school registration system. How many
entries should be planned for in the array? Before the first student registers,
no entries are required. The max number may vary from year to year. This
falls in the category of 1 is too many and 1,000,000 is not enough. |
|
What is required
is a DYNAMIC data
structure.
|
Pointers
|
The Basic Addressable
Unit (BAU) was an ARTHURISM (The instructor's name is J.R. Arthur) identified
as a BYTE.
|
|
If the address at
which a piece of data is stored is known, it is trivial to find it. |
|
Since addresses are
simply numeric values, they themselves may be stored as other data are
stored. |
|
Address cells are
called pointers. |
|
Program counter =
instruction pointer. |
Contiguous
Lists
|
Contiguous lists
are similar to array storage systems and do not use pointers. |
|
Central to the concept
of arrays was the storage location (address) at which the array would be
stored as well as the size of the array. |
|
If the entire list
is stored in a single block of memory cells with consecutive addresses. |
|
If the list of 10
names (each of which is 8 characters or bytes in length), it takes
80 bytes of contiguous memory to store the list. In Pascal: |
|
var |
list packed array [1..10,1..8] of char;
Which is 10 rows
of 8 characters.
|
Problems: |
|
If a name is deleted,
all of names which follow it must be moved. |
|
If a name is added,
all names which follow must be moved. |
|
If there is no additional
space, there is a problem.
|
Linked
Lists
|
Problems associated
with the contiguous list may be avoided using linked lists, however, there
is some overhead associated with this: |
|
To keep track of
where the list starts, there must be a pointer to the first name in the
list. |
|
The data structure
will consume more space, since it will be necessary to store a “next-element”
pointer with each of the names. |
|
To read the list
simply look at the record pointed to by the HeadPtr, read the name, read
the pointer contained in that cell, read the record pointed to by the pointer,
etc. to the end of the list. |
|
The value of the
pointer in the last record in the list has a special value called the NIL
pointer. This indicates that there are no more entries in the list. |
|
If it ever becomes
necessary to delete a record in the list, simply: |
|
Take the pointer
in the record to be deleted, and place it in the record that preceded it. |
|
This leaves two records
pointing at the the record that used to follow the deleted record. This
is not a problem, but the fact that there is a record that is no longer
being used is a problem. |
|
If the space is not
reclaimed, eventually, the system will run out of memory. This loss is
called memory leak. |
|
To prevent memory
leak, there must be some way of collecting and reclaiming memory no longer
needed in the data structure, this is called garbage collection. |
|
To add a record into
a linked list structure: |
|
A “chunk” of memory
must be obtained by requesting the system’s memory manager to allocate
a sufficiently large amount of free space to hold the new record and the
pointer. |
|
A temporary pointer
must be set up to point to the new memory. |
|
The contents of the
“name” portion of the record must receive the new value. |
|
The current list
must be traversed until the proper location is found to insert the record. |
|
The pointer in the
record with a lesser value must be moved to the pointer location in the
new record. |
|
The value in the
temp pointer must be moved to the lesser pointer.
|
Supporting
the Conceptual List
|
Users of the list
should not need to consider the method by which the list is implemented. |
|
There must be procedures
written for the creation, update, deletion, printing and any other effort
that must be supported by the list, whether it is maintained as a contiguous
list or as a linked list. |
|
It is the responsibility
of the programmer to ensure that the implementation is totally transparent
to the user. |
Stacks
|
The insertion and
deletion requirement made the linked list more attractive than the contiguous
list implementation. |
|
When additions and
deletions are conducted at the same end of the structure, it is often more
convenient to use a contiguous structure. |
|
A stack is a structure
similar to a pile of dishes or books, new items are placed on the top of
the stack, when it is necessary to remove an item from the stack, it is
taken from the top. This is a “last in first out” (LIFO)
structure. |
|
A stack needs a “base”
which is the address “upon” which all is added. |
|
A stack also needs
a pointer to its top. |
|
When an item is placed
on the stack, it is said to have been “pushed” on to the top of the stack. |
|
When an item is removed
from the stack, it is said to have been “popped” from the top of the stack. |
|
Care must be taken
not to try to “pop” elements from an empty stack, thus the pointer to the
base of the stack is important.
|
Stack
Applications
Flow
Control
|
When a procedure
calls a sub-procedure, it must transfer flow control to that new procedure. |
|
When the new procedure
completes its execution, it must transfer flow control back to the calling
procedure. |
|
It is obvious that
either the system, or the calling, or the called routine should ensure
that the address to which execution should be returned in the calling program
is saved (somehow). |
|
A stack is an ideal
structure for such a system.
|
Stack
Implementation
|
It is customary to
reserve a block of contiguous memory large enough to accommodate the stack
as it grows and shrinks. |
|
If too little room
is reserved, the stack will ultimately exceed its allotted storage space. |
|
The pointer to the
top of the stack is called the “stack pointer.” |
|
Care must be taken
to ensure that the stack pointer is never “less than or equal to” the base
pointer prior to a “pop.” |
|
Care must also be
taken to ensure that the stack pointer in never equal to the address of
the “top” of the stack area prior to a “push”
operation. |
Queues
Queue
Implementation
|
A queue is “first
in first out” (FIFO)
data structure. Objects are served in the same order in which they arrive. |
|
Items are added to
the tail of the structure. |
|
Items are deleted
from the head of the structure. |
|
Once again, it is
convenient to implement a queue using a contiguous block of memory. To
establish and maintain the queue, it is necessary to have: |
|
A pointer to the
head of the structure. |
|
Points to the address
from which an existing element will be used and deleted. |
|
A pointer to the
tail of the structure. |
|
Points to the address
at which a new element will be added. |
|
Note that the queue
will “crawl” throughout memory if not kept in check. |
|
One solution, as
elements are removed from the head of the queue, move the remaining elements
“forward.” |
|
Problem, this is
a lot of work. |
|
Second solution implements
the queue as a circular queue. |
|
Could be implemented
as a modulus.
|
A
Particular Queue Application
procedure
PrintSeparateListings (List)
assign CurrentPointer
the value in the head pointer of List.
while (CurrentPointer
is not NIL) do
[If (CurrentPointer points to the name of an undergraduate)
then (Print the name.)
else (Insert the name in the queue called Graduates.)
Observe
the value in the pointer cell of the List entry
Pointed to by CurrentPointer, and reassign
CurrentPointer to be that value.]
while (the
queue called Graduates is not empty) do
(Remove an entry from the queue and print it.)
Trees
Terminology
|
Trees are a hierarchical
data structure. |
|
Each position in
a tree is called a node. |
|
The single node at
the top of a tree is called the root node. |
|
The nodes at the
extreme end of the tree are called leaf nodes. |
|
A line connecting
two nodes is called an arc. |
|
A node in a higher
hierarchy of a tree is called the parent
and has has a 1:M relationship with its children
in the next lower level. |
|
Children nodes of
a single parent are called twin
or sibling nodes. |
|
The depth
of a tree is the number of nodes between the root and the terminal node
in the longest path (the number of horizontal layers in the tree).
|
Tree
Implementation
|
The implementation
will be restricted to binary trees
- trees in which each node has at most two children. |
|
Stored in memory
using a linked structure similar to linked lists. |
|
Node is normally
comprised of three elements: |
|
Data. |
|
Pointer to the left
child node. |
|
Pointer to the right
child node. |
|
Storing the tree
in memory involves: |
|
Finding the memory
to hold the nodes. |
|
Linking these nodes
according to the desired tree structure. |
DataSpace |
LeftChildPointer |
RightChildPointer |
|
The root node may
always be found by the root pointer. |
|
The path may be traced
or traversed by looking at the appropriate pointer. |
|
To find the parent
of any node, the data structure would need a fourth field, PtrToParent |
|
In contrast to the
linked tree structure, is the contiguous tree structure. |
|
Requires no pointer
fields. |
|
Root node goes in
first field. |
|
Left child in second |
|
Right child in third |
|
General formula: |
|
Left child of a node
n goes in cell 2n. |
|
Right child of a
node n goes in cell 2n + 1. |
|
Empty cells are marked
with a unique bit pattern. |
|
This structure stores
nodes of successively lower levels of the tree as successive segments. |
|
Note that the parent
of any child may easily be found by dividing the child’s location by 2
and discarding the remainder (if any). |
|
Parent of “F” 7/2
= 3. |
|
The sibling may be
found: |
|
If location is even,
location + 1. |
|
If odd, location
- 1. |
|
This storage system
makes efficient use of space for (nearly) balanced binary trees. |
|
Unbalanced binary
trees, on the other hand, have a great deal of empty space and are very
inefficient if stored as a contiguous tree. |
|
Both contiguous and
linked tree structures have their own advantages and drawbacks. |
|
The user should be
shielded from the implementation aspects of either (both), leaving the
exact implementation to the software developer.
|
A
Binary Tree Package
|
This is an example
of storing a list of names in alphabetical order in a linked binary tree
structure. |
|
The operations to
be performed on the list are: |
|
Search for the presence
of an entry. |
|
Print the list in
alphabetical order |
|
Insert a new entry. |
|
The goal is to develop
a storage system along with a collection of procedures to perform these
operations. |
|
Note, if the list
is stored as a pure linked list, it will have to be searched in order.
If it is in a binary tree format, it will have the benefits of the binary
search algorithm. |
|
This requires that
we be able to access the middle entry of successively smaller portions
of the list, “discarding” 1/2 of the list with each access. |
|
A contiguous list
can be organized in this manner. Performance: |
|
Benefits: |
|
Good response time
when attempting to locate a record. |
|
Good use of storage
space. |
|
Drawbacks: |
|
Very awkward when
attempting to make an entry in the structure when the tree is very close
to being balanced. |
|
In that the single
drawback sufficiently overwhelms benefits,
it is better to implement the list as a linked binary tree. |
|
The middle of the
list is the root node. |
|
The middle of the
first 1/2 is the left node. |
|
The middle of the
second 1/2 is the right node. |
|
This division continues
until there are only leaves remaining.
|
Customized
Data Types
|
Thus far, we have
investigated: |
|
Data types native
to the hardware: |
|
Integer |
|
Real |
|
Data types specifically
known to the operating system: |
|
Boolean |
|
Character (ASCII
- EBCDIC) |
|
Data types available
directly through the programming language: |
|
Arrays (homogeneous
structures) |
|
Records (heterogeneous
structures) |
|
Now we look at: |
|
User defined data
types. - These types more closely fit the needs of the application. |
|
Regardless of the
data type or what entity is responsible for “maintaining” it, it cannot
be created unless it is provided by the programming language. |
User-Defined
Types
|
As per the text example: |
struct
{char Name[8];
int Age;
float SkillRating;
} Employee:
defines a heterogeneous
data structure to define a variable called Employee.
We could
restate the composition of the structure each time a reference to that
structure is required, on the other hand, this is a bit bulky and hard
to read.
|
A better way would
be to describe the structure only once, by assigning it a descriptive name
and then using that name each time a reference to the structure is required.
Again, in C: |
typedef
struct
{char Name[8];
int Age;
float SkillRating;
} EmployeeType;
defines a new
data type, EmployeeType
that can be used to declare variables in the same way as a primitive type:
EmployeeType DistManager, SalesRep1, SalesRep2;
|
It must be realized
that the previous example defined a particular instance of the type. |
|
The creation of a
user-defined data type creates a template
(cookie cutter) that is used to construct instances of the type.
The creation of a data type does not actually create an occurrence of that
type.
|
Abstract
Data Types
|
A true data type
performs two important functions: |
|
Creation of the appropriate
storage space for an instance of the type. |
|
Definition of the
kinds of operations that may be performed on that instance - signed binary
math, floating point math, concatenation, etc. |
|
UDT’s perform only
the first step of the above. |
|
An abstract data
type is a more complete way of extending the types available in the programming
language. As with a UDT: |
|
It creates a template
that is distinct from an instance of the type. |
|
But it also: |
|
Encompasses the associated
operations. |
|
An example
of how Ada
implements
an abstract data type
StackOfIntegersis
shown here. It involves the definition of a program unit (package)StackPackage
that contains the description of the type and identifies two procedures
that are allowed to perform operations on it. |
|
The details of the
internal procedures are described in another program unit called the package
body. |
|
Using the package
as a template, actual stacks of integers (instances of the type) can be
declared by StackOne:
Stack of Integers; |
package
StackPackage
is
type StackOfIntegers is
record
StackOfEntries:
array(1..25) of integer;
StackPointer: integer;
end record;
procedure push(Value: in integer; Stack: in out StackOfIntegers);
procedure pop(Value: out integer; Stack: in out StackOfIntegers);
end StackPackage;
Encapsulation
|
This chapter has
repeatedly demonstrated how a software package consisting of a data structure
(or structures) and a collection of routines that manipulate that
structure, can be used to represent an otherwise abstract object such as
a stack, queue, list or tree. |
|
The importance of
assuring that all operations on the structure within the package be performed
by the routines provided, has not been stressed. |
|
To allow direct access
to the internal composition of the package will allow for side effect!
This is NOT good! |
|
An example of direct
access to the data structure would be allowing a programmer who knows the
manner in which the stack is implemented may create a “shortcut” method
to access the third element in the stack without formally popping the first
two elements. |
|
This will lead to
problems later in the product’s life cycle when maintenance programmers
change the array to a linked structure. |
|
To prevent the problem
, newer languages (especially Ada) provide techniques by which a software
package can be encapsulated. |
|
Internal structures
can be accessed only by means of approved package routines. The integrity
of these data types is protected from poorly conceived modifications. |
package StackPackage is
type StackOf
Integers is private;
procedure push(Value:
in
integer;
Stack: in out StackOfIntegers);
procedure pop(Value:
out
integer;
Stack: in out StackOfIntegers);
private
type StackOfIntegers
is
record
StackOfEntries: array(1...25) of integer;
StackPointer: integer;
end
record;
end StackPackage; |
|
Note: |
|
The details of the
stack’s structure have been moved to within the private part of the
package. |
|
Only the information
in the public part of the package is accessible outside the package. |
|
Data in the private
part is local to the package, global access is denied.
|
Object-Oriented
Programming
|
In the imperative
paradigm: |
|
A list may be implemented
as a simple data structure, i.e. a linked or contiguous list. |
|
The routines for
manipulating the list appear within the procedural part of the program. |
|
When using an abstract
data type: |
|
The routines for
manipulating the list are bundled with the list structure to form a package
(abstract data type, duh!). |
|
This shifts the task
of performing list manipulation from the user of the list to the package
representing the list. |
|
The user does not
manipulate the list. |
|
The user requests
that the package do the manipulation. |
|
Abstract data types
convert the imperative paradigm concept of passive data structures into
active
units that manipulate themselves. |
|
These active units
are called objects. |
|
The routines within
them are called methods
(or member functions
in C++) |
|
The object-oriented
approach to problem solving is to identify the objects involved and implement
them as self-contained units. |
|
Since the objects
are active as opposed to passive, the task resolves itself to the activation
of objects in the proper manner. |
|
The development of
objects is not simple. The object may need to respond in complex ways to
many different stimuli. Most languages provide features to reduce the burden
of object description. |
|
Templates or class
- |
|
Allows the description
of common properties of an entire collection of similar objects. |
|
Similar to, but more
general than an abstract data type. |
|
A class need not
involve a data structure, it could consist entirely of methods. |
|
Inheritance |
|
Allows classes to
be described in a hierarchical manner. |
|
Each class inherits
the properties of those higher in the hierarchy. |
|
Each class is allowed
to add additional features to override those of a previously defined class. |
|
Polymorphism |
|
Polymorphism is the
customized
interpretation of a message. In the text example of a graphics package: |
|
There is a variety
of objects, each representing a shape (circle, etc.). |
|
A particular image
consists of a collection of objects. |
|
Each object knows
its size, location, colors and how to respond to messages telling it to
move or draw itself on the monitor. |
|
To draw the image,
we send each object a message to draw itself. |
|
The routine used
to perform the draw operation varies depending on the shape of the object. |
|
It is here that the
interpretation of the message is customized to accommodate the particular
shape causing itself to be drawn. |
|
Polymorphism then
determines the meaning of a command in terms of the entity that is asked
to perform it. |
|
Constructors are
special routines within a class that are executed automatically each time
a new instance of the class is established. |
|
It handles issues
associated with establishing new dynamic data structures and identifying
them as being empty. |
|
Note that only those
portions of a class that are designated public can be referenced outside
an instance of the class. |
Const int MaxStack
= 25;
class StackOfIntegers
{int StackPointer;
//Data
int
StackEntries[MaxStack];
//structures
}
//private
public:
void StackOfIntegers(
)
//Constructor
{StackPointer = 0 );
//Initializes
}
//stack as empty
void
push (int *Entry)
{if (Stackpointer,
MaxStack)
StackEntries[StackPointer++] = *Entry;
}
void
pop (int *Entry)
{if (StackPointer
> 0 )
*Entry = StackEntries[StackPointer-1];
}
} |
The next page (SOC3) deals File Structures.
|