SOC2

Data Structures

	Data are stored in the computer’s hard drive and used in the computer’s main memory.
	On disk, as in memory, data are represented as “1’s” and “0’s.”
	This format is not commensurate with human intellect. It is often more convenient to associate other data arrangements, such as the contents of a single record in linear form and the contents of a file of records as being in tabular form.
	These abstract data organizations are totally human inspired and do not represent the actual organization of data at all!

Arrays

	Arrays should be thought of as being limited to a single, homogeneous data type.
	High level languages encourage the concept of the data being manipulated as if it were stored in a linear or tabular fashion.
	The programmer might refer to the fifth element in a one-dimensional array or the third row and sixth column of a two-dimensional array.
	It becomes the task of the translator to convert such references into the terminology of memory cells and addresses.

One-Dimensional Arrays

Suppose that we have an algorithm for manipulating a series of 24 hourly temperature readings, expressed in a high level language.

	The space to contain the data for each hour is a single byte.
	Thus the data can be stored in a sequence of 24 memory cells with consecutive addresses.
	The array, Readings might be thought of as being able to be accessed by means of the various positions in the list.

I.E. the first Reading is in the first location, etc.

The “logical” location of each succeeding element is calculated by incrementing by one for each element.

A given position might be accessed by its subscript or index within the array.

	The first might be referenced by Readings[1], the second by Readings[2], etc.
	This is a “language dependent” manner of using subscripts.

The conversion from a one-dimensional array’s “logical” organization to its “actual” arrangement within the machine is very straightforward.

The address at which a given hourly reading is stored may be determined in a three step manner:

	1). The address of the array is determined.
	2). The first element is located at the address at which the array is stored.

(Its “displacement” from the starting location is “0.”)

3). The address of the element desired is determined by adding the “displacement” to the “address of the array.”

Thus, in this case, to find the address of the 4th element in the array (4 a.m.):

13 + (4 - 1) = 16

Multidimensional Arrays

Multidimensional Arrays use additional subscripts to address elements.

The following two dimensional example requires 2 subscripts:

Weekly sales records - by salesperson (3) by day of week (5).

Row major order indicates that the first subscript refers to the row, the second to the column within that row:

Sales[3,4] is shaded below:

Multidimensional Arrays

Conversion of subscripts for a multidimensional array into memory addresses is not as simple as it is for single dimensional arrays.

The address of a given element in the array is:

E=A+S*((C*(I-1))+(J-1)), where:

	E - The address of the Element being sought.
	A - The beginning Address of the array.
	C - The number of Columns in the array.
	I - The current row (subscript).
	J - The current column (subscript).
	S - The Size of each data element - 1 byte, 1 full word, etc.

Lists

The form of an array (number of dimensions, number of entries in each dimension, size and data type of each element in the arrays) must be determined prior to “compile time.” The compiler needs to reserve a fixed amount of space for the array and for all of the other STATIC data structures (constants, literals and variables) in the program.

This requirement is often inconvenient. Assume a state school registration system. How many entries should be planned for in the array? Before the first student registers, no entries are required. The max number may vary from year to year. This falls in the category of 1 is too many and 1,000,000 is not enough.

What is required is a DYNAMIC data structure.

Pointers

	The Basic Addressable Unit (BAU) was an ARTHURISM (The instructor's name is J.R. Arthur) identified as a BYTE.
	If the address at which a piece of data is stored is known, it is trivial to find it.
	Since addresses are simply numeric values, they themselves may be stored as other data are stored.

Address cells are called pointers.

Program counter = instruction pointer.

Contiguous Lists

	Contiguous lists are similar to array storage systems and do not use pointers.
	Central to the concept of arrays was the storage location (address) at which the array would be stored as well as the size of the array.
	If the entire list is stored in a single block of memory cells with consecutive addresses.
	If the list of 10 names (each of which is 8 characters or bytes in length), it takes 80 bytes of contiguous memory to store the list. In Pascal:

list packed array [1..10,1..8] of char;

var

Which is 10 rows of 8 characters.

Problems:

	If a name is deleted, all of names which follow it must be moved.
	If a name is added, all names which follow must be moved.
	If there is no additional space, there is a problem.

Linked Lists

Problems associated with the contiguous list may be avoided using linked lists, however, there is some overhead associated with this:

	To keep track of where the list starts, there must be a pointer to the first name in the list.
	The data structure will consume more space, since it will be necessary to store a “next-element” pointer with each of the names.

To read the list simply look at the record pointed to by the HeadPtr, read the name, read the pointer contained in that cell, read the record pointed to by the pointer, etc. to the end of the list.

The value of the pointer in the last record in the list has a special value called the NIL pointer. This indicates that there are no more entries in the list.

If it ever becomes necessary to delete a record in the list, simply:

Take the pointer in the record to be deleted, and place it in the record that preceded it.

	This leaves two records pointing at the the record that used to follow the deleted record. This is not a problem, but the fact that there is a record that is no longer being used is a problem.
	If the space is not reclaimed, eventually, the system will run out of memory. This loss is called memory leak.
	To prevent memory leak, there must be some way of collecting and reclaiming memory no longer needed in the data structure, this is called garbage collection.

To add a record into a linked list structure:

	A “chunk” of memory must be obtained by requesting the system’s memory manager to allocate a sufficiently large amount of free space to hold the new record and the pointer.
	A temporary pointer must be set up to point to the new memory.
	The contents of the “name” portion of the record must receive the new value.
	The current list must be traversed until the proper location is found to insert the record.
	The pointer in the record with a lesser value must be moved to the pointer location in the new record.
	The value in the temp pointer must be moved to the lesser pointer.

Supporting the Conceptual List

	Users of the list should not need to consider the method by which the list is implemented.
	There must be procedures written for the creation, update, deletion, printing and any other effort that must be supported by the list, whether it is maintained as a contiguous list or as a linked list.
	It is the responsibility of the programmer to ensure that the implementation is totally transparent to the user.

Stacks

The insertion and deletion requirement made the linked list more attractive than the contiguous list implementation.

When additions and deletions are conducted at the same end of the structure, it is often more convenient to use a contiguous structure.

A stack is a structure similar to a pile of dishes or books, new items are placed on the top of the stack, when it is necessary to remove an item from the stack, it is taken from the top. This is a “last in first out” (LIFO) structure.

A stack needs a “base” which is the address “upon” which all is added.

A stack also needs a pointer to its top.

When an item is placed on the stack, it is said to have been “pushed” on to the top of the stack.

When an item is removed from the stack, it is said to have been “popped” from the top of the stack.

Care must be taken not to try to “pop” elements from an empty stack, thus the pointer to the base of the stack is important.

Stack Applications

Flow Control

When a procedure calls a sub-procedure, it must transfer flow control to that new procedure.

When the new procedure completes its execution, it must transfer flow control back to the calling procedure.

It is obvious that either the system, or the calling, or the called routine should ensure that the address to which execution should be returned in the calling program is saved (somehow).

A stack is an ideal structure for such a system.

Stack Implementation

	It is customary to reserve a block of contiguous memory large enough to accommodate the stack as it grows and shrinks.
	If too little room is reserved, the stack will ultimately exceed its allotted storage space.
	The pointer to the top of the stack is called the “stack pointer.”
	Care must be taken to ensure that the stack pointer is never “less than or equal to” the base pointer prior to a “pop.”
	Care must also be taken to ensure that the stack pointer in never equal to the address of the “top” of the stack area prior to a “push” operation.

Queues

Queue Implementation

A queue is “first in first out” (FIFO) data structure. Objects are served in the same order in which they arrive.

	Items are added to the tail of the structure.
	Items are deleted from the head of the structure.

Once again, it is convenient to implement a queue using a contiguous block of memory. To establish and maintain the queue, it is necessary to have:

A pointer to the head of the structure.

Points to the address from which an existing element will be used and deleted.

A pointer to the tail of the structure.

Points to the address at which a new element will be added.

Note that the queue will “crawl” throughout memory if not kept in check.

One solution, as elements are removed from the head of the queue, move the remaining elements “forward.”

Problem, this is a lot of work.

Second solution implements the queue as a circular queue.

Could be implemented as a modulus.

A Particular Queue Application

procedure PrintSeparateListings (List)
assign CurrentPointer the value in the head pointer of List.
while (CurrentPointer is not NIL) do
     [If (CurrentPointer points to the name of an undergraduate)
          then (Print the name.)
          else (Insert the name in the queue called Graduates.)
Observe the value in the pointer cell of the List entry
          Pointed to by CurrentPointer, and reassign
          CurrentPointer to be that value.]
while (the queue called Graduates is not empty) do
      (Remove an entry from the queue and print it.)

Trees Terminology

Trees are a hierarchical data structure.

	Each position in a tree is called a node.
	The single node at the top of a tree is called the root node.
	The nodes at the extreme end of the tree are called leaf nodes.
	A line connecting two nodes is called an arc.
	A node in a higher hierarchy of a tree is called the parent and has has a 1:M relationship with its children in the next lower level.
	Children nodes of a single parent are called twin or sibling nodes.
	The depth of a tree is the number of nodes between the root and the terminal node in the longest path (the number of horizontal layers in the tree).

Tree Implementation

The implementation will be restricted to binary trees - trees in which each node has at most two children.

	Stored in memory using a linked structure similar to linked lists.
	Node is normally comprised of three elements:

	Data.
	Pointer to the left child node.
	Pointer to the right child node.

Storing the tree in memory involves:

	Finding the memory to hold the nodes.
	Linking these nodes according to the desired tree structure.

DataSpace

LeftChildPointer

RightChildPointer

The root node may always be found by the root pointer.

The path may be traced or traversed by looking at the appropriate pointer.

To find the parent of any node, the data structure would need a fourth field, PtrToParent

In contrast to the linked tree structure, is the contiguous tree structure.

	Requires no pointer fields.
	Root node goes in first field.
	Left child in second
	Right child in third
	General formula:

	Left child of a node n goes in cell 2n.
	Right child of a node n goes in cell 2n + 1.

Empty cells are marked with a unique bit pattern.

This structure stores nodes of successively lower levels of the tree as successive segments.

Note that the parent of any child may easily be found by dividing the child’s location by 2 and discarding the remainder (if any).

Parent of “F” 7/2 = 3.

The sibling may be found:

	If location is even, location + 1.
	If odd, location - 1.

This storage system makes efficient use of space for (nearly) balanced binary trees.

Unbalanced binary trees, on the other hand, have a great deal of empty space and are very inefficient if stored as a contiguous tree.

Both contiguous and linked tree structures have their own advantages and drawbacks.

The user should be shielded from the implementation aspects of either (both), leaving the exact implementation to the software developer.

A Binary Tree Package

	This is an example of storing a list of names in alphabetical order in a linked binary tree structure.
	The operations to be performed on the list are:

	Search for the presence of an entry.
	Print the list in alphabetical order
	Insert a new entry.

The goal is to develop a storage system along with a collection of procedures to perform these operations.

Note, if the list is stored as a pure linked list, it will have to be searched in order. If it is in a binary tree format, it will have the benefits of the binary search algorithm.

	This requires that we be able to access the middle entry of successively smaller portions of the list, “discarding” 1/2 of the list with each access.
	A contiguous list can be organized in this manner. Performance:
	Benefits:

	Good response time when attempting to locate a record.
	Good use of storage space.

Drawbacks:

Very awkward when attempting to make an entry in the structure when the tree is very close to being balanced.

In that the single drawback sufficiently overwhelms benefits, it is better to implement the list as a linked binary tree.

	The middle of the list is the root node.
	The middle of the first 1/2 is the left node.
	The middle of the second 1/2 is the right node.
	This division continues until there are only leaves remaining.

Customized Data Types

Thus far, we have investigated:

Data types native to the hardware:

	Integer
	Real

Data types specifically known to the operating system:

	Boolean
	Character (ASCII - EBCDIC)

Data types available directly through the programming language:

	Arrays (homogeneous structures)
	Records (heterogeneous structures)

Now we look at:

User defined data types. - These types more closely fit the needs of the application.

Regardless of the data type or what entity is responsible for “maintaining” it, it cannot be created unless it is provided by the programming language.

User-Defined Types

As per the text example:

struct
{char Name[8];
int Age;
float SkillRating;
} Employee:

defines a heterogeneous data structure to define a variable called Employee.
We could restate the composition of the structure each time a reference to that structure is required, on the other hand, this is a bit bulky and hard to read.

A better way would be to describe the structure only once, by assigning it a descriptive name and then using that name each time a reference to the structure is required. Again, in C:

typedef struct
{char Name[8];
int Age;
float SkillRating;
} EmployeeType;

defines a new data type, EmployeeType that can be used to declare variables in the same way as a primitive type: EmployeeType DistManager, SalesRep1, SalesRep2;

It must be realized that the previous example defined a particular instance of the type.

The creation of a user-defined data type creates a template (cookie cutter) that is used to construct instances of the type. The creation of a data type does not actually create an occurrence of that type.
Abstract Data Types

A true data type performs two important functions:

	Creation of the appropriate storage space for an instance of the type.
	Definition of the kinds of operations that may be performed on that instance - signed binary math, floating point math, concatenation, etc.

UDT’s perform only the first step of the above.

An abstract data type is a more complete way of extending the types available in the programming language. As with a UDT:

It creates a template that is distinct from an instance of the type.

But it also:

Encompasses the associated operations.

An example of how Ada implements an abstract data type StackOfIntegersis shown here. It involves the definition of a program unit (package)StackPackage that contains the description of the type and identifies two procedures that are allowed to perform operations on it.

The details of the internal procedures are described in another program unit called the package body.

Using the package as a template, actual stacks of integers (instances of the type) can be declared by StackOne: Stack of Integers;

package StackPackage is
type StackOfIntegers is
record

StackOfEntries: array(1..25) of integer;
StackPointer: integer;

        end record;
    procedure push(Value: in integer; Stack: in out StackOfIntegers);
    procedure pop(Value: out integer; Stack: in out StackOfIntegers);
end StackPackage;
Encapsulation

	This chapter has repeatedly demonstrated how a software package consisting of a data structure (or structures) and a collection of routines that manipulate that structure, can be used to represent an otherwise abstract object such as a stack, queue, list or tree.
	The importance of assuring that all operations on the structure within the package be performed by the routines provided, has not been stressed.
	To allow direct access to the internal composition of the package will allow for side effect! This is NOT good!

An example of direct access to the data structure would be allowing a programmer who knows the manner in which the stack is implemented may create a “shortcut” method to access the third element in the stack without formally popping the first two elements.

This will lead to problems later in the product’s life cycle when maintenance programmers change the array to a linked structure.

To prevent the problem , newer languages (especially Ada) provide techniques by which a software package can be encapsulated.

Internal structures can be accessed only by means of approved package routines. The integrity of these data types is protected from poorly conceived modifications.

package StackPackage is
     type StackOf Integers is private;
     procedure push(Value: in integer; Stack: in out StackOfIntegers);
     procedure pop(Value: out integer; Stack: in out StackOfIntegers);
private
     type StackOfIntegers is
          record
             StackOfEntries:    array(1...25) of integer;
               StackPointer:    integer;
        end record;
end StackPackage;

Note:

	The details of the stack’s structure have been moved to within the private part of the package.
	Only the information in the public part of the package is accessible outside the package.
	Data in the private part is local to the package, global access is denied.

Object-Oriented Programming

In the imperative paradigm:

	A list may be implemented as a simple data structure, i.e. a linked or contiguous list.
	The routines for manipulating the list appear within the procedural part of the program.

When using an abstract data type:

	The routines for manipulating the list are bundled with the list structure to form a package (abstract data type, duh!).
	This shifts the task of performing list manipulation from the user of the list to the package representing the list.

	The user does not manipulate the list.
	The user requests that the package do the manipulation.

Abstract data types convert the imperative paradigm concept of passive data structures into active units that manipulate themselves.

	These active units are called objects.
	The routines within them are called methods (or member functions in C++)

The object-oriented approach to problem solving is to identify the objects involved and implement them as self-contained units.

Since the objects are active as opposed to passive, the task resolves itself to the activation of objects in the proper manner.

The development of objects is not simple. The object may need to respond in complex ways to many different stimuli. Most languages provide features to reduce the burden of object description.

Templates or class -

	Allows the description of common properties of an entire collection of similar objects.
	Similar to, but more general than an abstract data type.
	A class need not involve a data structure, it could consist entirely of methods.

Inheritance

	Allows classes to be described in a hierarchical manner.
	Each class inherits the properties of those higher in the hierarchy.
	Each class is allowed to add additional features to override those of a previously defined class.

Polymorphism

Polymorphism is the customized interpretation of a message. In the text example of a graphics package:

	There is a variety of objects, each representing a shape (circle, etc.).
	A particular image consists of a collection of objects.

Each object knows its size, location, colors and how to respond to messages telling it to move or draw itself on the monitor.

To draw the image, we send each object a message to draw itself.

The routine used to perform the draw operation varies depending on the shape of the object.

It is here that the interpretation of the message is customized to accommodate the particular shape causing itself to be drawn.

Polymorphism then determines the meaning of a command in terms of the entity that is asked to perform it.

Constructors are special routines within a class that are executed automatically each time a new instance of the class is established.

It handles issues associated with establishing new dynamic data structures and identifying them as being empty.

Note that only those portions of a class that are designated public can be referenced outside an instance of the class.

Const     int MaxStack = 25;
class       StackOfIntegers
     {int   StackPointer;                            //Data
       int   StackEntries[MaxStack];          //structures
      }                                                      //private
      public:
      void StackOfIntegers( )                   //Constructor
            {StackPointer = 0 );                   //Initializes
             }                                               //stack as empty
      void   push (int *Entry)
      {if (Stackpointer, MaxStack)
             StackEntries[StackPointer++] = *Entry;
       }
       void pop (int *Entry)
       {if (StackPointer > 0 )
             *Entry = StackEntries[StackPointer-1];
        }
}

The next page (SOC3) deals File Structures.

	The form of an array (number of dimensions, number of entries in each dimension, size and data type of each element in the arrays) must be determined prior to “compile time.” The compiler needs to reserve a fixed amount of space for the array and for all of the other STATIC data structures (constants, literals and variables) in the program.
	This requirement is often inconvenient. Assume a state school registration system. How many entries should be planned for in the array? Before the first student registers, no entries are required. The max number may vary from year to year. This falls in the category of 1 is too many and 1,000,000 is not enough.
	What is required is a DYNAMIC data structure.

	The insertion and deletion requirement made the linked list more attractive than the contiguous list implementation.
	When additions and deletions are conducted at the same end of the structure, it is often more convenient to use a contiguous structure.
	A stack is a structure similar to a pile of dishes or books, new items are placed on the top of the stack, when it is necessary to remove an item from the stack, it is taken from the top. This is a “last in first out” (LIFO) structure.
	A stack needs a “base” which is the address “upon” which all is added.
	A stack also needs a pointer to its top.
	When an item is placed on the stack, it is said to have been “pushed” on to the top of the stack.
	When an item is removed from the stack, it is said to have been “popped” from the top of the stack.
	Care must be taken not to try to “pop” elements from an empty stack, thus the pointer to the base of the stack is important.

	When a procedure calls a sub-procedure, it must transfer flow control to that new procedure.
	When the new procedure completes its execution, it must transfer flow control back to the calling procedure.
	It is obvious that either the system, or the calling, or the called routine should ensure that the address to which execution should be returned in the calling program is saved (somehow).
	A stack is an ideal structure for such a system.