Lecture Notes

for

CSCI 301: Great Ideas in Computer Science

(c) 1993, 1994, 1995 John E. Howland, All Rights Reserved

Department of Computer Science

Trinity University

715 Stadium Drive

San Antonio, Texas 78212-7200

Internet: jhowland@ariel.cs.trinity.edu

6. Data Structures ( Terminal Session on this Machine )

Data structures deal with the problem of structuring or organizing data for efficient processing by programs.

To a certain extent, data structure ideas are independent of the language or notation used to express data structures. On the other hand, what data structures are expressible in a given programming notation are determined by the features of that programming language.

We are using the Scheme programming language as a notation not only for programming in the laboratory, but as an expository notation in our lectures on computer science topics.

6.1 Scheme Lists

In Scheme we have noticed that both programs and data are expressed as lists. A list is a structure whose printed representation begins with "(" ; contains zero or more items and ends with ")" . An item is a single indivisible Scheme object such as #t or #f or a number or a symbol, etc. In addition, an item may itself be a list.

Examples of Lists:

'(1 2 3)
'()
(+ 3 4)
(* (expt 2 4) 10.5)
(cons 'a '())
(car (cons 'a (cons 'b '())))
A quoted list is interpreted as a datum while an un-quoted list is interpreted as a simple program or expression to be evaluated.

Next we consider Scheme functions which access, construct and otherwise manipulate lists.

(car ls) ==> first item in ls
It is an error to apply car to '() .

Examples:

(car '(a b c)) ==> a
(car '((a b1) 2 3)) ==> (a b1)
(cdr ls) ==> a list containing all items of ls but the first item.

It is an error to apply cdr to '().

Examples:

(cdr '(a b c)) ==> (b c)
(cdr '((a b1) 2 3)) ==> (2 3)
(null? ls) ==> #t if ls is an empty list, otherwise #f

Examples:

(null? (cdr '(a))) ==> #t
(null? '(1 2 3)) ==> #f
(list? ls) ==> #t if ls is a list, otherwise #f

Examples:

(list? '(a b)) ==> #t
(list? (cons 1 2)) ==> #f
(length ls) ==> number of items in the list ls

Example:

(length '(a b (c d))) ==> 3
(list-ref ls k) ==> k th element of ls (origin 0)

Example:

(list-ref '(a b c d) 2) ==> c
(cons a ls) ==> constructs a list whose first item is a and whose remaining items is the list ls.

Examples:

(cons 2 '(a b c)) ==> (2 a b c)
(cons '(a b) '(c)) ==> ((a b) c)
When the second argument to cons is not a list, the result is a pair which is not a list.

Example:

(cons 1 2) ==> (1 . 2)
(list? (cons 1 2)) ==> #f
(append ls1 ls2 ...) ==> constructs a list of the elements of ls1 followed by the elements of ls2, etc.

Examples:

(append '(a) '(y)) ==> (a y)
(append '(a) '(b c d)) ==> (a b c d)
(append '(a (b)) '((c))) ==> (a (b) (c))
The last argument of append may be a pair which is not a list. In this case, the result is not a proper list.

Example:

(append '(a b) '(c . d)) ==> (a b c . d)
(list expr1 expr2 ...) ==> construct a list of values of expr1 exprt ...

Examples:

(list 1 (+ 3 4) 99) ==> (1 7 99)
(list) ==> ()

(reverse ls) ==> constructs a list which contains the top level items of ls in reverse order.

Example:

(reverse '(a b (c d))) ==> ((c d) b a)
Since the list structure is pervasive in Scheme we will use lists to describe data structures.

6.2 Abstract Data Structures

The description of an abstract data structure involves the definition of a set of elements together with a collection of operations (functions) which are defined on those elements.

6.3 Stack Data Structure

We wish to describe an abstract data structure called a stack. A stack is a structure which allows data elements to be added (pushed) or removed (popped) in such a manner so that the last item added will be the first item removed. Such structures are sometimes called LIFO (last in first out). When storing items (which have indefinite shelf life) in a warehouse, it is sometimes convenient to stack items out from a wall or up from the floor in such a way so that the last item stacked will be the first item removed.

A stack is a collection of items together with the following operations.

(make-stack) ==> constructs a stack
(stack? obj) ==> #t if obj is a stack, else #f
(empty-stack? stack) ==> #t if stack empty, else #f
(push-stack item stack) ==> put item on stack
(pop-stack stack) ==> remove last item pushed on stack
(top-stack stack) ==> return last item pushed on stack
                      without removing that value from stack
(size-stack stack) ==> return number of items on stack
(print-stack stack) ==> print items on stack
Following is an implementation of stacks using a list which contains a special stack type tag as its first element.
(define stack-tag "stack")
(define the-empty-stack (cons stack-tag '()))
The stack construction verb follows.
(define make-stack
  (lambda ()
    the-empty-stack))
(define s (make-stack)) ==> <#unspecified>
The predicates for stacks and empty stacks.
(define stack?
  (lambda (obj)
    (if (and (list? obj)
             (not (null? obj))
             (eq? stack-tag (car obj)))
        #t
        #f)))

(stack? s) ==> #t

(define empty-stack?
  (lambda (obj)
    (equal? the-empty-stack obj)))
(empty-stack? s) ==> #t
s ==> ("stack")
Note that something which looks like a stack isn't really a stack.
(stack? '("stack")) ==> #f
This is because the stack? predicate uses the eq? verb to test equality and eq? is true if and only if the objects are the same.

Next we define the push-stack verb.

(define push-stack
  (lambda (item obj)
    (if (not (stack? obj))
      (error "wrong type second argument to push-stack" obj)
      (cons stack-tag
            (cons item (cdr obj))))))
(define s (push-stack '(a b) s)) ==> <#unspecified>
s ==> ("stack" (a b))
(define s (push-stack 3 s)) ==> <#unspecified>
s ==> ("stack" 3 (a b))
Here is the top-stack verb.
(define top-stack
  (lambda (obj)
    (if (not (stack? obj))
      (error "wrong type argument to top-stack" obj)
		(if (empty-stack? obj)
		(error "stack empty" obj)
         (cadr obj)))))
(top-stack s) ==> 3
s ==> ("stack" 3 (a b))
Here is the implementation for the pop-stack verb.
(define pop-stack
  (lambda (obj)
    (if (not (stack? obj))
      (error "wrong type argument to pop-stack" obj)
      (if (empty-stack? obj)
        (error "stack empty" obj)
        (cons stack-tag (cddr obj))))))
(define s (pop-stack s)) ==> <#unspecified>
(top-stack s) ==> (a b)
Accessing the top element of a stack does not change the stack.
s ==> ("stack" (a b))
Next we define the size-stack verb.
(define size-stack
  (lambda (obj)
    (if (not (stack? obj))
      (error "wrong type argument to size-stack" obj)
      (- (length obj) 1))))
(size-stack s) ==> 1
(define s (push-stack "Hi there" s)) ==> <#unspecified>
(size-stack s) ==> 2
(top-stack s) ==> "hi there"
Finally, we define the print-stack verb.
(define print-stack
  (lambda (obj)
    (if (not (stack? obj))
      (error "wrong type argument to print-stack" obj)
      (begin
        (display "TOP: ")
        (for-each
          (lambda (x)
            (write x)
            (display " "))
          (cdr obj))))
  (newline)))
(print-stack s) ==> TOP: "Hi there" (a b)
(size-stack s) ==> 2
Any scheme object may be pushed on a stack. For example, a Scheme primitive verb may be pushed on a stack.
(define s (push-stack + s)) ==> <#unspecified>
(print-stack s) ==> TOP: #[procedure +] "Hi there" (a b)
((top-stack s) 2 3) ==> 5

6.4 Queue Data Structure

Next, we consider a data structure, called a queue. A queue structure is similar to a waiting line. Additions to a queue are made at one end of the waiting line, called the rear, while removals are made at the other end of the queue, called the front.

Queue structures are called FIFO (first in first out) structures and are often used when modeling inventories of perishable or age-dated goods.

A queue is a collection of items together with the following operations.

(make-queue) ==> constructs a queue
(queue? obj) ==> #t if obj is a queue, else #f
(empty-queue? queue) ==> #t if queue empty, else #f
(enter-queue item queue) ==> put item in queue
(remove-queue queue) ==> remove item at front of queue
(front-queue queue) ==> return item at front of queue
                      without removing that value from queue
(size-queue queue) ==> return number of items in queue
(print-queue queue) ==> print items in queue

Following is an implementation of queues using lists which contain a tag identifying a list as a queue.

The queue-tag and the the-empty-queue constants may be defined as:

(define queue-tag "queue")
(define the-empty-queue (cons queue-tag '()))
The make-queue verb.
(define make-queue
  (lambda ()
    the-empty-queue))
(define q (make-queue)) ==> <#unspecified>
q ==> ("queue")
The predicate for a queue may be defined as:
(define queue?
  (lambda (obj)
    (if (and (list? obj)
             (not (null? obj))
             (eq? queue-tag (car obj)))
      #t
      #f)))
(queue? q) ==> #t
Note that something which looks like a queue isn't really a queue.
(queue? '("queue")) ==> #f

The predicate for an empty queue may be defined as:

(define empty-queue?
  (lambda (obj)
    (equal? the-empty-queue obj)))
(empty-queue? q) ==> #t
Next we define the enter-queue verb.
(define enter-queue
  (lambda (item queue)
    (if (not (queue? queue))
      (error "wrong type second argument to enter-queue" queue)
      (append queue (list item)))))
(define q (enter-queue "Hi there" q)) ==> <#unspecified>
q ==> ("queue" "Hi there")
(define q (enter-queue (/ 4 5) q)) ==> <#unspecified>
q ==> ("queue" "Hi there" 4/5)
The front-queue verb may be define as:
(define front-queue
  (lambda (queue)
    (if (not (queue? queue))
      (error "wrong type argument to front-queue" queue)
      (if (empty-queue? queue)
		   (error "queue empty" queue)
            (cadr queue)))))
(front-queue q) ==> "Hi there"
The remove-queue verb may be defined as:
(define remove-queue
  (lambda (queue)
    (if (not (queue? queue))
      (error "wrong type argument to remove-queue" queue)
      (if (empty-queue? queue?)
        (error "queue empty" queue)
        (cons queue-tag (cddr queue))))))
(define q (remove-queue q)) ==> <#unspecified>
q ==> ("queue" 4/5)
(front-queue q) ==> 4/5
Next we define the size-queue verb.
(define size-queue
  (lambda (queue)
    (if (not (queue? queue))
      (error "wrong type argument to size-queue" queue)
      (- (length queue) 1))))
(size-queue q) ==> 1
Finally, we define the print-queue verb.
(define print-queue
  (lambda (queue)
    (if (not (queue? queue))
      (error "wrong type argument to print-queue" queue)
      (begin
        (display "FRONT: ")
        (for-each
          (lambda (x)
            (write x)
            (display " "))
          (cdr queue))))
  (newline)))
(print-queue q) ==> FRONT: 4/5
(define q (enter-queue expt q)) ==> <#unspecified>
q ==> ("queue" 4/5 #[procedure expt])
(print-queue q) ==> FRONT: 4/5 #[procedure expt]

6.5 Object Programming

Next we give an "object oriented" implementation of stacks and queues. Object oriented programming is currently a very active area of computer science research and development.

The main idea of object oriented programming is that one packages together in a single "object" both data structure and the operations (verbs) which are defined on that data structure.

The main benefits of object oriented programming are:

* better abstraction of data structures

* objects may be defined so that they inherit some of their behavior from other objects

* better encapsulation of data structures which helps isolate those structures from the programs which access those structures

6.5.1 Preliminary Definitions

First, we need a few preliminary definitions. The synonyms 1st and 2nd for car and cadr improve the readability of our object programs.
(define 1st car)
(define 2nd cadr)
The last-pair verb returns the last element of a list.
(define last-pair
  (lambda (x)
    (if (pair? (cdr x))
        (last-pair (cdr x))
        x)))  
Next we define the verb append! which appends two lists, ls1 and ls2. Recall the naming convention used in Scheme which adds the symbol ! to a verb name whenever that verb changes the state of an existant object as a side effect. append! perfoms the same operation as append, that is, it builds a resulting list from ls1 and ls2 by adding the elements of ls2 at the end of ls1. However, append! changes ls1 to this resulting list as a side effect, but append does not.
(define append!
  (lambda (ls1 ls2)
    (if (pair? ls1)
        (begin
          (set-cdr! (last-pair ls1) ls2)
          ls1)
        ls2)))
For example:
(define l1 '(a b c)) ==> <#unspecified>
(define l2 '(d e)) ==> <#unspecified>
(append l1 l2) ==> (a b c d e)
l1 ==> (a b c)
(append! l1 l2) ==> (a b c d e)
l1 ==> (a b c d e)
To understand how append! works, we need to review how set-car! and set-cdr! work. Consider the following example:
(define x '(a b c)) ==> <#unspecified>
x ==> (a b c)
(define y (last-pair x)) ==> <#unspecified>
y ==> (c)
(set-car! (last-pair x) 'jill) ==> <#unspecified>
x ==> (a b jill)
Notice that set-car! changed x as a side effect. We condinue this example with the following:
(set-cdr! (last-pair x) '(another list)) ==> <#unspecified>
x ==> (a b jill another list)

Next we define an invalid-method-name-indicator constant and the verb for-effect-only.

(define invalid-method-name-indicator "unknown")
(define for-effect-only
  (lambda (item-ignored)
    "unspecified value"))
The Scheme evaluation rule says that all objects for a verb are evaluated before the verb may be applied to those objects. Hence the object, item-ignored to for-effect-only is evaluated before for-effect-only is applied. Since for-effect-only doesn't use item-ignored in its definition, its value is independent of its object. In fact, the resulting value of for-effect-only is always "unspecified value". However, if the evaluation of an object to for-effect-only has a side effect, that side effect remains after for-effect-only has been evaluated. For example:
(define l1 '(a b c)) ==> <#unspecified>
(define l2 '(d e)) ==> <#unspecified>
(for-effect-only (append l1 l2)) ==> "unspecified value"
l1 ==> (a b c)
(for-effect-only (append! l1 l2)) ==> "unspecified value"
l1 ==> (a b c d e)

6.5.2 A Trivial Object

We are going to use Scheme functions to represent objects. The following simple example illustrates how simple objects may be constructed in Scheme.
(define object-maker
  (lambda (init-value)
    (let ((object-value init-value))
      (lambda msg
        (case (car msg)
          ((value) object-value)
          ((set!) (set! object-value (cadr msg)))
          (else 'unknown-operation))))))
The object-maker verb produces a verb which takes an unspecified number of inputs. The verb returned by object-maker is contained in the body of a let, so the variables bound to values in the let (object-value is bound to init-value in this example) are packaged with the returned verb. The inputs to this verb are thought of as messages or operations which the verb can perform on its data structure. There are two messages implemented in this example. Ther first, value, simply returns the value of the data structure, i.e. object-value. The second message, set!, changes the value of the data structure. The following example shows how to use object-maker to make an object called box.
(define box (object-maker 6)) ==> <#unspecified>
box ==> <#procedure box>
(box 'value) ==> 6
(box 'set 7) ==> "undefined operation"
(box 'set! 7) ==> <#unspecified>
(box 'value) ==> 7

6.5.3 More Definitions

Next we define several helping verbs. delegate is a synonym for apply.
(define delegate
  (lambda (obj msg)
    (apply obj msg)))
base-object is the object from which all of our objects will inherit. It knows one operation or message, namely type. Any other messages result in the invalid-method-name-indicator.
(define base-object
  (lambda msg
    (case (1st msg)
      ((type) "base-object")
      (else invalid-method-name-indicator))))
Sometimes it is convenient to express ourselves in terms of sending objects a message to perform a particular operation on its data structure. The following verb accomplishes this.
(define send
  (lambda args
    (let ((object (car args)) (message (cdr args)))
      (let ((try (apply object message)))
        (if (eq? invalid-method-name-indicator try)
            (error 
              (string-append (symbol->string (car message)) ": "
                             "Bad method name sent to object of " 
                             (object 'type) " type."))
            try)))))

send is just a form of "syntactic sugar" to provide a message-sending paradigm for communicating with a stack or queue object.

As we shall see, once an object is created, we can send that object messages which cause operations (sometimes called methods) to be performed on that object's data structure.

6.5.4 Stack Objects

Following is the stack object constructor.
(define stack-maker
  (lambda ()
    (let ((stk '()))
      (lambda msg
        (case (1st msg)
          ((type) "stack")
          ((empty?) (null? stk))
          ((push!) (for-effect-only 
                     (set! stk (cons (2nd msg) stk))))
          ((top) (if (null? stk)
                     (error "top: The stack is empty.")
                     (car stk)))
          ((pop!) (for-effect-only
                    (if (null? stk)
                        (error "pop!: The stack is empty.")
                        (set! stk (cdr stk)))))
          ((size) (length stk))
          ((print) (display "TOP: ")
                   (for-each 
                     (lambda (x) 
                       (display x)
                       (display " "))
                     stk)
                   (newline))
          (else (delegate base-object msg)))))))
stack-maker returns a verb of a variable number of inputs which "remembers" an initial binding of the value '() to stk. This is the data structure portion of the stack object. The verb which is returned knows how to perform the operations of:
'type
'empty?
'push!
'top
'pop!
'size
'print
and inherits any other operations which the base-object can perform. The base object given above does nothing but process error messages, however other operations might be defined.

Conceivably, stack objects could delegate operations to other objects which in turn delegate some operations to still other objects, etc.

The following examples examine the operation of stacks as implemented with the stack-maker object constructor.

(define s (stack-maker))
(send s 'type) ==> "stack"
Alternately we could write
(s 'type) ==> "stack"
since send is syntactic sugar" which is a verb which applies its remaining inputs to its first input.
(send s 'empty?) ==> #t
(send s 'print) ==> TOP:
(send s 'push! '(ab c)) ==> "unspecified value"
(send s 'top) ==> (ab c)
(send s 'print) ==> TOP: (ab c)
(send s 'push! (* 3 4)) ==> "unspecified value"
(send s 'print) ==> TOP: 12 (ab c)
(send s 'top) ==> 12
(send s 'pop!) ==> "unspecified value"
(send s 'print) ==> TOP: (ab c)
(send s 'empty?) ==> #f
If we wish to more closely duplicate the operations of the non object oriented implementation we could make the following definitions:
(define make-stack stack-maker)
(define empty?
  (lambda (s)
    (s 'empty?)))
(define push-stack
  (lambda (item s)
    (s 'push! item)))

etc.

6.5.5 Queue Objects

In a similar fashion we examine the behavior of queue objects.
(define q (queue-maker))
(send q 'type) ==> "queue"
(send q 'empty?) ==> #t
(send q 'print) ==> FRONT:
(send q 'enter-queue! (/ 4 5)) ==> "unspecified value"
(send q 'print) ==> FRONT: 4/5
(send q 'front) ==> 4/5
(send q 'enter-queue! +) ==> "unspecified value"
(send q 'print) ==> FRONT: 4/5  #[procedure +]
(send q 'remove-queue!) ==> "unspecified value"
(send q 'print) ==> FRONT: #[procedure +]

6.6 Scheme Lists as a Primitive Data Structure

Next, we consider a method of diagramming Scheme lists. Consider the list:
(a b c)
If we separate this list into its car and cdr components we have:
a and (b c)
We can use a tree like (upside down) notation and write
  (a b c)
  /   \
 a   (b c)
Doing the same for (b c) and (c) we have the following diagram:
  (a b c)
  /   \
 a   (b c)
     /   \
    b    (c)
         / \
        c  ()

A diagram for any list may be drawn in a similar fashion.

We call such a diagram a binary tree because each location in the tree has at least 2 branches. The data of the tree, that is, a, b, c, and () are referred to as the leaves of the tree. These are leaves because there are no branches emminating from these locations. Another name for a leaf of a tree is terminal-node. The non-terminal nodes of the tree for a given list represent the cons's or cons cells of that list. For example,

(a b c) <==> (cons 'a '(b c))
(b c) <==> (cons 'b '(c))
(c) <==> (cons 'c '())
We could use what is often called a "box and pointer" notation for a Scheme list. Each box represents a cons cell in memory and the lines are locators or pointers to the memory cells which hold the tree leaves.

Each box is a representation of a cons cell in memory. The left part of the box locates the car portion of that cons cell while the right part of each box locates the cdr portion of that cons cell. In fact, most Scheme systems use a similar type of representation of lists in memory.

6.6.1 A Simple List Application

Suppose we wish to use lists to represent a database of information on people. For example, we might have a simple database like the following:
(define people
'((howland (jack (computer science prof)))
  (clinton (bill (president usa)))
  (clinton (hillary (president usa)))
  (perot (ross (wanted to be president usa)))
  (bush (george (used to be president usa)))))
A people database is simply a list of lists, each of which is of the form (name (info-list))

We can write a simple Scheme function which will look-up names in such a database and return the corresponding info-list.

(define lookup
  (lambda (name database)
    (if (null? database)
       '()
       (if (equal? name (caar database))
          (car database)
          (lookup name (cdr database))))))
Then
(lookup 'perot people) ==>
     (perot (ross (wanted to be president)))
If we analyze this lookup function so as to estimate the amount of work done while searching the database, we find that the lookup function searches sequentially through the items in the database. Hence, in the worst case, when the item we are looking for is the last item of an n item database, the lookup process requires n uses of the lookup function. If we measure the work done as the number of times lookup is used, then the work done is of order n. We write this as O(n) which is "read order n" or "big O of n".

Suppose that we could devise a method to store our data at the leaf positions of a tree which is symmetrical (unlike the tree diagrams for lists). If we had 16 items and some sort procedure to tell us which half of the tree (left or right), then the number of uses of lookup would be 4.

It is not hard to see that using a similar method to store 17 through 32 items would require 5 uses of lookup, etc. The work done here for a database of n items is ceiling of log2 n, where ceiling of j is the smallest integer which is greater than or equal to j. Hence the work done is O(log2 n) which is much smaller than n for large n. This would result in very fast searches of large databases.

6.7 Arrays

Finally, we wish to abstract another data structure for storing collections of items. We have already seen the list structure of Scheme. Recall some of the operations of its abstraction include:

(list? ls)  ;a predicate for lists
(null? ls)  ;a predicate for empty lists
(list item1 item2 ...)  ;a constructor for lists
(cons item ls)  ;a constructor for lists
(append ls1 ls2 ...)  ;a constructor for lists
(car ls)  ;access first item of a list
(cdr ls)  ;access list of all items but the first
(length ls)  ;access length of a list
(list-ref ls k)  ;access kth item of ls
Note: list-ref is O(n)

We abstract another data structure for storing collections of items, known as an array or vector. The key property of a vector is that it is possible to access the kth item of a vector in constant time. This means that the access is O(1). The easiest way to accomplish this is to store vectors in consecutive memory cells. Given a locator, l for the beginning of the vector, the kth item is located at

l + k * sizeof(item)

Following are some of the vector functions:

(make-vector k) ;create a vector capable of storing k items
(make-vector k item) ;create a vector storing k items, each initialized to item
(make-vector 3 0) ==> #(0 0 0)
(vector? obj) ;a predicate for vectors
(vector obj1 obj2 ...)  ;create a vector
(vector 'a 'b 'c) ==> #(a b c)
(define v (vector 'a 'b 'c))
v ==> #(a b c)
(vector-length vec) ;the length of a vector
(vector-length v) ==> 3
(vector-ref vec k) ;access the kth item of vec
This reference is done in constant time independent of the length of the vector, so is O(1).
(vector-ref v 1) ==> b
(vector-set! vec k obj) ;change kth item of vec to obj
(vector-set! v 1 "Jack")
v ==> #(a "Jack" c)
Some conversion functions for vectors.
(vector->list vec) ;convert a vector to a list
(vector->list v) ==> (a "Jack" c)
(list->vector ls) ;convert a list to a vector
(list->vector '(1 2 3)) ==> #(1 2 3)