for
CSCI 301: Great Ideas in Computer Science
(c) 1993, 1994, 1995 John E. Howland, All Rights Reserved
Department of Computer Science
Trinity University
Stadium Drive
San Antonio, Texas 78212-7200
Internet: jhowland@ariel.cs.trinity.edu
We start with the problem of syntax analysis. We need some method of specifying, precisely, the collection of syntactically correct sentences of a language. Inductive specification is a tool which may be applied to form strings of symbols chosen from an alphabet.
Inductive specification is a method of precisely specifying a set of values. As an example, consider the following specification of a certain subset of the natural numbers.
The set S is defined to be the smallest set of natural numbers satisfying:
a) 0 is an element of S
b) If x is an element of S, then x+2 is an
element of S.
We can also reason about the set S in a non-inductive fashion as follows. 0 is in S (property a). 2 is in S (property b). 4 is in S (property b), etc. This leads us to think of the set of non-negative multiples of 2. Call this set M. It follows that S contains the set M. Similarly, since S is the smallest set of natural numbers satisfying a) and b), S must be a subset of every set satisfying a) and b). Hence, it must be true that M = S since M is a subset of S and S is a subset of M.
We can generalize the idea of inductive specification as follows. To specify a set A by induction, we define it to be the smallest set satisfying two properties of the form:
a) Some specific values must be in A.
b) If certain values are in A, then certain
other values are also in A.
This means (non-inductively) that A is the set of all elements satisfying property a) together with all elements which are obtained by repeatedly applying b). However, this non-inductive specification is not precise enough for our purposes.
The syntax element list-of-numbers is the smallest set of objects satisfying the two properties:
1. The empty list, '() is a list-of-numbers
2. If l is a list-of-numbers and n is a
number, then the pair (cons n l) is
a
list-of-numbers.
Here, we need to be careful about the choice of symbols and the preciseness of our rules. Notice that we wrote '() rather than (). If we are writing for human readers we should use (), but if we are going to try to enter the datum () to a Scheme interpreter, we must quote the empty list. Here we are writing for human readers so () is preferrable. Similarly, since we are writing for human beings, we need a notation for pairing n and l. The notation we shall use is the standard lisp or Scheme pair notation ( n . l) which a lisp or Scheme interpreter will print as (n l). We will use the notation (n . l) rather than the lisp or Scheme list constructor (cons n l).
Using this notation we rewrite the inductive specification as
1. The empty list, () is a list-of-numbers
2. If l is a list-of-numbers and n is a
number, then the pair (n . l)
is a
list-of-numbers.
We can infer the following:
a) () is a list-of-numbers by property 1.
b) (10 . ()) ==> (10) is a
list-of-numbers by property 2 since
10 is
a number.
c) (3 . (10)) ==> (3 10) is a
list-of-numbers by property 2 since
3
is a number.
d) Nothing is a list-of-numbers unless it is
constructed in this manner.
BNF can be used to inductively define a number of sets of syntactic elements of a language at once. These sets are called syntactic categories, or sometimes nonterminals and we write the names of these sets by enclosing them in "<" ">" . For example, the list-of-numbers Scheme data type would be a nonterminal and would be written as <list-of-numbers>. Each syntactic category is defined by a finite set of rules, or productions. Each rule asserts that certain values must be in the syntactic category.
<list-of-numbers> ::= ()
<list-of-numbers> ::= (<number> . <list-of-numbers>)
The first rule says that the empty list is in <list-of-numbers> and the second rule says that if n is in <number> and l is in <list-of-numbers>, the (n . l) which is (cons n l) is in <list-of-numbers>. Each BNF rule starts with a syntactic category followed by ::= which is read as "is". The right hand side of the rule or production specifies how a member of the syntactic category is to be constructed from other syntactic categories and terminal symbols.
In the above rules the parens "(" ")" and period "." are terminal symbols and <number> and <list-of-numbers> are nonterminals. Sometimes we use the symbol "|" to shorten or simplify rules. "|" is read as or. The above two rules can be combined as:
<list-of-numbers> ::= () | (<number> . <list-of-numbers>)
Another shorthand is the Kleene star {...}* which means zero or more repetitions of what ever is between the braces. And the Kleene plus {...}+ which means one or more repetitions of what ever is between the braces. For example, the <list-of-numbers> category might be defined as:
<list-of-numbers> ::= ({<number>}*)
The shorthand items "|", Kleene star and plus are not really necessary, but are used with BNF to help shorten the descriptions.
As an example, consider the following specifications of Scheme data:
<dotted-datum> ::= ({<datum>}+ . <datum>)
<vector> ::= #({<datum>}*)
<datum> ::= <number> | <symbol> | <boolean> |
<string> | <list> | <dotted-datum> | <vector>
We are discussing how one formulates the syntax rules of a programming language. The properties of a program which can be determined by analyzing just the text of a program are said to be static properties. Similarly, the dynamic properties of a program are those which are determined by run-time inputs. Analysis of static properties is important because a translator can use this information to catch certain kinds of program errors and perhaps write a more efficient program.
<exp> ::= <var-ref>
| (lambda (<var>)
<exp>)
| (<exp> <exp>)
This is a language which has references to variables, lamba expressions with one formal parameter and procedure calls.
This language is the lambda calculus which was invented by Alonzo Church in his 1941 book "The Calculi of Lambda-Conversion. The lambda calculus is a fundamental basis for most programming languages. I strongly influenced the design of Lisp and particularly the Scheme programming language which we are using to describe computer science topics. We are using the lambda calculus language as an example more because it is a tiny language than because of its theoretical significance.
The notation Church used for his lambda calculus would look something like the following when expressed in BNF:
<exp> ::= <var-ref>
| lambda <var> .
<exp>
| <exp> <exp>
We have used a Scheme syntax with "(" ")" and the keyword lambda so that we can more easily use Scheme to model this language.
(define exp?
(lambda (exp)
(if (symbol? exp)
#t
(if (and
(pair? exp)
(= 3 (length exp))
(eq? 'lambda (car exp))
(pair? (cadr exp))
(= 1 (length (cadr exp)))
(symbol? (caadr exp))
(exp? (caddr exp)))
#t
(if (and
(pair? exp)
(= 2 (length exp))
(exp? (car exp))
(exp? (cadr exp)))
#t
#f)))))
(exp? 'x) ==> #t (exp? '(x y)) ==> #t (exp? '(x y 2)) ==> #f (exp? '(lambda (x) x)) ==> #t (exp? '(lambda (x y) x)) ==> #f
We know the the lambda calculus allows us to
a) reference variables
b) define functions of one argument
c) call functions of one argument
A variable reference is said to be bound in an expression if it refers to a formal parameter introduced in the expression.
A variable reference is said to be free in an expression if it is not a formal parameter in the expression.
((lambda (x) x) y)The reference to x is bound.
The reference to y is free.
A variable is said to occur bound in an expression if the expression contains a bound reference to the variable.
Similarly, a variable is said to occur free in an expression if the expression contains a free reference to the variable.
When evaluated at run time each variable reference must have a value. Variables bound to formal parameters are said to be lexically bound. Other wise they must be bound at the top level or supplied by the system. Such variables are said to be globally bound.
((lambda (x) x) y)x is lexically bound.
y must be globally bound or an error will result
A reference to a variable that is neither lexically bound or globally bound is an error.
The value of an expression depends only on the values of the variables which occur free in the expression.
The context surrounding the expression must supply the values of the free variables.
(lambda (y) ((lambda (x) x) y))is determined by the binding of the parameter y.
Hence,
((lambda (y) ((lambda (x) x) y)) 2) ==> 2Note that y was free in ((lambda (x) x) y) but bound in (lambda (y) ((lambda (x) x) y)).
The value of an expression is independent of bindings for variables that do not occur free in the expression.
(define x 5) (define y 4) ((lambda (x) x) y) ==> 4 x ==> 5The meaning, that is, semantics, of (lambda (x) x) is always the same. It does not depend on the variable x. It is the identity function, i.e., it returns whatever value is passed as an argument.
(lambda (f)
(lambda (x)
(f x)))
This
procedure takes a function as an argument and returns a function of one
argument which applies that argument to the function.Lambda expressions without free variables are called combinators. We have seen the identity function and the application combinator.
(lambda (f)
(lambda (g)
(lambda (x)
(f (g x)))))
is
another example.Next we give formal definitions of free and bound occurrences in an expression.
x occurs free in an expression E if and only if
a) E is a variable reference and E is x or
b) E is of the form (E1 E2) and x occurs free
in
E1 or E2 or
c) E is of the form (lambda (y) E1) where y is
different from x and x occurs free in E1.
(define free?
(lambda (var exp)
(if (and
(symbol? exp)
(eq? var exp))
#t
(if (and
(pair? exp)
(= 2 (length exp))
(exp? (car exp))
(exp? (cadr exp))
(or
(free? var (car exp))
(free? var (cadr exp))))
#t
(if (and
(pair? exp)
(= 3 (length exp))
(eq? 'lambda (car exp))
(= 1 (length (cadr exp)))
(symbol? (caadr exp))
(exp? (caddr exp))
(not (eq? var (caadr exp)))
(free? var (caddr exp)))
#t
#f)))))
(free? 'y '((lambda (x) x) y)) ==> #t (free? 'x '((lambda (x) x) y)) ==> #f (free? 'y '(lambda (y) ((lambda (x) x) y))) ==> #f (free? 'y 'x) ==> #f (free? 'y '(x y)) ==> #t (free? 'x '(x y)) ==> #tx occurs bound in an expression E if and only if
a) E is of the form (E1 E2) and x occurs bound
in
E1 or E2 or
b) E is of the form (lambda (y) E1) where x
occurs
bound in E1 or x and y are the same
variable and
y occurs free in E1
No variable occurs bound in an expression consisting of just a single variable.
(define bound?
(lambda (var exp)
(if (symbol? exp)
#f
(if (and
(pair? exp)
(= 2 (length exp))
(exp? (car exp))
(exp? (cadr exp))
(or
(bound? var (car exp))
(bound? var (cadr exp))))
#t
(if (or
(and
(pair? exp)
(= 3 (length exp))
(eq? 'lambda (car exp))
(= 1 (length (cadr exp)))
(symbol? (caadr exp))
(exp? (caddr exp))
(bound? var (caddr exp)))
(and
(pair? exp)
(= 3 (length exp))
(eq? 'lambda (car exp))
(= 1 (length (cadr exp)))
(symbol? (caadr exp))
(exp? (caddr exp))
(eq? var (caadr exp))
(free? (caadr exp) (caddr exp))))
#t
#f)))))
(bound? 'x 'x) ==> #f (bound? 'x 'y) ==> #f (bound? 'x '(x y)) ==> #f (bound? 'x '(lambda (x) x)) ==> #t (bound? 'y '((lambda (x) x) y)) ==> #f (bound? 'y '(lambda (y) ((lambda (x) x) y))) ==>#tGiven the predicates exp?, free? and bound? we could produce two functions (free exp) ==> list of all free variables in exp and (bound exp) ==> list of all bound variables in exp.
However, it is more important that we discuss the rules for evaluation in our lambda calculus language. The evaluation of an expression involves attaching a meaning to each expression in the lambda calculus.
To evaluate a lambda expression exp:
1. If exp is a variable reference, return its binding.
2. If exp is a function call, (E1 E2), evaluate its argument E2 and bind that value to the formal parameter y of the lambda expression (lambda (y) E) to which E1 evaluates, then evaluate E and return that value.
3. A lambda expression just evaluates to itself.