Lecture Notes

for

CSCI 301: Great Ideas in Computer Science

(c) 1993, 1994, 1995 John E. Howland, All Rights Reserved

Department of Computer Science

Trinity University

Stadium Drive

San Antonio, Texas 78212-7200

Internet: jhowland@ariel.cs.trinity.edu

9 Language Translation ( Terminal Session on this Machine )

This discussion of language translation deals with some of the aspects of the translation of higher level programming languages to lower level machine languages. We do not here consider the computer application area of translation of one natural language, such as Russian, to another natural language such as English. Machine translation of natural languages is a much more complex problem. We begin with some background concepts from mathematics.

9.1 Inductive Specification

The problem of language translation is broken down into two sub-problems; syntax analysis and semantic analysis. Syntax analysis deals with the problem of recognizing whether or not a sentence in the language (expression) is correctly formed according to the rules of syntax for the language. Semantic analysis deals with the meaning of correctly formed sentences.

We start with the problem of syntax analysis. We need some method of specifying, precisely, the collection of syntactically correct sentences of a language. Inductive specification is a tool which may be applied to form strings of symbols chosen from an alphabet.

Inductive specification is a method of precisely specifying a set of values. As an example, consider the following specification of a certain subset of the natural numbers.

The set S is defined to be the smallest set of natural numbers satisfying:

a) 0 is an element of S

b) If x is an element of S, then x+2 is an
element of S.

We can also reason about the set S in a non-inductive fashion as follows. 0 is in S (property a). 2 is in S (property b). 4 is in S (property b), etc. This leads us to think of the set of non-negative multiples of 2. Call this set M. It follows that S contains the set M. Similarly, since S is the smallest set of natural numbers satisfying a) and b), S must be a subset of every set satisfying a) and b). Hence, it must be true that M = S since M is a subset of S and S is a subset of M.

We can generalize the idea of inductive specification as follows. To specify a set A by induction, we define it to be the smallest set satisfying two properties of the form:

a) Some specific values must be in A.

b) If certain values are in A, then certain
other values are also in A.

This means (non-inductively) that A is the set of all elements satisfying property a) together with all elements which are obtained by repeatedly applying b). However, this non-inductive specification is not precise enough for our purposes.

9.2 Language Syntax Specification

We wish to apply inductive specification of a set to the problem of specifying the syntax of a programming language. In particular, we wish to use inductive specification to specify syntactic elements of a programming language. Each such set will be a set of strings which are correctly formed strings of symbols in the programming language. The rules which determine the set of all acceptable strings of symbols for a language is called the grammar of the language or the syntax of the language.

9.2.1 Example

Let us suppose that we wish to give the Scheme programming language grammar for a subset of Scheme known as list-of-numbers. Intuitively we are thinking of lists such as (1 2 3), (), etc., but not ( 9 2.4 a 3). Consider the following inductive specification:

The syntax element list-of-numbers is the smallest set of objects satisfying the two properties:

1. The empty list, '() is a list-of-numbers

2. If l is a list-of-numbers and n is a
number, then the pair (cons n l) is a
list-of-numbers.

Here, we need to be careful about the choice of symbols and the preciseness of our rules. Notice that we wrote '() rather than (). If we are writing for human readers we should use (), but if we are going to try to enter the datum () to a Scheme interpreter, we must quote the empty list. Here we are writing for human readers so () is preferrable. Similarly, since we are writing for human beings, we need a notation for pairing n and l. The notation we shall use is the standard lisp or Scheme pair notation ( n . l) which a lisp or Scheme interpreter will print as (n l). We will use the notation (n . l) rather than the lisp or Scheme list constructor (cons n l).

Using this notation we rewrite the inductive specification as

1. The empty list, () is a list-of-numbers


2. If l is a list-of-numbers and n is a
number, then the pair (n . l) is a
list-of-numbers.

We can infer the following:

a) () is a list-of-numbers by property 1.

b) (10 . ()) ==> (10) is a
list-of-numbers by property 2 since
10 is a number.

c) (3 . (10)) ==> (3 10) is a
list-of-numbers by property 2 since
3 is a number.

d) Nothing is a list-of-numbers unless it is
constructed in this manner.

9.3 Bacus-Naur Form and Syntactic Derivations

Bacus-Naur form is a notation which can be used to give inductive specifications for the syntactic elements of a language. The problem we face when we wish to define the rules of syntax (grammar) of a language is that we have to use notation (often involving the same alphabet as used in the language) to describe the rules for forming strings of symbols. We need to be able to distinguish symbols appearing in the syntax rule which are part of the rule itself from symbols which are part of a properly formed string of the language. Bacus and Naur developed a notational scheme, called BNF, for such syntax descriptions.

BNF can be used to inductively define a number of sets of syntactic elements of a language at once. These sets are called syntactic categories, or sometimes nonterminals and we write the names of these sets by enclosing them in "<" ">" . For example, the list-of-numbers Scheme data type would be a nonterminal and would be written as <list-of-numbers>. Each syntactic category is defined by a finite set of rules, or productions. Each rule asserts that certain values must be in the syntactic category.

9.3.1 Example

Here is the BNF definition of the list-of-numbers syntax. This description has two rules.

<list-of-numbers> ::= ()

<list-of-numbers> ::= (<number> . <list-of-numbers>)

The first rule says that the empty list is in <list-of-numbers> and the second rule says that if n is in <number> and l is in <list-of-numbers>, the (n . l) which is (cons n l) is in <list-of-numbers>. Each BNF rule starts with a syntactic category followed by ::= which is read as "is". The right hand side of the rule or production specifies how a member of the syntactic category is to be constructed from other syntactic categories and terminal symbols.

In the above rules the parens "(" ")" and period "." are terminal symbols and <number> and <list-of-numbers> are nonterminals. Sometimes we use the symbol "|" to shorten or simplify rules. "|" is read as or. The above two rules can be combined as:

<list-of-numbers> ::= () | (<number> . <list-of-numbers>)

Another shorthand is the Kleene star {...}* which means zero or more repetitions of what ever is between the braces. And the Kleene plus {...}+ which means one or more repetitions of what ever is between the braces. For example, the <list-of-numbers> category might be defined as:

<list-of-numbers> ::= ({<number>}*)

The shorthand items "|", Kleene star and plus are not really necessary, but are used with BNF to help shorten the descriptions.

As an example, consider the following specifications of Scheme data:

9.3.2 Example

<list> ::= ({<datum>}*)

<dotted-datum> ::= ({<datum>}+ . <datum>)

<vector> ::= #({<datum>}*)

<datum> ::= <number> | <symbol> | <boolean> |
<string> | <list> | <dotted-datum> | <vector>

We are discussing how one formulates the syntax rules of a programming language. The properties of a program which can be determined by analyzing just the text of a program are said to be static properties. Similarly, the dynamic properties of a program are those which are determined by run-time inputs. Analysis of static properties is important because a translator can use this information to catch certain kinds of program errors and perhaps write a more efficient program.

9.4 Lambda Calculus Language

If the variable x is a formal parameter, a reference to x is a static program property in most programming languages. We wish to explore this idea a little more fully. Suppose we consider a tiny programming language defined by the following BNF rules.

<exp> ::= <var-ref>
| (lambda (<var>) <exp>)
| (<exp> <exp>)

This is a language which has references to variables, lamba expressions with one formal parameter and procedure calls.

This language is the lambda calculus which was invented by Alonzo Church in his 1941 book "The Calculi of Lambda-Conversion. The lambda calculus is a fundamental basis for most programming languages. I strongly influenced the design of Lisp and particularly the Scheme programming language which we are using to describe computer science topics. We are using the lambda calculus language as an example more because it is a tiny language than because of its theoretical significance.

The notation Church used for his lambda calculus would look something like the following when expressed in BNF:

<exp> ::= <var-ref>
| lambda <var> . <exp>
| <exp> <exp>

We have used a Scheme syntax with "(" ")" and the keyword lambda so that we can more easily use Scheme to model this language.

9.4.1 Predicate exp?

Following is an implementation of a predicate exp? which returns true if an expression is a correctly formed string in the lambda calculus.
(define exp?
  (lambda (exp)
          (if (symbol? exp)
              #t
              (if (and
                   (pair? exp)
                   (= 3 (length exp))
                   (eq? 'lambda (car exp))
                   (pair? (cadr exp))
                   (= 1 (length (cadr exp)))
                   (symbol? (caadr exp))
                   (exp? (caddr exp)))
                  #t
                  (if (and
                       (pair? exp)
                       (= 2 (length exp))
                       (exp? (car exp))
                       (exp? (cadr exp)))
                      #t
                      #f)))))

9.4.2 Examples:

(exp? 'x) ==> #t
(exp? '(x y)) ==> #t
(exp? '(x y 2)) ==> #f
(exp? '(lambda (x) x)) ==> #t
(exp? '(lambda (x y) x)) ==> #f

9.5 Semantics

Next we give some definitions which are important when we begin to attach some semantics (meanings) to our lambda expression language.

We know the the lambda calculus allows us to

a) reference variables

b) define functions of one argument

c) call functions of one argument

A variable reference is said to be bound in an expression if it refers to a formal parameter introduced in the expression.

A variable reference is said to be free in an expression if it is not a formal parameter in the expression.

9.5.1 Example

((lambda (x) x) y)
The reference to x is bound.

The reference to y is free.

A variable is said to occur bound in an expression if the expression contains a bound reference to the variable.

Similarly, a variable is said to occur free in an expression if the expression contains a free reference to the variable.

When evaluated at run time each variable reference must have a value. Variables bound to formal parameters are said to be lexically bound. Other wise they must be bound at the top level or supplied by the system. Such variables are said to be globally bound.

9.5.2 Example

In:
((lambda (x) x) y)
x is lexically bound.

y must be globally bound or an error will result

A reference to a variable that is neither lexically bound or globally bound is an error.

The value of an expression depends only on the values of the variables which occur free in the expression.

The context surrounding the expression must supply the values of the free variables.

9.5.3 Example

The value of ((lambda (x) x) y) depends on y because y is free in this expression.

9.5.4 Example

The value of ((lambda (x) x) y) in
(lambda (y) ((lambda (x) x) y))
is determined by the binding of the parameter y.

Hence,

((lambda (y) ((lambda (x) x) y)) 2) ==> 2
Note that y was free in ((lambda (x) x) y) but bound in (lambda (y) ((lambda (x) x) y)).

The value of an expression is independent of bindings for variables that do not occur free in the expression.

9.5.5 Example

(define x 5)
(define y 4)

((lambda (x) x) y) ==> 4

x ==> 5
The meaning, that is, semantics, of (lambda (x) x) is always the same. It does not depend on the variable x. It is the identity function, i.e., it returns whatever value is passed as an argument.

9.5.6 Example

Here is another example of a useful procedure which has no free variables.
(lambda (f)
  (lambda (x)
    (f x)))
This procedure takes a function as an argument and returns a function of one argument which applies that argument to the function.

Lambda expressions without free variables are called combinators. We have seen the identity function and the application combinator.

(lambda (f)
  (lambda (g)
    (lambda (x)
      (f (g x)))))
is another example.

Next we give formal definitions of free and bound occurrences in an expression.

x occurs free in an expression E if and only if

a) E is a variable reference and E is x or

b) E is of the form (E1 E2) and x occurs free in
E1 or E2 or

c) E is of the form (lambda (y) E1) where y is
different from x and x occurs free in E1.

9.5.7 Examples

x is free in (x y)
y is free in y
y is free in (x y)
y is free in ((lambda (x) x) y)

9.5.8 Predicate free?

Next we define a Scheme predicate for free variables in the lambda calculus.
(define free?
  (lambda (var exp)
    (if (and
         (symbol? exp)
         (eq? var exp))
      #t
      (if (and
           (pair? exp)
           (= 2 (length exp))
           (exp? (car exp))
           (exp? (cadr exp))
           (or
            (free? var (car exp))
            (free? var (cadr exp))))
        #t
        (if (and
             (pair? exp)
             (= 3 (length exp))
             (eq? 'lambda (car exp))
             (= 1 (length (cadr exp)))
             (symbol? (caadr exp))
             (exp? (caddr exp))
             (not (eq? var (caadr exp)))
             (free? var (caddr exp)))
          #t
          #f)))))

9.5.9 Examples

(free? 'y '((lambda (x) x) y)) ==> #t

(free? 'x '((lambda (x) x) y)) ==> #f

(free? 'y '(lambda (y) ((lambda (x) x) y))) ==> #f

(free? 'y 'x) ==> #f

(free? 'y '(x y)) ==> #t

(free? 'x '(x y)) ==> #t
x occurs bound in an expression E if and only if

a) E is of the form (E1 E2) and x occurs bound in
E1 or E2 or

b) E is of the form (lambda (y) E1) where x occurs
bound in E1 or x and y are the same variable and
y occurs free in E1

No variable occurs bound in an expression consisting of just a single variable.

9.5.10 Predicate bound?

Following is a predicate to determine whether or not a variable is bound in an expression in the lambda calculus.
(define bound?
  (lambda (var exp)
    (if (symbol? exp)
      #f
      (if (and
           (pair? exp)
           (= 2 (length exp))
           (exp? (car exp))
           (exp? (cadr exp))
           (or
            (bound? var (car exp))
            (bound? var (cadr exp))))
        #t
        (if (or
             (and
              (pair? exp)
              (= 3 (length exp))
              (eq? 'lambda (car exp))
              (= 1 (length (cadr exp)))
              (symbol? (caadr exp))
              (exp? (caddr exp))
              (bound? var (caddr exp)))
             (and
              (pair? exp)
              (= 3 (length exp))
              (eq? 'lambda (car exp))
              (= 1 (length (cadr exp)))
              (symbol? (caadr exp))
              (exp? (caddr exp))
              (eq? var (caadr exp))
              (free? (caadr exp) (caddr exp))))
          #t
          #f)))))

9.5.11 Examples

(bound? 'x 'x) ==> #f

(bound? 'x 'y) ==> #f

(bound? 'x '(x y)) ==> #f

(bound? 'x '(lambda (x) x)) ==> #t

(bound? 'y '((lambda (x) x) y)) ==> #f

(bound? 'y '(lambda (y) ((lambda (x) x) y))) ==>#t
Given the predicates exp?, free? and bound? we could produce two functions (free exp) ==> list of all free variables in exp and (bound exp) ==> list of all bound variables in exp.

However, it is more important that we discuss the rules for evaluation in our lambda calculus language. The evaluation of an expression involves attaching a meaning to each expression in the lambda calculus.

To evaluate a lambda expression exp:

1. If exp is a variable reference, return its binding.

2. If exp is a function call, (E1 E2), evaluate its argument E2 and bind that value to the formal parameter y of the lambda expression (lambda (y) E) to which E1 evaluates, then evaluate E and return that value.

3. A lambda expression just evaluates to itself.