for
CSCI 1301: Great Ideas in Computer Science
(c) 1993 - 2007 John E. Howland, All Rights Reserved
Department of Computer Science
Trinity University
One Trinity Place
San Antonio, Texas 78212-7200
Internet: jhowland@ariel.cs.trinity.edu
We start with the problem of syntax analysis. We need some method of specifying, precisely, the collection of syntactically correct sentences of a language. Inductive specification is a tool which may be applied to form strings of symbols chosen from an alphabet.
Inductive specification is a method of precisely specifying a set of values. As an example, consider the following specification of a certain subset of the natural numbers.
The set S is defined to be the smallest set of natural numbers satisfying:
a) zero is an element of S
b) If x is an element of S, then x+2 is an
element of S.
We can also reason about the set S in a non-inductive fashion as follows. 0 is in S (property a). 2 is in S (property b). 4 is in S (property b), etc. This leads us to think of the set of non-negative multiples of 2. Call this set M. It follows that S contains the set M. Similarly, since S is the smallest set of natural numbers satisfying a) and b), S must be a subset of every set satisfying a) and b). Hence, it must be true that M = S since M is a subset of S and S is a subset of M.
We can generalize the idea of inductive specification as follows. To specify a set A by induction, we define it to be the smallest set satisfying two properties of the form:
a) Some specific values must be in A.
b) If certain values are in A, then certain
other values are also in A.
This means (non-inductively) that A is the set of all elements satisfying property a) together with all elements which are obtained by repeatedly applying b). However, this non-inductive specification is not precise enough for our purposes.
The syntax element list_of_numbers is the smallest set of objects satisfying the two properties:
1. The empty list, '' is a list-of-numbers
2. If l is a list_of_numbers and n is a
number, then the pair n , l is
a
list_of_numbers.
We can infer the following:
a) '' is a list_of_numbers by property 1.
b) 10 , '' ==> 10 is a
list_of_numbers by property 2 since
10 is a
number.
c) 3 , 10 ==> 3 10 is a
list_of_numbers by property 2 since
3 is a
number.
d) Nothing is a list-of-numbers unless it is
constructed in this manner.
BNF can be used to inductively define a number of sets of syntactic elements of a language at once. These sets are called syntactic categories, or sometimes nonterminals and we write the names of these sets by enclosing them in "<" ">" . For example, the list_of_numbers J data type would be a nonterminal and would be written as <list_of_numbers>. Each syntactic category is defined by a finite set of rules, or productions. Each rule asserts that certain values must be in the syntactic category.
<list_of_numbers> ::= ''
<list_of_numbers> ::= <number> , <list_of_numbers>
The first rule says that the empty list is in <list_of_numbers> and the second rule says that if n is in <number> and l is in <list_of_numbers>, the n , l which is n append l is in <list_of_numbers>. Each BNF rule starts with a syntactic category followed by ::= which is read as "is". The right hand side of the rule or production specifies how a member of the syntactic category is to be constructed from other syntactic categories and terminal symbols.
In the above rules the comma "," (the J word for append) and single quote "'" are terminal symbols and <number> and <list_of_numbers> are nonterminals. Sometimes we use the symbol "|" to shorten or simplify rules. "|" is read as or. The above two rules can be combined as:
<list_of_numbers> ::= '' | <number> , <list_of_numbers>
Another shorthand is the Kleene star {...}* which means zero or more repetitions of what ever is between the braces. And the Kleene plus {...}+ which means one or more repetitions of what ever is between the braces. For example, the <list_of_numbers> category might be defined as:
<list_of_numbers> ::= {<number>}*
The shorthand items "|", Kleene star and plus are not really necessary, but are used with BNF to help shorten the descriptions.
As an example, consider the following specifications of Scheme data:
<dotted-datum> ::= ({<datum>}+ . <datum>)
<vector> ::= #({<datum>}*)
<datum> ::= <number> | <symbol> | <boolean> |
<string> | <list> | <dotted-datum> | <vector>
This means that
(1 abc 2 10)is a valid Scheme datum.
We are discussing how one formulates the syntax rules of a programming language. The properties of a program which can be determined by analyzing just the text of a program are said to be static properties. Similarly, the dynamic properties of a program are those which are determined by run-time inputs. Analysis of static properties is important because a translator can use this information to catch certain kinds of program errors and perhaps write a more efficient program.
<exp> ::= <var_ref>
| <exp> <exp>
| monad def '<exp>'
This is a language which has references to variables, function calls and
monad definitions.
This language is the lambda calculus which was invented by Alonzo Church in his 1941 book "The Calculi of Lambda-Conversion. The lambda calculus is a fundamental basis for many programming languages. Ii strongly influenced the design of Lisp and particularly the Scheme programming language. We are using the lambda calculus language as an example more because it is a tiny language than because of its theoretical significance.
The notation Church used for his lambda calculus would look something like the following when expressed in BNF:
<exp> ::= <var-ref>
| <exp> <exp>
| lambda <var> . <exp>
We have used a J syntax with "monad", "define" and single quotes rather than the lambda so that we can more easily use J to model this language. The counter part to the <var> is the pronoun "y." which may appear in the quoted <exp>.
exprp =: monad define
NB. parse sentence into words
words =. word y.
NB. check for variable reference
if. 1 = tally words
do. varp open words
NB. check for verb application
elseif. 2 = tally words
do. (exprp open 0 from words) and exprp open 1 from words
NB. check for verb definition
elseif. 3 = tally words
do. ('monad' match open 0 from words) and ('def' match open 1 from words) and exprp do open 2 from words
elseif. 1
do. 0
end.
)
This
definition uses two other verbs; word and the predicate for variable
references varp. The complete discussion of how these two verbs
accomplish their task is a bit beyond our level of understanding at this point,
but that should not deter our use of the verbs in the definition
exprp.The verb word =: ;: separates a character string into a boxed list of items in the same manner that a J system would parse the character string into words as if the string were a J sentence. For example:
word 'a+b' ==> +-+-+-+ |a|+|b| +-+-+-+ word '+/%#' ==> +-+-+-+-+ |+|/|%|#| +-+-+-+-+ word 'data * items' ==> +----+-+-----+ |data|*|items| +----+-+-----+ word '-data' ==> +-+----+ |-|data| +-+----+The verb
varp =: monad def ' _1 < (4 !: 0) < , y.'is a predicate for a name which has been bound to some J item. For example:
a ==> 100 varp 'a' ==> 1 b ==> |value error varp 'b' ==> 0We are now ready to illustrate the operation of the predicate, exprp, for properly formed sentences.
exprp 'a' ==> 1 exprp 'a a' ==> 1 exprp 'b' ==> 0 b ==> |value error exprp 'a b' ==> 0 exprp 'a a a a' ==> 0 exprp 'monad def ''y.''' ==> 1 exprp 'monad def ''a''' ==> 1 exprp 'monad def ''b''' ==> 0
We want the lambda calculus to allow us to:
a) reference variables
b) apply monads to an input
c) define monads
A variable reference is said to be bound in an expression if it refers to a formal parameter introduced in the expression.
A variable reference is said to be free in an expression if it is not a formal parameter in the expression.
(monad def 'y.') aThe reference to y. is bound.
The reference to a is free.
A variable is said to occur bound in an expression if the expression contains a bound reference to the variable.
Similarly, a variable is said to occur free in an expression if the expression contains a free reference to the variable.
When evaluated at run time each variable reference must have a value. Variables bound to formal parameters are said to be lexically bound. Other wise they must be bound at the top level or supplied by the system. Such variables are said to be globally bound.
(monad def 'y.') ay. is lexically bound.
a must be globally bound or an error will result
(monad def 'y.') a ==> 100because
a ==> 100A reference to a variable that is neither lexically bound or globally bound is an error.
The value of an expression depends only on the values of the variables which occur free in the expression.
The context surrounding the expression must supply the values of the free variables.
(monad def '(monad def ''y.'') y.')is determined by the binding of the parameter y. .
Hence,
(monad def '(monad def ''y.'') y.') 2 ==> 2Note that y. was free in (monad def 'y.') y. but bound in
(monad def '(monad def ''y.'') y.')To put it another way, in the above expression, the two variables y. are distinct from each other.
The value of an expression is independent of bindings for variables that do not occur free in the expression.
y. =: 10
(monad def 'y.') y.
10
(monad def '(monad def ''y.'') y.') 2
2
y.
10
The
meaning, that is, semantics, of (monad def 'y.') is always the
same. It is the identity function, i.e., it returns whatever value is passed
as an argument.Next we discuss in more detail the idea of free and bound variable references. In J, all bindings are top level bindings unless they are bound in an explicit definition. As stated before, within the context of an explicit definition, the names x. and y. are lexically bound. In addition, any names assigned by the local binding symbol (spelled =.) are also bound variable references after each such binding. Any other variable references are said to be free. For example, consider the definition foo:
foo =: monad define
('one' ; 'two' ; 'three') =. y.
one + two * three - four
)
The
variable references one, two and three are bound in the first line of this
definition, while four is free. The value of this function will be dependent
on the value of the free variable four.