The Explorer

The Explorer
The Adventures of a Pythonista in Schemeland/7
by Michele Simionato
October 21, 2008

Summary
In this installment I pave the way to the heart of Lisp, i.e. to the famous *code is
data* concept. In order to do that, I will have to introduce two fundamental data
types first: symbols and lists.
_______________________________________________________________________________

## Symbols

Scheme and Lisp have a particular data type which is missing in most languages (with the
exception of Ruby): the symbol.

From the grammar point of view, a symbol is just a quoted identifier, i.e. a sequence of
characters corresponding to a valid identifier preceded by a quote. For instance, 'a,
'b1 e 'c_ are symbols. On the set of symbols there is an equality operator eq? which is
able to determine if two symbols are the same or not:

> (define sym 'a)
> (eq? sym 'b)
#f
> (eq? sym 'a)
#t

#f and #t are the Boolean values False and True respectively, as you may have imagined.
The equality operator is extremely efficient on symbols, since the compiler associates
to every symbol an integer number (this operation is called hashing) and stores it in an
interal registry (this operation is called interning): when the compiler checks the
identity of two symbols it actually checks the equality of two integer numbers, which is
an extremely fast operation.

You may get the number associated to a symbol with the function symbol-hash:

> (symbol-hash sym)
117416170
> (symbol-hash 'b)
134650981
> (symbol-hash 'a)
117416170

It is always possible to convert a string into a symbol and viceversa thanks to the
functions string->symbol and symbol->string, however conceptually - and also practically
- symbols in Scheme are completely different from strings.

The situation is not really different in Python. It is true that symbols do not exist as
a primitive data type, however strings corresponding to names of Python objects are
actually treated as symbols. You can infer this from the documentation about the builtin
functions hash e intern, which says: normally, the names used in Python programs are
automatically interned, and the dictionaries used to hold module, class or instance
attributes have interned keys. BTW, if you want to know exactly how string comparison
works in Python I suggest you to look at this post:

Scheme has much more valid identifiers than Python or C, where the valid characters are
restricted to a-zA-Z-0-9_ (I am ignoring the possibility of having Unicode characters in
identifiers, which is possible both in R6RS Scheme and Python 3.0). By convention,
symbols ending by ? are associated to boolean values or to boolean-valued functions,
whereas symbols ending by ! are associated to functions or macros with side effects.

The function eq?, is polymorphic and works on any kind of object, but it may surprise
you sometimes:

> (eq? "pippo" "pippo")
#f

The reason is that eq? (corrisponding to is in Python) checks if two objects are the
same object at the pointer level, but it does not check the content. Actually, Python
works the same. It is only by accident than "pippo" is "pippo" returns True on my
machine, since the CPython implementation manages differently "short" strings from
"long" strings:

>>> "a"*10 is "a"*10 # a short string
True
>>> "a"*100 is "a"*100 # a long string
False

If you want to check if two objects have the same content you should use the function
equal?, corresponding to == in Python:

> (equal? "pippo" "pippo")
#t

It you know the type of the objects you can use more efficient equality operators; for
instance for strings you can use string=? and for integer numbers =:

> (string=? "pippo" "pippo")
#t

> (= 42 42)
#t

To be entirely accurate, in addition to eq and equal, Scheme also provides a third
equality operator eqv?. eqv? looks to me like an useless complication of the language,
so I will not discuss it. If you are interested, you can read what the R6RS document
says about it.

## Lists

The original meaning of LISP was List Processing, since lists were the fundamental data
type of the language. Nowadays Lisp implements all possible data types, but still lists
have a somewhat privileged position, since lists can describe code. A Lisp/Scheme list
is a recursive data type such that:
1. the list is empty: '()
2. the list is the composition of an element and a list via the cons operation (cons
stands for constructor): (cons elem lst).

For instance, one-element lists are obtained as composition of an element with the empty
list:

> (cons 'x1 '()); one-element list
(x1)

Two-elements lists are obtained by composing an element with a one-element list:

> (cons 'x1 (cons 'x2 '())); two-element list
(x1 x2)

That generalizes to N-element lists:

> (cons 'x1 (cons 'x2 (cons 'x3 ..... (cons 'xN '()))) ...)
(x1 x2 x3 ... xN)

For simplicity, the language provides an N-ary list constructor list

> (list x1 x2 x3 ... xN)
(x1 x2 x3 ... xN)

but the expression (list x1 ... xN) is nothing else than syntactic sugar for the fully
explicit expression in terms of constructors.

There are also the so-called improper lists, i.e. the ones where the second argument of
the cons is not a list. In that case the representation of the list displayed at the
REPL contains a dot:

> (cons 'x1 'x2) ; improper list
(x1 . x2)

It is important to remember that improper lists are not lists, therefore operations like
map, filter and similar do not work on them.

As we anticipated in episode 4, the first element of a list (proper or improper) can be
extracted with the function car; the tail of the list instead can be extracted with the
function cdr. If the list is proper, its cdr is proper:

> (cdr (cons 'x1 'x2))
x2

> (cdr (cons 'x1 (cons 'x2 '())))
(x2)

At low level Scheme lists are implemented as linked list, i.e. as couples
(pointer-to-sublist, value) until you arrive at the null pointer.

Some example

To give an example of how to build Scheme lists, here I show you how you could define a
range function analogous to Python range. Here are the specs:

> (range 5); one-argument syntax
(0 1 2 3 4)
> (range 1 5); two-argument syntax
(1 2 3 4)
> (range 1 10 2); three-argument syntax
(1 3 5 7 9)
> (range 10 0 -2); negative step
(10 8 6 4 2)
> (range 5 1); meaningless range
()

Here is the implementation:

(define range
(case-lambda
((n); one-argument syntax
(range 0 n 1))
((n0 n); two-argument syntax
(range n0 n 1))
((n0 n s); three-argument syntax
(assert (and (for-all number? (list n0 n s)) (not (zero? s))))
(let ((cmp (if (positive? s) >= <=)))
(let loop ((i n0) (acc '()))
(if (cmp i n) (reverse acc)
(loop (+ i s) (cons i acc))))))))

Here case-lambda is a syntactic form that allows to define functions with different
behavior according to the number of arguments. for-all is an R6RS higher order function:
(for-all pred lst) applies the predicate pred to the elements of list lst, until a false
value is found - in that case it returns #f - otherwise it returns #t. Here the
assertion checks at runtime that all the passed arguments n0, n and s are numbers, with
s non-zero.

The first let defines a comparison function cmp which is >= if the step s is positive,
or <= if the step s is negative. The reason is clear: if s is positive at some point the
index i will get greater than n, whereas if s is negative at some point i will get
smaller than n.

The trick used in the loop is extremely common: instead of modifying a pre-existing
list, at each iteration a new list is created ex-novo by adding an element (cons i acc);
at the end the accumulator is reversed (reverse acc). The same happens for the counter i
which is never modified.

This is an example of functional loop; imperative loops based on mutation are considered
bad style in Scheme and other functional languages. Notice by contrast that in Common
Lisp imperative loops are perfectly acceptable and actually there is a very advanced
LOOP macro allowing you to do everything with imperative loops.

Actually, the range function defined here is more powerful than Python's range, since it
also works with floating point numbers:

> (range 1.3 2.5 .25)
(1.3 1.55 1.8 2.05 2.3)

As always, the only way to really get Scheme lists is to use them. I suggest you try the
following exercises:
1. implement an equivalent of Python enumerate for Scheme lists;
2. implement an equivalent of Python zip for Scheme lists.

I will show the solutions in the next episode. If you are lazy and you want an already
written list library, I suggest you to give a look at the SRFI-1 library, which is very
rich and available practically in all Scheme implementations. Many of the SRFI-1
features are built-in in the R6RS standard, but many other are still available only in
the SRFI-1 .

This weblog entry is Copyright © 2008 Michele Simionato. All rights reserved.