2013-01-20
Even though Lisp lists are remarkably powerful and flexible, they are not the only data-structure available in Common Lisp. Among the most useful other data structures are arrays and hash tables.
Vectors in Common Lisp come in two flavours - fixed size and
resizeable. The former are roughly the Lisp analogues of C/Java arrays
and the latter are more like Python lists or C++
std::vector.
Vectors can be created using the VECTOR function by
passing it an arbitrary number of elements in the vector:
CL-USER> (vector 1 2 3)
#(1 2 3)
CL-USER> (vector)
#()
These calls return a fixed-size vector. Note that the
literal syntax for vectors is #(...).
There is a more general function called MAKE-ARRAY that
can make multi- dimensional arrays(including vectors, which are just 1-D
arrays).
CL-USER> (make-array 5)
#(0 0 0 0 0)
CL-USER> (make-array '(2 2))
#2A((0 0) (0 0))
CL-USER> (make-array '(5))
#(0 0 0 0 0)
The only required argument to MAKE-ARRAY is the
dimension of the array being requested, as a list. As a convenience,
when we want a vector(i.e., only one dimension), we can skip wrapping
the size of that dimension in a list - a bare number is accepted. This
returns a resizeable vector. The arrays returned from
MAKE-ARRAY may not have their elements initialized, so they
may not be accessed before a SETF (this is not the case on
SBCL at least, but it is a good idea to initialize anyway). To
initialize all the elements of an array with a value, pass that value as
the :initial-element keyword argument to
MAKE-ARRAY.
CL-USER> (make-array '(2 2) :initial-element "hola")
#2A(("hola" "hola") ("hola" "hola"))
Resizeable vectors have a fill pointer associated with them,
which is basically the index of the next vacant position in the vector.
To create a vector of capacity 5 which is initially empty, we must
specify the :fill-pointer keyword argument to
MAKE-ARRAY as 0.
CL-USER> (make-array 5 :fill-pointer 0)
#()
CL-USER> (make-array 5 :fill-pointer 1)
#(0)
CL-USER> (make-array 5 :fill-pointer 1 :initial-element 10)
#(10)
Once we have a resizeable vector, we can insert elements in it using
the VECTOR-PUSH function, which adds an item at the fill
pointer and increments the fill pointer value and returns the index at
which the new element was inserted. The inverse to this is
VECTOR-POP, which returns the last pushed element and
decrements the fill pointer by one.
Note that when you try to push past the allocated capacity of a
vector, no push happens and VECTOR-PUSH returns
NIL to signify this. To make truly resizeable vectors set
the :adjustable keyword argument to non-nil when creating
the vector using MAKE-ARRAY. To insert stuff into a
resizeable vector, we use VECTOR-PUSH-EXTEND, which
increases the capacity of the underlying storage if a push is attempted
beyond the current capacity of the vector.
CL-USER> (defparameter *x* (make-array 2 :fill-pointer 0))
*X*
CL-USER> (vector-push 'a *x*)
0
CL-USER> (vector-push 'b *x*)
1
CL-USER> *x*
#(A B)
CL-USER> (vector-push 'c *x*)
NIL
CL-USER> *x*
#(A B)
;; Now a create a resizeable vector:
CL-USER> (defparameter *y* (make-array 2 :fill-pointer 0 :adjustable t))
*Y*
CL-USER> (vector-push-extend 'a *y*)
0
CL-USER> (vector-push-extend 'b *y*)
1
CL-USER> *y*
#(A B)
CL-USER> (vector-push-extend 'c *y*)
2
CL-USER> *y*
#(A B C)
One can create specialized vectors to hold strictly a particular type
of elements by passing a type descriptor to MAKE-ARRAY via
the :element-type keyword argument. So, one can create
resizeable and mutable strings like so:
CL-USER> (make-array 5 :fill-pointer 0 :adjustable t :element-type 'character)
""
Which implies that strings are actually implemented as vectors!
Both lists and vectors(both the general and specialized variants) are a form of sequences, which is a higher level abstraction. This calls for operations that are valid on any sequence - i.e., operations common to vectors and lists.
Two of them are LENGTH and ELT, for taking
the length of a sequence and getting the element at a particular index
in a sequence, respectively.
There are some other functions that operate on sequences:
COUNT takes an item and a sequence and returns the
number of occurrences of the item in that sequence.
FIND takes an item and a sequence and returns the item
if it is found in the sequence and NIL otherwise.
POSITION takes an item and a sequence and returns the
index of the first occurrence of the item in the sequence if found and
NIL otherwise.
REMOVE takes an item and a sequence and returns
a new sequence with all occurrences of the item removed.
SUBSTITUTE takes a new-item, item and a sequence and
returns a new sequence with all occurrences of the item replaced by
new-item.
All these functions use the generic object equality test
EQL when comparing two elements. But we can pass a custom
function that tests the equality of two elements in the
:test keyword argument. Further customization can be done
by passing in a one argument function as the :key keyword
argument which is applied on every element and the return value is used
for the comparison.
The :start and :end keyword arguments can
be given the starting and (one past) the ending indices of the
subsequence of the passed sequence to operate on. If the keyword
argument :from-end is true, the (sub)sequence is operated
on in reverse order.
In addition to these, SUBSTITUTE and REMOVE
take another keyword argument :count that specifies the
number of elements to substitute of remove in the result.
There is a class of sequence functions similar to the above, but
which, instead of taking an element and a sequence, take a one argument
predicate and a sequence. For example, REMOVE-IF-NOT takes
a predicate and a sequence and returns a sequence with all the elements
that satisfy that predicate. REMOVE-IF, on the other hand,
does the opposite - returns a sequence with all elements that do NOT
satisfy the predicate.
CL-USER> (defparameter *n* #(1 2 3 4 5))
*N*
CL-USER> (remove-if-not #'evenp *n*)
#(2 4)
CL-USER> (remove 1 *n*)
#(2 3 4 5)
CL-USER> (find 'a #((a 1) (b 2) (c 3)))
NIL
CL-USER> (find 'a #((a 1) (b 2) (c 3)) :key #'first)
(A 1)
CL-USER> (find 'a #((a 1) (b 2) (c 3) (a 4)) :key #'first)
(A 1)
CL-USER> (find 'a #((a 1) (b 2) (c 3) (a 4)) :key #'first :from-end t)
(A 4)
CL-USER> (substitute '(g 1) 'a #((a 1) (b 2) (c 3) (a 4)) :key #'first)
#((G 1) (B 2) (C 3) (G 1))
CL-USER> (substitute '(g 1) 'a #((a 1) (b 2) (c 3) (a 4)) :key #'first :count 1)
#((G 1) (B 2) (C 3) (A 4))
CL-USER> (substitute '(g 1) 'a #((a 1) (b 2) (c 3) (a 4)) :key #'first :count 1 :from-end t)
#((A 1) (B 2) (C 3) (G 1))
CL-USER> (substitute-if '(g 1) #'(lambda (x) (eql (first x) 'a)) #((a 1) (b 2) (c 3) (a )))
#((G 1) (B 2) (C 3) (G 1))
REMOVE-DUPLICATES takes a sequence and removes all the
duplicate elements, keeping the lasts of each kind in the default
invocation. It takes the same keyword arguments as REMOVE
except :count.
CL-USER> (remove-duplicates #(1 1 1 2 1 3 4 5 1))
#(2 3 4 5 1)
CL-USER> (remove-duplicates #(1 1 1 2 1 3 4 5 1) :from-end t)
#(1 2 3 4 5)
Some functions that operate on sequences as a whole are also
provided. For example, there is COPY-SEQ that returns a
copy of its sole argument and there is REVERSE that returns
a copy of its only argument with the items arranged in the reverse
order.
The CONCATENATE function creates and returns a new
sequence by concatenating any number of sequences. It must also be given
the type of the sequence we expect from it.
CL-USER> (reverse #(1 2 3))
#(3 2 1)
CL-USER> (reverse '(1 2 3))
(3 2 1)
CL-USER> (concatenate 'vector #(1 2 3) '(a b c))
#(1 2 3 A B C)
CL-USER> (concatenate 'list #(1 2 3) '(a b c))
(1 2 3 A B C)
CL-USER> (copy-seq #(1 2 3))
#(1 2 3)
REVERSE and COPY-SEQ return a sequence of
the same type as their sole argument.
Sorting and merging support is provided in the CL standard library
via the SORT, STABLE-SORT and
MERGE functions. Both SORT and
STABLE-SORT are destructive functions and will modify their
argument. Like CONCATENATE, MERGE also
requires a type specifier as the first argument which becomes the type
of the sequence returned.
SUBSEQ can be used to extract/assign-to
subsequences.
CL-USER> (subseq "lama" 1)
"ama"
CL-USER> (concatenate 'string "os" (subseq "lama" 1))
"osama"
CL-USER> (subseq "obama" 1 3)
"ba"
SUBSEQ returns a SETF able place, but it
does not extend/shrink a sequence. If the new value and the subsequence
to be replaces are of different lengths, the shorter one determines the
number of characters actually replaced.
CL-USER> (defparameter *x* (copy-seq "foobarbaz"))
*X*
CL-USER> (subseq *x* 3 6)
"bar"
CL-USER> (setf (subseq *x* 3 6) "xxx")
"xxx"
CL-USER> *x*
"fooxxxbaz"
CL-USER> (setf (subseq *x* 3 6) "abcs")
"abcs"
CL-USER> *x*
"fooabcbaz"
CL-USER> (setf (subseq *x* 3 6) "xx")
"xx"
CL-USER> *x*
"fooxxcbaz"
Common Lisp has hash-tables, that are the CL analogs of Python dicts
or the C++ std::map. A hashtable can be created using
MAKE-HASH-TABLE which also accepts a :test
keyword parameter, which can only be one of EQL (which is
the default), EQUAL, EQ or
EQUALP.
The GETHASH function can be used to get the value stored
in a hash under a key. The first argument to GETHASH is the
key and the second is the hashtable. One can use SETF with
GETHASH to set values in a hashtable. GETHASH
returns two values - the first one is the value under the given key in
the given hash table, or NIL if there is no such key. The
second return value is a boolean which indicates whether the requested
key was present in the hashtable or not. This is needed because the
first return value being NIL can mean either the key is not
present or that while the key is present, the value under the requested
key is itself NIL.
CL-USER> (defparameter *h* (make-hash-table))
*H*
CL-USER> (gethash 'foo *h*)
NIL
NIL
CL-USER> (setf (gethash 'foo *h*) "hello")
"hello"
CL-USER> (gethash 'foo *h*)
"hello"
T
CL-USER> (setf (gethash 'foo *h*) "hello")
"hello"
CL-USER> (gethash :foo *h*)
NIL
NIL
REMHASH takes the same arguments as GETHASH
and removes the key given. CLRHASH clears an entire
hashtable.
The MAPHASH function takes a two argument function and a
hashtable and calls the passed function for each key-value pair in the
hashtable. One can SETF and REMHASH the
current entry, but other than that, adding/removing arbitrary elements
to a hashtable leads to undefined behaviour.
CL-USER> (defparameter *hashtab* (make-hash-table))
*HASHTAB*
CL-USER> (setf (getf 'one *hashtab*) 1)
1
CL-USER> (setf (getf 'two *hashtab*) 2)
2
CL-USER> (maphash #'(lambda (k v)
(format t "~a is counted out loud as ~a~%" v k))
*hashtab*)
1 is counted out loud as ONE
2 is counted out loud as TWO
NIL
To conclude, Common Lisp is a very rich language, which sometimes makes it look ugly, just like C++, but I’ll take the ugliness of a practical, powerful language anyday over being circumscribed by an aesthetically pleasing toy language.