Copyright (C) 2022-2025 Andrea Monaco

Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and this
notice are preserved.  This file is offered as-is, without any warranty.




Some notes on design choices, portability and ANSI conformance of alisp.



__Floating point arithmetic__

The four floating-point types of ANSI CL are all the same type and map to native
C double.  This is allowed by ANSI and was done as a quick implementation
strategy.  Of course, it may change in the future.



__Type checking__

Arguments to standard functions and macros (or result of evaluating arguments to
macros, when applicable) are all type checked.

Values assigned to standard variables (like *PACKAGE*, *READ-BASE* and so on)
are not type checked, so assigning an object of the wrong type will likely crash
the interpreter.

Currently, these behaviors can't be changed.



__Stack overflows__

As a protection against stack overflows in lisp code, al has a limit on stack
depth set by the LISP_STACK_SIZE macro in C.  If that limit is reached, the
interpreter will print an error message and abort to top level.



__Builtin functions and macros__

Some standard macros are implemented as special operators in al, i.e. they are
executed directly by the interpreter instead of being expanded; this is allowed
by ANSI, but the standard also requires a true macro implementation that is not
provided currently.  Note that SPECIAL-OPERATOR-P returns T for these special
operators.

al also provides additional non-standard special operators.  I think that this
is forbidden by the standard, understandably so that code walkers can rely on a
full knowledge of the language.  Such special operators will be removed in due
time.

You can freely redefine standard functions and macros.  If you redefine a
builtin with your lisp code though, there's no way to get the C definition back.
Be careful, you can easily make your image unusable this way.



__alisp extensions__

al has a small number of non-standard builtins; they are interned in the CL-USER
package.  There are also some non-standard extensions of standard functions.

- *AL-ARGC* (an integer) and *AL-ARGV* (a vector of strings) let you access argc
  and argv of C, respectively, so they contain the number and value of
  command-line arguments passed to the interpreter respectively

- AL-GETENV takes a string and returns the value of the environment variable
  with that name, or NIL if such does not exist

- AL-SYSTEM takes a string or NIL; in the first case it tries executing that
  command, in the second it tells whether an execution environment is actually
  available.  AL-SYSTEM uses the 'system' function from C, so the precise
  working depends on your C environment

- the AL-EXIT function takes an optional integer (defaulting to 0) and exits the
  interpreter with that return value

- the function AL-LIST-DIRECTORY takes a single pathname designator and returns
  a list with the filenames (without any leading path) contained in that
  directory

- the function AL-DIRECTORYP takes a single pathname designator, it queries the
  filesystem and returns T if the path designates a directory, NIL otherwise

- the function AL-GETCWD takes no arguments and returns the current working
  directory as a slash-terminated string

- the functions AL-PRINT-NO-WARRANTY and AL-PRINT-TERMS-AND-CONDITIONS print
  legal information

- the function AL-STRING-INPUT-STREAM-STRING takes an input string stream and
  returns a string with the characters left to read

- the standard function MAKE-STRING-OUTPUT-STREAM takes an optional string
  argument; if provided, writing operations append to the string, also
  respecting the fill pointer if present

- the types AL-BACKQUOTE, AL-COMMA, AL-AT and AL-DOT represent the respective
  elements in backquote notation.  You can call the AL-NEXT function to reach
  the next element when traversing them

- the variable *AL-COMPILE-WHEN-DEFINING* is described in "Compilation"

- AL-BREAK is a condition class that represents encountering a breakpoint

- the *AL-ENABLE-BREAKPOINTS* parameter lets you enable or disable breakpoints
  globally.  It defaults to T

- when you enter the debugger due to a condition, *AL-DEBUGGING-CONDITION* is
  bound dinamically to that condition object; if you entered the debugger due to
  stepping, this variable is NIL

- the functions AL-WATCH and AL-UNWATCH control watchpoints, see "Watchpoints"

- the variable *AL-PPRINT-DEPTH* is used by the pretty printer and contains the
  current level of indentation

- when the variable *AL-PRINT-ALWAYS-TWO-COLONS* is non-nil, the printer always
  prefixes symbols with two colons when it prints the package name.  This is
  used by the compiler

- the function AL-PRINT-RESTARTS prints the available restarts and returns T

- the function AL-PRINT-BACKTRACE takes an optional argument and prints the
  current backtrace of all called functions (including macros) and their
  arguments.  If the argument is non-nil, the function is verbose, meaning it
  also prints special forms and builtin macros as frames.  The function always
  returns T

- the function AL-LIST-BACKTRACE takes an optional argument and returns a list
  with the argument lists of each call in the backtrace, in the same order as
  AL-PRINT-BACKTRACE.  The argument has the same effect as that of
  AL-PRINT-BACKTRACE

- the function AL-DUMP-BINDINGS takes no arguments and returns a list with all
  the non-global live variable bindings, starting with the most recently
  established.  Each binding is a list containing the symbol, the value, the
  type of the binding (:LEXICAL or :SPECIAL); for lexical bindings, there's also
  a fourth element of T if the binding is in scope at that moment.  This
  function is intended to be called in the debugger, but works everywhere

- the function AL-DUMP-FUNCTION-BINDINGS does the same thing as
  AL-DUMP-BINDINGS, but with local function bindings

- the function AL-DUMP-CAPTURED-ENV takes a function object and returns a list
  with all the lexical variable bindings that the function closed over.  The
  format of the list is the same as AL-DUMP-BINDINGS, without the fourth field
  since it doesn't apply

- the function AL-DUMP-METHODS takes a generic function and returns a list of
  its method objects

- the standard function FUNCTION-LAMBDA-EXPRESSION also accepts a method object.
  In that case, it returns the body of that method as a list

- each function object carries a name field for clarity and debugging purposes.
  The function AL-FUNCTION-NAME takes a function and returns its name (or NIL
  for anonymous functions), while (SETF AL-FUNCTION-NAME) lets you change the
  name

- the functions AL-FUNCTION-BODY and (SETF AL-FUNCTION-BODY) let you inspect and
  change the body of a function or method object

- each function or method object carries a set of attributes expressed as a list
  of keywords.  The functions AL-FUNCTION-ATTRIBUTES and (SETF
  AL-FUNCTION-ATTRIBUTES) let you inspect and change such list.  Currently the
  only recognised attribute is :COMPILED

- the function AL-DUMP-FIELDS takes either a structure object, a standard object
  or a standard class and dumps the slots of that object or class as a list.
  Each slot of an object is represented as a symbol with its name, if it is
  unbound, or as a name-value pair, if it is bound; for classes, only the name
  is present

- the function AL-CLASS-PRECEDENCE-LIST takes a standard class object (not for a
  structure class nor a condition class) and returns the class precedence list
  of that class

- the special operators AL-LOOPY-DESTRUCTURING-BIND and AL-LOOPY-SETQ are
  similar to DESTRUCTURING-BIND and SETQ respectively, but they allow the kind
  of destructuring that LOOP uses, which has more lax rules than
  DESTRUCTURING-BIND.  In particular, you can provide more or less elements in
  the template than values, and you can put a NIL in the template to ignore the
  corresponding subtree

- the functions AL-START-PROFILING, AL-STOP-PROFILING, AL-CLEAR-PROFILING and
  AL-REPORT-PROFILING govern the profiler, see "Profiling".



__Literal objects__

In alisp, the reader always produces fresh objects.  Therefore modifying
literals, albeit undefined by ANSI, works as expected.  You can also modify the
result of a backquote expression.

Of course, such undefined behavior should be avoided if you want best
portability.



__Setting an undefined variable__

Doing SETQ or SETF on an undefined variable is undefined in ANSI, but it is
accepted by many implementations.  In al, this causes the variable to be
proclaimed special, so it is equivalent to a DEFPARAMETER.



__Arrays__

All arrays are adjustable in alisp.  As long as it's an interpreter, there's no
reason to do otherwise.



__Structures__

Redefining a structure type works fine, despite being undefined in ANSI.  But if
you redefine a structure class as something other than a structure, then calling
the constructor or accessors of the previous definition will cause a crash.



__Pathnames__

I don't like the filename API of Common Lisp very much.  I think it tries to be
so abstract as to accomodate every conceivable filesystem, while at the same
time giving so much leeway to the implementers that you can assume very little
about each implementation.

The syntax is also puzzling: if you want to represent the file "/home/foo/",
then why representing it as (:ABSOLUTE "home" "foo")?  The former is a simple
and recognizable string, while the second means allocating three conses, a
symbol and two strings, which is quite inefficient.
The standard seemingly implies that the second syntax is better because it is
independent of path separator characters, but is that so?  If you port your
program to different or exotic systems, important files will likely be in
totally different places, so separator characters will be the least of your
concerns.

I'd go as far as recommend to represent filenames as plain strings, avoiding
pathname objects entirely.  (I think that all pathname functions also accept
plain strings.)
(See also the file WHY-NO-PATHNAMES).

That said, alisp represents a pathname object as a string internally; you can
access the underlying string with NAMESTRING.

If you really need to access the "components" of a path, those are extracted
according to the following syntax: in "/home/foo/bar.baz", the directory is
"/home/foo/", the name is "bar" and the type is "baz".

If you specify :WILD as a path component, that component becomes a single
asterisk.  The value :WILD-INFERIORS becomes two asterisks, but most Posix
shells don't interpret those in a special way.  The value :UNSPECIFIC is never
allowed in any component of a pathname.

The truename of a file is just the file path, there's no resolution.

USER-HOMEDIR-PATHNAME tries reading the HOME environment variable, and nothing
else.

If you need to support another system than Posix, you have to change a few
things, but it's not hard.



__File operations__

PROBE-FILE does not work very well.  It tries to open the file for reading and
returns NIL if it can't, so it may return NIL even just for lack of permissions.
OPEN uses the same approach to determine if the file exists.  In the future I
will probably add an optional use of POSIX api for better file operations.



__Streams__

No stream is deemed interactive in alisp.



__Language of implementation__

The alisp codebase should be valid C89, to the best of my knowledge.
Unfortunately, the whole alisp is not C89, since libgmp, which is a required
dependency, seemingly is not.  I will remove the dependency on libgmp at some
point.  I don't know about libreadline, but you can still build without it.



__Loading cl.lisp__

If you don't load cl.lisp, you still have a decent and self-sufficient subset of
Common Lisp.



__Using ASDF__

alisp ships with a modified version of ASDF that you can load.  Only
ASDF:LOAD-SYSTEM has been confirmed to work.



__Number bases__

*READ-BASE* works correctly, so you can read numbers in any base from 2 to 36.
*PRINT-BASE* instead only works with the bases 8, 10, 16, due to a limitation of
libgmp.



__Character encoding__

al expects its input in UTF-8 and stores strings in the same encoding.  (Note
that UTF-8 is compatible with ASCII, but not with ISO-8859).  If al gets input
in a different encoding, it might still work somewhat, as long as it understands
basic macro characters, except for things like counting the characters in a
string or accessing characters by index.

Input that is not well-formed UTF-8 may cause incorrect behavior in string
manipulation functions, but should not cause a crash.



__Garbage collection__

For garbage collection, alisp uses the algorithm described in "A cyclic
reference counting algorithm and its proof" by Pepels, van Eekelen, Plasmeijer
(1988).  This is a kind of enhanced reference counting that also collects loops.
The paper contains a proof of termination and correctness.  I don't know of
other implementations using this, so this is somewhat experimental.

Constants defined with DEFCONSTANT are skipped when doing traversals of the
reference graph, so defining constants has true performance benefits.

Package objects are also skipped in reference counting.



__The ROOM function__

Calling ROOM shows the number of living objects of various types.  T means all
living objects; FUNCTION also includes macros.



__Profiling__

al has a basic profiler.  You can start profiling with AL-START-PROFILING, which
introduces some overhead, and stop it with AL-STOP-PROFILING.

If you later start profiling again, al will keep adding to the previous data;
calling AL-CLEAR-PROFILING clears all data.

AL-REPORT-PROFILING returns a list in which each element is a list which
contains a name, a counter of all the times that function or macro has been
called, all the time (in the same unit used by clock () of your C library, often
microseconds) spent in that function or macro, including the time spent in all
descendants in the call graph, and the average time spent per call, that is the
ratio between the second and first number.

For functions, the total time doesn't include evaluation of arguments.  For
macros, the time includes both expansion and evaluation of result.

Keep in mind that, when a function calls itself directly or indirectly, the time
spent in the inner invocation is counted twice, so the last number is often more
useful than the second one.

The profiler has a few limitations: structure constructor and accessors,
condition readers, functions obtained as macro functions and macros defined with
macro functions are not tracked.



__Compilation__

If the variable CL-USER:*AL-COMPILE-WHEN-DEFINING* is non-nil, then DEFUN and
DEFMACRO will compile the body of the function when they are evaluated.  Also,
each newly defined generic function will be marked as compiled, meaning that
each new method added to that function will get compiled.  The variable defaults
to NIL.

You may want to keep the variable disabled for debugging, since stepping through
a macro-expanded function is less clear.

If a function is compiled, FUNCTION-LAMBDA-EXPRESSION will return the body of
the function macro-expanded.



__Compiler macros__

Definitions of compiler macros are registered, but compiler macros are never
expanded.



__Error reporting__

When I started writing alisp, the condition system was not in place, so I had to
devise a simpler scheme for reporting errors: each time the interpreter
encountered an abnormal situation, it would print an appropriate error message
and abort to top level.
This was not terribly useful, since you couldn't really handle those conditions
nor enter the debugger.
Then at some point I implemented a decent subset of the condition system, so I
started replacing the old system of reporting with the one that ANSI requires.
Many types of error situations now raise proper conditions; the old system is
used in some places, but I will gradually replace it.



__Debugging__

Stepping is always available in the debugger, no matter if you enter it with
BREAK or in any other way.  The following commands are available:

- N executes the next form and then breaks again

- X only steps over the macroexpansion of the next form, if it is a
  (non-builtin) unexpanded macro; otherwise it behaves like N

- S steps inside the next form and breaks; for function forms, it will first
  step in the argument forms; for unexpanded macro forms, it will first step in
  the macro expansion process, then in the resulting form

- C continues execution at normal speed

- E continues execution until the end of current function

- BT is equivalent to (CL-USER:AL-PRINT-BACKTRACE)

- H or ? display help.

When the debugger is entered due to stepping, the result of evaluating the last
form is displayed preceded by " -> ", then a blank line, then the next form is
showed.  If the next form is a non-builtin macro, then it is followed by
"(macro)".  If you input an empty line at the debugger prompt, the last form or
debugging command is executed again.



__Watchpoints__

You can watch standard objects and hash tables for modifications.  The function
AL-WATCH takes a standard object or an hash table as argument and toggles
watching on it; it returns T if the object is one of those types, NIL otherwise.
AL-UNWATCH untoggles watching on its argument.

When a watched object is modified in any field or a watched hash table has an
object added or removed or is cleared, the debugger is entered.



__Minor details__

In functions that take both a :TEST and :TEST-NOT argument, the former takes
precedence.
