Thursday, December 22, 2011

Lectures on the Metacircular Evaluator

I have been struggling with the notion of a "metacircular evaluator" for a while. Then I stumbled upon the MIT lectures for it.

The lectures are quite beautiful, and should be watched!

MIT's 6.001 Lecture 7A YouTube clip

MIT's 6.001 Lecture 7B YouTube clip

Bear in mind, these "clips" are roughly an hour long...but they will solve all problems involving the metacircular beast!

Monday, December 19, 2011

Learning Lua

So I have learned about LuaTeX, which basically implements TeX in Lua and thus enables Lua scripting inside a TeX document. Thus I want to learn Lua!

Hello World!

I installed the basic lua50 package on ubuntu, so let me write my first program.

-- begin hello.lua

-- This is a comment

print("Goodbye, Cruel World!")

-- end hello.lua

One runs it on the *nix command line by $ lua hello.lua

However, we can use write(...) instead of print(...), the difference is that print() automatically adds a new line.

Getting User Input

We can ask for user's information:

-- begin name.lua 

io.write("What's your name? ")
name = io.read()
print("Why, '"..name.."' is a stupid name!")

-- end name.lua

We start using the io module's functions. We need to indicate this by calling io.read() and io.write()

Unique Aspects of Lua

Lua is unique in that everything is-a table. That's how you implement classes, data structures, and so on. (Think how LISP has everything is-a list.)

Even operator overloading is handled through tables...well, "metatables".

There are many examples of advanced table usage on Lua's wiki. So next time, we'll start considering examples involving tables.

Thursday, December 15, 2011

CWEB, Part 2: Hello World returns!

Continuing from our previous post on CWEB, we will take the time honored example:
lets consider a more complicated "Hello World!" Program.

\def\title{Hello World, Reloaded}

@*A Simple Example.
This is a trivial example of a \.{CWEB} program.
It is, of course, the classic "hello, world"
program we all know and love:

@c
@<Header files needed by the program@>@;
@#
main(void)
{
   @<Print the message |"hello, world"|@>@;
}

@ Naturally, we use |printf| to do the dirty work:

@<Print the message |"Hello, World!"|@>=
printf("Hello, World!\n");

@ The prototype for |printf| is in the standard
header, \.{<stdio.h>}.

@<Header files needed by the program@>=
#include <stdio.h>

@*Index.

This is perhaps the simplest example demonstrating how to use chunk identifiers (those @<...@> things) in CWEB.

Note that \.{CWEB} typesets CWEB using typewriter font.

Also note that we don't have to define a chunk before we use it. That's what we did in this example, all the chunks were defined after they were used.

What are those @; symbols used for? Well, they're for formatting, and they don't do anything other than prettyprint the TeX output.

Monday, December 12, 2011

CWEB

It turns out there is some interesting stuff to do with CWEB.

First off, you program in chunks/modules called "sections". Each section contains either code or documentation. Each section begins with @.

The skeleton of a CWEB program might look like:

% this is a comment
\def\title{My awesome program!!!!} % this sets the title
\datethis % this sets the date
@*Introduction. % create the introduction section
This program will solve all my problems.

[Code not shown]

@*Index.

The line of code @*Introduction. will create a section with a title, it would look like "1. Introduction. ". It's TeX equivalent would be \medbreak\noindent{\bf 1.\quad Introduction.\quad}, and yes those \quad spacings are correct.

The first line of code puts a time stamp between the first section and the title.

The last line produces an index of all the variables.

The Code

How do we actually write code in CWEB?

We can use @c to start writing code, just as @ is used to write documentation.

And just as we had @*[title]., we have something analogous for code @<[title]>

However, the first time we use @<section name>, we are defining it. When we use it later on, we are calling it. Note the difference!

Comments

A simple comment is formatted by @q ... and it is not printed to the TeX file.

Write to Different Files

If one writes @(foo>, then the contents of the section (onwards?) is written to the file foo.

Macro Definitions

Instead of that ungodly #define ..., we write @d ... to do the same thing.

Some example code (from string.w):

@s string int
@s Xstring int @q -- a hint for the typesetter -- @>
@(xstring.h@>=
#ifndef XSTRING_H
#define XSTRING_H // prevent multiple inclusions
@#
class Xstring {
 @<private |Xstring| members@>@;
public:@/
 @<public |Xstring| members@>@;
};
@#
#endif

What does the @# line do? Well, it forces a line break, and that's all.

Similarly, the @/ line forces CWEAVE to put a line break in the C code.

But what do those @>= lines mean? We are initializing and declaring an identifier for a section. For us, that is xstring.h, and since we have prefaced it with @( it means we are working with the code in a separate file.

Observe that @<public |Xstring| members@>@; simply loads the code defined in section public |Xstring| members since the section ended without an equal sign, i.e. @> instead of >=.

More notes to come, but one might also be interested in Knuth's straighten.w program for computing irreps of the symmetric group.

Addendum

Here is the obligatory "Hello World!" type program:

@*Introduction.
This is a simple ``Hello World!'' program using literate
programming. I hope it works! (It does work!)

@c
#include <stdio.h>

int main(int argc, char** argv)
{
 printf("Hello world! I am literate programming?\n");
 return 1;
}

@*Index.

Saturday, October 29, 2011

Learning Assembly Language, Part 2: Little Man Computer

Last time we discussed the CARDIAC assembly language. We did everything by hand. It was also incredibly limited.

This time, we will introduce a more comprehensive assembly language which is also done completely by hand.

The architecture is known as the "Little Man Computer", it too is in decimal (base-10).

The little man computer is a model designed to assist in the understanding of the fetch/execute cycle, through the visualization of elements of the computer architecture that would normally remain hidden.

The story is the suppose we have a "little man" trapped in a room with 100 mailboxes.

We also have two mailboxes at the other end labelled INBOX and OUTBOX which are used for receiving and outputting data to the outside world.

In the center of the room, there is a work area containing a simple two function (addition and subtraction) calculator known as the Accumulator and a reset-able counter known as the Program Counter.

Architecture Specifications

So, we have 100 memory cells (the mailboxes) each can contain 3-digit numbers from -999 to +999.

The basic algorithm for the instruction cycle is given by:
  1. check the Program Counter for the mailbox number that contains a program instruction
  2. fetch the instruction from the mailbox with that number
  3. Increment the Program Counter (so that it contains the mailbox number of the next instruction)
  4. decode the instruction (includes finding the mailbox number for the data it will work on)
  5. fetch the data from the mailbox with the number found in the previous step
  6. execute the instruction
  7. store the new data in the mailbox from which the old data was retrieved
  8. Repeat the cycle or halt
Now what are the instructions?

As with the CARDIAC system, an instruction looks like OAA where the first digit specifies the opcode (the operation to be carried out) and AA contain the address which we need to work with.

There are roughly 10 operations for the little man computer, similar to the CARDIAC system.

In general, any assembly language will have a similar grouping of instructions into several families:
  1. Arithmetic: arithmetic instructions may operate on all registers or on just a special register (e.g. accumulator).
  2. Control: stuff like "copy contents of register 1 into register 2", loading to and storing from the accumulator, "jumps" like "goto", etc.
  3. Register-addressing method: these are indirectly loading values into registers, very important for theoretical computer scientists...not so important a distinction for the working coder.
  4. Input-output: usually present in all models.

Now, let us list off the instructions for the little man computer (taken from wikipedia).

Table of Little Man Instructions
Numeric code Mnemonic code Instruction Description
1xx ADD ADD Add the value stored in mailbox xx to whatever value is currently on the accumulator (calculator).
Note: the contents of the mailbox are not changed, and the actions of the accumulator (calculator) are not defined for add instructions that cause sums larger than 3 digits.
2xx SUB SUBTRACT Subtract the value stored in mailbox xx from whatever value is currently on the accumulator (calculator).

Note: the contents of the mailbox are not changed, and the actions of the accumulator are not defined for subtract instructions that cause negative results - however, a negative flag will be set so that 8xx (BRP) can be used properly.
3xx STA STORE Store the contents of the accumulator in mailbox xx (destructive).

Note: the contents of the accumulator (calculator) are not changed (non-destructive), but contents of mailbox are replaced regardless of what was in there (destructive)
5xx LDA LOAD Load the value from mailbox xx (non-destructive) and enter it in the accumulator (destructive).
6xx BRA BRANCH (unconditional) Set the program counter to the given address (value xx). That is, value xx will be the next instruction executed.
7xx BRZ BRANCH IF ZERO (conditional) If the accumulator (calculator) contains the value 000, set the program counter to the value xx. Otherwise, do nothing.
Note: since the program is stored in memory, data and program instructions all have the same address/location format.
8xx BRP BRANCH IF POSITIVE (conditional) If the accumulator (calculator) is 0 or positive, set the program counter to the value xx. Otherwise, do nothing.
Note: since the program is stored in memory, data and program instructions all have the same address/location format.
901 INP INPUT Go to the INBOX, fetch the value from the user, and put it in the accumulator (calculator)
Note: this will overwrite whatever value was in the accumulator (destructive)
902 OUT OUTPUT Copy the value from the accumulator (calculator) to the OUTBOX.
Note: the contents of the accumulator are not changed (non-destructive).
000 HLT HALT Stop working.
DAT DATA This is an assembler instruction which simply loads the value into the next available mailbox. DAT can also be used in conjunction with labels to declare variables. For example, DAT 984 will store the value 984 into a mailbox.

Mnemonics

In the Little man computer, there are these exotic things called "mnemonics". It works sort of like macros in C/C++...but not really.

For example, we could refer to data locations using mnenomics:
INP
STA FIRST 
INP
STA SECOND 
LDA FIRST 
SUB SECOND
OUT
HLT
FIRST DAT
SECOND DAT
The STA FIRST uses FIRST as-a mnenomic, and it is defined as-a DAT.

In short, a mnemonic is a short sequence of characters (e.g. FIRST or SECOND) that are an alias for an opcode (e.g. DAT).

Labels

This is the last innovative feature which differentiates Little Man from CARDIAC. We use labels to indicate portions of our program...you could think of this sort of like defining procedures.

Well, to be more precise, a label can be used to identify a particular instruction as a target for a BRANCH instruction ("where the procedure begins").

We can do more! We can also have "variables", that is associate data to a sequence of characters. We have seen this with the FIRST and SECOND example code.

An example program:
     INP
LOOP SUB ONE  // This address will be called LOOP, 
              // Subtract the value stored at ONE
     OUT
     BRZ QUIT // If we are at 0, then jump to the location QUIT
     BRA LOOP // We are not at 0, go back to the start of LOOP
QUIT HLT
ONE  DAT 1    // Put the value 1 in this mailbox,
              // and call it ONE (variable declaration)
Now what are some exercises?

Exercises

  1. Write a program that finds the first 100 prime numbers.
  2. Modify this program to take a positive number from the user n, and find the first n prime numbers.
  3. Implement a mul procedure to multiply integers together.
  4. Write up code to implement a floating point arithmetic division procedure (see Donald Knuth's Art of Computer Programming vol. 2 for details).
  5. Write a procedure to write a given number in a given base (e.g. write 720 in base 3).

Future Directions

Maybe next time I will begin discussing the MIXAL, the MIX assembly language designed by Donald Knuth.

Thursday, October 27, 2011

Studying FreeBSD, Part 0: Preparatory Materials

So, I am disappointed with Ubuntu's Unity interface, and immediate switch to Gnome 3. Out of disgust, I am switching to FreeBSD.

However, I would like to study the source code to be prepared for anything! How do this? I outline my thoughts in this post...

In the Beginning, there was Unix 6 (And it WAS Good!)


So FreeBSD traces its roots back to V6. Today, there is a modern version of V6 for the x86 machine: xv6 [mit.edu].

Some may find Lions' Commentary [lemis.com] to be useful as well.

This is a little bit dated, but fortunately the xv6 team has decent documentation [mit.edu] which explains a lot regarding operating systems.

One may be tempted to refer to Maurice J. Bach's excellent text Design and Implementation of the Unix System, but resist this urge! Bach's text discusses UNIX System V (1986 version), whereas V6 is a decade older.

FreeBSD derives from BSD-lite, which is in turn based on V6 not SysV!

On the second day, there was the SHELL (and it was GOOD!)


FreeBSD comes with the Almquist Shell, and the C-shell.

Should we worry about the C-shell, which is a preloaded shell used in FreeBSD? There are 10 reasons not to use the C-shell [grymoire.com]:
  1. The Ad Hoc Parser
  2. Multiple-line quoting difficult
  3. Quoting can be confusing and inconsistent
  4. If/while/foreach/read cannot use redirection
  5. Getting input a line at a time
  6. Aliases are line oriented
  7. Limited file I/O redirection
  8. Poor management of signals and sub-processes
  9. Fewer ways to test for missing variables
  10. Inconsistent use of variables and commands.
So...perhaps it is best to use something like ASH. This is the default shell for root, and it is a clone of the original Bourne shell.

The positive aspects to it: it's minimal, lightweight, doesn't consume a lot of memory.

The negative aspects to it: because it's so Spartan, it has no bells or whistles to speak of.

Perhaps I will eventually write up something on advanced ASH scripting one day...

Tuesday, October 25, 2011

Scheme course

I found the course page for MIT's 6.001 back in the day when they used SICP, like real programmers.

Well, here's the homepage [mit.edu] and particularly interesting are the quizzes [mit.edu].

The projects are worth looking at too...I may end up getting distracted doing one or two!

Sunday, October 23, 2011

Learning Assembly Language

Underlying Pedagogical Philosophy.
Recently in the New York Times, there was an article [nytimes.com] talking about the Waldorf School system which discourages the use of technology when teaching. I am sympathetic to this point of view, and believe (as a mathematician) that most subjects should be taught without technology.

What about computer science? Yes, to a degree, I believe so. What about learning assembly language...?

CARDIAC

The CARDboard Illustrative Aid to Computation (CARDIAC for short) was initially created at Bell labs to help understand programming in assembly language. It was also done "by hand": you are the computer!

It is done in base 10, and the computer has 100 memory cells. Each cell can hold signed numbers from ±0 to ±999.

You can download pdfs to print out and start computing here [kylem.net].

Each 3-digit number represents some instruction. It should be read as OAA where O is the digit referring to the instruction to carry out, and AA refers to the address of the memory cell involved.

Just some quick terminology, an "Accumulator" is a special memory cell (a CPU register, actually) where intermediate results of a calculation are held.

To keep track of which cell we are computing, one uses a "Program Counter". Originally it was a butterfly pin in the Bell Labs CARDIAC kit, but one could use anything.

Now, let us run quickly through the opcodes:

0 INP Input take a number from the input card and put it in a specified memory cell.
1 CLA Clear and add clear the accumulator and add the contents of a memory cell to the accumulator.
2 ADD Add add the contents of a memory cell to the accumulator.
3 TAC Test accumulator contents performs a sign test on the contents of the accumulator.
4 SFT Shift shifts the accumulator x places left, then y places right, where x is the upper address digit and y is the lower.
5 OUT Output take a number from the specified memory cell and write it on the output card.
6 STO Store copy the contents of the accumulator into a specified memory cell.
7 SUB Subtract subtract the contents of a specified memory cell from the accumulator.
8 JMP Jump jump to a specified memory cell. The current cell number is written in cell 99. This allows for one level of subroutines.
9xx HRS Halt and reset halt the machine and move the program counter to xx.

Note that cell 0 was "read only memory" (I guess one could write in pen to indicate that it won't ever change) and was 001.

Basic Algorithm

The basic algorithm to using the CARDIAC system is as follows:

  1. Begin with a clean output card, set the accumulator to 0.
  2. Set the program counter to memory cell 0.
  3. Read the opcode and the address from the cell.
  4. Look up the opcode, perform the operation using the address given.
  5. If the opcode is jump, set the current address to the address given; otherwise increment the program counter by 1 and go back to step 3.

Some Exercises

1. Consider the numbers 1, 1+2, 1+2+3, etc. These are triangular numbers. Write a program that prints out the first 100 triangular numbers.
2. How can we implement a loop in CARDIAC?
3. Implement Euclid's algorithm in CARDIAC.
4. Implement the Sieve of Eratosthenes in CARDIAC.

Final Remarks

This gives us some idea of how assembly language works, but it is also limited. Perhaps next time (if there is a next time!) we will look into the Littleman computer architecture.

Computer Science Classics

I have been trying to think of a list of "classics" in computer science. This is a far from complete list, but it is a good list nevertheless.

Wizard Book

  • Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs. Second Edition, MIT Press (1996). Eprint [mit.edu]

This was the first computer science book I read, so it has a particularly special place in my library.

The deepest concept in this book was actually the idea that mathematics and computer science aren't different fields. This was new to me, as I was learning topos theory and C++/Java simultaneously.

The Church Encoding [wikipedia.org] demonstrates the effectiveness of lambda calculus.

And for people who are crazy (e.g., me) there is H.P. Barendregt's classic text The Lambda Calculus, Its Syntax and Semantics (Studies in Logic and the Foundations of Mathematics, Volume 103, 1985).

Dragon Book

  • Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools. Second edition, Pearson Education (2006).

Again, this is a theoretical computer science book. This is also heavily mathematical (well, that's what computer scientists tell me!).

I don't have a copy, but I have browsed through portions of it.

For anyone thinking about writing a programming language, this ought to be read first. It will scare you straight! You'll never think about it again!

Minix Book

  • Andrew S. Tanenbaum and Albert S. Woodhull, Operating Systems Design and Implementation. Third edition, Addison-Wesley (2006).

This is a decent book on operating systems. I have a few minor qualms about it, but nothing major. It's wonderful when supplemented with other books, but alone I'm not too sure.

This presupposes knowledge of C, which one could refer to the C book by K&R.

The problem is: what books work well with it?

If an undergraduate came to me and said, "I know C but I want to learn about operating system internals. What do you recommend?"

I would recommend look at xv6 [mit.edu] first.

This gives you the basic idea of what an operating system is. It should be studied, along with (simultaneously) Maurice J. Bach's The design of the Unix Operating System (Prentice Hall, 1986).

Thus you have established a thorough knowledge of the algorithms and data structures underlying the Unix operating system, and you may begin to broaden your knowledge with other texts.

TAOCP

  • Donald Knuth, The Art of Computer Programming. Addison-Wesley.

There are (so far) 4 (ish) volumes published.

Knuth is a bad ass who introduced rigor to computer science. His algorithms are written in either a low-level language or a high-level language: MMIX assembly or English, respectively.

His exercises range from 5 minute puzzles to open research problems.

Quite possibly my favorite collection of books, definitely in my canon of texts.

It's hit-or-miss with most programmers, though. But one should admit: programming amounts to structured reasoning, and nothing helps more than mathematics!

Monday, October 3, 2011

Operating Systems books

Someone lent me this wonderful book:

  • William Stallings, Operating Systems: Internals and Design Principles. Seventh edition, Prentice hall, 2011.
It really is a beautiful book on operating systems for people that are interested and don't know much about hardware (or where to begin!).

I might write up some notes on the relevant information on hardware. But it is an excellent book, I'll try expanding on its virtues later.