Object Oriented Kool Aid: pseudocode

1. Overview. So I've been using databases recently, and wanted to write a series of notes on them.

1.1. Warning: We will NOT talk about:

What's going on "under the hood" in a database
Compare various engines (SQL vs. NoSQL vs. etc.)
How to code a database engine from scratch
How to optimize a given database engine (e.g., "How do I tune MySQL?")

1.2. Topics Discussed in the Series. We will review a database agnostic pseudocode for queries, data definition, data manipulation. We will introduce the basic concepts, using black boxes along the way (e.g., "The database typically has a storage engine translating the data to the machine" and specify it no further!).

1.3. Topics Discussed in this post. We will discuss the components of a database, namely: a database record, record's field, query, table, and views.

1.4. Intended Audience. If you're a programmer, have not learned databases, but need to learn how to use them, then this article is for you! If you are trying to write a database in, say, C...this isn't for you.

Remark. It dawns on me others have pointed out SQL is a DSL [andrejkoelewijn.com]; this is true, but we need to abuse language when working with full generality. (Say what? We need to pretend the terms are not "strict" and "loose", e.g., weakening the demand that all entries in the database have identical format.)

Database Components

2.Terminology. So, suppose we're programmers for a library. They want a program to keep track of their books. What do we do?

2.1. Records. Well, each book is represented by some "Record" (also called a row or, in CS theory, tuple) which is like a plain old data structure.

I think theoretical computer scientists would want me to say "A record is a well-defined, ordered, finite list of heterogeneous fields" or something similar.

At any rate, returning to the library example, our records would be the books.

We call a record's components its "Fields" (or Columns). For a book, its fields would be the author, the title, the publisher, the year published, etc.

Right, so far a Database Record is just a composite data structure, and a Field is an entry in a record.

2.1.1. Pseudocode Conventions. I try to adhere to CLRS' pseudocode conventions (according to their third edition). So foobar.spam accesses the field spam of the variable foobar, a composite data structure. (This is eerily similar to JSON in JavaScript.)

So if hamlet represents a record, then the field hamlet.author should be the string "William Shakespeare".

2.1.2. Conventions. All records have some default fields: id (which is assigned by the engine "under the hood" to keep track of records). More may be added later, for now id is all that's needed.

2.2. Queries. A database is great because we can look stuff up. For libraries, we can find all books within a given subject, or find all the books one given author wrote, or...

Each of these are a "Query" which retrieve records from the database based on specific criteria (e.g., year published, author, subject, etc.).

2.3. Tables, Views. We usually stick these records into a file (or sometimes several files). The technical term for such a file is a "Table".

One may be tempted to ask "For our library example, what are the tables?" Good question!

The answer is non-unique: i.e., we have many different acceptable answers! We could have one giant file containing all the books, we could have one file per subject, or have each file correspond to the books published in different years, or...

Sometimes it's useful to do all of these! Why? Because they are the results of queries, and it may be faster to pre-store the results rather than dynamically generate it at runtime.

When we store in a table the results of a query for future use, we call this table a "View".

2.4. Future Directions: Modus Operandi. We have just introduced the components for a database, but there are four aspects the programmer needs to know about: (1) Queries, (2) Data manipulation, (3) Transaction controls, (4) Data definition.

We will begin discussing queries next time.

So I've been typing up my own notes in html, by hand (like in the bad old days of the internet), and I've discovered some rather interesting behaviour of CSS2.

My pseudocode conventions have composite data structures resemble C's struct, so for example:

struct myDataStructure {
    // code not shown
}

Comments are done in traditional C method too. How to accomplish this using  classes? Well, we first specify a class pseudo for pre denoting our pseudocode:

pre.pseudo { } /* pre's pseudocode class */

Now we will investigate each part of pseudocode we want to implement!

Data Structures

I use Plain Old Data Structures in my pseudocode, often writing myStructName and formatting it as struct myStructName...

/* write a structure as <span class="struct">myStructName</span> */
pre.pseudo span.struct:before {
    content:"struct ";
    font-weight:bold;
}

Functions

My shorthand for the word "function" is always "fn", so this is the class I created. The pseudocode conventions used is simply function doSomething(...), and to implement this:

pre.pseudo span.fn:before {
    content:"function ";
    font-weight:bold;
}

This is my personal convention, but there is also the C-style convention which I'll discuss.

Alternative Function Pseudocode

Some people like writing their function pseudocode to resemble C's ret-type function-name (arg-list), and it is possible to do this:

/* <span class="function" ret="void">foobar</span>() produces "void foobar()" */
pre.pseudocode span.function:before {
    content:attr(ret) " ";
/*
    font-weight:bold;
*/
}

If you prefer the function's ret-type to be bold, you can uncomment the second line.

For Loops

Our trusty for(...) construct is readily implemented:

/* for-loop formatting to write 
   <span="for">i=1; i<j; i++</span>
   be formatted as
   for(i=1; i<j; i++) */
pre.pseudo span.for:before {
    content:"for (";
    font-style:normal;
}
pre.pseudo span.for:after {
    content:") ";
    font-style:normal;
}

It would be nice if I could write ... to produce for(...), but I cannot think of a way to do so.

Comments

I end up using the comment convention // my comment which works well on paper, but due to html resizing...it can be problematical. So here's the CSS implementation of the paper-way, and commented out is the traditional C way /* writing a comment like this */.

/* comments are italicized texts beginning with a double slash // */
pre.pseudo span.comment {
    font-style:italic;
}
pre.pseudo span.comment:before {
    content:"// ";
    font-style:normal;
}
/* alternatively uncomment the following to have C++ like comments */
/*
pre.pseudo span.comment:before {
    content:"/* ";
    font-style:normal;
}
pre.pseudo span.comment:after {
    content:" */";
    font-style:normal;
}
*/

Constants

What about constants? For example, I like referring to the null object as NULL, can we do it?

pre.pseudo span#NULL:before {
    content:"NULL";
}

This would require us to write . This may seem like a waste of time and energy, but when you decide lowercase "null" looks prettier...you have to change only one line of css to get it!

Flaws

None!

No, I'm just kidding, the only problems I can think of is that these  classes work only within the <pre class="pseudo"> environment.

So if you were working in the <code> environment...you'd have trouble!

Of course, my pseudocode listings provided are incomplete (conditionals and while-loops are omitted, among other things), but you get the idea how to take advantage of CSS when "typesetting" pseudocode in html.

Object Oriented Kool Aid

Sunday, October 14, 2012

Databases, Part 1: Components