Wednesday, September 19, 2007

Object Oriented C

Or "C for Java Programmers"...

Revision History Revision 0: (19 September 2007) Published. Revision 1: (19 September 2007) Modified the dummy header.

Introduction

A class, in the immortal words of Dr. Neat, is defined by three things: 1) constructor(s), 2) fields, and 3) methods.

Since this tutorial is written for Java programmers, I will focus on the implementation of classes in C by using structs, function pointers, etc.

One might ask "Why should I bother learning C since C++ is so much more convenient for me?" Well, that's a valid question a Java programmer may ask. The answer is that C, even object oriented C, is more sparing on the resources of the target machine.

So if you are doing embedded programming, you would want to opt for C rather than C++. If you are doing, e.g., generic programming of a calculator, or a shell, or something, you might want to use C++ instead.

But if one is programming on a machine with one megabyte of RAM and a 300 MHz processor, C is your language of choice.

Objects, Objects, Objects, and Objects

The inspiration of object oriented C comes from the virtual file system implementation on SunOS 2.0 from 1985 (see the immortal technical paper that describes the virtual file system implementation).

What we do is use structs as the object, function pointers as the methods, and - because we are using structs - the fields are already taken care of.

There are two approaches one can take: one is to follow the orthodox vnode approach and have a struct for the fields and a struct for the operations, or follow the lazy approach and use one struct.

Consider the following object:
1. struct object {
2.       int id;
3.       void (*dumpState)(struct object *this);
4.       bool (*equals)(struct object *this, struct object *o);
5.       char* (*toString)(struct object *this);
6. };
Line 2 is the field of the object class, lines 3-5 are the methods of the class. Note how the first argument of all the methods is the this pointer.

This is similar to Python's object oriented approach.

The alternate approach would be to do the following:
01. struct object {
02.       int id;
03.       struct obj_ops *ops;
04. };
05.
06. struct obj_ops {
07.       void (*dumpState)(struct object *this);
08.       bool (*equals)(struct object *this, struct object *o);
09.       char* (*toString)(struct object *this);
10. };
Note how the operations are encapsulated in one struct and the fields are in the other. This is the orthodox approach.

The problem with this approach is that if one wants to get the object's toString() method, the line of code would look like: obj.ops->toString(obj); as opposed to a simpler obj.toString(obj).

The UNIX KISS

This may be impresive to some, but I'm sure someone has said "Hey, if the toString() method takes in an struct object object anyways, why not just use the line of code toString(obj)?"

That is actually what Unix did. So if we were to take a renewed look at the list of operations and the structure, what we would do is in a hypothetical header write:
/* object.h header file */

struct object {
      int id;
};

void dumpState(struct object *this);
bool equals(struct object *this, struct object *o);
char* toString(struct object *this);

/* end of the header file */
Then the "methods" of the class would be implemented by simply calling the function and making the first argument the callee.

Some assembly benchmarks for C++

I'm rather curious about C++ to be honest...I've never dealt with a programming language without a common object model before (yes, I love Java and D).

So, I tried my hand at a few things. First, I wanted to check some properties of the mythical vtable. So what I do is I create a toy object by creating a file test.cpp and a header Object.hpp. The header is:
#include < iostream >
using namespace std;

class Object
{
public:
      Object() { }
      ~Object() { }

      int getAddress()
      {
            return (int)(this);
      }
      void address()
      {
            cout << "Object @" << getAddress() << endl;
      }
      void stat()
      {
            address();
            cout << "Size of object: " << sizeof(this) << endl;
      }
};
And then the test.cpp file I don't change that much:
#include "Object.hpp"

int main()
{
      Object* o = new Object();
      o->stat();
      return 0;
}
The number of lines of assembly for this, by using the command:
$ g++ -S test.cpp
Then opening test.s is 376 lines. This is the control.

Spice Things Up

The next thing I try is to make all the methods and constructors/destructors virtual. I don't know how, if at all, this would change anything. Mind you, I have not taken a formal course on C++ so I don't know what happens if I change anything.

I then found out, by means of a compiler error, that I cannot make constructors virtual. I decided to make the destructor non-virtual also.

The resulting assembly file is 414 lines long. So adding the virtual qualifier bloats the program.

I assume that this is because of the vtable in some manner, but I do not know for certain.

I am ashamed to admit it, but I am going to brush up on the vtable with Wikipedia. I'll probably write about it next...or else I'll write about object oriented C.