Coding Conventions

Before writing Fungimol, I read "Large Scale C++ Software Design" by John Lakos (Addison-Wesley, 1996, ISBN 0-201-63362-0), and liked it. Some of the programming practices used in Fungimol come from there, and others are reasonable things I learned elsewhere:
Identifier naming conventions
Class names start with an upper case letter, such as Class. Variable and function names start with a lower case letter. Multiword classes and variables have words after the first capitalized, such as the class RecursiveSlotValue or the variable currentItem. Constants have names that are all upper case and separated by underscores, such as WHEEL_DOWN.

Member variables have names starting with m_, such as m_data Static variables have names starting with s_.

Redundant Internal and External include guards
Internal include guards are quite common. At the beginning of a header file with a name like foo.h, one typically has:
#ifndef __Foo_h__
#define __Foo_h__
and at the end one has:
#endif
(Incidentally, the common practice of having a comment like /* __foo_h__ */ after the #endif is a deviation from the C++ standard, and is not used in Fungimol.) These internal include guards cause redundant inclusions of foo.h to have no effect, which is usually what you want. In a large program it is difficult and not worth the effort to avoid such redundant inclusions.

However, if you only have internal include guards, then the C preprocessor will have to parse the entirety of each redundant inclusion of the header file. If we also have the include guards outside the inclusion of foo.h, like this:

#ifndef __Foo_h__
#include "Foo.h"
#endif
then the C preprocessor will not have to parse foo.h when it is included after the first time. It is easy with Gnu Emacs macros to automate the addition of the external include guards.

The preprocessor symbol used for the include guard starts with two underscores, then has the filename with "." and "/" transliterated to underscores, then ends with two underscores. We use the same convention whether the include file is included with something like #include "Foo.h" or something like #include <Foo.h>.

For system header files, I can't control the name of the internal include guard that is used (and there may not be one), so we do something like this:

#ifndef __iostream_h__
#include <iostream.h>
#define __iostream_h__
#endif
The symbol is defined after the file is included just in case the system header file happens to have the same include guard name as the external include guard I'm using.
Naming files after classes
A class named Foo is normally declared in a file Foo.h and defined in a file Foo.cpp. It is quite common to know the name of a class and to want to look at the definition or declaration, so this is good.
Minimal includes
Each .cpp file is responsible for including all of the header files it directly uses. Each .h file is responsible for including all of the header files it needs to compile. Forward class definitions are used to minimize the amount that header files include other header files. For instance, we might have a header file Foo.h with this text:
#ifndef __Foo_h__
#define __Foo_h__

#ifndef __Quaternion_h__
#include "Quaternion.h"
#endif

class String;
class Foo
{
   Quaternion m_q;
public:
   String foo (const Quaternion &q);
}
#endif
and the class Foo would be defined as follows in Foo.cpp:
#include "Foo.h"

#ifndef __Quaternion_h__
#include "Quaternion.h"
#endif

#ifndef __String_h__
#include "String.h"
#endif

String Foo::foo (const Quaternion &q) {
  m_q = q;
  return "Hi";
}
Since Foo.cpp includes Foo.h as the first line, the compiler is checking that Foo.h includes everything it needs to compile. Since the size of the class String does not need to be known to lay out an instance of Foo, we don't need to include String.h from Foo.h. This way, a class that uses Foo but does not use the method Foo::foo may not include String.h, so it may not have to be recompiled if a comment is changed in String.h.
Aggressive memory allocator abuse detection: NEW instead of new
I used malloc to implement a custom memory allocator that can find leaks, corrupted memory blocks, use of freed memory, and some use of uninitialized memory. This requires that memory allocation be done with NEW instead of new. The macro NEW is defined in Util/MemoryUtil.h. The leak detection slows down memory allocation, but memory allocation is slow enough already that I had to take care not to do any in the inner loops, so I leave the leak detection turned on in the non-debug build.

If you want to allocate a new reference to a multi-argument template, then there's a battle between C++ and the preprocessor which is resolved in the case of two template arguments by using the macro NEW2. For example, in the normal case one invokes NEW like this to make a pointer to a string:

NEW (String ())
If we naively replace the type String with the template class HashTable <String, FactoryList>, we get an error when attempting to compile the following:
NEW (HashTable <String, FactoryList> ())  // Error
The problem is that so far as the preprocessor is concerned, NEW is now taking two arguments, specifically HashTable <String and FactoryList> (). NEW is only defined to take one argument, so this does not work. The problem is solved by using the macro NEW2 which takes two arguments and does the right thing with them:
NEW2 (HashTable <String, FactoryList> ())
If you need NEW3, add it to Util/MemoryUtil.h.

If you have some code that puts a pointer into static storage, and you don't want the leak detector to complain about it, you'll have to register a deallocator with the leak detector. Read Util/MemoryUtil.h for instructions.

After allocation but before use, the allocated memory is filled with many copies of the easily recognizable 32-bit hexadecimal constant 0xbabeface. After being freed, the previous contents of the allocated memory is overwritten with the constant 0xdeadbeef. Some out-of-bounds array references are detected by verifying that the program does not disturb the constant 0xfeeddeed that is placed before the memory block or the constant 0xfadedebb that is placed after the memory block. If you set the radix to 16, you will occasionally see these constants with the debugger.

This does not help when C library code is called, since C code does not call new and delete. The mcheck package is used under Linux to indulge my paranoia in this case.

myassert.h instead of <assert.h>
The default version of assert in Linux is declared to never return, so the compiler does not allocate a stack frame for it, thus making debugging more difficult. I wrote my own assert that is a normal function that leaves behind a normal stack frame.
Insulation
If the efficiency of a class doesn't matter much, then we can decrease the included files further by insulating the implementation of the class. One way to do this is to hide the implementation behind a pointer. An insulated version of the class Foo defined above would have this in Foo.h:
#ifndef __Foo_h__
#define __Foo_h__

class Quaternion;
class String;
class Foo
{
   struct FooData;
   FooData *m_data;
public:
   Foo ();
   ~Foo ();
   String foo (const Quaternion &q);
}
#endif
and this in Foo.cpp:
#include "Foo.h"

#ifndef __Quaternion_h__
#include "Quaternion.h"
#endif

#ifndef __String_h__
#include "String.h"
#endif

#ifndef __MemoryUtil_h__
#include "MemoryUtil.h"
#endif

#ifndef __myassert_h__
#include "myassert.h"
#endif

struct Foo::FooData {
   Quaternion m_q;
};

Foo::Foo ()
  : m_data (NEW (FooData ()))
{}

Foo::~Foo ()
{
   assert (m_data);
   delete m_data;
   m_data = 0;
}

String Foo::foo (const Quaternion &q) {
  m_data->m_q = q;
  return "Hi";
}
There are several things going on here:
Cooperating with etags
Some indentation conventions are designed to cooperate with etags. To help etags find class names, we declare them like this:
class SceneGraphConfiguration
  : public Configuration     // Good
{
...
};
and not like this:
class SceneGraphConfiguration : public Configuration // Bad
{
...
};
or like this:
class SceneGraphConfiguration : public Configuration { // Bad
...
};
Smart Pointers and reference counting
A commonly used template throughout the program is SP, which stands for "Smart Pointer". This has methods defined on it to make it behave like a pointer, except it maintains the reference count of the object it is pointing to, and it takes care of destructing the recently-pointed-to object if the reference count is zero after being decremented. The class that defines the reference count is called Refcount, and it is a subclass of most classes in the program.

Here's a useful idiom: if c has the type, say, SP<Configuration>, then the expression &*c has the type Configuration *. Since pointers get more implicit conversions than smart pointers, ordinary pointers are more useful for some purposes.

Smart Pointers and Constructors
If a constructor or a destructor of a subclass of Refcount passes this to some other function, there is the possibility that the other function will convert this to a smart pointer, which will in turn increment and then decrement the reference count and then deallocate the object at a time when the constructor or destructor intended to do more work with it. To avoid this, the constructor or destructor should call ref() before passing this to the function and call deref() afterward. This way, the reference count will not be decremented to zero. deref() does not deallocate the object if it decreases the reference count to zero.
Smart Pointers and Arguments
If a function takes an argument, that argument generally can be a plain pointer instead of a smart pointer. The caller of the function passed that argument in, so the caller should have a reference to the object for the duration of the function call. I did not understand this at the time I started writing the program, so this practice is not uniformly followed, but it is a good idea and should be done in any new code.

An exception to this rule is a function that stores a reference to the object. Suppose the function looks like this:

SP<Bar> theGlobalBar = 0;
void foo (Bar *b) {
   doSomething (b);
   theGlobalBar = b;  // Error: theGlobalBar may point to deallocated memory
}
Because foo stores a reference, it might be called as foo (NEW (Bar ())). In this case, if doSomething does smart pointer manipulation on its argument b, it will deallocate the object when it is done with the smart pointer manipulation, so theGlobalBar will be left with a pointer to deallocated storage, unless the program crashes while trying to update b's reference count while assigning to theGlobalBar. This can easily be fixed by writing foo like this:
SP<Bar> theGlobalBar = 0;
void foo (Bar *b) {
   SP<myB> = b;
   doSomething (myB);
   theGlobalBar = myB;  // OK
}
If foo didn't store a reference to b then a call like foo (NEW (Bar ())) is guaranteed to be a memory leak. Since we aggressively track down memory leaks, transforming this memory leak bug into a program crash just causes the bug to be found sooner rather than later, which is no great loss. Therefore a subroutine that does not store an argument need not take the argument as a smart pointer.

Functions that return a pointer to a reference counted object should generally return it as a smart pointer. In code like

SP<Bar> baz ();

... foo (baz ()) ... // OK
one might be concerned about the following scenario: However, the C++ standard specifies that "The life time of a temporary object [will] extend to the end of the full expresssion in which it is created, [except in some other cases where it has to extend even longer]" (Annotated C++ Reference Manual, Ellis & Stroustrup, AT&T, 1990, page 423.) The relevant temporary object in this example is the smart pointer itself, so this scenario will not happen and the smart pointer will not be deallocated before foo is called. This rule from the standard does not save the similar code
SP<Bar> baz ();

Bar* x = baz ();
... foo (x) ... // Error
which is bad because the caller is using x to store a pointer to a reference-counted object without using a smart pointer. The fix is to declare x as a smart pointer:
SP<Bar> baz ();

SP<Bar> x = baz ();
... foo (x) ... // OK

Objects with reference counts must not be allocated on the stack, since they are likely to then be freed, thus corrupting the stack. For instance, if Bar is a subclass of Refcount, the following code is likely to fail:

Bar b;
doSomething (&b); // Bad: doSomething may free b
A reasonable fix would be to change b to a smart pointer:
SP<Bar> b = NEW (Bar ());
doSomething (b);
Smart Pointers and reference cycles
The weakness of this reference-counting scheme is that a cycle of smart pointers will never be freed and will show up in the list of leaked blocks when the program exits. The usual fix for this is to replace one of the smart pointers in the cycle with an ordinary pointer.
Array allocation and bounds checking
Most of the arrays in the program are instances of the template Dynavec. This defines reasonably efficent variable-length arrays. A debug build will do bounds checking on array references.

Copyright 2000 Tim Freeman <tim@infoscreen.com>