Jonathan Livingston Seagull

is the title of an excellent book by Richard Bach. It's well worth rereading every year or so. As usual I'm going to quote from a few pages:
...to eat, to stay alive as long as we possibly can.
Jonathan Seagull discovered that boredom and fear and anger are the reasons that a gull's life is so short, and with these gone from his thought, he lived a long life indeed.
We choose our next world through what we learn in this one.
Heaven is not a place, and it is not a time. Heaven is being perfect.
You have less fear of learning than any gull I've seen in ten thousand years.
"Why is it", Jonathan puzzled, "that the hardest thing in the world is to convince a bird that he is free, and that he can prove it for himself if he'd just spend a little time practising? Why should that be so hard?
You don't love hatred and evil, of course. You have to practise and see the real gull, the good in every one of them, and to help them see it in themselves.
Don't believe what your eyes are telling you. All they show is limitation. Look with your understanding, to find out what you already know, and you'll see the way to fly.

Shadow Data Types

This article appeared in the December 2009 issue of the accu Overload magazine.

Shadow Data Types

Suppose we have a type called wibble defined as a concrete data type (that is, a type whose representation is fully visible) as follows:
// wibble.h
#include "grommet.h"
#include "flange.h"

typedef struct
{
   grommet g;
   flange f;
} wibble;

void wopen(wibble * w, int i);
...
void wclose(wibble * w);
The definition of wibble exposes the types involved in its representation (grommet and flange in this made up example), and hence requires a #include for those types in its header file. This exposure has a price. One cost is that any change to the grommet or flange header files, or any header files they in turn #include, at any depth, will require a recompilation of any source file that #includes wibble.h (either directly or transitively). Another cost is that client code can easily become reliant on the exposed representation rather than relying solely on the functional api. Note that in C++ you can avoid this problem by declaring your data members private.

Abstract Data Types

These costs are sufficiently high that software developers have invented techniques to hide a type's representation; to turn a type into an Abstract Data Type. An Abstract Data Type is simply a type that does not reveal its representation; a type that abstracts away its representation. In this article I'll look at two abstract data type implementation techniques: Opaque Data Types, and Shadow Data Types.

Opaque Data Types

The term Opaque Data Type is a well established term for the technique of exposing only the name of the type in its header file. This is done with a forward type declaration. This certainly has the affect of not exposing any representation!
// wibble.h
typedef struct wibble wibble;

wibble * wopen(int);
...
void wclose(wibble *);

Storage class restrictions

A definite downside with this approach is that clients cannot create objects.
#include "wibble.h"

void eg(void)
{
   wibble * ptr; // ok
   wibble value; // constraint violation
   ptr = malloc(sizeof *ptr); // constraint violation
}
The wibble type's representation is defined in its source file and so only code in the source file can create wibble objects. Furthermore, these wibble objects have to be returned to users as pointers. These two constraints mean the created objects cannot have auto storage class. This is a great loss since auto storage class alone of the three storage class options allows the clients to decide where the objects live which can greatly improve locality of reference.
// wibble.c
...
wibble * wopen(int value)
{
   wibble opened;
   ...
   return &opened; // very very bad
}
...
A second possibility is to create the objects with static storage class:
// wibble.c
...
static wibble wstorage[42];
static size_t windex = 0;
...
wibble * wopen(int value)
{
   wibble * opened = &wstorage[windex];
   windex++;
   ...
   return opened;
}
...
The final possibility is to create the objects with allocated storage class. That is, to create the objects dynamically on the heap:
// wibble.c
...
wibble * wopen(int value)
{
   wibble * opened = malloc(sizeof *opened);
   ...
   return opened;
}
...
The static and the allocated approaches have opposing advantages and disadvantages. Static storage is very fast and doesn't fragment the memory but the type has to decide the maximum number of objects the application will need. That might be a dependency going in the wrong direction. Allocated storage is much slower and can create memory fragmentation issues, but the application decides how many objects it needs. In short, the classic ADT technique creates an abstraction that is very opaque and pays a hefty price for this "over-abstraction". Abstracting away the representation also abstracts away the size details of a type and it is the loss of the size information that creates the storage class restrictions. The Shadow Data Type implementation technique attempts to rebalance these forces of abstraction by separating size abstraction from representation abstraction.

Shadow Data Types

The term Shadow Data Type, in contrast to Opaque Data Type, is not a well established term. The technique I'm calling Shadow Data Type has probably been around for a long time, it's just that it doesn't seem to have ever been documented anywhere and so a term for it has never become established. I've chosen the term Shadow Data Type to try and convey the idea that when you shine a light on an object it casts a shadow which reveals something of the size of the object but nothing of the details of the object. In other words, a Shadow Data Type has a "full" type declaration (rather than a forward type declaration) but one revealing only the size of type.
// wibble.h
typedef struct
{
   unsigned char size_shadow[16];
} wibble;

void wopen(wibble *, int);
...
void wclose(wibble *);
The "true" definition of the type (together with its accompanying #includes) moves into the source file:
// wibble.c
#include "wibble.h"
#include "grommet.h"
#include "flange.h"
#include <string.h>

typedef struct
{
   grommet g;
   flange f;
} wibble_rep;

// sizeof(wibble) >= sizeof(wibble_rep)

void wopen(wibble * w, int value)
{
   wibble_rep rep;
   ...
   ...
   memcpy(w, &rep, sizeof rep);
}
...
However, there are two problems needing attention.

Synchronized Alignment?

Firstly, there is no guarantee the two types (wibble and wibble_rep) are alignment compatible. We can solve this problem. The trick is to create a union containing all the basic types. We don't know which basic types have the strictest alignments but if the union contains them all the union must also have the strictest alignment.
// alignment.h
typedef union
{
   // one of each of all the basic types go here
   // including data pointers and function pointers
} alignment;
We redefine wibble to be a union with two members; one member to take care of the memory footprint and one member to take care of the alignment:
// wibble.h
#include "alignment.h"

typedef union
{
   unsigned char size_shadow[16];
   alignment universal;
} wibble;
...
The main problem with wibble being a union is that unions are rare. Suppose you want to forward declare the wibble type in a header. You're quite likely to forget it's a union.
typedef struct wibble wibble; // Oooops
We can fix this by simply putting the union inside a struct!
// wibble.h
#include "alignment.h"

typedef struct
{
union
{
   unsigned char size[16];
   alignment universal;
} shadow;
} wibble;
...
This is now sufficiently tricky to warrant an abstraction of its own:
// shadow_type.h
#ifndef SHADOW_TYPE_INCLUDED
#define SHADOW_TYPE_INCLUDED

#include "alignment.h"

#define SHADOW_TYPE(name, size) \
typedef struct \
{ \
   union \
   { \
       unsigned char bytes[size]; \
       alignment universal; \
   } shadow; \
} name

#endif
// wibble.h
#include "shadow_type.h"

SHADOW_TYPE(wibble, 16);

Synchronized Size?

The second problem is hinted at by the comment in wibble.c
// sizeof(wibble) >= sizeof(wibble_rep)
This comment, like all comments, has no teeth. Ideally we'd like an assurance that if the types' sizes lose synchronization we're told about it. This can be done by asserting the relationship inside a unit test of course. The problem with this the possibility that the runtime check inside a unit-test won't get run. Or, more likely, that the unit-test simply won't get written at all. Fortunately in this case we can check the relationship using a compile time assertion. We start with the fact that you cannot declare an array of negative size:
extern char wont_compile[-1];
extern char will_compile[+1];
Now we have to select a size of either +1 or -1 if the asserted expression is true or false respectively.
// may or may not compile
extern char compile_time_assert[sizeof(wibble) >= sizeof(wibble_rep) ? +1 : -1];
Hiding this mechanism behind a macro inside a dedicated header helps to make the code more Intention Revealing.
// compile_time_assert.h
#define COMPILE_TIME_ASSERT(description, expression) \
extern char description[ (expression) ? 1 : -1 ]
// wibble.c
...
#include "compile_time_assert.h"
...
COMPILE_TIME_ASSERT(sizeof_wibble_is_not_less_than_sizeof_wibble_rep,
sizeof(wibble) >= sizeof(wibble_rep));
...
It's worth spending a few moments to think about alignment carefully. The wibble type contains a union to give us the strictest alignment. This means a single wibble_rep and a single wibble can overlay each other in either direction. If we create an array of wibbles the compiler will ensure the address of each wibble is suitably aligned. To do this it may add trailing padding to the wibble type but this padding will be reflected by sizeof(wibble). Similarly, any padding for the wibble_rep type will also be reflected by sizeof(wibble_rep). Importantly, since sizeof(wibble_rep) may be strictly less than sizeof(wibble) we cannot overlay an array of either type directly onto an array of the other type. However, we are only concerned with creating an array of wibbles since that is the type the client uses. There should never be any need to create an array of wibble_reps. Nevertheless, the .c file implementation must always do any array pointer arithmetic in terms of wibbles and never in terms of wibble_reps. Note also that using >= rather than == also allows binary compatibility with any alternative smaller representation.

Casting the shadow

Inside the source file we can create a helper function to overlay the true representation onto the clients memory (the fragment below uses the dot designator syntax introduced in c99):
// wibble.c
static inline void shadow(wibble * dst, wibble_rep * src)
{
   memcpy(dst, src, sizeof *src);
}

bool wopen(wibble * w, const char * name)
{
   wibble_rep rep =
   {
       .g = ...,
       .f = ...,
   };
   shadow(w, &rep);
   ...
}
Careful use of memcpy can help to make the wibble functions behave atomically from the users perspective. That is to say, the function can do the work off to the side in a local wibble_rep, and copy back into the shadow only if everything is successful. An alternative to memcpy is to cast the pointer on each access:
// wibble.c
static inline wibble_rep * light(wibble * w)
{
   return (wibble_rep *)w;
}

void wclose(wibble * w)
{
   wibble_rep * rep = light(w);
   rep->g = ...;
   rep->f = ...;
   ...
}

Constness?

It makes no sense to declare a wibble object with a const modifier unless the object can be initialized.

void pointless(void)
{
   const wibble w; // :-(
   // ... ?
}
However, this is not an issue since the wibble type is opaque anyway. Nevertheless, a slight redesign can accommodate const wibble objects if desired, at the cost of copying struct objects:
...
wibble wopen(int value)
{
   wibble_rep rep = { ...value... };
   wibble w;
   memcpy(&w, &rep, sizeof rep);
   return w;
}
void ok(void)
{
   const wibble w = wopen(42);
   ...
}

Summary

In C it is impossible to expose a type's size without also exposing its representation. It is possible to define a type concretely and to explicitly specify its representation as being "unpublished". However, since C does not offer the C++ private keyword using the representation is always possible and remains a constant temptation. Once one piece of client code succumbs more are sure to follow and like a dam bursting the client and implementation quickly become tightly coupled and any separation is washed away. Completely hiding a type's representation behind an opaque pointer/handle removes the temptation and creates a powerful abstraction but at the price of hiding the size of the type and the consequent restriction on the storage class of client memory. A shadow data type offers a half-way house where a type is effectively split into two, with one part exposing the size and the other part holding the representation. The alignment and sizes of the two parts must correspond. Client code is then able to use all three storage class options. The implementation code takes the full load of the extra complexity mapping/overlaying between the split parts. One interesting observation is that the client code would be uneffected (other than needing recompilation) if the representation was moved back into the client side type (to try and improve performance perhaps). No mechanism is universally applicable and the shadow data type is no exception! Experience and time alone will tell if and how useful it is. Caveat emptor.

Perfect Software and other illusions about testing

is the title of an excellent book by Jerry Weinberg. As usual I'm going to quote from a few pages:
Information is neutral but people's reactions to information are rarely neutral.
I'm not in the proof business; I'm in the evidence and inference business.
A process description is just that, a description ... not of what has actually been done.
To qualify as a test, an action has to seek information that will influence an action.
Fixing under pressure tends to raise the fault-feedback-ratio.
Anyone who makes promises about the future must be some sort of devil.
Errors are made, not born.
In a moment of weakness it is difficult to resist infantile suggestions.
Good tools amplify effectiveness; if your effectiveness is negative, adding tools will amplify only the negativity.

enums in C



Enums are notoriously weakly typed in C. While pondering this the other day it occurred to me that instead of writing an enum like this:
enum suit { clubs, hearts, diamonds, spades };
you could write it like this:
struct suit
{
    enum { clubs, hearts, diamonds, spades } value;
};
The struct wrapper adds a modicum of type safety. It also allows you to forward declare suit (you can't forward declare enums). The enumerators (clubs, hearts, diamonds, clubs) are still visible and can be used when a compile time value is required (e.g., switch cases). As usual, caveat emptor.