5.21.2011

Stripped down linked list in C

I found a special solution for how to implement linked lists in C without sacrifying the format freedom of C++ templates. I've also seen generic linked lists as a combination of the link informations (all connected other items in an array or more primitive list), so it's no wonder that I haven't seen that linked lists are defined by their links and not by their data! Take an array for example: it's data is sequentiell, the indices are simple ints and you know that the next or previous element will always come in a fixed format, place and formula. A linked list on the other side can't only be described by their data, but also by a set of pointer to other entries. It's a rather a descriptor of a layout and not a data storage. So, what's left if you strip down the data part is the sole list navigation made of pointers. When I was programming around with some more generic list variants than my usual doubly-linked one I noticed how useful it would me to simply design the link layout instead of connecting it to data. Yeah, that's rather something theoretical, but it takes away some functions one would have to add otherwise. Writing linked lists is kinda annoying and doesn't make any real fun. It's more interesting to dump stuff on the harddrive, whatever. However, this is a point where I'm thinking about also moving this set of classes to C by providing a similar interface. Normally I would have let's use inheritance this time to add an additional data element to the item structure itself. But there's the problem you get in C then: where to inherit? Where to type the data if not static inside the structure? Well, that's a real conceptual problem... Of course could I simply use a void pointer like glib does it, but I find that rather ineffective for builtin datatypes (Do you want to malloc twice? I don't). So I started thinking about abusing memory alignment: What if the navigation structure of entry is stored before the actual data and can be seen as an offset addresse before the actual data? May it be possible to abuse the C standard's data layout definition to simply calculate the data parts offset and use a special get/set method to access different types? Well, that's not that easy. There are several things preventing this, especially the fact that C/C++ both don't give you the exact structure as you've depicted it. It's easier to imagine when taking a CPU's register sizes: you have several registers that can work as machine words, half words, bytes etc... You can split most of them and put stuff in it in whatever way. You therefore also store local or loaded structure variables completely made of registers if your structure layout complies with the possible registers and register splits of the target machine. I was surprised when I found this out! It's a pretty effective way to avoid RAM usage for temporary variables, though it'S probably not as useful when too many temporary variables get used. I don't dare complaining about it not beeing exactly as I specified in the code cause i know that it's more worth to store such data in registers. So it's not exactly possible to combine it this way. BUT - and that's defined by the C standard - each structure doesn't have any changed arrangement (padding) for the first member! So, choosing a list navigation structure as the first element of another structure should possible force the compiler (if it's standard-compliant) to keep two seperate structures, each differently aligned for them self. So, a getter functions could be used to cast the pointer to the element to the appropriate custom data structure! So you get the new layout depending on what you cast and as what you initially allocated the structure. It's the only possible way I can imagine for C++ to keep it's inheritance system working. Parent structures need to find their data and child structures, too - even in C you have to guerantee that ever struct X inside struct Y has the same layout as a single struct X. So yeah... I did it! Of course, it's not really type safe. But whatever, C programmers have sacrifice a few things sometimes... However, it makes me glad to know that double malloc isn't necessary. I'll write down a quick prototype of it to see whether this actually works as expected... As long as it only works with properties defined by atleast C99, it should work exceptionally fine I think.

No comments: