1.30.2012

Hashing

I had my thoughts about hash arrays today and whatever way you try make them useful, you'll either end up with too much required memory, horrible performance or a limited number of keys to store per hash array cell (can be increased, too, but will then result in poor performance, too). The best approach is then always to have an awesome hash function. Therefore, hash functions are very most important thing an hash array (interesting that I'm going through all the stuff from the first few semester...). And since I'm friend of tinkering too much with hash functions, I'm tried to somehow find a new way of storing the associated hash values with static memory. This is not trivial I have to say. Using new and delete for linked lists is very easy way to do this, but personally I want to use hash arrays to map indices from the hash array slots (n per hash array, forming a twodimensional array) to an array of actual data. This is of course not a very popular way of using hash arrays, but it's more dynamic than using a pool with an iterating index cause you WANT to directly the associated hash array data and don't share with anyone else. So it's more like a pool but with the possibility to get data by a key value. Anyway, I want to give the possiblity to occupy and de-occupy slots in the hash array to make it function more like proper pool. Occupying an entry is rather simple and only requires a few iteration through the hash array until a valid one was found. But - and that's the point where I annoyingness comes in - when do you if the requested key does already exist? When do you that there's no associated entry so that you'll have create a new one? This is exactly the thing that's bugging me. You'll probably need to iterate through all cells (or maximum number of them, limiting a cell to only take from n close neighbours) to cover this problems if you don't want to utilize linked lists. So what I'm trying to figure out is a way to not use linked lists but still have a runtime behaviour until you notice that you'll need to allocate a new entry. Currently, I'm thinking about splitting the key into several parts and split the hash array slots/cells into different dimensions. That means that I'll be able to sort large keys with different binary signature into different parts of a multidimensional array. This Each part will be hashed on it's own, using a different direction in which they will go if all associated slots got empty. I think this could work better than a plain single hash array and I can use some more fuzziness, utilizing the randomness of hash keys that are system handles or addresses. Good thing is, it's a bit different from the project itself and I'm glad that it doesn't have to do anything with mutexes or threads. I mean the tech is working quitely nicely right now, though I haven't done any more wild tests. However, I'll this project in the current state, comment the critical debug sections and be done with it for now.

Aaaaand, I'm gonna buy a headset for doing Let's Plays! Yay! That'll be fun I tell you. Will cost me quite a bit but good hardware doesn't usually come at low cost. Especially not for a Sennheiser fanboy.

No comments: