5.21.2011

From behind

I read a bit about the issues for using ZIP compressed files as Open Pandora application distribution format and noticed how somebody wrote that ZIP files get read "from behind" when unzipping. Some minutes later I knew why: ZIP uses compression per file and adds a file tree information block at the end of each file to ensure that the u uncompressor knows in what order he has to interpret the read files. Why from behind? Wouldn't it be easier and more save to put in in front to ensure that incomplete archives can still be read somehow? Actually, no. The secret lies behind the fact that while compressing you don't exactly know how big the output size is. So you can't say for sure how long the compressed file chunks are and a previously written file tree with block position inside the file is impossible to create. So it simply compresses them one after another, puts them into the file and appends it's file tree after all files compressed. That's quite genius! Of course not ensurable for damaged files, but it has the advantage of beeing able to write data without knowing how it will become and restoring it later much more quickly. I was annoyed by how one has to always know the size of a linked list to make it later possible to read it into an array. That's over now! I can simply dump it all into a file and count them during this. Later I write the size at it's end and the reader can read it backwards to know the size before he has to generate a list and then an array from it. It's way better to immediately know the size, especially for data that doesn't change at all later. So I'm thinking about making a theoretical design for writing data dynamically and then reading it statically. It's not much of a problem, but goes well together with the idea of bringing more structure into my way of handling file format. I used to only use text files for communication/settings exchange and binary for temporary (or permanent) dumping of raw data. A simple three-letter header for verification and error while reading. That was always enough and I never had problems with it. However, It may be a good idea to think about something more... archive-alike. Something to order huge and small blocks of data, binary and textual, sequential or hierarchical... All read from behind to enable quick writing and quick reading in any case. I guess I can ignore that thing about damaged files cause I assume to not have them damaged. This format will require either require a two-way signature to a) ensure quick format check when reading from begin and b) from the end. So both signatures need to be existant and correct that the loading will occur. And storing everything in a hierarchical tree makes it possible to scan through the data, check for the saved format etc... Yeah, that's another component bringing me closer to some more file-based processing. I like creating those super-multifunctional ideas and concepts. It gives me a clue about what's a good format in many ways and how to reduce required imformations to process data. Especially in terms of trees, files or linked list I'd say it could take endless to iterate through it.

No comments: