4.20.2012

Awesomeness on rails

While waiting on an annoyingly delayed train cycle to kick in, I did my usual random thinkage about the stuff I like to work on. Moe specifically, I came to an interesting realization about my graphics engine's VM and the way it handles input via buffers. I did mention that this concept might be useful for mor than this sort of high level calculation - and indeed: it could be quite an interesting way to handle formulas in interpreted or VM based language that need to do repeated operation that consist of a number of all-same sub operation. Just think about this: a formula with n operations, each of which you have to parse (binary or text) and iterate the command we have to execute. If you're using identifiers and jump tables, you can do this as often as you want. If not, if your compiler does not optimize cases well enough, if you're using function calls or something worse (like interpretation), you'll have quite an overhead. For the general case, function pointers and minimal memory usage is the way to go - brings acceptable performance and you'll definitely know that no identifier iteration is done. But still, if you have to do this operation a few hundret times, you'll need to repeat this process on and on again - though you simple know what to do and might go better with dynamic compilation methods if you're really after this sort of stuff.

All this is usually impossible to prevent - each command requires a matching piece of code. That's a lot stuff to check, even for simple alpha blending operations. Anyway, ever thought about splitting the whole formula into several batch operations and doing each block n times in an optimized look? No, because it may require more memory when done naively? Well, that's certainly true, but so many operations are often done on many elements, so it's up to the language designer or scripter to do the loop manually. But still: 5 operations done together 100 times requires 500 command iterations and just a few variables - maybe some stack operations, too, if it's not a register machine or requires additional data. When doing 5 operations each 100 times we get 5 command iterations and 500 operations as optimized as possible (when split and unspecialized of course). We're talking about non-native execution, so it's always better to keep portions optimized in a VM. Even better, this brings the idea of specialized commands for every thing and won't run faster than than in a real program. I need to integrate this concept when implementing my multitarget system for my scripting language.

Pre-post Edit: It works! Totally awesome. I tested this stuff with a simple VM that only knows basic commands on single values as well as on full array blocks and got an execution speed decreased by around 60% on average for simple alpha blending operations on a few million elements. It's the same ratio when using a specialized alpha blending formula, so it's even more worth doing so (in which case it's seems just quarter of the original duration). It's also roughly the same ratio when turning all optimizations on, so yeah: this stuff works very well for VMs. I decided to start designing the VM first and then putting parser and code generation on top of it. This way, I can clearly seperate what's part of syntactic sugar and what not. I'll try to keep the instruction to a the most useful minimum with a possibly human-readable format, so that it's possibly to write it on your own.

Excitement! So, once again, taking your time to think about stuff DOES bring good ideas. This also works very well for my idea multitarget concept as mentioned above, so I can directly generate proper VM instructions by splitting the the formular properly. The optimal case would be a syntax where I can completely drop handmade loops and leave most algorithmic execution time to the VM. Functions will still be sort of slow to optimize, but this makes inlining and code generation even more valuable.

No comments: