[SDL] Blit1to2 optimization

Gianni Trevisti g.trevisti at gmail.com
Wed Dec 27 15:05:38 PST 2006


hello all,
I'm working on the porting of a game to a 16 bit/pixel embedded Linux box.
This game is 8 bit/pixel and I noticed a lot of performance loss in the 8 to 16
bit/pixel conversion function of a SDL_Surface: Blit1to2
I tried a modified version of Blit1to2 that uses a palette lookup table of 2
entries built in this way:

lookup_map[65536];
Uint16 cc = 0;
while (1) {
    lookup_map[cc] = map[cc & 0x000000FF] | (map[(cc & 0x0000FF00) >> 8] << 16);
    if (++cc == 0) break;
}

then, instead of working on every single source byte, as the original code do:
    *(Uint16 *)dst = map[*src++];

I work on a couple of bytes:
    *(int *)dst = lookup_map[*(Uint16 *)src];

I tried this version on my game and I had about five more frames (28 -> 32).
Valgring tells me the new version is about two time faster, compared to the
previous one. Obviously it requires 64K more space (for every palette table, but
in my case I've only one of it) and the cache miss will increase (the new table
is a lot bigger than the old one).
Nethertheless, I gained 5 frames on a Geode processor, that has a _very_ small
cache.

Do you thing it can be worth one's while to create a patch and submit it to this
newsgroup? I ask this question because a lot of work must be done in order to
transform my test code in a real patch :-)

Gianni





More information about the SDL mailing list