[SDL] [PATCH] Altivec blitters...
bob at redivi.com
Sat Feb 19 20:58:53 PST 2005
On Feb 18, 2005, at 11:14 PM, Bob Ippolito wrote:
> On Feb 15, 2005, at 7:03, Ryan C. Gordon wrote:
>> Here's another revision of my Altivec blitter patch. This one cleans
>> up a lot of the FIXMEs and generalizes the 32bit-to-32bit swizzler to
>> be able to convert between any 8888 format generically. It obsoletes
>> the previous patch, and can be applied directly to CVS.
>> I've also added a new program to the test directory named
>> testblitspeed.c; it's got about a million options, but to get an idea
>> of how the Altivec code path did against the standard C version, I
>> ran this:
>> ./testblitspeed --dstbpp 32 --dstwidth 640 --dstheight 480 --srcbpp 32
>> --srcwidth 640 --srcheight 480 --seconds 10 --dstrmask 0x00FF0000
>> --dstgmask 0x0000FF00 --dstbmask 0x000000FF --dstamask 0x00000000
>> --srcrmask 0x000000FF --srcgmask 0x00FF0000 --srcbmask 0x0000FF00
>> --srcamask 0x00000000
>> The Altivec code was more than 3 times faster than the C codepath in
>> the above test.
>> testblitspeed is in CVS, the Altivec patch is attached. I'd like to
>> hear from PowerPC users that aren't MacOS-based to make sure this
>> compiles cleanly and functions elsewhere.
>> Ideally, we'd get 32->16 (or more importantly, 16->32) Altivec
>> blitters in here to complete the speed boost for the rest of the
>> feasible scenarios, but I have no plans to do these.
> Here's my revised version of the patch
> It's largely just a cleanup, but provides a marginal amount of extra
> functionality. Only tested on Mac OS X 10.3:
> - I made sure the configure.in checks to see that the syntax extension
> you're using compiles.
> - It no longer tries to execute the code (in the case you're compiling
> altivec support on a G3 for some reason)
> - It checks for altivec on darwin (I assume you were testing from
> - Checks a sysctl to see if it should use prefetch or not (if L3 cache
> present, or not OS X, it uses prefetch -- optimal for G4)
> - Instead of changing the size of the blit function table I changed
> one of the fields to be a bitflag rather than a bool.. right now MMX
> is 1, Altivec is 2, and don't-use-prefetch is 4.
> - prefetch and no-prefetch 32-32 blits are separate functions (could
> be the same function with userdata I guess).
> Using the same test as above, I was able to reproduce the 3x speed
> bump on a dual 2ghz G5 (with second CPU disabled cause it's broken,
> I'm going to profile some real-world SDL games (specifically the ones
> that I sort-of-maintain OS X ports for) to see which of the other blit
> functions I should vectorize, if any.
I've revised the patch again
After trying it on Blob Wars, I saw that the original calc_swizzle32
wasn't implemented correctly, so I rewrote it. I also noticed that a
lot of the memory shuffling that it does might not be necessary on OS
X, but since it's so much faster as-is I'm not going to prematurely
It still has some issues, for example, the configure test that checks
to see whether altivec code will compile is probably going to fail on a
G3 because main() will have vector stuff in it. I can't really test
that, so I won't fix it. It's also not tested anywhere but OS X, so it
probably won't work elsewhere unless someone with another PPC platform
is willing to step up and test.
Now, off to do some more profiling and perhaps implement the 16bit
More information about the SDL