[SDL] [PATCH] Altivec blitters...

Bob Ippolito bob at redivi.com
Thu Jan 6 21:41:01 PST 2005

On Jan 6, 2005, at 10:15, Ryan C. Gordon wrote:

> I keep hearing from people that complain that SDL is slow on MacOSX. 
> Having shipped commercial projects using it, I couldn't ever 
> understand why they'd say that. I think I figured out why: the GL 
> codepath is fast, but the 2D codepaths are not.
> I was surprised to find that a 2D MacOSX project I am working on was 
> sitting in BlitNtoN() for 55% of my CPU time, so I set out to optimize 
> a little.
> The project in question blits a 32-bit surface to the screen surface 
> once per frame, usually the whole 640x480 area (but less, in some 
> cases). Preconverting or producing source surfaces in screen format 
> isn't practical, so the conversion gets done in SDL_BlitSurface(). The 
> application wants to write exclusively to a BGRA8888 surface. MacOS is 
> handing me a ARGB8888 surface, so a blit requires some basic swizzling 
> but no serious conversion. Having no optimized blitters for anything 
> but MMX-based CPUs, we fall into BlitNtoN, which is inefficient for 
> several reasons, even for scalar code.
> Attached is a patch to add the start of Altivec-based blitters. 
> Besides the needed structure, I've filled in just the one function, 
> which swizzles from one 8888 format to another, and even there, just 
> the format I need for my project vs what OSX gives me. Adding new 
> swizzlers can use the same function, at the cost of 64 bytes of data 
> per swizzler. It's a total win.
> Other blitters (16-bit handlers, etc) would need more work, but are 
> possible.
> The end result was hard to gauge, since Shark seems to kill the 
> performance boost you'd get from cache prefetching, but before adding 
> that, the CPU time spent in the blitter dropped from 55% to about 13%. 
> My framerate went from around a consistent 25-27 to 150-300. Once I 
> added prefetching, it went up to over 4500. Not a typo. Like I said, 
> total win.
> Improvements needed:
> - Needs other swizzle data filled in.
> - Needs non 32-bit blitters written.
> - Move this to a seperate file; SDL_blit_N.c is getting cluttered.
> - vec_dst gives a HUGE improvement on a G4, but apparently stalls the 
> pipeline on a G5. Someone should fix that by figuring out how to 
> toggle use_software_prefetch to 0 on a G5 system (and how to do that 
> on non-MacOS platforms).
> - Configure.in should let you enable/disable the altivec code, and 
> should let non-Macs (AmigaOS, PowerPC Linux, etc) use it. Right now 
> it's a hardcoded #define to turn it on.
> - Configure.in _must_ add -faltivec to gcc's CFLAGS or it won't 
> compile...I hacked the generated Makefile because I'm lazy.
> - Someone should have MacOSX builds compile with -O3 instead of -O2 
> (this comes at Apple's general recommendation that O3 is a significant 
> boost over O2, unlike, say, x86 Linux). -falign-loops=32 can be a big 
> help in some cases (especially in the blitters on a G5, if I had to 
> assume).

This is cool!  I think there should be three code paths though:
- G3 (just use normal C code)
- G4 (use vec_dst)
- G5 (don't use vec_dst)

At startup, you should be able to determine the current architecture 
and pick the right function pointers.


More information about the SDL mailing list