[SDL] [PATCH] Altivec blitters...
bob at redivi.com
Thu Jan 6 21:41:01 PST 2005
On Jan 6, 2005, at 10:15, Ryan C. Gordon wrote:
> I keep hearing from people that complain that SDL is slow on MacOSX.
> Having shipped commercial projects using it, I couldn't ever
> understand why they'd say that. I think I figured out why: the GL
> codepath is fast, but the 2D codepaths are not.
> I was surprised to find that a 2D MacOSX project I am working on was
> sitting in BlitNtoN() for 55% of my CPU time, so I set out to optimize
> a little.
> The project in question blits a 32-bit surface to the screen surface
> once per frame, usually the whole 640x480 area (but less, in some
> cases). Preconverting or producing source surfaces in screen format
> isn't practical, so the conversion gets done in SDL_BlitSurface(). The
> application wants to write exclusively to a BGRA8888 surface. MacOS is
> handing me a ARGB8888 surface, so a blit requires some basic swizzling
> but no serious conversion. Having no optimized blitters for anything
> but MMX-based CPUs, we fall into BlitNtoN, which is inefficient for
> several reasons, even for scalar code.
> Attached is a patch to add the start of Altivec-based blitters.
> Besides the needed structure, I've filled in just the one function,
> which swizzles from one 8888 format to another, and even there, just
> the format I need for my project vs what OSX gives me. Adding new
> swizzlers can use the same function, at the cost of 64 bytes of data
> per swizzler. It's a total win.
> Other blitters (16-bit handlers, etc) would need more work, but are
> The end result was hard to gauge, since Shark seems to kill the
> performance boost you'd get from cache prefetching, but before adding
> that, the CPU time spent in the blitter dropped from 55% to about 13%.
> My framerate went from around a consistent 25-27 to 150-300. Once I
> added prefetching, it went up to over 4500. Not a typo. Like I said,
> total win.
> Improvements needed:
> - Needs other swizzle data filled in.
> - Needs non 32-bit blitters written.
> - Move this to a seperate file; SDL_blit_N.c is getting cluttered.
> - vec_dst gives a HUGE improvement on a G4, but apparently stalls the
> pipeline on a G5. Someone should fix that by figuring out how to
> toggle use_software_prefetch to 0 on a G5 system (and how to do that
> on non-MacOS platforms).
> - Configure.in should let you enable/disable the altivec code, and
> should let non-Macs (AmigaOS, PowerPC Linux, etc) use it. Right now
> it's a hardcoded #define to turn it on.
> - Configure.in _must_ add -faltivec to gcc's CFLAGS or it won't
> compile...I hacked the generated Makefile because I'm lazy.
> - Someone should have MacOSX builds compile with -O3 instead of -O2
> (this comes at Apple's general recommendation that O3 is a significant
> boost over O2, unlike, say, x86 Linux). -falign-loops=32 can be a big
> help in some cases (especially in the blitters on a G5, if I had to
This is cool! I think there should be three code paths though:
- G3 (just use normal C code)
- G4 (use vec_dst)
- G5 (don't use vec_dst)
At startup, you should be able to determine the current architecture
and pick the right function pointers.
More information about the SDL