[SDL] Segmentation Fault (Parachute Deployed)
zombie_1985 at hotmail.com
Sat Jan 22 20:35:42 PST 2005
Ryan C. Gordon wrote:
> I keep hearing from people that complain that SDL is slow on MacOSX.
> Having shipped commercial projects using it, I couldn't ever
> understand why they'd say that. I think I figured out why: the GL
> codepath is fast, but the 2D codepaths are not.
> I was surprised to find that a 2D MacOSX project I am working on was
> sitting in BlitNtoN() for 55% of my CPU time, so I set out to optimize
> a little.
> The project in question blits a 32-bit surface to the screen surface
> once per frame, usually the whole 640x480 area (but less, in some
> cases). Preconverting or producing source surfaces in screen format
> isn't practical, so the conversion gets done in SDL_BlitSurface(). The
> application wants to write exclusively to a BGRA8888 surface. MacOS is
> handing me a ARGB8888 surface, so a blit requires some basic swizzling
> but no serious conversion. Having no optimized blitters for anything
> but MMX-based CPUs, we fall into BlitNtoN, which is inefficient for
> several reasons, even for scalar code.
> Attached is a patch to add the start of Altivec-based blitters.
> Besides the needed structure, I've filled in just the one function,
> which swizzles from one 8888 format to another, and even there, just
> the format I need for my project vs what OSX gives me. Adding new
> swizzlers can use the same function, at the cost of 64 bytes of data
> per swizzler. It's a total win.
> Other blitters (16-bit handlers, etc) would need more work, but are
> The end result was hard to gauge, since Shark seems to kill the
> performance boost you'd get from cache prefetching, but before adding
> that, the CPU time spent in the blitter dropped from 55% to about 13%.
> My framerate went from around a consistent 25-27 to 150-300. Once I
> added prefetching, it went up to over 4500. Not a typo. Like I said,
> total win.
Only half related to this patch, but thinking about some older SDL
discussion wrt portable vector assembly the idea occured to me that it
was possible to use prefetch in a portable fashion. So I wrapped a
prefetch macro that does this on ia64 and ia32 (the ia32 one also works
in x86_64 mode of course) with actually quite a bit of success.
So, what about having SDL blitters with portable prefetch support as a
second choice of optimization ? If you bring the powerpc prefetch in the
mix, that's 4 architectures supported. This would allow fairly good
optimizations for a wider range of systems than just ia32 cpus with a
lot less work.
Also, if it's 2D and you have OpenGL (who doesn't on OSX ?), try glSDL :)
More information about the SDL