[SDL] Segmentation Fault (Parachute Deployed)

Daniel zombie_1985 at hotmail.com
Sat Jan 22 20:35:42 PST 2005


Ryan C. Gordon wrote:

>
> I keep hearing from people that complain that SDL is slow on MacOSX. 
> Having shipped commercial projects using it, I couldn't ever 
> understand why they'd say that. I think I figured out why: the GL 
> codepath is fast, but the 2D codepaths are not.
>
> I was surprised to find that a 2D MacOSX project I am working on was 
> sitting in BlitNtoN() for 55% of my CPU time, so I set out to optimize 
> a little.
>
> The project in question blits a 32-bit surface to the screen surface 
> once per frame, usually the whole 640x480 area (but less, in some 
> cases). Preconverting or producing source surfaces in screen format 
> isn't practical, so the conversion gets done in SDL_BlitSurface(). The 
> application wants to write exclusively to a BGRA8888 surface. MacOS is 
> handing me a ARGB8888 surface, so a blit requires some basic swizzling 
> but no serious conversion. Having no optimized blitters for anything 
> but MMX-based CPUs, we fall into BlitNtoN, which is inefficient for 
> several reasons, even for scalar code.
>
> Attached is a patch to add the start of Altivec-based blitters. 
> Besides the needed structure, I've filled in just the one function, 
> which swizzles from one 8888 format to another, and even there, just 
> the format I need for my project vs what OSX gives me. Adding new 
> swizzlers can use the same function, at the cost of 64 bytes of data 
> per swizzler. It's a total win.
>
> Other blitters (16-bit handlers, etc) would need more work, but are 
> possible.
>
> The end result was hard to gauge, since Shark seems to kill the 
> performance boost you'd get from cache prefetching, but before adding 
> that, the CPU time spent in the blitter dropped from 55% to about 13%. 
> My framerate went from around a consistent 25-27 to 150-300. Once I 
> added prefetching, it went up to over 4500. Not a typo. Like I said, 
> total win.
>

Only half related to this patch, but thinking about some older SDL 
discussion wrt portable vector assembly the idea occured to me that it 
was possible to use prefetch in a portable fashion. So I wrapped a 
prefetch macro that does this on ia64 and ia32 (the ia32 one also works 
in x86_64 mode of course) with actually quite a bit of success.

So, what about having SDL blitters with portable prefetch support as a 
second choice of optimization ? If you bring the powerpc prefetch in the 
mix, that's 4 architectures supported. This would allow fairly good 
optimizations for a wider range of systems than just ia32 cpus with a 
lot less work.

Also, if it's 2D and you have OpenGL (who doesn't on OSX ?), try glSDL :)

Stephane







More information about the SDL mailing list