[SDL] [PATCH] Altivec blitters...
Ryan C. Gordon
icculus at clutteredmind.org
Sat Feb 19 04:39:58 PST 2005
> - I made sure the configure.in checks to see that the syntax extension
> you're using compiles.
Apparently Apple's GCC uses -faltivec, but the FSF GCC uses -maltivec
and specify vector constants differently; can someone on linux/ppc
please test this patch?
> - It no longer tries to execute the code (in the case you're compiling
> altivec support on a G3 for some reason)
You need to put it in a seperate, non-inline function if you use vector
intrinsics: GCC inserts an Altivec opcode at the top of the function if
it sees a vector thing...so it'll still crash on a G3 as-is; the "if
(0)" isn't enough to prevent it.
> - It checks for altivec on darwin (I assume you were testing from
I tested from the command line, but not on a real Darwin system, just
> - Checks a sysctl to see if it should use prefetch or not (if L3 cache
> present, or not OS X, it uses prefetch -- optimal for G4)
I got a huge boost on my powerbook (L2, but not L3 cache) with the
prefetch. On a G5, the prefetch instructions cause pipeline stalls
(which seems a really silly design decision from where I'm sitting, but
whatever), so those should always avoid the prefetch. The G5, however,
starts automatically prefetching when you touch a few cachelines
linearly, which we do in this function, so it should get the same result
as long as you don't try to force it with vec_dst().
I'm not sure how to check for this reliably; there's a way to ask MacOS
"am I on a G5?" but I'm not sure what that does when you are one day on
a G6...there might be a sysctl or Gestalt to query if there's an
automatic prefetch, though.
The existance of G5-style prefetching is the only time we should avoid
> - prefetch and no-prefetch 32-32 blits are separate functions (could be
> the same function with userdata I guess).
There were three conditionals regardless of dataset; I'm not really sure
it's worth splitting it into a seperate function.
> Using the same test as above, I was able to reproduce the 3x speed bump
> on a dual 2ghz G5 (with second CPU disabled cause it's broken, argh).
A broken G5? That sucks!
> I'm going to profile some real-world SDL games (specifically the ones
> that I sort-of-maintain OS X ports for) to see which of the other blit
> functions I should vectorize, if any.
There's probably a bunch of games that want to write to a 16-bit surface
regardless of the screen format...there are also a LOT of people that
think running their system in 16-bit color will give them a better
I can't think of a useful way to vectorize 8-bit blits, but there's
probably some clever way to do this.
More information about the SDL