[SDL] [PATCH] Altivec blitters...

Ryan C. Gordon icculus at clutteredmind.org
Sat Feb 19 04:39:58 PST 2005


> - I made sure the configure.in checks to see that the syntax extension 
> you're using compiles.

Apparently Apple's GCC uses -faltivec, but the FSF GCC uses -maltivec 
and specify vector constants differently; can someone on linux/ppc 
please test this patch?

> - It no longer tries to execute the code (in the case you're compiling 
> altivec support on a G3 for some reason)

You need to put it in a seperate, non-inline function if you use vector 
intrinsics: GCC inserts an Altivec opcode at the top of the function if 
it sees a vector thing...so it'll still crash on a G3 as-is; the "if 
(0)" isn't enough to prevent it.

> - It checks for altivec on darwin (I assume you were testing from 
> Xcode?)

I tested from the command line, but not on a real Darwin system, just 
Panther.

> - Checks a sysctl to see if it should use prefetch or not (if L3 cache 
> present, or not OS X, it uses prefetch -- optimal for G4)

I got a huge boost on my powerbook (L2, but not L3 cache) with the 
prefetch. On a G5, the prefetch instructions cause pipeline stalls 
(which seems a really silly design decision from where I'm sitting, but 
whatever), so those should always avoid the prefetch. The G5, however, 
starts automatically prefetching when you touch a few cachelines 
linearly, which we do in this function, so it should get the same result 
as long as you don't try to force it with vec_dst().

I'm not sure how to check for this reliably; there's a way to ask MacOS 
"am I on a G5?" but I'm not sure what that does when you are one day on 
a G6...there might be a sysctl or Gestalt to query if there's an 
automatic prefetch, though.

The existance of G5-style prefetching is the only time we should avoid 
vec_dst*, though.

> - prefetch and no-prefetch 32-32 blits are separate functions (could be 
> the same function with userdata I guess).

There were three conditionals regardless of dataset; I'm not really sure 
it's worth splitting it into a seperate function.

> Using the same test as above, I was able to reproduce the 3x speed bump 
> on a dual 2ghz G5 (with second CPU disabled cause it's broken, argh).

A broken G5? That sucks!

> I'm going to profile some real-world SDL games (specifically the ones 
> that I sort-of-maintain OS X ports for) to see which of the other blit 
> functions I should vectorize, if any.

There's probably a bunch of games that want to write to a 16-bit surface 
regardless of the screen format...there are also a LOT of people that 
think running their system in 16-bit color will give them a better 
framerate.

I can't think of a useful way to vectorize 8-bit blits, but there's 
probably some clever way to do this.

--ryan.





More information about the SDL mailing list