[SDL] [PATCH] Re: SDL_memcpy variants used in SDL_BlitCopy
avcp-sdlmail at usa.net
Tue Sep 13 15:57:28 PDT 2005
Stephane Marchesin wrote:
> Well, there's a second reason. Most distros are compiled for i386, i585
> or i686 so a memcpy implementation is at most rep movsd, and mmx is not
> used since it's not part of i686 (remember i686 includes pentium pro
> which doesn't have mmx). I'm not sure about source distros, these could
> have an edge on that since you build your own libc, but for some reason
> they don't (not gentoo at least).
I am testing inlined SDL_memcpy variants against an inlined 'rep movsd' (not
libc/msvcrt memcpy()), so no function calls are involved.
I just tested different size blocks with some interesting results.
For short rows (0x100-0x200 bytes range):
With srcskip=0, the MMX version is faster than the inlined 'rep movsd', and
the SSE version is horribly slow.
With srcskip>0x20, the MMX version is slower than the inlined 'rep movsd',
and the SSE version is faster.
The SSE version is much faster when row length and srcskip are 0x40-aligned.
For longer rows (above 0x300):
The MMX performs about the same as inlined 'rep movsd' or slower, with minor
variations caused by srcskip.
With srcskip=0, the SSE version is quite a bit slower.
With srcskip around 0x40, the SSE version is faster, but as srcskip
increases -- prefetch stops covering the next row and it slows down to a
Again, this is all on a dual Xeon 1.7, where the SMP arch could affect the
Draw your own conclusions ;). But IMHO, the SSE version should never be
called with srcskip=0 -- the speed loss is too great. It should also not be
called when prefetch stops covering the major portion of the next row, or
you get a performance penalty as well.
> Also, choosing the right memcpy version at runtime is not an option,
> for obvious performance reasons when doing small copies.
The version is already being chosen at runtime based on SDL_HasXXX, so I am
not sure what you mean. Rejecting the SSE version based on srcskip and row
width would not be terribly expensive.
More information about the SDL