[SDL] HW accelaration
chuck at valen.vvisions.com
Wed Aug 18 08:16:11 PDT 1999
On Wed, 18 Aug 1999 dcsin at islandnet.com wrote:
> dest = (source << 8) | source;
> 80x386 method:
> mov al, source
> mov ah, al
> mov dest, ax
DEAR GOD, THE PIPELINE STALL!!!
> Wow. It's been a LOOONG time since I've used assembly.
I see that. However, I agree that this is better than using a lookup
table. How about this asm instead:
lodsd ; read 4 bytes from [esi] into eax, and increment esi
mov ebx,eax ; save rest for later
mov edx,eax ; load into dx and bp for masking
shl edx,16 ; move into position
and eax,0x000000FF ; isolate bitmasks, and merge into output pixels
stosd ; save eax to [edi], and increment edi
;; do something similar for next 16 bits in ebx
This is off the top of my head, so you might come up with something that
uses fewer shifts and masks, and doesn't use EBP as a scrap register
(however, I left ECX open for counting the loops) but the advantage here
is that it utilizes the pipleines better (I'm assuming a pentium or
better, here). So even though it is twice the size, it can do several of
these operations simulteneously, and when you're done, it has extended
four pixels instead of one, using all 32 bits of the CPU.
On second thought, I think it would be better to load 16 bits at a time
from the input, so it wouldn't waste the bx register. Oh well.
(I apoligize, I've been itching to write some asm for a while...)
More information about the SDL