[SDL] HW accelaration

Chuck Homic chuck at valen.vvisions.com
Wed Aug 18 08:16:11 PDT 1999


On Wed, 18 Aug 1999 dcsin at islandnet.com wrote:

> dest = (source << 8) | source;
> 
> 80x386 method:
> 
> mov   al, source
> mov   ah, al
> mov   dest, ax

DEAR GOD, THE PIPELINE STALL!!!

> Wow. It's been a LOOONG time since I've used assembly.

I see that.  However, I agree that this is better than using a lookup
table. How about this asm instead:

---

lodsd		; read 4 bytes from [esi] into eax, and increment esi
mov ebx,eax	; save rest for later

mov edx,eax	; load into dx and bp for masking
mov ebp,eax	
shl edx,16	; move into position
shl ebp,8

and eax,0x000000FF	; isolate bitmasks, and merge into output pixels
and edx,0xFF000000
and ebp,0x00FFFF00
or eax,edx
or eax,ebp

stosd		; save eax to [edi], and increment edi

;; do something similar for next 16 bits in ebx
;;

stosd

---

This is off the top of my head, so you might come up with something that
uses fewer shifts and masks, and doesn't use EBP as a scrap register
(however, I left ECX open for counting the loops) but the advantage here
is that it utilizes the pipleines better (I'm assuming a pentium or
better, here).  So even though it is twice the size, it can do several of
these operations simulteneously, and when you're done, it has extended
four pixels instead of one, using all 32 bits of the CPU.

On second thought, I think it would be better to load 16 bits at a time
from the input, so it wouldn't waste the bx register.  Oh well.

(I apoligize, I've been itching to write some asm for a while...)

 -Chuck




More information about the SDL mailing list