[SDL] HW accelaration
dcsin at islandnet.com
dcsin at islandnet.com
Thu Aug 19 07:10:33 PDT 1999
On Wed, 18 Aug 1999 11:16:11 -0400 (EDT), you wrote:
>On Wed, 18 Aug 1999 dcsin at islandnet.com wrote:
>> dest = (source << 8) | source;
>> 80x386 method:
>> mov al, source
>> mov ah, al
>> mov dest, ax
>DEAR GOD, THE PIPELINE STALL!!!
Yeah, I thought as much. My assembly days were pre-pentium so I don't
know much of the details of optimizing for them. Also, I've never
actually owned an Intel CPU - I've always bought AMDs so the details
are different for those.
>> Wow. It's been a LOOONG time since I've used assembly.
>I see that. However, I agree that this is better than using a lookup
>table. How about this asm instead:
It was sort of meant to be pseudo-assembly :)
>lodsd ; read 4 bytes from [esi] into eax, and increment esi
>mov ebx,eax ; save rest for later
>mov edx,eax ; load into dx and bp for masking
>shl edx,16 ; move into position
>and eax,0x000000FF ; isolate bitmasks, and merge into output pixels
>stosd ; save eax to [edi], and increment edi
>;; do something similar for next 16 bits in ebx
I don't feel like dissecting that right now, but why the ANDs? Using
masks like that requires a memory access, which is of course slow.
>This is off the top of my head, so you might come up with something that
>uses fewer shifts and masks, and doesn't use EBP as a scrap register
>(however, I left ECX open for counting the loops) but the advantage here
>is that it utilizes the pipleines better (I'm assuming a pentium or
>better, here). So even though it is twice the size, it can do several of
>these operations simulteneously, and when you're done, it has extended
>four pixels instead of one, using all 32 bits of the CPU.
Maybe MMX would help out in this situation. Too bad I don't know
anything about those instructions.
>On second thought, I think it would be better to load 16 bits at a time
>from the input, so it wouldn't waste the bx register. Oh well.
Agreed. That could save a lot of memory accesses.
>(I apoligize, I've been itching to write some asm for a while...)
No problem. I've been wishing I had the time to learn Pentium
optimizations and MMX for a while now. Especially after writing some
stuff that could really use it (like a lovely little 2D bumpmapper).
More information about the SDL