[SDL] single pixel drawing

Neil Bradley nb at synthcom.com
Wed Jul 17 14:02:01 PDT 2002

> > How so? At least on the Pentium, Multiplies are 1 clock or less, just like
> > shifts.
> But since Pentium Pro you can issue two instructions per cycle. Those can be
> 2 adds, 1 add and 1 shift etc. You cannot however issue more than one
> multiplication and it blocks both ALUs. Moreover shifts and adds have
> latency of 1 cycle, while multiplication has latency of 4 cycles. In total,
> if you have shift+add it will execute in between 2 and 4 cycles, while
> mul+add will execute in between 5 and 7 cycles. That's twice the
> performance.

I can't find any references to anything stating that a multiply is any
more than 1 instruction on a PII/PIII/P4. Not only that, you haven't
considered that you'll stall one of the pipelines if the add relies on the
multiply (which it will in this case).

FWIW, I compiled my emulator that does lots of raster graphics, removed
all throttling, and changed shifts to multiplies. It made zero difference
(and yes, I shut off optimization and did a disassembly of the code to
ensure it wasn't turning my multiplies into shifts) between multiplies and
shifts. This is on a 1Ghz PIII, and the graphics processing takes 60% of
overall execution time.

There are more factors than just the instructions themselves. Depending
upon the shift, it may require reloading the cl register which can do who
knows what to the optimization capabilities of the compiler. At this point
it's a wash.


Neil Bradley            What are burger lovers saying
Synthcom Systems, Inc.  about the new BK Back Porch Griller?
ICQ #29402898	        "It tastes like it came off the back porch." - Me

More information about the SDL mailing list