[SDL] [optimaze] Duff's device in clean c syntax ?
skywind
skywind3000 at 163.com
Sat Sep 22 10:35:25 PDT 2007
Hello List,
Duff's device is a very dangerous trick for the strange & non-standard c syntax usage.
There is a compiler problem in duff's device (SDL_blit.h). The comment above it says:
"There's a bug in the Visual C++ 7 optimizer when compiling this code"
I don't think it is a optimizer bug, but a duff's bug -- non-standard c syntax usage.
/* 4-times unrolled loop */
#define DUFFS_LOOP4(pixel_copy_increment, width) \
{ int n = (width+3)/4; \
switch (width & 3) { \
case 0: do { pixel_copy_increment; \
case 3: pixel_copy_increment; \
case 2: pixel_copy_increment; \
case 1: pixel_copy_increment; \
} while ( --n > 0 ); \
} \
}
it can be replaced with clean-c code without any compiler error:
#define DUFFS_LOOP4_FIXED(pixel_copy_increment, width) \
{ \
unsigned long __width = (unsigned long)(width); \
unsigned long __increment = __width >> 2; \
for (; __increment > 0; __increment--) { \
pixel_copy_increment; \
pixel_copy_increment; \
pixel_copy_increment; \
pixel_copy_increment; \
} \
switch (__width & 3) \
{ \
case 3: pixel_copy_increment; \
case 2: pixel_copy_increment; \
case 1: pixel_copy_increment; \
case 0: break; \
} \
}
The only disadvantage is that it will generate more code. Because of the strange & non-
standard c syntax usage, we find bugs in VC7 today, maybe in other compilers tomorrow.
Duff maybe feel elated by mixing switch/do/case/while together, but it is irrelevant
in loop unrolling.
DUFFS_LOOP_DOUBLE2 can also be replaced with clean-c:
#define DUFFS_LOOP_DOUBLE2_FIXED(working, workingx2, width) \
{ \
unsigned long __width = (unsigned long)(width); \
unsigned long __increment = __width >> 2; \
for (; __increment > 0; __increment--) { \
workingx2; \
workingx2; \
} \
switch (__width & 3) { \
case 2: workingx2; break; \
case 3: workingx2; \
case 1: working; break; \
} \
}
The old version - "DUFFS_LOOP_DOUBLE2" is likely to copy 1-3 pixels first before
unrolling when "width % 4 != 0". It will not get the best performance when src/dst
address are both aligned and the width mod 4 isn't zero.
I think it is good for us to write every thing in standard c syntax .
------------------
skywind
2007-09-23
More information about the SDL
mailing list