[SDL] [optimaze] Duff's device in clean c syntax ?

skywind skywind3000 at 163.com
Sat Sep 22 10:35:25 PDT 2007


Hello List,

Duff's device is a very dangerous trick for the strange & non-standard c syntax usage.

There is a compiler problem in duff's device (SDL_blit.h). The comment above it says: 
"There's a bug in the Visual C++ 7 optimizer when compiling this code"

I don't think it is a optimizer bug, but a duff's bug -- non-standard c syntax usage.

/* 4-times unrolled loop */
#define DUFFS_LOOP4(pixel_copy_increment, width)			\
{ int n = (width+3)/4;							\
	switch (width & 3) {						\
	case 0: do {	pixel_copy_increment;				\
	case 3:		pixel_copy_increment;				\
	case 2:		pixel_copy_increment;				\
	case 1:		pixel_copy_increment;				\
		} while ( --n > 0 );					\
	}								\
}

it can be replaced with clean-c code without any compiler error:

#define DUFFS_LOOP4_FIXED(pixel_copy_increment, width)  \
{ \
	unsigned long __width = (unsigned long)(width); \
	unsigned long __increment = __width >> 2; \
	for (; __increment > 0; __increment--) { \
		pixel_copy_increment; \
		pixel_copy_increment; \
		pixel_copy_increment; \
		pixel_copy_increment; \
	}	\
	switch (__width & 3) \
	{		\
	case 3: pixel_copy_increment; \
	case 2: pixel_copy_increment; \
	case 1: pixel_copy_increment; \
	case 0:	break;	\
	}		\
}

The only disadvantage is that it will generate more code. Because of the strange & non-
standard c syntax usage, we find bugs in VC7 today, maybe in other compilers tomorrow.
Duff maybe feel elated by mixing switch/do/case/while together, but it is irrelevant
in loop unrolling.

DUFFS_LOOP_DOUBLE2 can also be replaced with clean-c:

#define DUFFS_LOOP_DOUBLE2_FIXED(working, workingx2, width)  \
{ \
	unsigned long __width = (unsigned long)(width); \
	unsigned long __increment = __width >> 2; \
	for (; __increment > 0; __increment--) { \
		workingx2;	\
		workingx2;	\
	}	\
	switch (__width & 3) { \
	case 2: workingx2; break; \
	case 3: workingx2; \
	case 1: working; break; \
	}	\
}

The old version - "DUFFS_LOOP_DOUBLE2" is likely to copy 1-3 pixels first before 
unrolling when "width % 4 != 0". It will not get the best performance when src/dst
address are both aligned and the width mod 4 isn't zero.

I think it is good for us to write every thing in standard c syntax .



------------------				 
skywind
2007-09-23
 				




More information about the SDL mailing list