[SDL] SDL_ttf - UTF8_to_UNICODE (bug report)

Duy-Luan TAO duyluan at caramail.com
Wed Oct 13 16:27:36 PDT 2004


 hi,

i was recently reading some information about UNICODE
and UTF8 encoding while playing with SDL_ttf lib,
then i just noticed a little slip in the function
UTF8_to_UNICODE() -- taken from CVS SDL_ttf.c

i got resources about UTF8 from RFC3629 at
http://www.ietf.org/rfc/rfc3629.txt

Char. number range  |   UTF-8 octet sequence
   (hexadecimal)    |      (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx


here's the CODE with COMMENTS

static Uint16 *UTF8_to_UNICODE(Uint16 *unicode, const char *utf8, int len)
{
	int i, j;
	Uint16 ch;

	for ( i=0, j=0; i < len; ++i, ++j ) {
		ch = ((const unsigned char *)utf8)[i];
		if ( ch >= 0xF0 ) {
			ch  =  (Uint16)(utf8[i]&0x07) << 18;
			ch |=  (Uint16)(utf8[++i]&0x3F) << 12;
			ch |=  (Uint16)(utf8[++i]&0x3F) << 6;
			ch |=  (Uint16)(utf8[++i]&0x3F);
		} else
		if ( ch >= 0xE0 ) {
			
			// ****** NOTE1 ******
			// well, we only use 4 bits in the first octet
			// so it should be 0x0F instead of 0x3F.
			// here, the 6th bit is equal to 1
			// and that makes the char range no more 
			// 0000 0800 - 0000 FFFF (2048 to 65,535)
			// but rather 131,072+
			// oops.. 60,000 chars have vanished...
			
			// it should be
			// ch = (Uint16)(utf8[i] & 0x0F) << 12;
			// instead of
			ch  =  (Uint16)(utf8[i]&0x3F) << 12;
			
			// **** END NOTE1 ****
			
			ch |=  (Uint16)(utf8[++i]&0x3F) << 6;
			ch |=  (Uint16)(utf8[++i]&0x3F);
		} else
		if ( ch >= 0xC0 ) {
			
			// ****** NOTE2 ******
			// here it should be 0x1F instead of 0x3F
			// because we use 5 bits and not 6
			// but since the 6th bit is 0 anyway,
			// it is working properly.
			// at least, only if there's no error in the string
			ch  =  (Uint16)(utf8[i]&0x3F) << 6;
			// **** END NOTE2 ****
			
			ch |=  (Uint16)(utf8[++i]&0x3F);
		}
		unicode[j] = ch;
	}
	unicode[j] = 0;

	return unicode;
}
null

Forfait AOL ADSL 5 Méga à 22.90EUR/mois



More information about the SDL mailing list