[SDL] Text output - getting more speed in SDL

Kein-Hong Man keinhong at gmail.com
Mon Apr 16 01:48:10 PDT 2007


Hi all,

Ryan C. Gordon wrote:
>> It looks rather horrifying.
> 
> I wouldn't have gone that far.
> 
> But I'm curious if keeping an array of memory offsets where a glyph's 
> pixels exist would actually beat SDL's blitters...the method we're 
> discussing writes to less memory that a default surface blit, but it 
> doesn't take advantage of the possibility of blitting between hardware 
> surfaces, or that pixel formats, alpha blending, and conversions can be 
> handled without cluttering your application and possibly with 
> MMX/Altivec/SSE behind the scenes.
> 
> Colorkey blitting with RLE encoding might achieve close to the same 
> speed inside SDL for about the same memory usage, too.
> [snip]

Here's some data, at least, I hope the following will be helpful 
in some way for people still targeting 2D modes. Timings generated 
based on: Desktop is a Sempron 3000+ (power saving mode) with a 
basic Nvidia (WinXP), while laptop is a old Celeron 766MHz with a 
2D Trident (WinME). SDL 1.2.11 from libsdl's MinGW developer's 
library, text_speed.c compiled using MinGW (gcc 3.4.2) on -O2. 
Display modes are 32-bit, at least 1024x768.

Timings are for a single run, copied verbatim from stdout.txt 
files. Timings are quite stable from run to run. For info, in the 
first item, the screen is written 1000 times (actually about 1372 
out of 1920 positions are written to).

No changes (desktop)
--------------------

Time needed to display text pages: 15280.
      - spent in text output: 984
      - spent in screen refreshes: 14293
      - spent in miscellaneous: 3

Time needed to display text lines: 20600.
      - spent in text output: 931
      - spent in screen refreshes: 19648
      - spent in miscellaneous: 21

Time needed to display individual characters: 9858.
      - spent in text output: 1037
      - spent in screen refreshes: 8761
      - spent in miscellaneous: 60

The bulk of the work is spent on SDL_UpdateRects. So one screen 
update (the first item) is about 15.3ms (15280/1000).

No changes (laptop)
-------------------

Time needed to display text pages: 41370.
      - spent in text output: 14326
      - spent in screen refreshes: 26158
      - spent in miscellaneous: 886

Time needed to display text lines: 57253.
      - spent in text output: 15748
      - spent in screen refreshes: 40265
      - spent in miscellaneous: 1240

Time needed to display individual characters: 43425.
      - spent in text output: 13043
      - spent in screen refreshes: 23522
      - spent in miscellaneous: 6860

An old Celeron is much slower crunching the pixel array (41.4ms). 
Sluggish, but turn-based games should survive.

No changes (desktop, directx)
-----------------------------

Time needed to display text pages: 3967.
      - spent in text output: 931
      - spent in screen refreshes: 3028
      - spent in miscellaneous: 8

Time needed to display text lines: 5366.
      - spent in text output: 984
      - spent in screen refreshes: 4362
      - spent in miscellaneous: 20

Time needed to display individual characters: 40438.
      - spent in text output: 898
      - spent in screen refreshes: 39478
      - spent in miscellaneous: 62

One screen update is done in about 4ms. However, for directx, the 
time for individual characters is much slower. Is this due to some 
kind of windib optimization?

Single UpdateRects of 800x480 (desktop)
---------------------------------------

Time needed to display text pages: 21451.
      - spent in text output: 957
      - spent in screen refreshes: 20490
      - spent in miscellaneous: 4

This refreshes the entire screen area of 80x24 as implemented in 
the code. Since more positions are drawn (1920 versus 1372), it is 
slower, but a simple calculation confirms that the scaling of rect 
update time is linear. So, to minimize redraw with windib, a 
textmode buffer can minimize the number of changes required, if 
that's absolutely necessary.

Single UpdateRects of 800x480 (desktop)
Array of display SDL_Surface
---------------------------------------

Time needed to display text pages: 21962.
      - spent in text output: 2037
      - spent in screen refreshes: 19919
      - spent in miscellaneous: 6

Draws the first test using surfaces generated using 
SDL_DisplayFormat, using a single SDL_UpdateRects. The text output 
section takes double the time of Leon's method, but the overall 
performance impact is small (22.0ms versus 21.5ms). However, if 
there is a need to set many background and foreground colours, 
some kind of caching will be desirable.

Single UpdateRects of 800x480 (desktop)
Array of 8-bit SDL_Surface
---------------------------------------

Time needed to display text pages: 22418.
      - spent in text output: 2479
      - spent in screen refreshes: 19935
      - spent in miscellaneous: 4

Draws the first test using 8-bit surfaces generated by SDL_TTF, 
using a single SDL_UpdateRects. This is slightly slower than 
drawing SDL_DisplayFormat surfaces. Scaling from the 15.3ms in the 
first results, for this method the same redraw would probably take 
about 17ms. For this format, any fg/bg colour choice can be made, 
with blit operation doing all the hard work. Plus, it is only 
slightly slower than using opaque display-format surfaces.

Single UpdateRects of 800x480 (desktop)
Array of 8-bit SDL_Surface + SDL_SRCCOLORKEY|SDL_RLEACCEL
---------------------------------------------------------

Time needed to display text pages: 22256.
      - spent in text output: 2344
      - spent in screen refreshes: 19907
      - spent in miscellaneous: 5

Using SDL_SRCCOLORKEY|SDL_RLEACCEL with 8-bit surfaces is slightly 
faster than without those flags. There is a single clear to 
background call for each Term_text_sdl call; so if each position 
can have its own fg/bg colour, this will be slower.

Single UpdateRects of 800x480 (desktop)
Array of display SDL_Surface + SDL_SRCCOLORKEY|SDL_RLEACCEL
-----------------------------------------------------------

Time needed to display text pages: 21481.
      - spent in text output: 1456
      - spent in screen refreshes: 20017
      - spent in miscellaneous: 8

Using SDL_SRCCOLORKEY|SDL_RLEACCEL with SDL_DisplayFormat surfaces 
is much faster than without those flags. There is a single clear 
to background call for each Term_text_sdl call; so if each 
position can have its own fg/bg colour, this will be slower.

It look like there's not a whole lot to be gained from all the 
effort optimizing this thingy for 2D video mode, especially if the 
game is a turn-based one (just assuming) and windib is used. 
Perhaps other graphic cards might give significantly different 
results.

-- 
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia




More information about the SDL mailing list