[SDL] Re: X11 again

Dan Maas dmaas at SPAMdcine.com
Tue Apr 4 23:00:43 PDT 2000


> we really need to get the X server out of the 2D rendering cycle during
the
> running phase.

I too have been interested in different types of optimizations that could
speed up X. Basically you have two problems: latency and bandwidth. Latency
is the time between the user clicking on a button, and the button graphic
becoming highlighted/depressed on screen. Bandwidth is how many 640x480
32-bit images you can display per second. Just to throw out some numbers,
I'd really love to see latency below 10ms and bandwidth at least 30MB/sec...

Traditional X uses a UNIX domain socket between client and server, a pretty
bad form of data transport (from the viewpoint of optimizing latency and
bandwidth). In the best case, latency isn't too slow; all that separates a
client drawing request from the server carrying it out is a single context
switch. As the system load increases, latency gets worse since there is more
contention for the CPU... Note that Windows NT used a similar transport
method through v3.51; however, NT got better performance under heavy loads
thanks to some nasty scheduler tricks. Specifically, when a client sent a
drawing request to the server, it could use a system call that would DEMAND
the server thread to be run next, bypassing all other threads waiting for
the CPU. One could theoretically implement this on Linux by, say, adding a
system call to switch right to the server, instead of what the scheduler has
planned. (someday when I get a chance I'll try this out; it should reduce
worst-case latency to a single context switch, plus the time for
encoding/decoding the drawing request into the IPC buffer. I *believe* a
context switch would still allow really smoking performance, especially with
shared-memory transport... has anyone timed context switches?)

Bandwidth, on the other hand, really blows with UNIX sockets. You haven't
got a chance at displaying full-motion video at 30fps, since all that data
must be serialized and then passed through the tiny socket buffer. (A while
ago I measured the throughput of a UNIX pipe at ~1 MB/sec, ugh). The obvious
way around this is to share memory between client and server, as in X-SHM. A
large enough shared segment could contain all drawing requests and bitmap
data; then you could also switch to a simple semaphore as the IPC mechanism.
Together with a switch_to_server_thread_NOW() system call, this would
approximate NT 3.51 pretty closely. (you could additionally optimize the
server->hardware path by, say, directly DMA'ing between the shared memory
bitmaps and the framebuffer; this could push your rendering bandwidth into
the hundreds of MB/sec on AGP cards!)

Client-side rendering, the method you outlined, is currently in use by
Windows NT4, as well as the XFree 4.0 DRI drivers. I honestly don't believe
going this far is necessary, since client-side hardware access is very, very
hard to manage, and all it buys you is one measly context switch. You really
don't want to try multiplexing MMU-less consumer graphics hardware across
many simultaneously rendering processes. Presenting a uniform view of the
hardware to each process at the very least requires lots of kernel support;
you'd need a DRI-like interface that dispatches DMA buffers to the card,
plus some synchronization mechanisms. And how do you manage video memory? Do
you really want to write kernel code to swap bitmaps in and out of card
memory on context switches? And what about security?

For those of us yearning for Windows-like 2D and 3D graphics speed on Linux,
I believe a more realistic approach is a minimalistic client-server
architecture. As in X, the server manages all hardware access and input
devices. Set up several MB of shared memory for each client to transmit
bitmaps and drawing commands. Synchronize the client and server processes
with a simple UNIX semaphore, and modify the Linux scheduler to guarantee
that the server process can run immediately after the client deposits
drawing requests in shared memory. All the server has to do is decode the
simple drawing commands and start DMA'ing bitmaps right into the
framebuffer. You could also head more in the direction of DRI, and allow the
client to build up DMA buffers itself; that might make more sense for 3D
API's where lots of number-crunching might have to occur to go from drawing
commands to register programming.

I've seriously thought about implementing the above, and I might try it when
I get some free time this summer. I can't wait for the day when dragging
windows around on Linux will be just as fast as NT...

Comments are welcome,
Dan





More information about the SDL mailing list