[SDL] Re: X11 again

Pierre Phaneuf pphaneuf at sx.nec.com
Wed Apr 5 09:22:02 PDT 2000

Dan Maas wrote:

> encoding/decoding the drawing request into the IPC buffer. I *believe* a
> context switch would still allow really smoking performance, especially with
> shared-memory transport... has anyone timed context switches?)

The problem I find with context switches isn't all that much their own
performance, but the hit they impart on other things. My main problem is
that it completely wrecks the cache. I work hard to make optimized
blitting and culling routines, aligning everything and making sure all
my structures all fit in the cache, and what do I get? It gets kicked
out and I have a processor doing wait-states, waiting for the data to be
BACK when it could have stayed there the whole time.

Even without realtime scheduling, you would get higher performance if
you removed the context switch. You'd only get a hitch once in a while
when a background process had to do something, but when they are all
sleeping, you don't get pre-empted and switched. If you use realtime
scheduling, then you *don't* get pre-empted, period.

I personnally don't mind that much getting pre-empted once in a while by
a syslogd wake-up, enough that I'm not thinking that hard about realtime
scheduling. Now, using realtime scheduling has some effects (at least,
*your* cache doesn't get fucked up), but the X server (which is doing
the bulk of the job) gets the same mis-treatment. It could be even
WORSE, because now it has to contend with an hungry, over-prioritized
realtime process!

> Client-side rendering, the method you outlined, is currently in use by
> Windows NT4, as well as the XFree 4.0 DRI drivers. I honestly don't believe
> going this far is necessary, since client-side hardware access is very, very
> hard to manage, and all it buys you is one measly context switch. You really
> don't want to try multiplexing MMU-less consumer graphics hardware across
> many simultaneously rendering processes. Presenting a uniform view of the

I think most video card (including the cheaper top-end professionnal
offerings) do not have MMUs. The only ones I know are the ccNUMA SGI
workstations, which is obviously MMUish stuff.

> hardware to each process at the very least requires lots of kernel support;
> you'd need a DRI-like interface that dispatches DMA buffers to the card,
> plus some synchronization mechanisms. And how do you manage video memory? Do
> you really want to write kernel code to swap bitmaps in and out of card
> memory on context switches? And what about security?

It's already there and done (that kernel support), for DRI. Yes, it was
a lot of work. And it's done, isn't it wonderful? So now, let's USE it!

For the security, this is covered by the DRM. The /dev/drm device won't
talk to a process without being given a cookie that comes from the X
server. I *think* that root privilege is not required, but that they
recommend making this device only accessible to a group, and putting the
appropriate people in that group (like the "floppy" group on older Red
Hat used to control the access to the /dev/fd0 device).

> For those of us yearning for Windows-like 2D and 3D graphics speed on Linux,
> I believe a more realistic approach is a minimalistic client-server
> architecture. As in X, the server manages all hardware access and input
> devices. Set up several MB of shared memory for each client to transmit
> bitmaps and drawing commands. Synchronize the client and server processes
> with a simple UNIX semaphore, and modify the Linux scheduler to guarantee
> that the server process can run immediately after the client deposits
> drawing requests in shared memory. All the server has to do is decode the
> simple drawing commands and start DMA'ing bitmaps right into the
> framebuffer. You could also head more in the direction of DRI, and allow the
> client to build up DMA buffers itself; that might make more sense for 3D
> API's where lots of number-crunching might have to occur to go from drawing
> commands to register programming.

This would be a lot of work (work yet to be done, as opposed to the lot
of work already done on DRI). Would be highly unportable (regarding that
"schedule this process next" thing). This actually sounds a lot like
what X currently is (with XShm), the only addition being the kernel
scheduler modification (which you referred to as "nasty" earlier, not a
good sign, eh?).

Note that there is a flaw in the current system, DMA buffers have to be
physically contiguous (AGP lifts that restriction though), so DMA direct
from shared memory is usually not possible.

Pierre Phaneuf
Systems Exorcist

More information about the SDL mailing list