[SDL] Re: threading overhead
pphaneuf at sx.nec.com
Fri Apr 14 10:38:20 PDT 2000
Mattias Engdegård wrote:
> > Freshly malloc()ed memory sometimes is not actually allocated, but this
> > is slightly different from COW, any access (including reading) to the
> > page will do the actual allocation, right?
> Remember this was Solaris, and I wasn't sure how it handled allocation and
> whether it did overcommit. Setting fresh anonymous mappings to COW the zero
> page is a quite reasonable implementation, and reading it would just read
> one small (8K) physical page.
Yes, I should write all over the buffer once before starting so that any
overcommitting was actually committed. Even though it doesn't change
anything on Solaris... 8K pages on UltraSPARC? I didn't knew that... :-)
> > My impression is that for a game, it is cache misses that are biggest
> > lose, as they mess up things like culling, blitting and mixing.
> My impression is that the biggest lose for MT development is the debugging
> nightmare :-). I agree about the cache importance though.
Argh! Yeah, don't get me started on debugging! Notice that the locking
situation in this small piece of code is really simple, but that it
could not be the case in real life situation. The "no locking at all" in
the non-threaded version is actually infinitely simpler (and easier to
"Infinitely easier", just think about how easier that is! :-)
> The comparatively small overhead in my case probably stems from Solaris
> threads being NxM (N API threads mapping on M kernel threads, N >= M),
> effectively giving non-preemptive user-mode threading for your benchmark.
> I suppose one thread would run to completion, then the other one.
Ahh, non-preemptive threading is okay with me! It *is* more complicated
than no threading at all (it can require locking structures over a
blocking call that causes the control to be passed to another thread)
though, but has good performance.
I actually that the NxM threading model is probably one of the most
efficient, with the number of kernel threads being equal to the number
of CPUs you want to use. But I like being explicit as a way of being
clear and simple, I have a slight preference to forking a real process
with shared memory communication or spawning a thread for a single job.
Whatever is simpler is probably better.
More information about the SDL