[SDL] [OT] Resource file

Andre de Leiradella leiradella at bigfoot.com
Thu Feb 21 06:39:05 PST 2008


> Well, mmap is a bit faster than reading/writing a file. I'm rather fond of
> it myself. Not to mention that having the data in a binary format is very
> nice. Do remember that addresses may not stay meaningful when mmaped into
> memory. The worst case is when they do stay meaningful on your machine, and
> on some other machines, but are not meaningful on every machine. This can
> lead you to putting addresses into the file and then having to redo
> everything so that you store offsets.
>   
I like it a lot too. Implementing decompressing readers with mmapped 
files is so much easier than reading chunks of data when the input 
buffer of the decompressor is empty.

I'm not storing pointers in the file even because the file can be mapped 
to different addresses each time on the same machine. All I have are 
offsets and they're all at the file header, and a copy-on-write mmap 
helps when converting them to platform-dependent pointers. For each 
offset I have 16 bytes of extra space in the file to accomodate the 
corresponding pointer.
> B. Spare space where I could insert the digital signature of the file.
>   
>> This space must be filled with zeroes while computing the hash for the
>> signature.
>>     
>
>
> That isn't really a big deal, you can always create a zeroed out item in any
> resource file that can be used to store the signature after it is computed.
> Even better, you can just put the signature in a wrapper so that you don't
> actually change the signature of the file by adding the signature to the
> file. A simple header consisting of the length of the signature followed by
> the signature can be prepended to the file to sign it. Or, even simpler, you
> can just append the signature to the file and never care about the format.
>   
Yeah, you're right.
> The preliminary version is working quite well. Data is accessed by name
>   
>> (char *), with a reader that supports basic data input operations being
>> returned. Since the file is mmaped, the location of a given chunk of
>> data within the file is quickly found via a bsearch call. Data can be
>> stored without compression (good for mp3, ogg, jpeg...) or bzipped. The
>> resource file can even be part of another file, the most common use
>> being to append the resource file to an executable.
>>
>> There is also support for transparently using a directory instead of a
>> real resource file just like zziplib, and reading entries via SDL_RWops.
>>
>> So my questions are:
>>
>> 1. Is there any thing bad about mmapping a resource file? The file size
>> can easily be greater than 4 GiB.
>>     
>
>
> Well, by allowing your file to be so large you restrict yourself to 64 bit
> architectures. In general  32 bit machines can not address more than 4 gigs
> of process space. In reality they can rarely address more than 2 gigs of
> process space. If you don't care about 32 bit machines then there is really
> nothing wrong with what you are doing. Just remember that your addresses and
> offsets need to be 64 bits.
>   
Sure, they are. Do you know of any gotchas when mmapping files bigger 
than 2 GiB into an application's address space?
> 2. Are there other resource file formats that provide A and B above?
>
>
> Well, your "A" requirement is a requirement of the implementation of the
> access library and has absolutely nothing to do with the file format. So
> basically all and/or no file format gives you that. You could take any
> existing file format and create a library for accessing it that uses mmap.
> If you look deep down in the file code for your favorite compiler you might
> find that ti already uses mmap to implement read and write in which case all
> libraries have this feature.
>   
I partially agree. The file format is being designed so that all offsets 
are stored at the header so that I don't have to walk through all the 
file with a pointer to find an entry, which would cause the OS to bring 
many pages to RAM. I didn't really check if other formats would behave 
the same. The TAR is one I know which doesn't. Besides, the format I'm 
designing will allow a simple bsearch call to find an entry instead of 
comparing all entries' names. I know the speed gain is small if one is 
going to read a large entry, but I like to think that many small speed 
gains result in a overall speed gain.

I'm aware that some compilers might use mmap behind the scenes for 
regular file IO, but as I said before decompressing things with mmapped 
files is a breeze.
>  And your "B" requirement can be met by simply appending the signature to an
> existing file or by writing it to a different file, so pretty much all other
> file formats meet this requirement.
>
> 3. Are there other important characteristics for a resource file format
>   
>> I'm missing?
>>
>>     
>
> That it is available right now and  you don't have to write it from scratch?
>   
Yeah :-) I'm trying to use readly available code for everything in my 
projects. zziplib already does almost everything I need, and I could 
implement the signature like you suggested. But although it can be used 
to open ZIP files that are part of larger files, the documentation does 
not say how this can be done, and I'm not really into source code study 
of someone else's code, I prefer to study and to do this myself.
> 4. Do you have requirements for a new resource file?
>
>
> No.
>
> The exposed interface so far is:
>
>
> Hmmm, you aren't being consistent about the use of uint32_t and int. The
> interface as written may blow up if you try to do arithmetic on offsets or
> lengths because you are mixing unsigned and signed integers for lengths and
> offsets. It depends on whether the "int" variables are 32 bits or 64 bits.
>   
Entries are 4 GiB maximum each. But you are right, af_entry_read, 
af_entry_seek and af_entry_tell should take and return uint32_t values 
too. I was just closely following the SDL_RWops interface; my file 
format will be used in a SDL application later on and I wanted them have 
the same interface.
> Above you mentioned that the file size can easily be bigger than 4 gigs but
> you seem to only have 32 bits of offset in this format which restricts you
> to <= 4 gigs. You need to changes this interface to use 64 bit offsets.
>
> Bob Pendleton
>   
Thanks for your input.

Cheers,

Andre



More information about the SDL mailing list