Monday, October 12, 2009

Adventures in virtual texture space, part 10

Texture compression

Yesterday night I managed to spend some time on my virtual texture project.
I got sick of just fiddling around with all the stuff I already build, especially since it didn't really produce that much results.
So I figured it would be better to implement YCoCG/DXT5 compression.
Since time is a problem for me, I simply copied the code from the excellent
"Real-Time YCoCg/DXT5 Compression" paper and converted the C code into C# (since that's the programming language I'm using right now).

Conversion was pretty trivial, but unfortunately there seemed to be some inconsistencies between all the papers on this subject, so it took me more time than I would've liked to find the right combination of compression and decompression code.
It also turned out that somewhere in my image conversion chain my red and blue channels where reversed somehow? Still not sure exactly where it went wrong, didn't have much time, so i simply swapped them, which worked.

Unfortunately the quality was a big let down.
I'm guessing Id software simply uses higher resolution textures for Rage, so it's less obvious, but with some of the quake 4 textures the artifacts get really ugly.

I noticed that in the "Real-Time DXT Compression" paper (the prequel to the former paper) at the top of page 12 they mentioned that a line of code from 'EmitColorIndices' can be simplified from:
result = ( !b3 & b4 ) | ( b2 & b5 ) | 
         ( ( ( b0 & b3 ) | 
             ( b1 & b2 ) ) << 1 );
result = ( b2 & b5 ) | 
         ( ( ( b0 & b3 ) | 
             ( b1 & b2 ) ) << 1 );
mentioning that

Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be omitted because it does not significantly contribute to the final result.

Well maybe it's just me, but I DO see significant differences, and replacing their 'improved' version with the 'unimproved' version increased quality a lot for me, especially with anything yellow. (I'll try to post some images later on)

(Don't get me wrong though, other than this little thing, it's an excellent paper and very valuable resource!)

Anyway, with YCoCg/DXT5 compression my page tiles shrunk 4x and performance increased a lot, so I'm quite happy about that.
I had a couple of frustratingly moments in the past where I thought I fixed my page loading performance, but it turned out the be caching, so I replaced everything with uncached memory mapping, and everything still ran rather smoothly.
Even though reducing the size of my pages on disk helps with locality and therefore latency, I honestly thought that throughput wasn't really part of my performance problems, at least not yet.
Apparently I was wrong.

I haven't tried the more advanced "Real-Time Texture Streaming & Decompression" stuff yet, but that should increase my performance even more.

Now that I've mentioned all these papers, I should also mention "Geospatial Texture Streaming From Slow Storage Devices" which is another paper written by Id Software, the latest one if I'm not mistaken.
Even though it's, oddly enough, about the older megatexture/clipmapping technology rather than virtual texture technology, which makes me wonder if they're mixing megatexure for terrain with virtual texturing for everything else, there is some rather interesting stuff in there, so well worth the read.

Possible analytical solution

Other than that, I have this vague idea concerning virtual texturing. I could have a hashed grid for each mip-level, and then store which pages are potentially visible from within each hashed-grid cell. I can then use this to sort my pages in my virtual texture file, and perhaps also use this at runtime to determine which pages are (potentially) visible when.

It would avoid having to do a read-back, and it would actually work with transparent textures.
Another thing would be that it would allow me to 'look ahead' to determine which pages might be visible in the future when i move in the direction i'm facing (or something similar).
I would need to load more pages though, most of which, i'm guessing should, be cached on the CPU side.. Hopefully this would be offset by the better (theoretical) locality of it all.