Wednesday, October 28, 2009

Adventures in virtual texture space, part 12

So the last time I mentioned that I managed to compress my textures to about 6Kb, and theorizing that it's about the same compression ratio that Id software has.
Obviously Id software, next to simply being awesome, have many more resources compared to me to get their texture compression just perfect.
So I was quite happy with myself that, using their published papers, I was able to get it the same results.

Unfortunately, as Brian Karis mentioned in the comments, I was wrong.

6Kb wasn't per texture but for all the different material channels, diffuse/bump/specular, combined.

To get the same compression ratio I would need to get it down to 2Kb.

Of course I could just crank up the compression ratio, but I didn't really like the results if I did.
That being said, if I convert the results to dxt5 after decompression I'd lose some more quality anyway, so in the end cranking up the compression ratio might be acceptable.
But I really wanted to see how far I could go without decreasing the quality.

So I started to experiment with Burrows Wheeler transformations, which do some clever stuff to change the order of bytes so that when you perform RLE (Run Length Encoding) on them, the compression ratio ends up being better.
And look at that, it worked like a charm!

The file size got down to 2Kb.

But it's slow. REALLY slow.

The algorithm scales horribly, and on a 64Kb buffer it took seconds to finish.
Decompression took >10ms, which is a lot faster than compression, but still not good enough.

Sometime later, after trying all kinds of things, I tried doing RLE both before & after the transformation, just to see what would happen.
The reasoning was that if I first decrease the size of the buffer, by compressing it a little with RLE, then the Burrows Wheeler process would be faster.
After that I would then do another RLE over the resulting buffer to increase my compression ratio somewhat.
I expected this to be a little faster, and probably have somewhat worse compression overall.

To my surprise now the total compression process was MUCH FASTER only taking 14ms.
Decompression only took 3.2ms overall, and the file got about 100 bytes SMALLER, which is somewhat counter-intuitive.

Result: 1899 bytes for an 128x128 RGBA texture.
(the alpha channel is a solid color though)

Friday, October 23, 2009

Adventures in virtual texture space, part 11

I've been working on implementing the texture compression ideas from the  "Real-Time Texture Streaming & Decompression" paper in my virtual texture implementation.
I couldn't just use the code and port it this time because this paper only has the decompression code, and I figured it would probably be easier to write everything from scratch then to try to figure out how the compression side would look to perfectly work with the given decompression code.
It's not implemented into my VT demo (yet), but I have some results.



Right now I can decompress an 128x128 texture in 2.5ms, and it ends up about 5-6kb, and that includes an alpha channel.
I don't think i can get it much smaller, at least i don't know how to do that without seriously degrading performance, and the quality is already less than i would like.
Eventually i'll see how much of a difference it makes when i turn this into a YCoCg/DXT5 texture, in which case there might not be much difference in quality when comparing the original and the compressed one (since DXT5 already reduces the quality)
The speed can probably be increased several times considering this is written in plain vanilla C#. Languages like C or C++ are much better at this sort of thing because you can do all kinds of pointer tricks that you cannot do easily in a managed language, although i might try C++/CLI eventually.

Like the paper I'm converting my RGB(A) texture into YCoCg(A), separate the channels, and then downsample the Co and Cg channels (which actually makes no real visual impact).
I then splice the buffers up into 8x8 blocks and pass them trough an DCT converter and quantize it all.
Unlike the paper I'm not doing run-length and Huffman encoding on a block basis, but i'm doing this over all blocks. Eventually it might make sense to do the encoding on a block basis, to make it easier to multi thread the decompression, but I'm not so sure it's a good idea to have a Huffman header on a per block basis, I'm think it would be more efficient on a per texture basis.

In the "From Texture Virtualization to Massive Parallelization" paper they mention they have "diffuse, specular, bump and cover/alpha" channels for their tiles, and that they have "Typically 2-6kB input, 40kB output".
Which makes me wonder if they have this 2-6Kb input per 'texture', or for all those channels combined.
I can't imagine that it's for all the channels combined, because 40Kb is a 128x128 texture with 2.5 bytes per texel, and they use 128x128 tile sizes as an example in the same PDF.
So it seems to me they have 2-6Kb per tile, which is about the same I have right now.
The lower limit of 2Kb is probably for grey textures that only have 1 channel.

Tuesday, October 13, 2009

The birdmen cometh

So yesterday my parents came back from France, from their second home i mentioned before in a previous post.

While they where in France my brother and his family visited them and he helped them cut down a couple of trees.

So my parents showed me a video of my brother perched up in a tree,
sawing off the branches.

A little while later his 3 year old son comes out of the house,
barely awake, just woken up.

He looks up, and sees his father sitting there in the tree,
which he obviously didn't expect to see.
Somewhat puzzled, he then says (roughly translated from dutch):
"Daddy, surely you're not a bird!?"

Classic.

Monday, October 12, 2009

Adventures in virtual texture space, part 10B

Like I promised in my last post, here are some images.
On the left you can see the original texture, the other two are the result of YCoCg/DXT5 compression.
The middle one is with the simplification used in the
"Real-Time DXT Compression" paper I talked about, the right one is without it.



Here you can see that there's an orange border around
all the yellow edges. In the middle one there are also some
bright green blocks.



Everything is blocky after compression, has a
washed out red appearance, and in the middle
there are some purple smudges here and there.


In the middle image there's a red glow around the top of
the yellow stripes, while the black stripes have a redish glow.
The right one isn't that much better, but the glow is more yellow,
which makes more sense since there's no red in the original.
 When you look at the yellow stripes from some distance,
the middle one looks green, unlike the left and right
 one which remain yellow.

I'm pretty sure these artifacts would be far less visible on higher resolution textures, but here all these artifacts are rather visible.

Update: Updated images with higher resolution versions.

Adventures in virtual texture space, part 10

Texture compression

Yesterday night I managed to spend some time on my virtual texture project.
I got sick of just fiddling around with all the stuff I already build, especially since it didn't really produce that much results.
So I figured it would be better to implement YCoCG/DXT5 compression.
Since time is a problem for me, I simply copied the code from the excellent
"Real-Time YCoCg/DXT5 Compression" paper and converted the C code into C# (since that's the programming language I'm using right now).

Conversion was pretty trivial, but unfortunately there seemed to be some inconsistencies between all the papers on this subject, so it took me more time than I would've liked to find the right combination of compression and decompression code.
It also turned out that somewhere in my image conversion chain my red and blue channels where reversed somehow? Still not sure exactly where it went wrong, didn't have much time, so i simply swapped them, which worked.


Unfortunately the quality was a big let down.
I'm guessing Id software simply uses higher resolution textures for Rage, so it's less obvious, but with some of the quake 4 textures the artifacts get really ugly.


I noticed that in the "Real-Time DXT Compression" paper (the prequel to the former paper) at the top of page 12 they mentioned that a line of code from 'EmitColorIndices' can be simplified from:
result = ( !b3 & b4 ) | ( b2 & b5 ) | 
         ( ( ( b0 & b3 ) | 
             ( b1 & b2 ) ) << 1 );
to
result = ( b2 & b5 ) | 
         ( ( ( b0 & b3 ) | 
             ( b1 & b2 ) ) << 1 );
mentioning that

Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be omitted because it does not significantly contribute to the final result.

Well maybe it's just me, but I DO see significant differences, and replacing their 'improved' version with the 'unimproved' version increased quality a lot for me, especially with anything yellow. (I'll try to post some images later on)

(Don't get me wrong though, other than this little thing, it's an excellent paper and very valuable resource!)

Anyway, with YCoCg/DXT5 compression my page tiles shrunk 4x and performance increased a lot, so I'm quite happy about that.
I had a couple of frustratingly moments in the past where I thought I fixed my page loading performance, but it turned out the be caching, so I replaced everything with uncached memory mapping, and everything still ran rather smoothly.
Even though reducing the size of my pages on disk helps with locality and therefore latency, I honestly thought that throughput wasn't really part of my performance problems, at least not yet.
Apparently I was wrong.

I haven't tried the more advanced "Real-Time Texture Streaming & Decompression" stuff yet, but that should increase my performance even more.


Now that I've mentioned all these papers, I should also mention "Geospatial Texture Streaming From Slow Storage Devices" which is another paper written by Id Software, the latest one if I'm not mistaken.
Even though it's, oddly enough, about the older megatexture/clipmapping technology rather than virtual texture technology, which makes me wonder if they're mixing megatexure for terrain with virtual texturing for everything else, there is some rather interesting stuff in there, so well worth the read.



Possible analytical solution

Other than that, I have this vague idea concerning virtual texturing. I could have a hashed grid for each mip-level, and then store which pages are potentially visible from within each hashed-grid cell. I can then use this to sort my pages in my virtual texture file, and perhaps also use this at runtime to determine which pages are (potentially) visible when.

It would avoid having to do a read-back, and it would actually work with transparent textures.
Another thing would be that it would allow me to 'look ahead' to determine which pages might be visible in the future when i move in the direction i'm facing (or something similar).
I would need to load more pages though, most of which, i'm guessing should, be cached on the CPU side.. Hopefully this would be offset by the better (theoretical) locality of it all.

Wednesday, October 7, 2009

Somebody beat me to it!

I was planning to, eventually, try to optimize some parts of virtual texturing using OpenCL.

But now it seems someone has done this pioneering work before me!
(using CUDA)

Check out this post from Charles-Fredrik Hollemeersch.
Perhaps some of you may recognize the name as the guy behind Tenebrae.

According to their Poster they seemed to have presented their results at the GPU Technology Conference in San Jose, California.

A chapter about this can be found in the upcoming book
"GPU Pro: Advanced Rendering Techniques"

Awesome.

Friday, October 2, 2009

Adventures in virtual texture space, part 9

Awesome, I discovered that if I use bigger page sizes, my level would display much more smoothly.
Not because bigger page sizes are better, but because the code i was using to update my indirection table was god awfully slow!
(and the bigger the page size, the smaller the indirection table)

In the crytek paper they mention a trick where they basically render their page changes into the indirection table instead of uploading a texture.
Sounds like the next step for me!

Now if only I had some time to actually do this.. hmmm

Oh and defrag your harddisk when working on virtual texture tech!
It helps!
A LOT