I've been working on implementing the texture compression ideas from the "Real-Time Texture Streaming & Decompression" paper in my virtual texture implementation.
I couldn't just use the code and port it this time because this paper only has the decompression code, and I figured it would probably be easier to write everything from scratch then to try to figure out how the compression side would look to perfectly work with the given decompression code.
It's not implemented into my VT demo (yet), but I have some results.
Right now I can decompress an 128x128 texture in 2.5ms, and it ends up about 5-6kb, and that includes an alpha channel.
I don't think i can get it much smaller, at least i don't know how to do that without seriously degrading performance, and the quality is already less than i would like.
Eventually i'll see how much of a difference it makes when i turn this into a YCoCg/DXT5 texture, in which case there might not be much difference in quality when comparing the original and the compressed one (since DXT5 already reduces the quality)
The speed can probably be increased several times considering this is written in plain vanilla C#. Languages like C or C++ are much better at this sort of thing because you can do all kinds of pointer tricks that you cannot do easily in a managed language, although i might try C++/CLI eventually.
Like the paper I'm converting my RGB(A) texture into YCoCg(A), separate the channels, and then downsample the Co and Cg channels (which actually makes no real visual impact).
I then splice the buffers up into 8x8 blocks and pass them trough an DCT converter and quantize it all.
Unlike the paper I'm not doing run-length and Huffman encoding on a block basis, but i'm doing this over all blocks. Eventually it might make sense to do the encoding on a block basis, to make it easier to multi thread the decompression, but I'm not so sure it's a good idea to have a Huffman header on a per block basis, I'm think it would be more efficient on a per texture basis.
In the "From Texture Virtualization to Massive Parallelization" paper they mention they have "diffuse, specular, bump and cover/alpha" channels for their tiles, and that they have "Typically 2-6kB input, 40kB output".
Which makes me wonder if they have this 2-6Kb input per 'texture', or for all those channels combined.
I can't imagine that it's for all the channels combined, because 40Kb is a 128x128 texture with 2.5 bytes per texel, and they use 128x128 tile sizes as an example in the same PDF.
So it seems to me they have 2-6Kb per tile, which is about the same I have right now.
The lower limit of 2Kb is probably for grey textures that only have 1 channel.