Thursday, September 23, 2010

Mr Elusive strikes again!

Mr Elusive, or I should say J.M.P. van Waveren from Id Software, has released a new presentation about virtual texturing called "Using Virtual Texturing to Handle Massive Texture Data".

Most of it has been covered in previous presentations, but there are a couple of interesting tidbits which give some hints on how Id Software's virtual texture implementation works.

  • Their "sparse texture quad tree" is implemented as a MIP mapped texture, just like everybody else. I wondered about this because they never explicitly mentioned it before, and only mentioned a quad tree.
  • The presentation mentions about feedback rendering that "factor 10 smaller is OK" and "~ .5 msec on CPU for 80 x 60" which surprises me, because it's definitely not okay in my quake 4 test levels. But I suppose if the artwork has been made to work with virtual texturing it might actually work.
  • It also mentions that
      diffuse + specular + normal + alpha + power = 10 channels - notice the 10
      • 128k x 128k x 12 channels = 256 GB - now where did that 12 come from??
      • 53 GB DXT compressed (1 x DXT1 + 2 x DXT5)
    • use brute force scene visibility to throw away data - I was wondering about that, I could only think of brute force solutions, I guess the same goes for Id!
      • down to 20 – 50 GB (uncompressed) - It says uncompressed but it's still DXT compressed otherwise the numbers simply don't make any sense
      • 4 – 10 GB DXT compressed - this must be simply 'compressed' then.
So 128x128x12 texels per page = 192kb per page.
192kb / 6 = 21.5x compression. (their worst case compression ratio)

(50 GB / 10 GB = 5x compression, which explains why that 50 GB must be DXT compressed)

It also mentions somewhere "lossy compression is perfectly acceptable", so I'm guessing there are probably a lot of artifacts in their textures, otherwise they won't be able to achieve such high compression ratios.

They also mention that decompressing costs "1 to 2 milliseconds per page on a single CPU core",
which I'm assuming for their entire 12 channel page.

My results are about 3ms per 4 channel texture, which would be 3*3=9ms.
Of course my compression stuff is rather unoptimized, I chose to focus on compression ratio and image quality, so I'm not at all surprised that they have far better results than my part-time experimental efforts.

Another thing is that they specifically mention read back buffers, so their implementation is not analytical after all.
Which makes me wonder about this screenshot ...


... which incidentally uses the exact same screenshot as the previous presentation, because the MIP boundaries always lie on the page boundaries, which is completely not how a normal MIP boundary would look like.
So I'm thinking it's probably just an artificial representation of the pages, not at all representative of their technology.

Afterwards it presents an high level overview of their decompression pipeline, which I haven't had any time yet to look in detail.

Now if you'll excuse me I have to go back to my 2 simultaneous (and impossible) deadlines, thank you.

13 comments:

  1. Was hoping some news on about how the texture cache was managed and some shader tricks... Still it's interesting...

    In "Halo Wars Vector Terrain" GDC talk, they refer to "Efficient Cache Replacement using the Age and Cost Metrics" article published in GPG7 as the best way to do that... but nodoby else ever mentionned it afaik.
    A great talk to watch btw.Some virtual texture compression bits too there, editor info and much more performance tricks (like rendering multitexture/pass tile in UV space, compress, and store, avoiding huge vertex redraw)

    ReplyDelete
  2. Awesome, I haven't seen that one yet. I'll check it out after these stupid clients let me stop working on their pesky little projects.
    Pah, can't they just pay me without having to do the work?? I have better things to do! Things like virtual texturing! And taking over the world!

    ReplyDelete
  3. Looks like the link is broken... try this one: http://mrelusive.com/publications/presentations/gtc2010/GTC_2010_Virtual_Texture$.pdf

    ReplyDelete
  4. Huh. okay, that's weird.
    It's as if he deliberately changed the letter s to an $.
    Maybe problems with his bandwidth quota?

    ReplyDelete
  5. I removed the link... so I guess its down for now.

    ReplyDelete
  6. "now where did that 12 come from??"

    I presume that's just the number of channels in raw 3-channel image form (.tga, etc.).

    Also, I'm guessing that their brute force visibility really *is* able to get the footprint down to 20-50GB uncompressed. That sounds staggering, but perhaps they're also throwing away MIP data that just won't be accessed within the confines of the playable area (some racing games do this).

    "(50 GB / 10 GB = 5x compression, which explains why that 50 GB must be DXT compressed)"

    That makes no sense; what is this magic step that goes from 50GB DXT compressed to 10GB DXT compressed? :) The 5x compression ratio is the same as before the 'visibility' optimisation.

    ReplyDelete
  7. "I presume that's just the number of channels in raw 3-channel image form"

    Perhaps, I don't know.

    "Also, I'm guessing that their brute force visibility really *is* able to get the footprint down to 20-50GB uncompressed"

    Well the thing is, the document says an uncompressed page is 192kb, and that the final page is 1-6kb in size.
    192kb / 6kb = 32x compression ratio, worst case.
    (how the hell did i get the 21.5 before? weird.. I'm waaay to tired to do even simple math apparently *sigh*)
    If this is true, how could they only get 5x compression ratio for the entire virtual texture?

    Unless the paper is talking about 192kb for an entire page with all it's channels, and 1-6kb is per channel?
    That's not how I read it though, and I specifically looked for that in the paper.
    Without the original paper, it's kinda hard to go back and re-read it. :(

    ReplyDelete
  8. "That makes no sense; what is this magic step that goes from 50GB DXT compressed to 10GB DXT compressed? :) The 5x compression ratio is the same as before the 'visibility' optimisation"

    They have two compression steps, one is to turn their textures into DXT textures. The other is their own compression algorithm.

    Reading it back I must've misread it, but in my defense I've been working 12hr days for weeks now on 3 deadlines (one done), and to say I'm tired is an understatement ;)

    ReplyDelete
  9. I could be way off, but I vaguely remember something about all GPU textures really being 4 channel textures no matter what you type you actually use (int, int2, int3, int4). Although the compiler might try to coalesce them into as few textures as possible.

    If that is true, then we have

    diffuse (rgb - 3 values)
    specular (s - 1 value)
    normal (xyz - 3 values)
    alpha (a - 1 value)
    power (p - 1 value)

    So that's 9 values, dunno why they said 10, maybe one of those pieces of data has an extra channel I forgot, but my reason for 12 channels would work anyway. But packing those 9 channels onto the GPU requires 3 textures, which take up 12 (8-bit) channels of data. If Id changed the normal to x,y (calculate the z at run-time) they could probably fit it into 8 channels which is 2 textures and save bandwidth. But maybe run-time cost is what they currently care about?

    I could be totally off, it's just the first idea I came up with from reading what you wrote.

    ReplyDelete
  10. Apparently the slides can be found in the Google cache now.
    It's been converted to HTML though, and contains no images ... but it's all we've got at the moment.

    ReplyDelete
  11. You can also find a video of the presentation here:
    http://developer.download.nvidia.com/compute/cuda/docs/GTC_2010_Archives.htm

    ReplyDelete

To you spammers out there:
Spam will be deleted before it shows up, so don't bother.