Tuesday, September 22, 2009

Adventures in virtual texture space, part 6

Yesterday I was working on my virtual texture code, and there's only one bug left (that i know of).

I haven't had much time to look at it, but it fails to load some pages when the page-cache is full for some reason, so it probably has to do with the code that kicks out unused pages when the cache is full.

That aside i'm having some problems with loading my pages from disk.

If I schedule pages to be loaded right when i discover i need them, and just keep adding them to the list, i end up with an ever increasing list if i move around fast enough, because the disk IO simply can't keep up.

So i tried a couple of things, one of them being counting how many times i have a reference to each page in my readback buffer, and giving the pages with the highest ammount of references the highest priority.

After all, the more visible it is, the more imporant it is to load!
That helped somewhat, but still caused a lot of pages to be loaded and uploaded which aren't visible, yet might actually kick out pages that would be more likely to be visible soon.

The next thing i tried was to remove all the pages from my scheduler that aren't visible in the current frame, but that caused pages to not get loaded at all while you move your camera.

A couple of things that i'm going to try:
  • A CPU side disk cache, i only have one on the GPU at the moment.
  • Compression, hoping that'll decrease all these problems simply because loading would be faster and the list wouldn't grow as quickly.
  • Using adjacency information. By pre-calculating a bounding box of all the texel area in world-space, i can sort all the pages by how close they are to each other, and perhaps do some analysis on what is likely to be visible soon, and what would be unlikely to be visible soon.
  • The source material I'm working with is pretty horrible because it has a lot of decal textures which take up extra pages on the screen, and there are simply too many different pages on the screen at the same time. Building my own CSG preprocessor would allow me to optimize this more easily, as apposed to trying to fix a relatively arbitrary list of triangles with all kinds of materials.

My good friend Volker suggested a couple of things i could try:
  • If you know if a page is to be loaded for the current frame, it may be wise to put the higher and lower mip levels of that page in the queue too. (but only if there is nothing else)
  • Kick out pages by random instead of last used or least frequently used caching schemes, apparently the worst case is better, so that's worth a shot.
    (Obviously no pages should be kicked out that's visible in the current frame)
  • Using memory mapped IO.
    Volker: "in my tests, mmap io made a difference from 10mb/s to 500mb/s. (using the disk cache)"
    My biggest problem is latency though.

So does anyone out there have any good ideas on how to determine which pages should be loaded & which pages should be kicked out?

I'm going to see if I can upload a new video of my test level tonight.

5 comments:

  1. Keep up the great work. Looking forward to when you get to the Deferred Virtual Texture Shading part!

    Id uses something (according to Siggraph talk) where if they cannot fit all the required pages in the virtual texture, they reduce the global quality level until everything fits (might want to also try quality reduction progressively in depth).

    Disk streaming is a complex problem. If you go with regular IO, you get to use the page cache, at the cost of an extra memory copy (tiny), advantage being that once the kernel IO call is finished, likely safer to assume the data is resident in memory. If you use memory mapped IO, then keep in mind that any new memory read is a possible page fault (pages often get loaded when they are first touched, unless access pattern gets picked up by pre-fetch logic). Point here is that you will need to keep memory mapped IO area reads in a separate thread (assume they will block). A third option is async raw IO, which avoids the page cache. Great option for profiling and optimizing the worst case limiters on performance.

    Having virtual texture space organized with some data locality might help the latency problem. Idea being that disk seek time is going to dominate the latency problem, best to try to load in data for many possible textures per IO call...

    ReplyDelete
  2. We have the same problem in our implementation. What we do right now is to keep a priority queue for all tiles required in this or past frames which are not present in the cache. The priority of a tile is calculated based on three factors:
    1) The time the tile was requested. Tiles needed for this frame get a higher priority over tiles requested in past frames.
    2) The number of frames a tile has been requested. E.g. If you have two tiles which are needed in the same frame, the tile which has been requested more times in the past frames gets a higher priority (note that we don't count the number of pixels a tile covers in the readback buffer in order to keep things as fast as possible).
    3) The mipmap level of the tile. Tiles from higher mipmap levels (lower resolution) cover a larger area of the texture and as a result get a higher priority.

    When the priority queue has been updated, the system requests the first N tiles from a worker thread, which either generates the data on the fly or reads them from disk (if the tile is on the HDD level cache).

    In the case of a fast moving camera, the tiles which are going to be requested from the worker thread would be those corresponding to lower resolution mipmaps. Higher detail tiles will always get a lower priority due to factors 2 and 3. In the case of a stationary camera, higher detail tile will gradually pop into view because higher mipmap levels are always in the cache (and as a result they don't get a slot in the priority queue).

    The system isn't perfect by any chance. The problem is that I can't think of something different which will be as fast and with the same or smaller memory requirements. I'm always interested in your findings. Keep up the good work.

    ReplyDelete
  3. @HellRaiZer & @Timothy Farrar:
    Thanks!

    @Timothy Farrar:
    Good point about reducing quality, but obviously i want to try to get this to work without having to reduce/increase quality on the fly.. it's a last resort after all. I'm already using a thread for the loading. Why is async IO the worst case?
    You're right about trying to load data for as many pages at the same time. Right now i'm reading them one at a time and setting a seek before the read, which could possible be bad for caching behavior (haven't looked into that).

    @HellRaiZer:
    Thanks for the information, very interesting and helpful points!

    ReplyDelete
  4. Async RAW IO (skip the page cache, go direct to hardware) would represent sustained worst case (page cache miss) minus OS overhead (page table remap, etc) if you were looking to tune an implementation which used the OS page caching.

    However if I was to write my own virtual texturing I would actually be using async RAW IO and doing my own caching. Reason being that I would have better control of page faults, blocking, IO, etc.

    ReplyDelete
  5. Right, that's a good point, thanks. I've actually been having a hard time with the OS disk cache because it makes things suddenly go faster or slower, turning profiling into a guessing game.

    ReplyDelete

To you spammers out there:
Spam will be deleted before it shows up, so don't bother.