Sanders' blog: September 2009

Tuesday, September 29, 2009

Adventures in virtual texture space, part 8

So yesterday I had a little bit of time to try a couple of small things.

I changed the page sorting, first sorting the pages from high mipmap level to low mipmap level, then from more to less visible (by counting how many samples I find for each page in the readback buffer).

I also decreased the amount of pages I store in my GPU side page-cache, because I suspect that OpenGL updates the entire texture even when I only update a small part of it, and a smaller texture would upload faster.

Next to that i fixed a small bug where a buffer wasn't reset properly.

Oh, and I defragmented my system :)

And lo and behold... my virtual texture app worked flawlessly!?
No popping at all!

So naturally I wanted to figure out which of my changes was responsible for this!
.. and then my wife pulled me away from my computer ;)

I guess I'll have to look at it next time I have time, but I'm really happy at the current results!

Monday, September 28, 2009

Adventures in virtual texture space, part 7

Yesterday I spend a little time on my VT project, fiddled with priorities and got a good speedup by rewriting my threading code.
I wasn't lock-free data structures before, and now that I am things are much better.

That said, I did notice that in my profiler i'm now spending *a lot* of cycles in my lock-free data structure.
I'm assuming this is because the IO thread is often waiting for things to load.

I've now capped my IO thread queue to x items, assuming that before the thread would ever load the last texture in the queue it would've already been supplemented with new items anyway.
This way the response time is increased, and i don't run the risk of infinitely growing my queue.

This is still a part that i'm experimenting with, and i still need to try a lot of the stuff that's been suggested to me in the comment section of last couple of posts.

Once again, my test data is far from optimal.
In a perfect situation you'd have roughly 1:1 pixel to virtual page texel ratio, but i sometimes need 32x more than that.
This is because of lots of textures 'stamped' onto the geometry, and lots of tiny slivers of geometry that use their own unique textures.

This makes me believe that there are two ways to building virtual texture content.

One is that you build your virtual texture as a big texture-atlas, where although the textures themselves are only stored once (subdivided into pages), the pages themselves would actually be used many times in the indirection table.
UV coordinates would be calculated to take maximum advantage of this.
It might be somewhat tricky to do with multiple mip-levels, but it would speed up file IO and would remove some of the pressure on the page cache.
The UV coordinates would require more area, and the more uniquely textured area you have, the less efficient this would become.
This technique would be very close to regular, non virtual, texturing approaches, but would basically give you automatic texture management.

The other approach is to uv-unwrap all the geometry, avoiding overlapping unless the geometry truly lies on the same plane in world space.
The new UV coordinates would have to be aligned as much as possible with the original UV coordinate axis, to avoid the texturing looking different.
After this the original UV coordinates, and the textures belonging to each piece of geometry, would need to be rendered into the virtual texture.

This would more easily remove all the stamping problems I'm having, since the 'stamp' textures would be rendered on top of the original geometry (it would require some sorting though, somehow), and improve page locality.
Texel density would also be uniform across the geometry, which can also be a bad thing, lo-res textures could potentially use much more memory (like cube-maps which are processed as regular textures).
It would also be harder to re-use identical pages with this technique, since the chance that 2 pages are identical would probably be much smaller.

Also I tried rendering all my textures transparently, just to see how it looked, and I realized that using the readback approach to discover which pages has a serious defect: it can't look beyond the first surface, and since you might not have a page loaded yet, this won't even work with texture masks.
This is clearly a situation where an analytical approach would be superior.

Alas, I won't be able to spend as much time on virtual textures as the last couple of weeks.
I'm going to work at a client for the next two months, instead of at home, so the little time i can find will be fragmented at best.

Wednesday, September 23, 2009

Adventures in virtual texture space, part 6 B

As promised in my last post, here's a video of what i got so far.

Random thought:
I'm wondering if it would be useful to have automatically generated page 'samples' spread around the level, each containing a list which pages are near it. That way it would be relatively simple to 'look ahead' and determine which pages could potentially be required in the near future.

The samples would contain all the pages for a specific miplevel in all directions and limited to the distance to the sample where a miplevel would be used.

Since this would only be possible for static pages / geometry, pages for dynamic objects would have to be handled differently.

Potentially, this could remove the need to do a readback completely..

But would this approach be faster? It could be more accurate though.

Maybe navigation meshes, if they're accurate enough, could be used to simplify the process of creating the samples.
The same information could be used to determine which pages could never be visible.
It would look pretty horrible when some pages are missing but turn out to be visible after all though.

Tuesday, September 22, 2009

Adventures in virtual texture space, part 6

Yesterday I was working on my virtual texture code, and there's only one bug left (that i know of).

I haven't had much time to look at it, but it fails to load some pages when the page-cache is full for some reason, so it probably has to do with the code that kicks out unused pages when the cache is full.

That aside i'm having some problems with loading my pages from disk.

If I schedule pages to be loaded right when i discover i need them, and just keep adding them to the list, i end up with an ever increasing list if i move around fast enough, because the disk IO simply can't keep up.

So i tried a couple of things, one of them being counting how many times i have a reference to each page in my readback buffer, and giving the pages with the highest ammount of references the highest priority.

After all, the more visible it is, the more imporant it is to load!
That helped somewhat, but still caused a lot of pages to be loaded and uploaded which aren't visible, yet might actually kick out pages that would be more likely to be visible soon.

The next thing i tried was to remove all the pages from my scheduler that aren't visible in the current frame, but that caused pages to not get loaded at all while you move your camera.

A couple of things that i'm going to try:

A CPU side disk cache, i only have one on the GPU at the moment.
Compression, hoping that'll decrease all these problems simply because loading would be faster and the list wouldn't grow as quickly.
Using adjacency information. By pre-calculating a bounding box of all the texel area in world-space, i can sort all the pages by how close they are to each other, and perhaps do some analysis on what is likely to be visible soon, and what would be unlikely to be visible soon.
The source material I'm working with is pretty horrible because it has a lot of decal textures which take up extra pages on the screen, and there are simply too many different pages on the screen at the same time. Building my own CSG preprocessor would allow me to optimize this more easily, as apposed to trying to fix a relatively arbitrary list of triangles with all kinds of materials.

My good friend Volker suggested a couple of things i could try:

If you know if a page is to be loaded for the current frame, it may be wise to put the higher and lower mip levels of that page in the queue too. (but only if there is nothing else)
Kick out pages by random instead of last used or least frequently used caching schemes, apparently the worst case is better, so that's worth a shot.
(Obviously no pages should be kicked out that's visible in the current frame)
Using memory mapped IO.
Volker: "in my tests, mmap io made a difference from 10mb/s to 500mb/s. (using the disk cache)"
My biggest problem is latency though.

So does anyone out there have any good ideas on how to determine which pages should be loaded & which pages should be kicked out?

I'm going to see if I can upload a new video of my test level tonight.

Thursday, September 17, 2009

Adventures in virtual texture space, part 5

Today I've spend some time improving my tools and they're much simpler and faster than before, it only takes about 20 seconds to convert a quake4 level into a 16384x16384 virtual texture file and a separate geometry file. I'm quite happy with the tool as it is, there are only a handful of things i still want to do such as trying to rotate an allocated texture space to see if it'll fit better and forcing a texel density to some sane (maximum) limit.

Before i used 128x128 pages, and used texture-arrays (which are basically 3d textures where the z direction alway uses near filtering) with a depth of 512 layers.

This worked out pretty well because you never have any bleeding artifacts between pages.
It is possible to have visible seams between 2 pages that are rendered next to each other if they have a strong enough contrast right on the edge between them, the seam looks as if it's an aliased edge.
However i would consider it extremely rare, i only managed to see such a seam when i purposely made some handmade pages to see if it would happen at all, i couldn't find one when i looked at it in more real-life artwork.

So today i tried using smaller pages, but this caused some problems.
First of all, since i make the pages smaller (say 64x64 or 32x32) it uses less texture space, therefore i need more pages for the same amount of texels.
In theory, smaller pages should be able to match what i render on screen more precisely.
However, since my texture-array has a hard limit of 512 layers, even when the width/height is 4x smaller, i had no choice but to create a version of my texture cache that works with a giant 2d texture.
I haven't bothered to put borders around my pages (yet), so there are plenty of artifacts rendering it like that.

But when i started rendering lots of other artifacts started popping up, which apparently where more likely to happen with smaller pages somehow.
So i fixed a couple of these artifacts, some which helped improved performance, and i still have a couple of mysterious ones left.

Eventually i'll probably build something where i can record and replay a certain path trough my test level and i would use it to compare all the different parameters that can be used to build and render a virtual texture and see which ones are more efficient compared to the other.
This will also help to measure performance improvements, or the reverse, when i'll try to implement stuff like texture compression.

Unfortuneatly i probably won't have too much time working on this in the near future, so i'm kinda unsure if i should start on a CSG preprocessor for the level at this moment, because it'll take a couple of days to build and test.

On the upside I actually managed to get into NVIDIA's "GPU Computing Registered Developer Program"!
Which means i have access to OpenCL, which i would like to experiment with, to see if i can use it to optimize virtual texturing.
I can imagine that determining which pages are currently visible, could be done more efficiently trough OpenCL.
It could be done mostly on the GPU, saving CPU time, and would reduce the amount of data to be downloaded back to the CPU.
Another thing it could help with would be to improve texture decompression speed.

Tuesday, September 15, 2009

Virtual Texturing part 4; importing madness

So after rewriting the code to load the pages in the background on a secondary thread, i started to write some code to import a Quake 4 level (.proc files) and modify the geometry so it could be displayed using a virtual texture, which would be automatically created from the textures in the level.

The red in the screenshot above are textures that i couldn't automatically discover without hacking. The black areas are supposed to be transparent, but i'm not handling that at the moment.

Here's a screenshot where every color is a different page, which shows that i have way too many different pages on screen at the same time:

There are a couple of things that I've learned about converting existing (Quake 4, but the same will apply to other sources as well) geometry to take advantage of virtual textures:

Quake 4 uses a z prepass to take care of occlusion, so it's geometry is optimized for number of triangles and not so much for using as little geometry area as possible, which means a lot of wasted texel space.
Quake 4 has a lot of transparent textures that are placed upon other textures, which again leads to wasted texel space, as you can see in my screenshots i'm actually not handling transparency.
Since Quake 4 has separate geometry for each type of shader, you might end up with lots of patches of geometry that each have completely different pages. If this was build with virtual textures in mind, it would've been continuous. This is bad because it means more pages need to be loaded into memory.
Sometimes large textures are assigned to a relatively small area. If you don't take that into account you'll be assigning large areas of texture space to something which is tiny.
Without parsing materials (which i'm not doing), discovering the right textures is sometimes impossible.

These problems are causing me some headaches with my test-scene because i'm loading waaaay more pages than i would need to in a scene that would've been build with virtual textures in mind.
I could solve this by building my own quake4 map CSG code, which i might do eventually as i already have some experience with CSG.

However, if you would be building geometry from scratch this would all be easier, as long as you try to keep texel density at a sane level and keep surface area to a minimum. (aka don't assign texel space to something which is never visible)
Sounds rather straightforward, i know, but if you don't think about this up front you might end up with some nasty surprises later on.

Also, allocated texture space should be aligned to page boundaries, if you don't you might end up loading 4 pages when 1 would've been sufficient.

One mistake i made trough this whole process was thinking about a virtual texture as a giant texture, and processing it as such.
The problem with this is that you cannot handle unused pages easily.

Update: scratch that, fixing existing geometry (automatically), to be able to be used with virtual texturing -efficiently-, is hard enough to be considered a dead-end.
I'm going to rebuild the geometry with my own CSG process instead.

Monday, September 7, 2009

Virtual Texturing part 3

My virtual texture implementation now reads pages from disk.

I'm doing it the completely naive way, reading and uploading a page to the video card *just* before i need it, and i'm absolutely surprised how little performance penalty i'm seeing; only 0.1 - 0.2ms!

I'm guessing that this has to do, at least in part, because of disk caching since i'm using the regular .net disk functions at the moment (and i -just- generated my virtual texture disk file, so it's fresh in the cache).
Update: Confirmed, after a reboot i get horrible spikes of +/- 30ms when i move the camera around on the virtual texture!

Unfortunately .net won't get any memory mapped IO until .net 4.0 comes out, so i won't be able to try this unless i port everything over to C++.

I really should 'acquire' some more interesting test scenes.. a flat polygon is simply too ..erhm.. simple.