So I had a little bit of time between things to try out discovering which of my pages are potentially visible at any given time when moving around in my test scene (imported from quake 4).
At first I was amazed, my 1.5gb virtual texture file had reduced to about 50mb! I wanted to tell the world!
But no, that was a bug. Crap.
(Good thing I double checked!!)
When I got it working properly, it only removed 9 pages :(
It took about 5 minutes to process the entire Quake 4 level, 2 minutes to create the virtual texture file.
That said, in my current implementation I'm reusing the pages across surfaces, although they're still unique on the GPU side of things (so decals can still be added in the virtual texture), so it might be a bigger improvement when all pages are unique. Besides that I used the gathered information to sort all the pages according to locality and it helped reduce latency when loading the pages.
Considering that I'm reusing pages (hell, even the texture coordinates are created carefully as to maximize reusing the pages) I'm actually surprised that I only managed to get rid of 9 pages. I guess a lot of the 'reusing the pages' idea gets wasted when pages are combined into higher MIPs, which automatically makes them less unique. It would be better to store the lower resolution pages, that make up the lower resolution MIPs, separately. That way the virtual texture file could never be larger than the combined source textures. It would also allow more fine sorting (of sub-pages) and allow us to downscale higher resolution pages into a MIP, and hopefully removing the necessity to loading some pages. But it all gets awfully complicated real quick.
And I'm not sure how much that would help with true unique texturing.
The quake 4 levels are really worst cases for virtual texturing considering that the  texture placement is made to minimize the number of textures in the level, not so much minimizing texture area and not worrying about the source of the texture. So many tiny pieces of geometry are rendered with unique pages, which could all be combined into single surfaces. So I really should try to either combine all the surfaces and re-project the textures onto them, or create geometry from scratch with a more virtual texture friendly texture layout.