So the last time I mentioned that I managed to compress my textures to about 6Kb, and theorizing that it's about the same compression ratio that Id software has.
Obviously Id software, next to simply being awesome, have many more resources compared to me to get their texture compression just perfect.
So I was quite happy with myself that, using their published papers, I was able to get it the same results.
Unfortunately, as Brian Karis mentioned in the comments, I was wrong.
6Kb wasn't per texture but for all the different material channels, diffuse/bump/specular, combined.
To get the same compression ratio I would need to get it down to 2Kb.
Of course I could just crank up the compression ratio, but I didn't really like the results if I did.
That being said, if I convert the results to dxt5 after decompression I'd lose some more quality anyway, so in the end cranking up the compression ratio might be acceptable.
But I really wanted to see how far I could go without decreasing the quality.
So I started to experiment with Burrows Wheeler transformations, which do some clever stuff to change the order of bytes so that when you perform RLE (Run Length Encoding) on them, the compression ratio ends up being better.
And look at that, it worked like a charm!
The file size got down to 2Kb.
But it's slow. REALLY slow.
The algorithm scales horribly, and on a 64Kb buffer it took seconds to finish.
Decompression took >10ms, which is a lot faster than compression, but still not good enough.
Sometime later, after trying all kinds of things, I tried doing RLE both before & after the transformation, just to see what would happen.
The reasoning was that if I first decrease the size of the buffer, by compressing it a little with RLE, then the Burrows Wheeler process would be faster.
After that I would then do another RLE over the resulting buffer to increase my compression ratio somewhat.
I expected this to be a little faster, and probably have somewhat worse compression overall.
To my surprise now the total compression process was MUCH FASTER only taking 14ms.
Decompression only took 3.2ms overall, and the file got about 100 bytes SMALLER, which is somewhat counter-intuitive.
Result: 1899 bytes for an 128x128 RGBA texture.
(the alpha channel is a solid color though)