Thursday, September 28, 2017

Half-edge data structure considered harmful

Note this blog post assumes you're already familiar with half-edges.

I've been using the half-edge data structure for mesh/topology operations for a while now and I've come to the conclusion that it's actually an anti-pattern.
The problem is that it has a very large potential 'problem surface area'.

Let's investigate:

struct HalfEdge
{
  HalfEdgeIndex next;
  //HalfEdgeIndex prev; //(optional) 
  HalfEdgeIndex twin;
  VertexIndex vertex;
  PolygonIndex polygon;
}

struct Polygon
{
  HalfEdgeIndex first;
}

Note that i'm using indices here instead of pointers.
You should never use pointers in data structures like this because:

  1. Pointers are large (64-bit on 64-bit platforms) 
  2. It's more intuitive to bounds check an index than a pointer 
  3. Your debugger will actually show you legible values.
  4. Copying an entire mesh is trivial.

So what are the potential problems with a half-edge based mesh?

struct HalfEdge:
  • next, prev:
    potentially points to wrong edge, potentially points to self, or out of bounds index.
  • twin:
    potentially points to wrong edge, potentially points to self, or out of bounds index.
    twin could lead to edge on same polygon.
  • vertex:
    potentially points to wrong vertex, or out of bounds index
  • polygon:
    potentially points to wrong polygon, or out of bounds index
  • twin + next, prev + twin: (looping over vertex)
    if either twin or next/prev is wrong, you might end up in an infinite loop.
  • The edge could be disconnected and not belong to any polygon
  • Every time you follow an edge to another edgevertex or polygon there's a potential cache-miss since edges, vertices and polygons can be placed anywhere in memory relative to the current edge.

struct Polygon:
  • first
    potentially points to wrong edge, potentially points to self, or out of bounds index. The first edge might point to a chain of edges that never actually loops back to first, resulting in an infinite loop if you simply try to loop over the edges of a polygon.
    This is very easy to do unfortunately.
  • The polygon might be concave, self-intersecting or infinitely thin
  • The polygon could only have 1 or 2 edges.
  • All vertices on the polygon could lie on the same spot (singularity)
  • The order of vertices could be flipped and the polygon could be orientated in the wrong direction (note; you'd have other connectivity problems on the polygon as well)
  • The polygon could contain the same edge multiple times
  • The polygon could contain the same vertex multiple times

Literally every step you take could potentially be an issue. And if you can't control your input data you have to take each and every situation into account. You can't very well check the bounds of every next/prev/twin you follow so I hope you didn't make a mistake anywhere!

While developing/implementing an algorithm anything can go wrong at any time due to either a bug, or a situation you haven't considered. Worse still, the actual cause could be far removed from the place where you notice a problem since it's all a big pile of interconnected elements.

Half-edges are very hard to debug. You can't see in your debugger what your polygon looks like, so you need to draw it out by hand or, for instance, create some extra code to generate an html page with svg graphics to render it for you, and then spit out html pages at certain points in your code to visualize what is going on.


Now every data structure will have at least some of these problems.
But is there a data structure that at least decreases the problem surface area?
Well, recently I come across the data-structure at the end of this pdf.

Unfortunately it's not named here, they just called it 'triangle data structure'.
I modified it slightly and call it a 'connected triangle' because calling it 'triangle ds' will just be too ambiguous. (let me know if there's an official name for this or if you have a better suggestion to name this)

Update: In the book "Real-Time Collision Detection" an almost identical data structure is named "winged-triangle".

struct Triangle
{
  TriangleEdge neighbors[3];
  VertexIndex vertices[3];
}

struct TriangleEdge
{
  uint32 triangle : 30;
  uint32 edge : 2;
}

struct TriangleVertex
{
  Vector3 vertex;

  // To allow you to quickly find a starting point to loop over a vertex
  TriangleEdge triangleEdge;
}

struct Triangle:
  • neighbor[x].triangle
    potentially points to wrong triangle, potentially points to self, or out of bounds index.
  • neighbor[x].edge:
    potentially points to wrong edge on destination triangle, or out of bounds index.
    could be 0-3 where 3 is an invalid value (only 3 edges on triangle)
  • vertices[x]:
    potentially points to wrong vertex, or out of bounds index.
  • neighbor + next edge index, prev edge index + neighbor: (looping over vertex)
    could potentially be an infinite loop
  • The triangle could contain the same vertex multiple times.
  • The vertices could be in wrong index compared to neighbors
  • The order of vertices could be flipped and the triangle could be orientated in the wrong direction (note; you'd have other connectivity problems on the triangle as well)
  • You could have multiple neighbors pointing to same edge on same triangle
  • All vertices on the triangle could be aligned (infinitely thin triangle)
  • All vertices on the triangle could lie on the same spot (singularity triangle)

As you can see there is only one potential infinite loop; while looping over a vertex. But in practice that situation has never happened for me using both half-edges or the connected triangle ds. Also a triangle is always a triangle, it is always convex, never self-intersecting.

Cache misses are less likely to happen, compared to half-edges, since more data you'd be using at the same time is actually stored together.


Note that the classic half-edge structure doesn't have an equivalent to 'TriangleVertex', even though that would be trivial to add.
So for completeness sake:

struct TriangleVertex:
  • triangleEdge.triangle:
    potentially points to wrong triangle, or out of bounds index.
  • triangleEdge.edge:potentially points to wrong edge on destination triangle, or out of bounds index.
    could be 0-3 where 3 is an invalid value(only 3 edges on triangle)


Now a triangle based data structure might not always be what you want. Sometimes you really need to deal with polygons, but if you don't I would strongly suggest to not use half-edges.
Keep in mind that you'll probably need to end up with triangles at the end of your pipeline anyway.

A connected triangle ds could be modified to separate its neighbors and vertex indices, and its TriangleVertex could be split into to vertex and triangleEdge parts. You could just have 2 equal sized arrays for both. This way you would simply end up with triangle indices and vertices which could then be rendered directly. 

Keep in mind that this could potentially be bad cache wise, but it really depends on your algorithms.


Also. it is potentially possible to store your half-edges contiguously in memory, maybe store your half-edges as an array in your polygon somehow etc. etc. and solve some of the potential problems that half-edges have. It'll definitely increase complexity in other areas like memory management though.


P.S.
For either data structure I strongly suggest you make a 'Validate' method that ensures that the topology is valid and that there is no issue. You can then use this validate method after any operation to ensure that your algorithms do what they're supposed to do and don't leave your topology in an invalid state. This method could be optional when compiling under Debug, for instance.

For me, the connected triangles my Validate method fit on my screen, for half-edges it's several hundreds of lines of code for me.

Let me know if I missed anything and I'll update the article accordingly.


[UPDATE]
Eric Arnebäck tweeted about this blog post and it got a lot of reactions, some of these reactions had some interesting links, which I'll share here.