Flattening trees

Trees are one of the most popular structures in game development. Your skeletal hierarchy, your attachments – all this can be represented using trees. They lend nicely to short & clean code, especially with recursive functions – you typically perform your operation on a root, then call the same function for all children. Problem is – it’s not always the most effective way of doing things. Consider the most canonic form of tree traversal:

void NodeFunc(Node* n)
{
  DoSomethingWithNode(n);
  for (int  i = 0; i < n->children.size(); ++i)
  {
    NodeFunc(n->children[i]);
  }
}

Clean, concise and not very cache friendly. Unless we’re careful with our allocation strategies, accessing nodes results in a cache miss in majority of cases. On top of that, C++ compilers usually have a hard time with optimizing recursion, so we’ll pay the price for pushing/popping arguments and it also makes it harder for compiler to precalc things that do not change during traversal. Here’s a quick trick that I like to employ to flatten tree structures. Basically, we represent whole hierarchy as a single, linear array. Node structure may look like this:

struct FlatNode
{
  NodeData data;
  int numChildren;
  int firstChildIndex;
  int parentIndex;
};

This is for the most general case, often you have extra information about your tree and can use it to simplify the code. For example, if you know that tree node can have up to 2 children (ie. binary tree) and you’re fine with having empty cells, you don’t need to store child indices (see Realtime Collision Detection for more info). In many cases you can use 8/16 bit indices. Here’s how we can easily flatten our hierarchy:

void FlattenHierarchy(TreeNode* node, FlatNode& flatNode, int nodeIndex,
  int parentIndex)
{
  flatNode.nodeData = node->nodeData;
  flatNode.parentIndex = parentIndex;
  flatNode.numChildren = node->numChildren;
  const int firstChildIndex = mFlatHierarchy.size();
  mFlatHierarchy.resize(mFlatHierarchy.size() + flatNode.numChildren);
  for (int child = 0; child < flatNode.numChildren; ++child)
  {
    const int childIndex = firstChildIndex + child;
    FlattenHierarchy(node->GetChild(child), mFlatHierarchy[childIndex],
        childIndex, nodeIndex);
  }
  ...
  // For root.
  mFlatHierarchy.resize(1);
  FlattenHierarchy(rootNode, mFlatHierarchy.front(), 0, -1);

Now, ideally we’d like to get rid of runtime recursion as well. Most popular solution is to maintain a stack of nodes to visit and loop until it’s not empty (pushing children for every valid node). It works fine, but we can do better. In many cases we only traverse the tree in one manner - be it breadth-first, depth-first or whatever else you want. This means, we can prepare our array so that if we iterate it linearly, we’ll get the desired visit order. What’s more - in many cases, we don’t even care about the exact order, as long as parents are processed before children. That’s the case when calculating node transformation matrices for example. Using our flat hierarchy generated above, we can do:

for (FlatNode* flatNode = mFlatHierarchy.begin(); flatNode != mFlatHierarchy.end();
    ++flatNode)
{
  if (flatNode->parentIndex >= 0)
  {
    Matrix xform = mFlatHierarchy[flatNode->parentIndex].xform * flatNode->xform;
  }
}

Code for any other tree-traversing method is roughly the same, the difference is now in a way we generate the tree. Generated assembly for this function is as simple as it gets. Final step would be optimization specific to what’s happening during the traversal - e.g. we could store identity matrix in the first node, to get rid of ‘if’ or parallelize the whole process (we’d need to mark independent branches first), but that’s another story.

Old comments

More Reading
Newer// Hard Reset
Older// Hidden enemy