In trying to convert over several HTML pages to the DITA XML format, James Edwards came up against a problem involving recursion:
But a problem I came across several times was the sheer complexity of recursive element conversion '" <code> becomes <jsvalue> (or one of a dozen similar elements), <a> becomes <xref> '¦ and that's all simple enough; but each of these elements might contain the other, or further child elements like <em>, and as we walk through the DOM so the incidence of potential recursion increases, until it gets to the point where my brain explodes.
His solution involves working with both regular expressions and document fragments. He loads the node he wants to work with, its parsed to prepare it and is passed off to do the "text-based mangling" to update it. The result is them pushed back into an XML object (fragment) and this is pushed back into the main document with a replaceChild call.