Document.processTagOpen

These three functions, processTagOpen, processTagClose, and processNodeWhileParsing, allow you to process elements as they are parsed and choose to not append them to the dom tree.

More...
class Document
void
processTagOpen

Detailed Description

processTagOpen is called as soon as it reads the tag name and attributes into the passed Element structure, in order of appearance in the file. processTagClose is called similarly, when that tag has been closed. In between, all descendant nodes - including tags as well as text and other nodes - are passed to processNodeWhileParsing. Finally, after processTagClose, the node itself is passed to processNodeWhileParsing only after its children.

So, given:

<thing>
	<child>
		<grandchild></grandchild>
	</child>
</thing>

It would call:

  1. processTagOpen(thing)
  2. processNodeWhileParsing(thing, whitespace text) // the newlines, spaces, and tabs between the thing tag and child tag
  3. processTagOpen(child)
  4. processNodeWhileParsing(child, whitespace text)
  5. processTagOpen(grandchild)
  6. processTagClose(grandchild)
  7. processNodeWhileParsing(child, grandchild)
  8. processNodeWhileParsing(child, whitespace text) // whitespace after the grandchild
  9. processTagClose(child)
  10. processNodeWhileParsing(thing, child)
  11. processNodeWhileParsing(thing, whitespace text)
  12. processTagClose(thing)

The Element objects passed to those functions are the same ones you'd see; the tag open and tag close calls receive the same object, so you can compare them with the is operator if you want.

The default behavior of each function is that processTagOpen and processTagClose do nothing. processNodeWhileParsing's default behavior is to call parent.appendChild(child), in order to build the dom tree. If you do not want the dom tree, you can do override this function to do nothing. You might use processTagOpen and processTagClose to keep a stack or other state variables to help you make those decisions.

If you do not choose to append child to parent in processNodeWhileParsing, the garbage collector is free to clean up the node even as the document is not finished parsing, allowing memory use to stay lower. Memory use will tend to scale approximately with the max depth in the element tree rather the entire document size.

To cancel processing before the end of a document, you'll have to throw an exception and catch it at your call to parse. There is no other way to stop early and there are no concrete plans to add one.

Bugs

Even if you use a Utf8Stream to feed data and decline to append to the tree, the entire xml text is likely to end up in memory anyway.

See Also

Document#examples for the streaming example.

Meta

History

processNodeWhileParsing was added January 6, 2023.

processTagOpen and processTagClose were added February 21, 2025.

Suggestion Box / Bug Report