diff options
author | Matt Butcher <[email protected]> | 2013-04-13 13:50:03 -0500 |
---|---|---|
committer | Matt Butcher <[email protected]> | 2013-04-13 13:50:03 -0500 |
commit | 42bf55b1e5a7c4e1a7d182e8bf93b0676a7b9392 (patch) | |
tree | bca9202307ee2ba56074d069d6fbaac44d38458e | |
parent | 8daa07439ea3869021cc328b2509966eb853c6aa (diff) |
Updated the README with design info.
-rw-r--r-- | README.md | 25 |
1 files changed, 24 insertions, 1 deletions
@@ -70,7 +70,30 @@ syntax checking. The unit tests exercise each piece of the API, and every public function is well-documented. -## Notes on Serialized Formats +### Parser Design + +The parser is designed as follows: + +- The `InputStream` portion handles direct I/O. +- The `Scanner` handles scanning on behalf of the parser. +- The `Tokenizer` requests data off of the scanner, parses it, clasifies +it, and sends it to an `EventHandler`. It is a *recursive descent parser.* +- The `EventHandler` receives notifications and data for each specific +semantic event that occurs during tokenization. +- The `DOMBuilder` is an `EventHandler` that listens for tokenizing +events and builds a document tree (`DOMDocument`) based on the events. + +### Serializer Design + +The serializer takes a data structure (the `DOMDocument`) and transforms +it into a character representation -- an HTML5 document. + +The serializer is broken into two parts: + +- The `Traverser`, which is a special-purpose tree walker. It visits +each node and transforms it into a string. +- The `Serializer` manages the `Traverser` and stores the resultant data +in the correct place. The serializer (`save()`, `saveHTML()`) follows the [section 8.9 of the HTML 5.0 spec] (http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#serializing-html-fragments). |