summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatt Butcher <[email protected]>2013-04-13 13:50:03 -0500
committerMatt Butcher <[email protected]>2013-04-13 13:50:03 -0500
commit42bf55b1e5a7c4e1a7d182e8bf93b0676a7b9392 (patch)
treebca9202307ee2ba56074d069d6fbaac44d38458e
parent8daa07439ea3869021cc328b2509966eb853c6aa (diff)
Updated the README with design info.
-rw-r--r--README.md25
1 files changed, 24 insertions, 1 deletions
diff --git a/README.md b/README.md
index 69cfc1b..ed05cc9 100644
--- a/README.md
+++ b/README.md
@@ -70,7 +70,30 @@ syntax checking.
The unit tests exercise each piece of the API, and every public function
is well-documented.
-## Notes on Serialized Formats
+### Parser Design
+
+The parser is designed as follows:
+
+- The `InputStream` portion handles direct I/O.
+- The `Scanner` handles scanning on behalf of the parser.
+- The `Tokenizer` requests data off of the scanner, parses it, clasifies
+it, and sends it to an `EventHandler`. It is a *recursive descent parser.*
+- The `EventHandler` receives notifications and data for each specific
+semantic event that occurs during tokenization.
+- The `DOMBuilder` is an `EventHandler` that listens for tokenizing
+events and builds a document tree (`DOMDocument`) based on the events.
+
+### Serializer Design
+
+The serializer takes a data structure (the `DOMDocument`) and transforms
+it into a character representation -- an HTML5 document.
+
+The serializer is broken into two parts:
+
+- The `Traverser`, which is a special-purpose tree walker. It visits
+each node and transforms it into a string.
+- The `Serializer` manages the `Traverser` and stores the resultant data
+in the correct place.
The serializer (`save()`, `saveHTML()`) follows the
[section 8.9 of the HTML 5.0 spec] (http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#serializing-html-fragments).