summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorTechnosophos <[email protected]>2013-04-18 21:20:56 -0500
committerTechnosophos <[email protected]>2013-04-18 21:20:56 -0500
commite1edb351535c6a6706e289dd90b3905d176af14a (patch)
tree0e9b322c762ca91d8490548eff7c20f7b150c371 /src
parent3b0441037f6ee680a0099b91e87f4fd1544e59e8 (diff)
Updating Parser docs.
Diffstat (limited to 'src')
-rw-r--r--src/HTML5/Parser/README.md50
1 files changed, 50 insertions, 0 deletions
diff --git a/src/HTML5/Parser/README.md b/src/HTML5/Parser/README.md
new file mode 100644
index 0000000..a21ee9a
--- /dev/null
+++ b/src/HTML5/Parser/README.md
@@ -0,0 +1,50 @@
+# The Parser Model
+
+The parser model here follows the model in section
+[8.2.1](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#parsing)
+of the HTML5 specification, though we do not assume a networking layer.
+
+ [ InputStream ] // Generic support for reading input.
+ ||
+ [ Scanner ] // Breaks down the stream into characters.
+ ||
+ [ Tokenizer ] // Groups characters into syntactic
+ ||
+ [ Tree Builder ] // Organizes units into a tree of objects
+ ||
+ [DOM Document] // The final state of the parsed document.
+
+
+## InputStream
+
+This is an interface with at least two concrete implementations:
+
+- StringInputStream: Reads an HTML5 string.
+- FileInputStream: Reads an HTML5 file.
+
+## Scanner
+
+This is a mechanical piece of the parser.
+
+## Tokenizer
+
+This follows section 8.4 of the HTML5 spec. It is (roughly) a recursive
+descent parser. (Though there are plenty of optimizations that are less
+than purely functional.
+
+## EventHandler and DOMTree
+
+EventHandler is the interface for tree builders. Since not all
+implementations will necessarily build trees, we've chosen a more
+generic name.
+
+The event handler emits tokens during tokenization.
+
+The DOMTree is an event handler that builds a DOM tree. The output of
+the DOMTree builder is a DOMDocument.
+
+## DOMDocument
+
+PHP has a DOMDocument class built-in (technically, it's part of libxml.)
+We use that, thus rendering the output of this process compatible with
+SimpleXML, QueryPath, and many other XML/HTML processing tools.