From 0226e0ca0dc70f9a0310b3eef045ee1c1e0ca3ac Mon Sep 17 00:00:00 2001 From: Andrew Dolgov Date: Tue, 13 Dec 2022 20:00:46 +0300 Subject: split into a separate repo --- .../masterminds/html5/src/HTML5/Parser/README.md | 53 ++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 vendor/masterminds/html5/src/HTML5/Parser/README.md (limited to 'vendor/masterminds/html5/src/HTML5/Parser/README.md') diff --git a/vendor/masterminds/html5/src/HTML5/Parser/README.md b/vendor/masterminds/html5/src/HTML5/Parser/README.md new file mode 100644 index 0000000..9f92957 --- /dev/null +++ b/vendor/masterminds/html5/src/HTML5/Parser/README.md @@ -0,0 +1,53 @@ +# The Parser Model + +The parser model here follows the model in section +[8.2.1](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#parsing) +of the HTML5 specification, though we do not assume a networking layer. + + [ InputStream ] // Generic support for reading input. + || + [ Scanner ] // Breaks down the stream into characters. + || + [ Tokenizer ] // Groups characters into syntactic + || + [ Tree Builder ] // Organizes units into a tree of objects + || + [ DOM Document ] // The final state of the parsed document. + + +## InputStream + +This is an interface with at least two concrete implementations: + +- StringInputStream: Reads an HTML5 string. +- FileInputStream: Reads an HTML5 file. + +## Scanner + +This is a mechanical piece of the parser. + +## Tokenizer + +This follows section 8.4 of the HTML5 spec. It is (roughly) a recursive +descent parser. (Though there are plenty of optimizations that are less +than purely functional. + +## EventHandler and DOMTree + +EventHandler is the interface for tree builders. Since not all +implementations will necessarily build trees, we've chosen a more +generic name. + +The event handler emits tokens during tokenization. + +The DOMTree is an event handler that builds a DOM tree. The output of +the DOMTree builder is a DOMDocument. + +## DOMDocument + +PHP has a DOMDocument class built-in (technically, it's part of libxml.) +We use that, thus rendering the output of this process compatible with +SimpleXML, QueryPath, and many other XML/HTML processing tools. + +For cases where the HTML5 is a fragment of a HTML5 document a +DOMDocumentFragment is returned instead. This is another built-in class. -- cgit v1.2.3