diff options
author | Matt Butcher <[email protected]> | 2013-04-13 13:43:52 -0500 |
---|---|---|
committer | Matt Butcher <[email protected]> | 2013-04-13 13:43:52 -0500 |
commit | 8daa07439ea3869021cc328b2509966eb853c6aa (patch) | |
tree | 4a31bc46186eaa015f990ad3dcc0c903a336aadd | |
parent | 7e7c5c3dd2f9716f5e8cfd2988a59be49b76c3d7 (diff) |
Updated the README.
-rw-r--r-- | README.md | 44 |
1 files changed, 34 insertions, 10 deletions
@@ -1,21 +1,23 @@ # HTML5-PHP -This is a **highly experimental** fork of the html5lib PHP parser. +This is a **highly experimental** HTML5 Parser. -The need for an HTML5 parser in PHP is clear. This project extends on -the work of a previous (but seemingly abandoned) PHP parser. Beginning -with the [original source](https://code.google.com/p/html5lib/source/checkout), we have -create a newer version and are working to add the following features: +The need for an HTML5 parser in PHP is clear. This project initially +began with the seemingly abandoned `html5lib` project [original source](https://code.google.com/p/html5lib/source/checkout). +But after some initial refactoring work, we began a new parser. - An HTML5 serializer [in progress; early alpha] - Support for PHP namespace [done] - Composer support [in progress] +- Event-based (SAX-like) parser [in progress] +- DOM tree builder [in progress] - Interoperability with QueryPath [not started] -- Add non-HTML namespace support to parser. -## Usage +## Basic Usage -This is how you use the `HTML5` library: +HTML5-PHP has a high-level API and a low-level API. + +Here is how you use the high-level `HTML5` library API: ```php <?php @@ -39,7 +41,6 @@ HERE; // Parse the document. $dom is a DOMDocument. $dom = HTML5::parse($html); - // Render it as HTML5: print HTML5::saveHTML($dom); @@ -52,6 +53,23 @@ HTML5::save('out.html'); The `$dom` created by the parser is a full `DOMDocument` object. And the `save()` and `saveHTML()` methods will take any DOMDocument. + +### The Low-Level API + +This library provides the following low-level APIs that you can use to +create more customized HTML5 tools: + +- An `InputStream` abstraction that can work with different kinds of +input source (not just files and strings). +- A SAX-like event-based parser that you can hook into for special kinds +of parsing. +- A flexible error-reporting mechanism that can be tuned to document +syntax checking. +- A DOM implementation that uses PHP's built-in DOM library. + +The unit tests exercise each piece of the API, and every public function +is well-documented. + ## Notes on Serialized Formats The serializer (`save()`, `saveHTML()`) follows the @@ -66,6 +84,12 @@ So tags are serialized according to these rules: We owe a huge debt of gratitude to the original authors of html5lib. +While not much of the orignal parser remains, we learned a lot from +reading the html5lib library. And some pieces remain here. In +particular, much of the UTF-8 and Unicode handling is derived from the +html5lib project. + ## License -This software is released under the MIT license. +This software is released under the MIT license. The original html5lib +library was also released under the MIT license. |