summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatt Butcher <[email protected]>2013-04-13 13:43:52 -0500
committerMatt Butcher <[email protected]>2013-04-13 13:43:52 -0500
commit8daa07439ea3869021cc328b2509966eb853c6aa (patch)
tree4a31bc46186eaa015f990ad3dcc0c903a336aadd
parent7e7c5c3dd2f9716f5e8cfd2988a59be49b76c3d7 (diff)
Updated the README.
-rw-r--r--README.md44
1 files changed, 34 insertions, 10 deletions
diff --git a/README.md b/README.md
index 3256429..69cfc1b 100644
--- a/README.md
+++ b/README.md
@@ -1,21 +1,23 @@
# HTML5-PHP
-This is a **highly experimental** fork of the html5lib PHP parser.
+This is a **highly experimental** HTML5 Parser.
-The need for an HTML5 parser in PHP is clear. This project extends on
-the work of a previous (but seemingly abandoned) PHP parser. Beginning
-with the [original source](https://code.google.com/p/html5lib/source/checkout), we have
-create a newer version and are working to add the following features:
+The need for an HTML5 parser in PHP is clear. This project initially
+began with the seemingly abandoned `html5lib` project [original source](https://code.google.com/p/html5lib/source/checkout).
+But after some initial refactoring work, we began a new parser.
- An HTML5 serializer [in progress; early alpha]
- Support for PHP namespace [done]
- Composer support [in progress]
+- Event-based (SAX-like) parser [in progress]
+- DOM tree builder [in progress]
- Interoperability with QueryPath [not started]
-- Add non-HTML namespace support to parser.
-## Usage
+## Basic Usage
-This is how you use the `HTML5` library:
+HTML5-PHP has a high-level API and a low-level API.
+
+Here is how you use the high-level `HTML5` library API:
```php
<?php
@@ -39,7 +41,6 @@ HERE;
// Parse the document. $dom is a DOMDocument.
$dom = HTML5::parse($html);
-
// Render it as HTML5:
print HTML5::saveHTML($dom);
@@ -52,6 +53,23 @@ HTML5::save('out.html');
The `$dom` created by the parser is a full `DOMDocument` object. And the
`save()` and `saveHTML()` methods will take any DOMDocument.
+
+### The Low-Level API
+
+This library provides the following low-level APIs that you can use to
+create more customized HTML5 tools:
+
+- An `InputStream` abstraction that can work with different kinds of
+input source (not just files and strings).
+- A SAX-like event-based parser that you can hook into for special kinds
+of parsing.
+- A flexible error-reporting mechanism that can be tuned to document
+syntax checking.
+- A DOM implementation that uses PHP's built-in DOM library.
+
+The unit tests exercise each piece of the API, and every public function
+is well-documented.
+
## Notes on Serialized Formats
The serializer (`save()`, `saveHTML()`) follows the
@@ -66,6 +84,12 @@ So tags are serialized according to these rules:
We owe a huge debt of gratitude to the original authors of html5lib.
+While not much of the orignal parser remains, we learned a lot from
+reading the html5lib library. And some pieces remain here. In
+particular, much of the UTF-8 and Unicode handling is derived from the
+html5lib project.
+
## License
-This software is released under the MIT license.
+This software is released under the MIT license. The original html5lib
+library was also released under the MIT license.