blob: 3256429593cca5e8aea15488c761d5bb13077207 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
# HTML5-PHP
This is a **highly experimental** fork of the html5lib PHP parser.
The need for an HTML5 parser in PHP is clear. This project extends on
the work of a previous (but seemingly abandoned) PHP parser. Beginning
with the [original source](https://code.google.com/p/html5lib/source/checkout), we have
create a newer version and are working to add the following features:
- An HTML5 serializer [in progress; early alpha]
- Support for PHP namespace [done]
- Composer support [in progress]
- Interoperability with QueryPath [not started]
- Add non-HTML namespace support to parser.
## Usage
This is how you use the `HTML5` library:
```php
<?php
// Assuming you installed from Composer:
require "vendor/autoload.php";
// An example HTML document:
$html = <<< 'HERE'
<html>
<head>
<title>TEST</title>
</head>
<body id='foo'>
<h1>Hello World</h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>
HERE;
// Parse the document. $dom is a DOMDocument.
$dom = HTML5::parse($html);
// Render it as HTML5:
print HTML5::saveHTML($dom);
// Or save it to a file:
HTML5::save('out.html');
?>
```
The `$dom` created by the parser is a full `DOMDocument` object. And the
`save()` and `saveHTML()` methods will take any DOMDocument.
## Notes on Serialized Formats
The serializer (`save()`, `saveHTML()`) follows the
[section 8.9 of the HTML 5.0 spec] (http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#serializing-html-fragments).
So tags are serialized according to these rules:
- A tag with children: <foo>CHILDREN</foo>
- A tag that cannot have content: <foo> (no closing tag)
- A tag that could have content, but doesn't: <foo></foo>
## Thanks to...
We owe a huge debt of gratitude to the original authors of html5lib.
## License
This software is released under the MIT license.
|