summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorAndres Rey <[email protected]>2016-11-08 14:23:53 +0000
committerAndres Rey <[email protected]>2016-11-08 14:23:53 +0000
commit9f80666c5c34880a8174c02d97c60d1ef3b0ca3d (patch)
tree7129e881a451384d9e3375b5d6c49f4fa931ed80 /README.md
parentbdbf33732a7093a7db9405b13dd891d229e522ee (diff)
Readme updated, added final return
Diffstat (limited to 'README.md')
-rw-r--r--README.md16
1 files changed, 15 insertions, 1 deletions
diff --git a/README.md b/README.md
index 3d97f5a..3dee659 100644
--- a/README.md
+++ b/README.md
@@ -3,10 +3,14 @@
PHP port of *Mozilla's* **[Readability.js](https://github.com/mozilla/readability)**. Parses html text (usually news and other articles) and tries to return title, byline and text content. Analizes each text node, gives an score and orders them based on this calculation.
-**Requires**: PHP 5.3+
+**Requires**: PHP 5.4+
**Lead Developer**: Andres Rey
+## Status
+
+Current status is *ultra-mega-alpha*. It is broken right now and it will change dramatically until the first 1.0 release. Expect wild changes. Submit pull requests. Argue with me.
+
## How to use it
First you have to require the library using composer:
@@ -39,10 +43,20 @@ $result = [
## Limitations
+Of course the main limitation is PHP. Websites that load the content through lazy loading, AJAX, or any type of javascript fueled call will be ignored (actually, *not ran*) and the resulting text will be incorrect, compared to the readability.js results. All the articles you want to parse with readability.php will need to be complete and all the content should be in the HTML already.
+
## Known Issues
+None so far.
+
## To-do
+100% of the original readability code was ported, at least until the last commit when I started this project ([13 Aug 2016](https://github.com/mozilla/readability/commit/71aa562387fa507b0bac30ae7144e1df7ba8a356)). There are a lot of `TODO`s around the code, which are the part that need to be finished.
+
## Dependencies
Readability uses the Element interface and class from *The PHP League's* **[html-to-markdown](https://github.com/thephpleague/html-to-markdown/)**. The Readability object is an extension of the Element class. It overrides some methods but relies on it for basic DOMElement parsing.
+
+## How it works
+
+Readability parses all the text with DOMDocument, scans the text nodes and gives the a score, based on the amount of words, links and type of element. Then it selects the highest scoring element and creates a new DOMDocument with all its siblings. Each sibling is scored to discard useless elements, like nav bars, empty nodes, etc. \ No newline at end of file