summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2017-11-09Remove extra brs between p nodes after processing the articleAndres Rey
2017-11-09Remove reverse traversing when scanning for brs and convert the DOMNodeList ↵Andres Rey
to an array before looping over it
2017-11-09Scan nodes in reverse in removing functions.Andres Rey
In other words: Node shifting is a bitch
2017-11-09Better detection of empty paragraphsAndres Rey
2017-11-08Remove BR cleaning on text nodes temporarilyAndres Rey
2017-11-07Clean style attributes inside tagsAndres Rey
2017-11-07Merge branch 'master' into ↵Andres Rey
development-update-to-f0edc77cb58ef52890e3065cf2b0e334d940feb2
2017-11-07Add article text direction to responseAndres Rey
2017-11-07Update logic to remove nodes when cleaning conditionallyAndres Rey
2017-11-07Mark datatables and avoid removing them during cleaningAndres Rey
2017-11-06Check for maxDepth before continuingAndres Rey
2017-11-06Get the article directionAndres Rey
TODO: Make the metadata array an object with getters and setters
2017-11-06Keep potential top candidate's parent node to try to get text direction of ↵Andres Rey
it later.
2017-11-05If the top candidate is the only child, use parent instead. This will help ↵Andres Rey
sibling joining logic when adjacent content is actually located in parent's sibling node.
2017-11-05Find a better top candidate node if it contains (at least three) nodes which ↵Andres Rey
belong to `topCandidates` array and whose scores are quite closed with current `topCandidate` node.
2017-11-05CleanupAndres Rey
2017-11-05Check for text node contents before converting them to P tagsAndres Rey
2017-11-05Add isElementWithoutContent functionAndres Rey
2017-11-05Clean extra fields when prepping the articleAndres Rey
2017-11-04Add hierarchical separators detection on titlesAndres Rey
2017-11-02Minor cleanupAndres Rey
2017-11-02Update the unlikelyCandidates regexAndres Rey
2017-10-05Apply fixes from StyleCIAndres Rey
2017-10-05Merge pull request #24 from jagermesh/UndefinedIndexAndres Rey
Fix for Notice: Undefined index: og:image
2017-10-05fix forSergiy Lavryk
Notice: Undefined index: og:image in /andreskrey/readability.php/src/HTMLParser.php, line 469
2017-09-14Add summonCthulhu config option + test casesAndres Rey
2017-06-15Safecheck for really bad HTMLAndres Rey
2017-05-31Apply fixes from StyleCIAndres Rey
2017-05-30Minor fixAndres Rey
2017-05-21Minor fixAndres Rey
2017-05-20Move the removeScripts and prepDocument functions inside the loadHTML ↵Andres Rey
function. Performance will suffer (as the system has to reparse the html eveytime it cycles) but is the only solution AFAIK.
2017-05-20Merge remote-tracking branch 'origin/pr-20-new-backup-approach' into ↵Andres Rey
pr-20-new-backup-approach
2017-05-20Add new backup approach. Cloning the original DOM object is not useful to ↵Andres Rey
keep a backup of it because there seems to be a connection between original object and clone. Making a change on the original object translates it to the backup one, so html must be reloaded everytime the algorithm cycles.
2017-03-26Added normalizeEntities flag.Andres Rey
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Merge pull request #18 from andreskrey/developmentAndres Rey
Prepare for release v0.2.0
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Fixed all test cases and bugs, now 100% of our test pass. BREAK OUT THE ↵Andres Rey
CHAMPAGNE!
2017-03-09Fixed small mistake when getting the articleByLine. Corrected test caseAndres Rey
2017-03-07Fuck this, we are not going to normalize blank space.Andres Rey
2017-03-03Functons to normalize space and disable subtitute entitiesAndres Rey
2017-02-21Fixed test cases and added function to replace font tags with span + param ↵Andres Rey
to setNodeTag to keep attributes from original node.
2017-02-12Extract top image when og:image and twitter:image are missing on the HTMLAndres Rey
2017-02-04prevents an exception being thrown David Fricker
prevents an exception being thrown by postProcessContent when $result is a bool not a DOM object.
2016-12-28Removed the private var title since it wasn't usedAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Updated the getMetadata functionAndres Rey
2016-12-24Added function to clean Style tags and refactored the _clean function to ↵Andres Rey
traverse the DOM backwards
2016-12-23Apply fixes from StyleCIAndres Rey