summaryrefslogtreecommitdiff
path: root/src/HTMLParser.php
AgeCommit message (Collapse)Author
2017-09-14Add summonCthulhu config option + test casesAndres Rey
2017-06-15Safecheck for really bad HTMLAndres Rey
2017-05-31Apply fixes from StyleCIAndres Rey
2017-05-30Minor fixAndres Rey
2017-05-21Minor fixAndres Rey
2017-05-20Move the removeScripts and prepDocument functions inside the loadHTML ↵Andres Rey
function. Performance will suffer (as the system has to reparse the html eveytime it cycles) but is the only solution AFAIK.
2017-05-20Merge remote-tracking branch 'origin/pr-20-new-backup-approach' into ↵Andres Rey
pr-20-new-backup-approach
2017-05-20Add new backup approach. Cloning the original DOM object is not useful to ↵Andres Rey
keep a backup of it because there seems to be a connection between original object and clone. Making a change on the original object translates it to the backup one, so html must be reloaded everytime the algorithm cycles.
2017-03-26Added normalizeEntities flag.Andres Rey
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Merge pull request #18 from andreskrey/developmentAndres Rey
Prepare for release v0.2.0
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Fixed all test cases and bugs, now 100% of our test pass. BREAK OUT THE ↵Andres Rey
CHAMPAGNE!
2017-03-09Fixed small mistake when getting the articleByLine. Corrected test caseAndres Rey
2017-03-07Fuck this, we are not going to normalize blank space.Andres Rey
2017-03-03Functons to normalize space and disable subtitute entitiesAndres Rey
2017-02-21Fixed test cases and added function to replace font tags with span + param ↵Andres Rey
to setNodeTag to keep attributes from original node.
2017-02-12Extract top image when og:image and twitter:image are missing on the HTMLAndres Rey
2017-02-04prevents an exception being thrown David Fricker
prevents an exception being thrown by postProcessContent when $result is a bool not a DOM object.
2016-12-28Removed the private var title since it wasn't usedAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Updated the getMetadata functionAndres Rey
2016-12-24Added function to clean Style tags and refactored the _clean function to ↵Andres Rey
traverse the DOM backwards
2016-12-23Apply fixes from StyleCIAndres Rey
2016-12-23Updated README and fixed initial options.Andres Rey
2016-12-23IF statements are hardAndres Rey
2016-12-23New function to solve relative URLsAndres Rey
2016-12-22Node shifting is a bitchAndres Rey
2016-12-21Added safe check when getting the article title.Andres Rey
2016-12-15Added prepDocument function.Andres Rey
2016-12-15Moved the position of the backupnode creation since we need it without the ↵Andres Rey
script tags.
2016-12-15Added a hack to load HTML with UTF-8 charactersAndres Rey
2016-12-12Added recursion to re-run the algorithm in case no quality content is found.Andres Rey
2016-12-11Added backupdom property, which will hold the original HTML in case it's ↵Andres Rey
needed to create a fake top candidate
2016-12-11Added option to filter empty DOMText nodes while getting children.Andres Rey
2016-12-11Removed conditional cleaning of crlfsAndres Rey
2016-12-10Small fix to avoid adding empty DOMText nodes to the selected nodes.Andres Rey
2016-12-10Fixed wrong iteration on hasSingleChildBlockElement. Funny how one single ↵Andres Rey
line in JS turns into 10 in PHP. Not because PHP, more like because I'm a sub par dev :F
2016-12-10Solved some mistakes during node parsing (before scoring)Andres Rey
2016-12-08Added option to remove the data-readability tags.Andres Rey
2016-12-07Added some basic Unit Testing. HTML samples taken from readability.jsAndres Rey
2016-11-27Fixed node trasverse while cleaningConditionallyAndres Rey
2016-11-27Added cleanExtraParagraphsAndres Rey
2016-11-27Added cleanConditionallyAndres Rey
2016-11-25Updated readme for release. Added cleanheaders.Andres Rey
2016-11-24Progress over prepArticleAndres Rey
2016-11-24Initial approach to prepArticleAndres Rey
2016-11-23Added horrible way to restore score. Should be gone in next versions (or whe ↵Andres Rey
I learn to code properly)
2016-11-22Removed old reference to elementsToScore, switched the moment when elements ↵Andres Rey
are initialized