summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2017-11-09Remove extra brs between p nodes after processing the articleAndres Rey
2017-11-09Remove reverse traversing when scanning for brs and convert the DOMNodeList t...Andres Rey
2017-11-09Scan nodes in reverse in removing functions.Andres Rey
2017-11-09Better detection of empty paragraphsAndres Rey
2017-11-08Remove BR cleaning on text nodes temporarilyAndres Rey
2017-11-07Clean style attributes inside tagsAndres Rey
2017-11-07Merge branch 'master' into development-update-to-f0edc77cb58ef52890e3065cf2b0...Andres Rey
2017-11-07Add article text direction to responseAndres Rey
2017-11-07Update logic to remove nodes when cleaning conditionallyAndres Rey
2017-11-07Mark datatables and avoid removing them during cleaningAndres Rey
2017-11-06Check for maxDepth before continuingAndres Rey
2017-11-06Get the article directionAndres Rey
2017-11-06Keep potential top candidate's parent node to try to get text direction of it...Andres Rey
2017-11-05If the top candidate is the only child, use parent instead. This will help si...Andres Rey
2017-11-05Find a better top candidate node if it contains (at least three) nodes which ...Andres Rey
2017-11-05CleanupAndres Rey
2017-11-05Check for text node contents before converting them to P tagsAndres Rey
2017-11-05Add isElementWithoutContent functionAndres Rey
2017-11-05Clean extra fields when prepping the articleAndres Rey
2017-11-04Add hierarchical separators detection on titlesAndres Rey
2017-11-02Minor cleanupAndres Rey
2017-11-02Update the unlikelyCandidates regexAndres Rey
2017-10-05Apply fixes from StyleCIAndres Rey
2017-10-05Merge pull request #24 from jagermesh/UndefinedIndexAndres Rey
2017-10-05fix forSergiy Lavryk
2017-09-14Add summonCthulhu config option + test casesAndres Rey
2017-06-15Safecheck for really bad HTMLAndres Rey
2017-05-31Apply fixes from StyleCIAndres Rey
2017-05-30Minor fixAndres Rey
2017-05-21Minor fixAndres Rey
2017-05-20Move the removeScripts and prepDocument functions inside the loadHTML functio...Andres Rey
2017-05-20Merge remote-tracking branch 'origin/pr-20-new-backup-approach' into pr-20-ne...Andres Rey
2017-05-20Add new backup approach. Cloning the original DOM object is not useful to kee...Andres Rey
2017-03-26Added normalizeEntities flag.Andres Rey
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Merge pull request #18 from andreskrey/developmentAndres Rey
2017-03-10Apply fixes from StyleCIAndres Rey
2017-03-10Fixed all test cases and bugs, now 100% of our test pass. BREAK OUT THE CHAMP...Andres Rey
2017-03-09Fixed small mistake when getting the articleByLine. Corrected test caseAndres Rey
2017-03-07Fuck this, we are not going to normalize blank space.Andres Rey
2017-03-03Functons to normalize space and disable subtitute entitiesAndres Rey
2017-02-21Fixed test cases and added function to replace font tags with span + param to...Andres Rey
2017-02-12Extract top image when og:image and twitter:image are missing on the HTMLAndres Rey
2017-02-04prevents an exception being thrown David Fricker
2016-12-28Removed the private var title since it wasn't usedAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Restored the if to select the main image of the articleAndres Rey
2016-12-26Updated the getMetadata functionAndres Rey
2016-12-24Added function to clean Style tags and refactored the _clean function to trav...Andres Rey
2016-12-23Apply fixes from StyleCIAndres Rey