summaryrefslogtreecommitdiff
path: root/src/HTML5/Parser/Tokenizer.php
AgeCommit message (Collapse)Author
2020-08-24fix: character entity parsingKieran Brahney
2020-02-06prevent infinite loop on unterminated entity declaration at end of streamAsmir Mustafic
2018-11-27Optimize the processing of text between nodesChristophe Coevoet
Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).
2018-11-27Merge pull request #161 from stof/optimize_main_loopAsmir Mustafic
Optimize main loop
2018-11-26Optimize the main loopChristophe Coevoet
2018-11-26Merge pull request #155 from stof/optimize_attributesAsmir Mustafic
Optimize the parsing of unquoted attributes
2018-11-26Merge pull request #154 from stof/optimize_token_comparisonAsmir Mustafic
Optimize the token comparison
2018-11-26Remove useless condition for the parsing of cdataChristophe Coevoet
The caller already ensures that the current token is the right one.
2018-11-26Simplify the doctype matchingChristophe Coevoet
- the doctype() function is only called for a D or d token, so there is no need to check again inside the method - checking that we have the DOCTYPE string can use a sequence matching
2018-11-26Optimize the handling of the EOF detection in the main loopChristophe Coevoet
The eof() method is a no-op when the token is not false. As the main loop already needs to identify that case anyway, skipping the method call allows to reduce the cost of parsing text tokens.
2018-11-26Optimize the parsing of unquoted attributesChristophe Coevoet
2018-11-26Optimize the token comparisonChristophe Coevoet
Tokens are always a single char. Using strspn to find whether they belong to a fixed list is slower than comparing them directly.
2018-11-26Replace next calls with consume calls when the return value is ignoredChristophe Coevoet
2018-11-25Normalize PHPDoc commentsTitouan Galopin
2018-11-25Fix coding styleTitouan Galopin
2018-11-24Optimize consuming whitespacesChristophe Coevoet
Places consuming whitespaces don't care about the matched substring. They either need its length, or nothing. Returning only the length directly avoids computing the substring.
2018-11-24Merge pull request #152 from stof/fix_typoAsmir Mustafic
Fix typo in an error message
2018-11-24Fix typo in an error messageChristophe Coevoet
2018-11-24Optimize the handling of references when consuming dataChristophe Coevoet
2018-11-08move sequenceMatches to the ScannerAsmir Mustafic
2018-11-06Remove another current callTitouan Galopin
2018-11-05Inline tag open in Tokenizer to further improve performancesTitouan Galopin
2018-11-05Improve Tokenizer performance by inlining text parsing and removing some ↵Titouan Galopin
Scanner::current calls
2018-11-02Add more extensions on composer.json, improve phpdocs and remove dead codeTitouan Galopin
2017-12-04#136: Respect self-closing tags only on foreign elementsAlbert Peschar
2017-09-01Merge pull request #134 from Masterminds/ampersand-in-urlsAsmir Mustafic
Raw & in attributes
2017-08-31reduce number of times "current" is invokedAsmir Mustafic
2017-08-28Fixes https://github.com/Masterminds/html5-php/issues/124Asmir Mustafic
Reference: https://www.w3.org/TR/html52/syntax.html#character-reference-state If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state. If the last character matched is not a U+003B SEMICOLON character (;), this is a parse error.
2017-07-26Fix https://github.com/Masterminds/html5-php/issues/131Asmir Mustafic
2016-08-17In XML mode, tags are case sensitiveAsmir Mustafic
Fixes #114
2015-06-22doctype method has no arguments. Fixing.Matt Farina
2015-03-08Closes #78: Fixes bug where unmatched entity like string drops everything ↵Matt Farina
after &.
2015-02-02Allow whitespaces in RCDATA end tagsZhaofeng Li
Fixes #75 Signed-off-by: Zhaofeng Li <[email protected]>
2014-12-17Merge pull request #64 from goetas/i63Asmir Mustafic
Case insensitive tags
2014-12-14Case insensitive comparison only for html5 tagsAsmir Mustafic
2014-12-01Added support for dashes in element tag names (closes #65)Asmir Mustafic
2014-11-24Case insensitive tagsAsmir Mustafic
fixes #63
2014-08-01Closes #56Asmir Mustafic
2014-06-17PSR-2 formattingAsmir Mustafic
2014-06-11PSR-2 code styleAsmir Mustafic
2014-06-11PSR-0 vendor namespaceAsmir Mustafic
2014-05-25Parse RCDATA the right wayKITAITI Makoto
2014-04-16Don't throw an exception for invalid tag namesMišo Belica
2014-02-21Ignore attributes with illegal chars in name (fixes #23)Mišo Belica
This is neccesary because method "DOMElement::setAttribute" throws exception for wrong names so DOM elements can't contain these attributes.
2014-02-11Merge pull request #28 from miso-belica/fix-infinite-cycleMatt Butcher
Fixed infinite loop for char "&" in unquoted attribute
2014-02-11Merge branch 'master' of github.com:Masterminds/html5-phpMatt Butcher
2014-02-11Fix for #25: Handle missing tag close in attribute list.Matt Butcher
2014-02-11Fixed infinite loop for char "&" in unquoted attributeMišo Belica
2014-02-11Removed trailing whitespaceMišo Belica
2014-02-07#26: Updated the case handling for tags to allow for uppercase tags and ↵Matt Farina
normalizing tag names to lowercase (per 8.2.4.9) except for SVG foreign tags that are case sensitive.