Convert HTML parser to handle 8-bit resources without converting to UChar*
https://bugs.webkit.org/show_bug.cgi?id=90321
Reviewed by Adam Barth.
Source/WebCore:
No new tests, no new functionality therefore covered by existing tests.
Changed the parsing of HTML to handle 8-bit strings without converting the
whole string to 16-bits. Primary change was to use the appropriate
character pointer type (LChar* or UChar*) depending on the source string.
This access is abstracted in SegmentedSubstring by getCurrentChar{8,16} and
incrementAndGetCurrentChar{8,16} methods. In SegmentedString, the advance()
and advanceAndUpdateLineNumber() methods have been refactored into a
state machine managed pair of function pointers. The possible functions
have not only the 8 and 16 bitness factored out, they also have the
"exclude line numbers" logic factored out for advanceAndUpdateLineNumber().
This change also has improvements over http://trac.webkit.org/changeset/123679.
The most prominent change was to inline the 8 bit paths for both advance()
and advanceAndUpdateLineNumber().
This provides a net speed-up even with the 8/16 bit testing.
Other changes involve using String methods to access and compare with the
source data.
* html/FTPDirectoryDocument.cpp:
(WebCore::FTPDirectoryDocumentParser::append): Changed to use currentChar()
instead of SegmentedString '*' operator.
* html/parser/HTMLEntityParser.cpp: Changed to use currentChar() instead of
SegmentedString '*' operator.
* html/parser/HTMLParserIdioms.cpp:
(WebCore::stripLeadingAndTrailingHTMLSpaces): Added 8-bit fast path.
* html/parser/HTMLSourceTracker.cpp:
(WebCore::HTMLSourceTracker::sourceForToken): Changed to use currentChar() instead of
SegmentedString '*' operator.
* loader/cache/CachedScript.cpp:
(WebCore::CachedScript::script): Updated to use new String::dataSize() method.
* platform/text/SegmentedString.cpp:
(WebCore::SegmentedString::SegmentedString):
(WebCore::SegmentedString::operator=):
(WebCore::SegmentedString::clear):
(WebCore::SegmentedString::append):
(WebCore::SegmentedString::prepend):
(WebCore::SegmentedString::advanceSubstring):
(WebCore::SegmentedString::advance):
(WebCore::SegmentedString::advance8): Fast path advance function for 8 bit data.
(WebCore::SegmentedString::advance16): Fast path advance function for 16 bit data.
(WebCore::SegmentedString::advanceAndUpdateLineNumber8): Fast path
advanceAndUpdateLineNumber for 8 bit data when we need to update the line number.
(WebCore::SegmentedString::advanceAndUpdateLineNumber16): Fast path
advanceAndUpdateLineNumber for 16 bit data when we need to update the line number.
(WebCore::SegmentedString::advanceSlowCase): Slow case advance function for both
8 and 16 bit advance.
(WebCore::SegmentedString::advanceAndUpdateLineNumberSlowCase): Slow case
advanceAndUpdateLineNumber function for both 8 and 16 bit data.
(WebCore::SegmentedString::advanceEmpty): Advance function when input has been exhausted.
(WebCore::SegmentedString::updateSlowCaseFunctionPointers): Set advance function
pointers to slow case functions.
* platform/text/SegmentedString.h:
(WebCore::SegmentedSubstring::SegmentedSubstring):
(WebCore::SegmentedSubstring::clear):
(SegmentedSubstring):
(WebCore::SegmentedSubstring::is8Bit): New getter.
(WebCore::SegmentedSubstring::appendTo):
(WebCore::SegmentedSubstring::getCurrentChar8): New getter for the 8 bit case.
(WebCore::SegmentedSubstring::getCurrentChar16): New getter for the 16 bit case.
(WebCore::SegmentedSubstring::incrementAndGetCurrentChar8): New wrapper functions
that pre-increments the 8 bit pointer and returns the next character.
(WebCore::SegmentedSubstring::incrementAndGetCurrentChar16): New wrapper functions
that pre-increments the 16 bit pointer and returns the next character.
(WebCore::SegmentedSubstring::currentSubString):
(WebCore::SegmentedSubstring::getCurrentChar):
(WebCore::SegmentedSubstring::incrementAndGetCurrentChar):
(WebCore::SegmentedString::SegmentedString):
(WebCore::SegmentedString::push):
(WebCore::SegmentedString::isEmpty):
(WebCore::SegmentedString::lookAhead):
(WebCore::SegmentedString::lookAheadIgnoringCase):
(WebCore::SegmentedString::advance): Wrapper that contains 8 bit fast path
or calls via member function point to approriate advance() flavor.
(WebCore::SegmentedString::advanceAndUpdateLineNumber): Wrapper that
contains 8 bit fast path or calls via member function point to approriate
advance() flavor.
(WebCore::SegmentedString::advanceAndASSERT):
(WebCore::SegmentedString::advanceAndASSERTIgnoringCase):
(WebCore::SegmentedString::advancePastNonNewline):
(WebCore::SegmentedString::advancePastNewlineAndUpdateLineNumber):
(WebCore::SegmentedString::currentChar):
(WebCore::SegmentedString::decrementAndCheckLength): Decrement substring length and
change to slow case functions when only one character left.
(WebCore::SegmentedString::updateAdvanceFunctionPointers): Select appropriate
advance functions based on current substring.
(WebCore::SegmentedString::lookAheadInline): Changed to use String::startsWith().
(WebCore::SegmentedString::lookAheadSlowCase):
* xml/parser/CharacterReferenceParserInlineMethods.h:
(WebCore::consumeCharacterReference): Changed to use currentChar() instead of
SegmentedString '*' operator.
* xml/parser/MarkupTokenizerBase.h:
(WebCore::MarkupTokenizerBase::InputStreamPreprocessor::nextInputCharacter):
(WebCore::MarkupTokenizerBase::InputStreamPreprocessor::peek): Changed to use
currentChar() instead of SegmentedString '*' operator.
(WebCore::MarkupTokenizerBase::InputStreamPreprocessor::advance):
Source/WTF:
* wtf/text/WTFString.h:
(WTF::String::dataSize): New method to return the number of bytes
the string requires. This replaces a harcoded calculation based on
length and sizeof(UChar).
git-svn-id: http://svn.webkit.org/repository/webkit/trunk@124679 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/Source/WTF/ChangeLog b/Source/WTF/ChangeLog
index 0113eb1..3805100 100644
--- a/Source/WTF/ChangeLog
+++ b/Source/WTF/ChangeLog
@@ -1,3 +1,15 @@
+2012-08-03 Michael Saboff <msaboff@apple.com>
+
+ Convert HTML parser to handle 8-bit resources without converting to UChar*
+ https://bugs.webkit.org/show_bug.cgi?id=90321
+
+ Reviewed by Adam Barth.
+
+ * wtf/text/WTFString.h:
+ (WTF::String::dataSize): New method to return the number of bytes
+ the string requires. This replaces a harcoded calculation based on
+ length and sizeof(UChar).
+
2012-08-03 Benjamin Poulain <benjamin@webkit.org>
StringImpl created from literal should be BufferInternal
diff --git a/Source/WTF/wtf/text/WTFString.h b/Source/WTF/wtf/text/WTFString.h
index 3e09a20..e949843 100644
--- a/Source/WTF/wtf/text/WTFString.h
+++ b/Source/WTF/wtf/text/WTFString.h
@@ -190,6 +190,13 @@
bool is8Bit() const { return m_impl->is8Bit(); }
+ unsigned sizeInBytes() const
+ {
+ if (!m_impl)
+ return 0;
+ return m_impl->length() * (is8Bit() ? sizeof(LChar) : sizeof(UChar));
+ }
+
WTF_EXPORT_STRING_API CString ascii() const;
WTF_EXPORT_STRING_API CString latin1() const;
WTF_EXPORT_STRING_API CString utf8(bool strict = false) const;
diff --git a/Source/WebCore/ChangeLog b/Source/WebCore/ChangeLog
index 3bed2f4..714c9f6 100644
--- a/Source/WebCore/ChangeLog
+++ b/Source/WebCore/ChangeLog
@@ -1,3 +1,141 @@
+2012-08-03 Michael Saboff <msaboff@apple.com>
+
+ Convert HTML parser to handle 8-bit resources without converting to UChar*
+ https://bugs.webkit.org/show_bug.cgi?id=90321
+
+ Reviewed by Adam Barth.
+
+ No new tests, no new functionality therefore covered by existing tests.
+
+ Changed the parsing of HTML to handle 8-bit strings without converting the
+ whole string to 16-bits. Primary change was to use the appropriate
+ character pointer type (LChar* or UChar*) depending on the source string.
+ This access is abstracted in SegmentedSubstring by getCurrentChar{8,16} and
+ incrementAndGetCurrentChar{8,16} methods. In SegmentedString, the advance()
+ and advanceAndUpdateLineNumber() methods have been refactored into a
+ state machine managed pair of function pointers. The possible functions
+ have not only the 8 and 16 bitness factored out, they also have the
+ "exclude line numbers" logic factored out for advanceAndUpdateLineNumber().
+ This change also has improvements over http://trac.webkit.org/changeset/123679.
+ The most prominent change was to inline the 8 bit paths for both advance()
+ and advanceAndUpdateLineNumber().
+ This provides a net speed-up even with the 8/16 bit testing.
+ Other changes involve using String methods to access and compare with the
+ source data.
+
+ * html/FTPDirectoryDocument.cpp:
+ (WebCore::FTPDirectoryDocumentParser::append): Changed to use currentChar()
+ instead of SegmentedString '*' operator.
+ * html/parser/HTMLEntityParser.cpp: Changed to use currentChar() instead of
+ SegmentedString '*' operator.
+ * html/parser/HTMLParserIdioms.cpp:
+ (WebCore::stripLeadingAndTrailingHTMLSpaces): Added 8-bit fast path.
+ * html/parser/HTMLSourceTracker.cpp:
+ (WebCore::HTMLSourceTracker::sourceForToken): Changed to use currentChar() instead of
+ SegmentedString '*' operator.
+ * loader/cache/CachedScript.cpp:
+ (WebCore::CachedScript::script): Updated to use new String::dataSize() method.
+ * platform/text/SegmentedString.cpp:
+ (WebCore::SegmentedString::SegmentedString):
+ (WebCore::SegmentedString::operator=):
+ (WebCore::SegmentedString::clear):
+ (WebCore::SegmentedString::append):
+ (WebCore::SegmentedString::prepend):
+ (WebCore::SegmentedString::advanceSubstring):
+ (WebCore::SegmentedString::advance):
+ (WebCore::SegmentedString::advance8): Fast path advance function for 8 bit data.
+ (WebCore::SegmentedString::advance16): Fast path advance function for 16 bit data.
+ (WebCore::SegmentedString::advanceAndUpdateLineNumber8): Fast path
+ advanceAndUpdateLineNumber for 8 bit data when we need to update the line number.
+ (WebCore::SegmentedString::advanceAndUpdateLineNumber16): Fast path
+ advanceAndUpdateLineNumber for 16 bit data when we need to update the line number.
+ (WebCore::SegmentedString::advanceSlowCase): Slow case advance function for both
+ 8 and 16 bit advance.
+ (WebCore::SegmentedString::advanceAndUpdateLineNumberSlowCase): Slow case
+ advanceAndUpdateLineNumber function for both 8 and 16 bit data.
+ (WebCore::SegmentedString::advanceEmpty): Advance function when input has been exhausted.
+ (WebCore::SegmentedString::updateSlowCaseFunctionPointers): Set advance function
+ pointers to slow case functions.
+ * platform/text/SegmentedString.h:
+ (WebCore::SegmentedSubstring::SegmentedSubstring):
+ (WebCore::SegmentedSubstring::clear):
+ (SegmentedSubstring):
+ (WebCore::SegmentedSubstring::is8Bit): New getter.
+ (WebCore::SegmentedSubstring::appendTo):
+ (WebCore::SegmentedSubstring::getCurrentChar8): New getter for the 8 bit case.
+ (WebCore::SegmentedSubstring::getCurrentChar16): New getter for the 16 bit case.
+ (WebCore::SegmentedSubstring::incrementAndGetCurrentChar8): New wrapper functions
+ that pre-increments the 8 bit pointer and returns the next character.
+ (WebCore::SegmentedSubstring::incrementAndGetCurrentChar16): New wrapper functions
+ that pre-increments the 16 bit pointer and returns the next character.
+ (WebCore::SegmentedSubstring::currentSubString):
+ (WebCore::SegmentedSubstring::getCurrentChar):
+ (WebCore::SegmentedSubstring::incrementAndGetCurrentChar):
+ (WebCore::SegmentedString::SegmentedString):
+ (WebCore::SegmentedString::push):
+ (WebCore::SegmentedString::isEmpty):
+ (WebCore::SegmentedString::lookAhead):
+ (WebCore::SegmentedString::lookAheadIgnoringCase):
+ (WebCore::SegmentedString::advance): Wrapper that contains 8 bit fast path
+ or calls via member function point to approriate advance() flavor.
+ (WebCore::SegmentedString::advanceAndUpdateLineNumber): Wrapper that
+ contains 8 bit fast path or calls via member function point to approriate
+ advance() flavor.
+ (WebCore::SegmentedString::advanceAndASSERT):
+ (WebCore::SegmentedString::advanceAndASSERTIgnoringCase):
+ (WebCore::SegmentedString::advancePastNonNewline):
+ (WebCore::SegmentedString::advancePastNewlineAndUpdateLineNumber):
+ (WebCore::SegmentedString::currentChar):
+ (WebCore::SegmentedString::decrementAndCheckLength): Decrement substring length and
+ change to slow case functions when only one character left.
+ (WebCore::SegmentedString::updateAdvanceFunctionPointers): Select appropriate
+ advance functions based on current substring.
+ (WebCore::SegmentedString::lookAheadInline): Changed to use String::startsWith().
+ (WebCore::SegmentedString::lookAheadSlowCase):
+ * xml/parser/CharacterReferenceParserInlineMethods.h:
+ (WebCore::consumeCharacterReference): Changed to use currentChar() instead of
+ SegmentedString '*' operator.
+ * xml/parser/MarkupTokenizerBase.h:
+ (WebCore::MarkupTokenizerBase::InputStreamPreprocessor::nextInputCharacter):
+ (WebCore::MarkupTokenizerBase::InputStreamPreprocessor::peek): Changed to use
+ currentChar() instead of SegmentedString '*' operator.
+ (WebCore::MarkupTokenizerBase::InputStreamPreprocessor::advance):
+
+2012-08-03 Arnaud Renevier <a.renevier@sisa.samsung.com>
+
+ Update TypedArrays to throw RangeError or similar instead of INDEX_SIZE_ERR
+ https://bugs.webkit.org/show_bug.cgi?id=45118
+
+ Reviewed by Kenneth Russell.
+
+ Update TypedArrays to raise JavaScript RangeError instead of dom
+ INDEX_SIZE_ERR exceptions. Also, update TypedArrays to raise TypeError
+ instead of JavaScript SyntaxError or dom exceptions SYNTAX_ERR when
+ calling set method with invalid arguments.
+
+ Specification does not define the type of exceptions to raise, but
+ other browsers raise JavaScript errors, so those changes will improve
+ compatibility.
+
+ New Test: fast/canvas/webgl/array-set-invalid-arguments.html
+
+ Updated Tests expectations:
+ fast/canvas/webgl/array-set-out-of-bounds.html
+ fast/canvas/webgl/data-view-crash.html
+ fast/canvas/webgl/data-view-test.html
+
+ * bindings/js/JSArrayBufferViewHelper.h:
+ (WebCore):
+ (WebCore::setWebGLArrayWithTypedArrayArgument):
+ (WebCore::setWebGLArrayHelper):
+ (WebCore::constructArrayBufferViewWithTypedArrayArgument):
+ (WebCore::constructArrayBufferViewWithArrayBufferArgument):
+ (WebCore::constructArrayBufferView):
+ * bindings/v8/custom/V8ArrayBufferViewCustom.h:
+ (WebCore):
+ (WebCore::constructWebGLArrayWithArrayBufferArgument):
+ (WebCore::setWebGLArrayHelper):
+
2012-08-03 Dave Tu <dtu@chromium.org>
[chromium] Fix Renderer4.CompositorThreadImplDrawDelay histogram.
diff --git a/Source/WebCore/html/FTPDirectoryDocument.cpp b/Source/WebCore/html/FTPDirectoryDocument.cpp
index 5016df7..8fefc20 100644
--- a/Source/WebCore/html/FTPDirectoryDocument.cpp
+++ b/Source/WebCore/html/FTPDirectoryDocument.cpp
@@ -366,7 +366,7 @@
m_dest = m_buffer;
SegmentedString str = source;
while (!str.isEmpty()) {
- UChar c = *str;
+ UChar c = str.currentChar();
if (c == '\r') {
*m_dest++ = '\n';
diff --git a/Source/WebCore/html/parser/HTMLEntityParser.cpp b/Source/WebCore/html/parser/HTMLEntityParser.cpp
index 442cedb..1871805 100644
--- a/Source/WebCore/html/parser/HTMLEntityParser.cpp
+++ b/Source/WebCore/html/parser/HTMLEntityParser.cpp
@@ -87,7 +87,7 @@
StringBuilder consumedCharacters;
HTMLEntitySearch entitySearch;
while (!source.isEmpty()) {
- cc = *source;
+ cc = source.currentChar();
entitySearch.advance(cc);
if (!entitySearch.isEntityPrefix())
break;
@@ -114,13 +114,13 @@
const int length = entitySearch.mostRecentMatch()->length;
const UChar* reference = entitySearch.mostRecentMatch()->entity;
for (int i = 0; i < length; ++i) {
- cc = *source;
+ cc = source.currentChar();
ASSERT_UNUSED(reference, cc == *reference++);
consumedCharacters.append(cc);
source.advanceAndASSERT(cc);
ASSERT(!source.isEmpty());
}
- cc = *source;
+ cc = source.currentChar();
}
if (entitySearch.mostRecentMatch()->lastCharacter() == ';'
|| !additionalAllowedCharacter
diff --git a/Source/WebCore/html/parser/HTMLInputStream.h b/Source/WebCore/html/parser/HTMLInputStream.h
index da6932f..593290b 100644
--- a/Source/WebCore/html/parser/HTMLInputStream.h
+++ b/Source/WebCore/html/parser/HTMLInputStream.h
@@ -74,7 +74,7 @@
{
// FIXME: This should use InputStreamPreprocessor::endOfFileMarker
// once InputStreamPreprocessor is split off into its own header.
- static const UChar endOfFileMarker = 0;
+ static const LChar endOfFileMarker = 0;
m_last->append(SegmentedString(String(&endOfFileMarker, 1)));
m_last->close();
}
diff --git a/Source/WebCore/html/parser/HTMLParserIdioms.cpp b/Source/WebCore/html/parser/HTMLParserIdioms.cpp
index c3becc3..8f057b1 100644
--- a/Source/WebCore/html/parser/HTMLParserIdioms.cpp
+++ b/Source/WebCore/html/parser/HTMLParserIdioms.cpp
@@ -34,13 +34,13 @@
namespace WebCore {
-String stripLeadingAndTrailingHTMLSpaces(const String& string)
+template <typename CharType>
+static String stripLeadingAndTrailingHTMLSpaces(String string, CharType characters, unsigned length)
{
- const UChar* characters = string.characters();
- unsigned length = string.length();
+ unsigned numLeadingSpaces = 0;
+ unsigned numTrailingSpaces = 0;
- unsigned numLeadingSpaces;
- for (numLeadingSpaces = 0; numLeadingSpaces < length; ++numLeadingSpaces) {
+ for (; numLeadingSpaces < length; ++numLeadingSpaces) {
if (isNotHTMLSpace(characters[numLeadingSpaces]))
break;
}
@@ -48,17 +48,32 @@
if (numLeadingSpaces == length)
return string.isNull() ? string : emptyAtom.string();
- unsigned numTrailingSpaces;
- for (numTrailingSpaces = 0; numTrailingSpaces < length; ++numTrailingSpaces) {
+ for (; numTrailingSpaces < length; ++numTrailingSpaces) {
if (isNotHTMLSpace(characters[length - numTrailingSpaces - 1]))
break;
}
ASSERT(numLeadingSpaces + numTrailingSpaces < length);
+ if (!(numLeadingSpaces | numTrailingSpaces))
+ return string;
+
return string.substring(numLeadingSpaces, length - (numLeadingSpaces + numTrailingSpaces));
}
+String stripLeadingAndTrailingHTMLSpaces(const String& string)
+{
+ unsigned length = string.length();
+
+ if (!length)
+ return string.isNull() ? string : emptyAtom.string();
+
+ if (string.is8Bit())
+ return stripLeadingAndTrailingHTMLSpaces(string, string.characters8(), length);
+
+ return stripLeadingAndTrailingHTMLSpaces(string, string.characters(), length);
+}
+
String serializeForNumberType(const Decimal& number)
{
if (number.isZero()) {
diff --git a/Source/WebCore/html/parser/HTMLSourceTracker.cpp b/Source/WebCore/html/parser/HTMLSourceTracker.cpp
index e7c6876..b042dd6 100644
--- a/Source/WebCore/html/parser/HTMLSourceTracker.cpp
+++ b/Source/WebCore/html/parser/HTMLSourceTracker.cpp
@@ -71,12 +71,12 @@
size_t i = 0;
for ( ; i < length && !m_previousSource.isEmpty(); ++i) {
- source.append(*m_previousSource);
+ source.append(m_previousSource.currentChar());
m_previousSource.advance();
}
for ( ; i < length; ++i) {
ASSERT(!m_currentSource.isEmpty());
- source.append(*m_currentSource);
+ source.append(m_currentSource.currentChar());
m_currentSource.advance();
}
diff --git a/Source/WebCore/loader/cache/CachedScript.cpp b/Source/WebCore/loader/cache/CachedScript.cpp
index a2da744..086fa1b 100644
--- a/Source/WebCore/loader/cache/CachedScript.cpp
+++ b/Source/WebCore/loader/cache/CachedScript.cpp
@@ -71,7 +71,7 @@
if (!m_script && m_data) {
m_script = m_decoder->decode(m_data->data(), encodedSize());
m_script += m_decoder->flush();
- setDecodedSize(m_script.length() * sizeof(UChar));
+ setDecodedSize(m_script.sizeInBytes());
}
m_decodedDataDeletionTimer.startOneShot(0);
diff --git a/Source/WebCore/platform/text/SegmentedString.cpp b/Source/WebCore/platform/text/SegmentedString.cpp
index 63373ba..9edb83c 100644
--- a/Source/WebCore/platform/text/SegmentedString.cpp
+++ b/Source/WebCore/platform/text/SegmentedString.cpp
@@ -28,13 +28,17 @@
, m_currentString(other.m_currentString)
, m_substrings(other.m_substrings)
, m_closed(other.m_closed)
+ , m_empty(other.m_empty)
+ , m_fastPathFlags(other.m_fastPathFlags)
+ , m_advanceFunc(other.m_advanceFunc)
+ , m_advanceAndUpdateLineNumberFunc(other.m_advanceAndUpdateLineNumberFunc)
{
- if (other.m_currentChar == &other.m_pushedChar1)
- m_currentChar = &m_pushedChar1;
- else if (other.m_currentChar == &other.m_pushedChar2)
- m_currentChar = &m_pushedChar2;
+ if (m_pushedChar2)
+ m_currentChar = m_pushedChar2;
+ else if (m_pushedChar1)
+ m_currentChar = m_pushedChar1;
else
- m_currentChar = other.m_currentChar;
+ m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
}
const SegmentedString& SegmentedString::operator=(const SegmentedString& other)
@@ -43,17 +47,23 @@
m_pushedChar2 = other.m_pushedChar2;
m_currentString = other.m_currentString;
m_substrings = other.m_substrings;
- if (other.m_currentChar == &other.m_pushedChar1)
- m_currentChar = &m_pushedChar1;
- else if (other.m_currentChar == &other.m_pushedChar2)
- m_currentChar = &m_pushedChar2;
+ if (m_pushedChar2)
+ m_currentChar = m_pushedChar2;
+ else if (m_pushedChar1)
+ m_currentChar = m_pushedChar1;
else
- m_currentChar = other.m_currentChar;
+ m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+
m_closed = other.m_closed;
+ m_empty = other.m_empty;
+ m_fastPathFlags = other.m_fastPathFlags;
m_numberOfCharactersConsumedPriorToCurrentString = other.m_numberOfCharactersConsumedPriorToCurrentString;
m_numberOfCharactersConsumedPriorToCurrentLine = other.m_numberOfCharactersConsumedPriorToCurrentLine;
m_currentLine = other.m_currentLine;
+ m_advanceFunc = other.m_advanceFunc;
+ m_advanceAndUpdateLineNumberFunc = other.m_advanceAndUpdateLineNumberFunc;
+
return *this;
}
@@ -96,6 +106,10 @@
m_currentLine = 0;
m_substrings.clear();
m_closed = false;
+ m_empty = true;
+ m_fastPathFlags = NoFastPath;
+ m_advanceFunc = &SegmentedString::advanceEmpty;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
}
void SegmentedString::append(const SegmentedSubstring& s)
@@ -107,8 +121,10 @@
if (!m_currentString.m_length) {
m_numberOfCharactersConsumedPriorToCurrentString += m_currentString.numberOfCharactersConsumed();
m_currentString = s;
+ updateAdvanceFunctionPointers();
} else
m_substrings.append(s);
+ m_empty = false;
}
void SegmentedString::prepend(const SegmentedSubstring& s)
@@ -125,13 +141,16 @@
// cases in the future.
m_numberOfCharactersConsumedPriorToCurrentString += m_currentString.numberOfCharactersConsumed();
m_numberOfCharactersConsumedPriorToCurrentString -= s.m_length;
- if (!m_currentString.m_length)
+ if (!m_currentString.m_length) {
m_currentString = s;
- else {
+ updateAdvanceFunctionPointers();
+ } else {
// Shift our m_currentString into our list.
m_substrings.prepend(m_currentString);
m_currentString = s;
+ updateAdvanceFunctionPointers();
}
+ m_empty = false;
}
void SegmentedString::close()
@@ -152,7 +171,7 @@
for (; it != e; ++it)
append(*it);
}
- m_currentChar = m_pushedChar1 ? &m_pushedChar1 : m_currentString.m_current;
+ m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
}
void SegmentedString::prepend(const SegmentedString& s)
@@ -166,7 +185,7 @@
prepend(*it);
}
prepend(s.m_currentString);
- m_currentChar = m_pushedChar1 ? &m_pushedChar1 : m_currentString.m_current;
+ m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
}
void SegmentedString::advanceSubstring()
@@ -178,8 +197,14 @@
// string, we now account for those characters as part of the current
// string, not as part of "prior to current string."
m_numberOfCharactersConsumedPriorToCurrentString -= m_currentString.numberOfCharactersConsumed();
- } else
+ updateAdvanceFunctionPointers();
+ } else {
m_currentString.clear();
+ m_empty = true;
+ m_fastPathFlags = NoFastPath;
+ m_advanceFunc = &SegmentedString::advanceEmpty;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+ }
}
String SegmentedString::toString() const
@@ -204,22 +229,72 @@
{
ASSERT(count <= length());
for (unsigned i = 0; i < count; ++i) {
- consumedCharacters[i] = *current();
+ consumedCharacters[i] = currentChar();
advance();
}
}
+void SegmentedString::advance8()
+{
+ ASSERT(!m_pushedChar1);
+ decrementAndCheckLength();
+ m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+}
+
+void SegmentedString::advance16()
+{
+ ASSERT(!m_pushedChar1);
+ decrementAndCheckLength();
+ m_currentChar = m_currentString.incrementAndGetCurrentChar16();
+}
+
+void SegmentedString::advanceAndUpdateLineNumber8()
+{
+ ASSERT(!m_pushedChar1);
+ ASSERT(m_currentString.getCurrentChar() == m_currentChar);
+ if (m_currentChar == '\n') {
+ ++m_currentLine;
+ m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+ }
+ decrementAndCheckLength();
+ m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+}
+
+void SegmentedString::advanceAndUpdateLineNumber16()
+{
+ ASSERT(!m_pushedChar1);
+ ASSERT(m_currentString.getCurrentChar() == m_currentChar);
+ if (m_currentChar == '\n') {
+ ++m_currentLine;
+ m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+ }
+ decrementAndCheckLength();
+ m_currentChar = m_currentString.incrementAndGetCurrentChar16();
+}
+
void SegmentedString::advanceSlowCase()
{
if (m_pushedChar1) {
m_pushedChar1 = m_pushedChar2;
m_pushedChar2 = 0;
- } else if (m_currentString.m_current) {
- ++m_currentString.m_current;
+
+ if (m_pushedChar1) {
+ m_currentChar = m_pushedChar1;
+ return;
+ }
+
+ updateAdvanceFunctionPointers();
+ } else if (m_currentString.m_length) {
if (--m_currentString.m_length == 0)
advanceSubstring();
+ } else if (!isComposite()) {
+ m_currentString.clear();
+ m_empty = true;
+ m_fastPathFlags = NoFastPath;
+ m_advanceFunc = &SegmentedString::advanceEmpty;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
}
- m_currentChar = m_pushedChar1 ? &m_pushedChar1 : m_currentString.m_current;
+ m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
}
void SegmentedString::advanceAndUpdateLineNumberSlowCase()
@@ -227,16 +302,45 @@
if (m_pushedChar1) {
m_pushedChar1 = m_pushedChar2;
m_pushedChar2 = 0;
- } else if (m_currentString.m_current) {
- if (*m_currentString.m_current++ == '\n' && m_currentString.doNotExcludeLineNumbers()) {
+
+ if (m_pushedChar1) {
+ m_currentChar = m_pushedChar1;
+ return;
+ }
+
+ updateAdvanceFunctionPointers();
+ } else if (m_currentString.m_length) {
+ if (m_currentString.getCurrentChar() == '\n' && m_currentString.doNotExcludeLineNumbers()) {
++m_currentLine;
// Plus 1 because numberOfCharactersConsumed value hasn't incremented yet; it does with m_length decrement below.
m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
}
if (--m_currentString.m_length == 0)
advanceSubstring();
+ else
+ m_currentString.incrementAndGetCurrentChar(); // Only need the ++
+ } else if (!isComposite()) {
+ m_currentString.clear();
+ m_empty = true;
+ m_fastPathFlags = NoFastPath;
+ m_advanceFunc = &SegmentedString::advanceEmpty;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
}
- m_currentChar = m_pushedChar1 ? &m_pushedChar1 : m_currentString.m_current;
+
+ m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+}
+
+void SegmentedString::advanceEmpty()
+{
+ ASSERT(!m_currentString.m_length && !isComposite());
+ m_currentChar = 0;
+}
+
+void SegmentedString::updateSlowCaseFunctionPointers()
+{
+ m_fastPathFlags = NoFastPath;
+ m_advanceFunc = &SegmentedString::advanceSlowCase;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumberSlowCase;
}
OrdinalNumber SegmentedString::currentLine() const
diff --git a/Source/WebCore/platform/text/SegmentedString.h b/Source/WebCore/platform/text/SegmentedString.h
index 777a65e..0a8ecb2 100644
--- a/Source/WebCore/platform/text/SegmentedString.h
+++ b/Source/WebCore/platform/text/SegmentedString.h
@@ -33,20 +33,32 @@
public:
SegmentedSubstring()
: m_length(0)
- , m_current(0)
, m_doNotExcludeLineNumbers(true)
+ , m_is8Bit(false)
{
+ m_data.string16Ptr = 0;
}
SegmentedSubstring(const String& str)
: m_length(str.length())
- , m_current(str.isEmpty() ? 0 : str.characters())
- , m_string(str)
, m_doNotExcludeLineNumbers(true)
+ , m_string(str)
{
+ if (m_length) {
+ if (m_string.is8Bit()) {
+ m_is8Bit = true;
+ m_data.string8Ptr = m_string.characters8();
+ } else {
+ m_is8Bit = false;
+ m_data.string16Ptr = m_string.characters16();
+ }
+ } else
+ m_is8Bit = false;
}
- void clear() { m_length = 0; m_current = 0; }
+ void clear() { m_length = 0; m_data.string16Ptr = 0; m_is8Bit = false;}
+
+ bool is8Bit() { return m_is8Bit; }
bool excludeLineNumbers() const { return !m_doNotExcludeLineNumbers; }
bool doNotExcludeLineNumbers() const { return m_doNotExcludeLineNumbers; }
@@ -57,19 +69,70 @@
void appendTo(StringBuilder& builder) const
{
- if (m_string.characters() == m_current)
- builder.append(m_string);
- else
- builder.append(String(m_current, m_length));
+ int offset = m_string.length() - m_length;
+
+ if (!offset) {
+ if (m_length)
+ builder.append(m_string);
+ } else
+ builder.append(m_string.substring(offset, m_length));
+ }
+
+ UChar getCurrentChar8()
+ {
+ return *m_data.string8Ptr;
+ }
+
+ UChar getCurrentChar16()
+ {
+ return m_data.string16Ptr ? *m_data.string16Ptr : 0;
+ }
+
+ UChar incrementAndGetCurrentChar8()
+ {
+ ASSERT(m_data.string8Ptr);
+ return *++m_data.string8Ptr;
+ }
+
+ UChar incrementAndGetCurrentChar16()
+ {
+ ASSERT(m_data.string16Ptr);
+ return *++m_data.string16Ptr;
+ }
+
+ String currentSubString(unsigned length)
+ {
+ int offset = m_string.length() - m_length;
+ return m_string.substring(offset, length);
+ }
+
+ ALWAYS_INLINE UChar getCurrentChar()
+ {
+ ASSERT(m_length);
+ if (is8Bit())
+ return getCurrentChar8();
+ return getCurrentChar16();
+ }
+
+ ALWAYS_INLINE UChar incrementAndGetCurrentChar()
+ {
+ ASSERT(m_length);
+ if (is8Bit())
+ return incrementAndGetCurrentChar8();
+ return incrementAndGetCurrentChar16();
}
public:
+ union {
+ const LChar* string8Ptr;
+ const UChar* string16Ptr;
+ } m_data;
int m_length;
- const UChar* m_current;
private:
- String m_string;
bool m_doNotExcludeLineNumbers;
+ bool m_is8Bit;
+ String m_string;
};
class SegmentedString {
@@ -82,6 +145,10 @@
, m_numberOfCharactersConsumedPriorToCurrentLine(0)
, m_currentLine(0)
, m_closed(false)
+ , m_empty(true)
+ , m_fastPathFlags(NoFastPath)
+ , m_advanceFunc(&SegmentedString::advanceEmpty)
+ , m_advanceAndUpdateLineNumberFunc(&SegmentedString::advanceEmpty)
{
}
@@ -89,12 +156,17 @@
: m_pushedChar1(0)
, m_pushedChar2(0)
, m_currentString(str)
- , m_currentChar(m_currentString.m_current)
+ , m_currentChar(0)
, m_numberOfCharactersConsumedPriorToCurrentString(0)
, m_numberOfCharactersConsumedPriorToCurrentLine(0)
, m_currentLine(0)
, m_closed(false)
+ , m_empty(!str.length())
+ , m_fastPathFlags(NoFastPath)
{
+ if (m_currentString.m_length)
+ m_currentChar = m_currentString.getCurrentChar();
+ updateAdvanceFunctionPointers();
}
SegmentedString(const SegmentedString&);
@@ -114,14 +186,15 @@
{
if (!m_pushedChar1) {
m_pushedChar1 = c;
- m_currentChar = m_pushedChar1 ? &m_pushedChar1 : m_currentString.m_current;
+ m_currentChar = m_pushedChar1 ? m_pushedChar1 : m_currentString.getCurrentChar();
+ updateSlowCaseFunctionPointers();
} else {
ASSERT(!m_pushedChar2);
m_pushedChar2 = c;
}
}
- bool isEmpty() const { return !current(); }
+ bool isEmpty() const { return m_empty; }
unsigned length() const;
bool isClosed() const { return m_closed; }
@@ -132,66 +205,82 @@
NotEnoughCharacters,
};
- LookAheadResult lookAhead(const String& string) { return lookAheadInline<SegmentedString::equalsLiterally>(string); }
- LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline<SegmentedString::equalsIgnoringCase>(string); }
+ LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); }
+ LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); }
void advance()
{
- if (!m_pushedChar1 && m_currentString.m_length > 1) {
- --m_currentString.m_length;
- m_currentChar = ++m_currentString.m_current;
+ if (m_fastPathFlags & Use8BitAdvance) {
+ ASSERT(!m_pushedChar1);
+ bool haveOneCharacterLeft = (--m_currentString.m_length == 1);
+ m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+
+ if (!haveOneCharacterLeft)
+ return;
+
+ updateSlowCaseFunctionPointers();
+
return;
}
- advanceSlowCase();
+
+ (this->*m_advanceFunc)();
+ }
+
+ inline void advanceAndUpdateLineNumber()
+ {
+ if (m_fastPathFlags & Use8BitAdvance) {
+ ASSERT(!m_pushedChar1);
+
+ bool haveNewLine = (m_currentChar == '\n') & !!(m_fastPathFlags & Use8BitAdvanceAndUpdateLineNumbers);
+ bool haveOneCharacterLeft = (--m_currentString.m_length == 1);
+
+ m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+
+ if (!(haveNewLine | haveOneCharacterLeft))
+ return;
+
+ if (haveNewLine) {
+ ++m_currentLine;
+ m_numberOfCharactersConsumedPriorToCurrentLine = m_numberOfCharactersConsumedPriorToCurrentString + m_currentString.numberOfCharactersConsumed();
+ }
+
+ if (haveOneCharacterLeft)
+ updateSlowCaseFunctionPointers();
+
+ return;
+ }
+
+ (this->*m_advanceAndUpdateLineNumberFunc)();
}
void advanceAndASSERT(UChar expectedCharacter)
{
- ASSERT_UNUSED(expectedCharacter, *current() == expectedCharacter);
+ ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);
advance();
}
void advanceAndASSERTIgnoringCase(UChar expectedCharacter)
{
- ASSERT_UNUSED(expectedCharacter, WTF::Unicode::foldCase(*current()) == WTF::Unicode::foldCase(expectedCharacter));
+ ASSERT_UNUSED(expectedCharacter, WTF::Unicode::foldCase(currentChar()) == WTF::Unicode::foldCase(expectedCharacter));
+ advance();
+ }
+
+ void advancePastNonNewline()
+ {
+ ASSERT(currentChar() != '\n');
advance();
}
void advancePastNewlineAndUpdateLineNumber()
{
- ASSERT(*current() == '\n');
+ ASSERT(currentChar() == '\n');
if (!m_pushedChar1 && m_currentString.m_length > 1) {
int newLineFlag = m_currentString.doNotExcludeLineNumbers();
m_currentLine += newLineFlag;
if (newLineFlag)
m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
- --m_currentString.m_length;
- m_currentChar = ++m_currentString.m_current;
- return;
- }
- advanceAndUpdateLineNumberSlowCase();
- }
-
- void advancePastNonNewline()
- {
- ASSERT(*current() != '\n');
- if (!m_pushedChar1 && m_currentString.m_length > 1) {
- --m_currentString.m_length;
- m_currentChar = ++m_currentString.m_current;
- return;
- }
- advanceSlowCase();
- }
-
- void advanceAndUpdateLineNumber()
- {
- if (!m_pushedChar1 && m_currentString.m_length > 1) {
- int newLineFlag = (*m_currentString.m_current == '\n') & m_currentString.doNotExcludeLineNumbers();
- m_currentLine += newLineFlag;
- if (newLineFlag)
- m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
- --m_currentString.m_length;
- m_currentChar = ++m_currentString.m_current;
+ decrementAndCheckLength();
+ m_currentChar = m_currentString.incrementAndGetCurrentChar();
return;
}
advanceAndUpdateLineNumberSlowCase();
@@ -216,9 +305,7 @@
String toString() const;
- const UChar& operator*() const { return *current(); }
- const UChar* operator->() const { return current(); }
-
+ UChar currentChar() const { return m_currentChar; }
// The method is moderately slow, comparing to currentLine method.
OrdinalNumber currentColumn() const;
@@ -228,30 +315,77 @@
void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength);
private:
+ enum FastPathFlags {
+ NoFastPath = 0,
+ Use8BitAdvanceAndUpdateLineNumbers = 1 << 0,
+ Use8BitAdvance = 1 << 1,
+ };
+
void append(const SegmentedSubstring&);
void prepend(const SegmentedSubstring&);
+ void advance8();
+ void advance16();
+ void advanceAndUpdateLineNumber8();
+ void advanceAndUpdateLineNumber16();
void advanceSlowCase();
void advanceAndUpdateLineNumberSlowCase();
+ void advanceEmpty();
void advanceSubstring();
- const UChar* current() const { return m_currentChar; }
+
+ void updateSlowCaseFunctionPointers();
- static bool equalsLiterally(const UChar* str1, const UChar* str2, size_t count) { return !memcmp(str1, str2, count * sizeof(UChar)); }
- static bool equalsIgnoringCase(const UChar* str1, const UChar* str2, size_t count) { return !WTF::Unicode::umemcasecmp(str1, str2, count); }
+ void decrementAndCheckLength()
+ {
+ ASSERT(m_currentString.m_length > 1);
+ if (--m_currentString.m_length == 1)
+ updateSlowCaseFunctionPointers();
+ }
- template<bool equals(const UChar* str1, const UChar* str2, size_t count)>
- inline LookAheadResult lookAheadInline(const String& string)
+ void updateAdvanceFunctionPointers()
+ {
+ if ((m_currentString.m_length > 1) && !m_pushedChar1) {
+ if (m_currentString.is8Bit()) {
+ m_advanceFunc = &SegmentedString::advance8;
+ m_fastPathFlags = Use8BitAdvance;
+ if (m_currentString.doNotExcludeLineNumbers()) {
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumber8;
+ m_fastPathFlags |= Use8BitAdvanceAndUpdateLineNumbers;
+ } else
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advance8;
+ return;
+ }
+
+ m_advanceFunc = &SegmentedString::advance16;
+ m_fastPathFlags = NoFastPath;
+ if (m_currentString.doNotExcludeLineNumbers())
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumber16;
+ else
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advance16;
+ return;
+ }
+
+ if (!m_currentString.m_length && !isComposite()) {
+ m_advanceFunc = &SegmentedString::advanceEmpty;
+ m_fastPathFlags = NoFastPath;
+ m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+ }
+
+ updateSlowCaseFunctionPointers();
+ }
+
+ inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive)
{
if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) {
- if (equals(string.characters(), m_currentString.m_current, string.length()))
+ String currentSubstring = m_currentString.currentSubString(string.length());
+ if (currentSubstring.startsWith(string, caseSensitive))
return DidMatch;
return DidNotMatch;
}
- return lookAheadSlowCase<equals>(string);
+ return lookAheadSlowCase(string, caseSensitive);
}
-
- template<bool equals(const UChar* str1, const UChar* str2, size_t count)>
- LookAheadResult lookAheadSlowCase(const String& string)
+
+ LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive)
{
unsigned count = string.length();
if (count > length())
@@ -260,7 +394,7 @@
String consumedString = String::createUninitialized(count, consumedCharacters);
advance(count, consumedCharacters);
LookAheadResult result = DidNotMatch;
- if (equals(string.characters(), consumedCharacters, count))
+ if (consumedString.startsWith(string, caseSensitive))
result = DidMatch;
prepend(SegmentedString(consumedString));
return result;
@@ -271,12 +405,16 @@
UChar m_pushedChar1;
UChar m_pushedChar2;
SegmentedSubstring m_currentString;
- const UChar* m_currentChar;
+ UChar m_currentChar;
int m_numberOfCharactersConsumedPriorToCurrentString;
int m_numberOfCharactersConsumedPriorToCurrentLine;
int m_currentLine;
Deque<SegmentedSubstring> m_substrings;
bool m_closed;
+ bool m_empty;
+ unsigned char m_fastPathFlags;
+ void (SegmentedString::*m_advanceFunc)();
+ void (SegmentedString::*m_advanceAndUpdateLineNumberFunc)();
};
}
diff --git a/Source/WebCore/xml/parser/CharacterReferenceParserInlineMethods.h b/Source/WebCore/xml/parser/CharacterReferenceParserInlineMethods.h
index d83cecb..92df636 100644
--- a/Source/WebCore/xml/parser/CharacterReferenceParserInlineMethods.h
+++ b/Source/WebCore/xml/parser/CharacterReferenceParserInlineMethods.h
@@ -68,7 +68,7 @@
StringBuilder consumedCharacters;
while (!source.isEmpty()) {
- UChar cc = *source;
+ UChar cc = source.currentChar();
switch (entityState) {
case Initial: {
if (cc == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc == '&')
diff --git a/Source/WebCore/xml/parser/MarkupTokenizerBase.h b/Source/WebCore/xml/parser/MarkupTokenizerBase.h
index 0a3cada..93a0ed5 100644
--- a/Source/WebCore/xml/parser/MarkupTokenizerBase.h
+++ b/Source/WebCore/xml/parser/MarkupTokenizerBase.h
@@ -66,7 +66,7 @@
{
}
- UChar nextInputCharacter() const { return m_nextInputCharacter; }
+ ALWAYS_INLINE UChar nextInputCharacter() const { return m_nextInputCharacter; }
// Returns whether we succeeded in peeking at the next character.
// The only way we can fail to peek is if there are no more
@@ -74,7 +74,7 @@
ALWAYS_INLINE bool peek(SegmentedString& source)
{
PeekAgain:
- m_nextInputCharacter = *source;
+ m_nextInputCharacter = source.currentChar();
// Every branch in this function is expensive, so we have a
// fast-reject branch for characters that don't require special
@@ -91,7 +91,7 @@
source.advancePastNewlineAndUpdateLineNumber();
if (source.isEmpty())
return false;
- m_nextInputCharacter = *source;
+ m_nextInputCharacter = source.currentChar();
}
if (m_nextInputCharacter == '\r') {
m_nextInputCharacter = '\n';
@@ -116,7 +116,7 @@
}
// Returns whether there are more characters in |source| after advancing.
- bool advance(SegmentedString& source)
+ ALWAYS_INLINE bool advance(SegmentedString& source)
{
source.advanceAndUpdateLineNumber();
if (source.isEmpty())