Refactored script content removal in the fragment parser for clarity and speed
https://bugs.webkit.org/show_bug.cgi?id=112734

Reviewed by Enrica Casucci.

Source/WebCore: 

* WebCore.exp.in: Export!

* dom/DocumentFragment.cpp:
(WebCore::DocumentFragment::parseHTML):
(WebCore::DocumentFragment::parseXML):
* dom/DocumentFragment.h:
(DocumentFragment): Updated for rename of FragmentScriptingPermission to
ParserContentPolicy.

* dom/Element.cpp:
(WebCore::isEventHandlerAttribute):
(WebCore::Element::isJavaScriptURLAttribute):
(WebCore::Element::isJavaScriptAttribute): Fixed a FIXME by factoring
out some helper functions that reuse isURLAttribute(). This makes our
attribute removal slightly more precise, as reflected in test changes.

(WebCore::Element::stripJavaScriptAttributes): Factored this function out
of parserSetAttributes to clarify that the parser is responsible for
fully removing scripts before inserting anything into the DOM.

Now that this is a helper function, we can avoid doing any work in the
common case, where script content is allowed. Also, when we do have to
strip attributes, we use "two finger compaction" to avoid copying the
vector, and to avoid O(N) vector removal operations when there is
something to remove.

(WebCore::Element::parserSetAttributes):
* dom/Element.h:

* dom/FragmentScriptingPermission.h:
(WebCore::scriptingContentIsAllowed):
(WebCore::disallowScriptingContent):
(WebCore::pluginContentIsAllowed):
(WebCore::allowPluginContent): Renamed for clarity, and added some helper
functions for reading values out of this enum.

* dom/ScriptableDocumentParser.cpp:
(WebCore::ScriptableDocumentParser::ScriptableDocumentParser): Moved
a settings check into the parser constructor so we're sure that all
clients get the right behavior.

* dom/ScriptableDocumentParser.h:
(WebCore::ScriptableDocumentParser::parserContentPolicy):
(ScriptableDocumentParser):
* editing/markup.cpp:
(WebCore::createFragmentFromMarkup):
(WebCore::createFragmentFromMarkupWithContext):
(WebCore::createFragmentForInnerOuterHTML):
(WebCore::createContextualFragment):
* editing/markup.h: Updated for renames.

* html/HTMLAnchorElement.cpp:
(WebCore::HTMLAnchorElement::isURLAttribute): Fixed a bug where
isURLAttribute would ignore href attributes in other namespaces, like
xlink:href.

* html/HTMLBaseElement.cpp:
(WebCore::HTMLBaseElement::isURLAttribute): Same bug.

* html/HTMLElement.cpp:
(WebCore::HTMLElement::isURLAttribute): Fixed a logic error where HTMLElement
wouldn't call through to its base class.

* html/HTMLLinkElement.cpp:
(WebCore::HTMLLinkElement::isURLAttribute): Same isURLAttribute namespace
bug as above.

* html/parser/HTMLConstructionSite.cpp:
(WebCore::setAttributes): Helper function for optionally stripping
disallowed attributes before setting them on an element. This helps all
clients get the right behavior without sprinkling checks everywhere.

(WebCore::HTMLConstructionSite::HTMLConstructionSite):
(WebCore::HTMLConstructionSite::insertHTMLHtmlStartTagBeforeHTML):

(WebCore::HTMLConstructionSite::insertScriptElement): Don't schedule the
element for insertion into the document if the element is forbidden. This
is slightly clearer than leaving an empty forbidden element in the document.

(WebCore::HTMLConstructionSite::createElement):
(WebCore::HTMLConstructionSite::createHTMLElement):
* html/parser/HTMLConstructionSite.h:
(HTMLConstructionSite):
(WebCore::HTMLConstructionSite::parserContentPolicy):
* html/parser/HTMLDocumentParser.cpp:
(WebCore::HTMLDocumentParser::HTMLDocumentParser):
(WebCore::HTMLDocumentParser::runScriptsForPausedTreeBuilder):
(WebCore::HTMLDocumentParser::parseDocumentFragment):
* html/parser/HTMLDocumentParser.h:
(HTMLDocumentParser):
(WebCore::HTMLDocumentParser::create):
* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::HTMLTreeBuilder):
(WebCore::HTMLTreeBuilder::FragmentParsingContext::FragmentParsingContext):
Updated for renames and interface changes above.

(WebCore::HTMLTreeBuilder::processStartTagForInBody):
(WebCore::HTMLTreeBuilder::processEndTag): Removed isParsingFragment()
checks to make it possible to use ParserContentPolicy in more places.

Removed call to removeChildren() because, if an element is forbidden,
we fully remove the element now. This brings behavior for <script>
elements in line with behavior for plug-in elements. It also brings
behavior of the HTML parser in line with behavior of the XML parser.

* html/parser/HTMLTreeBuilder.h:
(WebCore::HTMLTreeBuilder::create):
(FragmentParsingContext):
(WebCore::HTMLTreeBuilder::FragmentParsingContext::contextElement):
* platform/blackberry/PasteboardBlackBerry.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/chromium/DragDataChromium.cpp:
(WebCore::DragData::asFragment):
* platform/chromium/PasteboardChromium.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/gtk/PasteboardGtk.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/mac/PasteboardMac.mm:
(WebCore::Pasteboard::documentFragment):
* platform/qt/DragDataQt.cpp:
(WebCore::DragData::asFragment):
* platform/qt/PasteboardQt.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/win/ClipboardUtilitiesWin.cpp:
(WebCore::fragmentFromCFHTML):
(WebCore::fragmentFromHTML):
* platform/wx/PasteboardWx.cpp:
(WebCore::Pasteboard::documentFragment): Updated for renames and interface
changes.

* svg/SVGAElement.cpp:
(WebCore::SVGAElement::isURLAttribute): Fixed a bug where SVG anchor
elements didn't identify their URL attributes.

* svg/SVGAElement.h:
(SVGAElement):

* xml/XMLErrors.cpp:
(WebCore::createXHTMLParserErrorHeader):
(WebCore::XMLErrors::insertErrorMessageBlock): No need to disallow
scripting attributes here because we're creating the attributes 
ourselves and we know they're not scripting attributes.

* xml/parser/XMLDocumentParser.cpp:
(WebCore::XMLDocumentParser::parseDocumentFragment):
* xml/parser/XMLDocumentParser.h:
(WebCore::XMLDocumentParser::create):
(XMLDocumentParser): Updated for renames and interface changes above.

Removed the 8 inline capacity in the attribute vector so we could share
helper functions with the HTML parser, which didn't have it.

* xml/parser/XMLDocumentParserLibxml2.cpp:
(WebCore::setAttributes):
(WebCore):
(WebCore::XMLDocumentParser::XMLDocumentParser):
(WebCore::handleNamespaceAttributes):
(WebCore::handleElementAttributes):
(WebCore::XMLDocumentParser::startElementNs):
(WebCore::XMLDocumentParser::endElementNs):
* xml/parser/XMLDocumentParserQt.cpp:
(WebCore::setAttributes):
(WebCore):
(WebCore::XMLDocumentParser::XMLDocumentParser):
(WebCore::handleNamespaceAttributes):
(WebCore::handleElementAttributes):
(WebCore::XMLDocumentParser::parseStartElement):
(WebCore::XMLDocumentParser::parseEndElement): Same changes as for the
HTML parser.

LayoutTests: 

Updated tests to improve coverage and reflect behavior tweaks to improve
clarity.

* editing/pasteboard/paste-noscript-expected.txt: 
    - The "href", "source", and "action" attributes are fully removed now,
    instead of being set to the empty string, because for clarity we
    fully remove script attributes instead of setting their values to
    the empty string.

    - The "formaction" attribute on the form control is not removed because,
    even though it seems to contain javascript content, the formaction
    attribute doesn't map to anything on a form element, and won't ever
    run as script.

    - I added a button with a "formaction" attribute, to verify that it
    does get stripped, since this is the case where the "formaction"
    attribute can run as script.

* editing/pasteboard/paste-noscript-svg-expected.txt:
    - The "xlink:href" attribute is fully removed now. See above.

* editing/pasteboard/paste-noscript-xhtml-expected.txt:
* editing/pasteboard/paste-noscript.html:
    - The "href", "source", and "action" attributes are fully removed now.
    See above.

    - The <script> element is fully removed now. See above.

    - The "formaction" attribute on the form control is not removed.
    See above.

    - I added a button with a "formaction" attribute. See above.


* editing/pasteboard/paste-visible-script-expected.txt:
    - The <script> elements are fully removed now. See above.

* editing/pasteboard/resources/paste-noscript-content.html:
    - The "formaction" attribute on the form control is not removed.
    See above.

    - I added a button with a "formaction" attribute. See above.


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@146264 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/Source/WebCore/dom/DocumentFragment.h b/Source/WebCore/dom/DocumentFragment.h
index 1aca4cc..d23c482 100644
--- a/Source/WebCore/dom/DocumentFragment.h
+++ b/Source/WebCore/dom/DocumentFragment.h
@@ -33,8 +33,8 @@
 public:
     static PassRefPtr<DocumentFragment> create(Document*);
 
-    void parseHTML(const String&, Element* contextElement, FragmentScriptingPermission = AllowScriptingContent);
-    bool parseXML(const String&, Element* contextElement, FragmentScriptingPermission = AllowScriptingContent);
+    void parseHTML(const String&, Element* contextElement, ParserContentPolicy = AllowScriptingContent);
+    bool parseXML(const String&, Element* contextElement, ParserContentPolicy = AllowScriptingContent);
     
     virtual bool canContainRangeEndPoint() const { return true; }
     virtual bool isTemplateContent() const { return false; }