Refactored script content removal in the fragment parser for clarity and speed
https://bugs.webkit.org/show_bug.cgi?id=112734

Reviewed by Enrica Casucci.

Source/WebCore: 

* WebCore.exp.in: Export!

* dom/DocumentFragment.cpp:
(WebCore::DocumentFragment::parseHTML):
(WebCore::DocumentFragment::parseXML):
* dom/DocumentFragment.h:
(DocumentFragment): Updated for rename of FragmentScriptingPermission to
ParserContentPolicy.

* dom/Element.cpp:
(WebCore::isEventHandlerAttribute):
(WebCore::Element::isJavaScriptURLAttribute):
(WebCore::Element::isJavaScriptAttribute): Fixed a FIXME by factoring
out some helper functions that reuse isURLAttribute(). This makes our
attribute removal slightly more precise, as reflected in test changes.

(WebCore::Element::stripJavaScriptAttributes): Factored this function out
of parserSetAttributes to clarify that the parser is responsible for
fully removing scripts before inserting anything into the DOM.

Now that this is a helper function, we can avoid doing any work in the
common case, where script content is allowed. Also, when we do have to
strip attributes, we use "two finger compaction" to avoid copying the
vector, and to avoid O(N) vector removal operations when there is
something to remove.

(WebCore::Element::parserSetAttributes):
* dom/Element.h:

* dom/FragmentScriptingPermission.h:
(WebCore::scriptingContentIsAllowed):
(WebCore::disallowScriptingContent):
(WebCore::pluginContentIsAllowed):
(WebCore::allowPluginContent): Renamed for clarity, and added some helper
functions for reading values out of this enum.

* dom/ScriptableDocumentParser.cpp:
(WebCore::ScriptableDocumentParser::ScriptableDocumentParser): Moved
a settings check into the parser constructor so we're sure that all
clients get the right behavior.

* dom/ScriptableDocumentParser.h:
(WebCore::ScriptableDocumentParser::parserContentPolicy):
(ScriptableDocumentParser):
* editing/markup.cpp:
(WebCore::createFragmentFromMarkup):
(WebCore::createFragmentFromMarkupWithContext):
(WebCore::createFragmentForInnerOuterHTML):
(WebCore::createContextualFragment):
* editing/markup.h: Updated for renames.

* html/HTMLAnchorElement.cpp:
(WebCore::HTMLAnchorElement::isURLAttribute): Fixed a bug where
isURLAttribute would ignore href attributes in other namespaces, like
xlink:href.

* html/HTMLBaseElement.cpp:
(WebCore::HTMLBaseElement::isURLAttribute): Same bug.

* html/HTMLElement.cpp:
(WebCore::HTMLElement::isURLAttribute): Fixed a logic error where HTMLElement
wouldn't call through to its base class.

* html/HTMLLinkElement.cpp:
(WebCore::HTMLLinkElement::isURLAttribute): Same isURLAttribute namespace
bug as above.

* html/parser/HTMLConstructionSite.cpp:
(WebCore::setAttributes): Helper function for optionally stripping
disallowed attributes before setting them on an element. This helps all
clients get the right behavior without sprinkling checks everywhere.

(WebCore::HTMLConstructionSite::HTMLConstructionSite):
(WebCore::HTMLConstructionSite::insertHTMLHtmlStartTagBeforeHTML):

(WebCore::HTMLConstructionSite::insertScriptElement): Don't schedule the
element for insertion into the document if the element is forbidden. This
is slightly clearer than leaving an empty forbidden element in the document.

(WebCore::HTMLConstructionSite::createElement):
(WebCore::HTMLConstructionSite::createHTMLElement):
* html/parser/HTMLConstructionSite.h:
(HTMLConstructionSite):
(WebCore::HTMLConstructionSite::parserContentPolicy):
* html/parser/HTMLDocumentParser.cpp:
(WebCore::HTMLDocumentParser::HTMLDocumentParser):
(WebCore::HTMLDocumentParser::runScriptsForPausedTreeBuilder):
(WebCore::HTMLDocumentParser::parseDocumentFragment):
* html/parser/HTMLDocumentParser.h:
(HTMLDocumentParser):
(WebCore::HTMLDocumentParser::create):
* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::HTMLTreeBuilder):
(WebCore::HTMLTreeBuilder::FragmentParsingContext::FragmentParsingContext):
Updated for renames and interface changes above.

(WebCore::HTMLTreeBuilder::processStartTagForInBody):
(WebCore::HTMLTreeBuilder::processEndTag): Removed isParsingFragment()
checks to make it possible to use ParserContentPolicy in more places.

Removed call to removeChildren() because, if an element is forbidden,
we fully remove the element now. This brings behavior for <script>
elements in line with behavior for plug-in elements. It also brings
behavior of the HTML parser in line with behavior of the XML parser.

* html/parser/HTMLTreeBuilder.h:
(WebCore::HTMLTreeBuilder::create):
(FragmentParsingContext):
(WebCore::HTMLTreeBuilder::FragmentParsingContext::contextElement):
* platform/blackberry/PasteboardBlackBerry.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/chromium/DragDataChromium.cpp:
(WebCore::DragData::asFragment):
* platform/chromium/PasteboardChromium.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/gtk/PasteboardGtk.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/mac/PasteboardMac.mm:
(WebCore::Pasteboard::documentFragment):
* platform/qt/DragDataQt.cpp:
(WebCore::DragData::asFragment):
* platform/qt/PasteboardQt.cpp:
(WebCore::Pasteboard::documentFragment):
* platform/win/ClipboardUtilitiesWin.cpp:
(WebCore::fragmentFromCFHTML):
(WebCore::fragmentFromHTML):
* platform/wx/PasteboardWx.cpp:
(WebCore::Pasteboard::documentFragment): Updated for renames and interface
changes.

* svg/SVGAElement.cpp:
(WebCore::SVGAElement::isURLAttribute): Fixed a bug where SVG anchor
elements didn't identify their URL attributes.

* svg/SVGAElement.h:
(SVGAElement):

* xml/XMLErrors.cpp:
(WebCore::createXHTMLParserErrorHeader):
(WebCore::XMLErrors::insertErrorMessageBlock): No need to disallow
scripting attributes here because we're creating the attributes 
ourselves and we know they're not scripting attributes.

* xml/parser/XMLDocumentParser.cpp:
(WebCore::XMLDocumentParser::parseDocumentFragment):
* xml/parser/XMLDocumentParser.h:
(WebCore::XMLDocumentParser::create):
(XMLDocumentParser): Updated for renames and interface changes above.

Removed the 8 inline capacity in the attribute vector so we could share
helper functions with the HTML parser, which didn't have it.

* xml/parser/XMLDocumentParserLibxml2.cpp:
(WebCore::setAttributes):
(WebCore):
(WebCore::XMLDocumentParser::XMLDocumentParser):
(WebCore::handleNamespaceAttributes):
(WebCore::handleElementAttributes):
(WebCore::XMLDocumentParser::startElementNs):
(WebCore::XMLDocumentParser::endElementNs):
* xml/parser/XMLDocumentParserQt.cpp:
(WebCore::setAttributes):
(WebCore):
(WebCore::XMLDocumentParser::XMLDocumentParser):
(WebCore::handleNamespaceAttributes):
(WebCore::handleElementAttributes):
(WebCore::XMLDocumentParser::parseStartElement):
(WebCore::XMLDocumentParser::parseEndElement): Same changes as for the
HTML parser.

LayoutTests: 

Updated tests to improve coverage and reflect behavior tweaks to improve
clarity.

* editing/pasteboard/paste-noscript-expected.txt: 
    - The "href", "source", and "action" attributes are fully removed now,
    instead of being set to the empty string, because for clarity we
    fully remove script attributes instead of setting their values to
    the empty string.

    - The "formaction" attribute on the form control is not removed because,
    even though it seems to contain javascript content, the formaction
    attribute doesn't map to anything on a form element, and won't ever
    run as script.

    - I added a button with a "formaction" attribute, to verify that it
    does get stripped, since this is the case where the "formaction"
    attribute can run as script.

* editing/pasteboard/paste-noscript-svg-expected.txt:
    - The "xlink:href" attribute is fully removed now. See above.

* editing/pasteboard/paste-noscript-xhtml-expected.txt:
* editing/pasteboard/paste-noscript.html:
    - The "href", "source", and "action" attributes are fully removed now.
    See above.

    - The <script> element is fully removed now. See above.

    - The "formaction" attribute on the form control is not removed.
    See above.

    - I added a button with a "formaction" attribute. See above.


* editing/pasteboard/paste-visible-script-expected.txt:
    - The <script> elements are fully removed now. See above.

* editing/pasteboard/resources/paste-noscript-content.html:
    - The "formaction" attribute on the form control is not removed.
    See above.

    - I added a button with a "formaction" attribute. See above.


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@146264 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/Source/WebCore/dom/DocumentFragment.cpp b/Source/WebCore/dom/DocumentFragment.cpp
index e660248..342d171 100644
--- a/Source/WebCore/dom/DocumentFragment.cpp
+++ b/Source/WebCore/dom/DocumentFragment.cpp
@@ -75,14 +75,14 @@
     return clone.release();
 }
 
-void DocumentFragment::parseHTML(const String& source, Element* contextElement, FragmentScriptingPermission scriptingPermission)
+void DocumentFragment::parseHTML(const String& source, Element* contextElement, ParserContentPolicy parserContentPolicy)
 {
-    HTMLDocumentParser::parseDocumentFragment(source, this, contextElement, scriptingPermission);
+    HTMLDocumentParser::parseDocumentFragment(source, this, contextElement, parserContentPolicy);
 }
 
-bool DocumentFragment::parseXML(const String& source, Element* contextElement, FragmentScriptingPermission scriptingPermission)
+bool DocumentFragment::parseXML(const String& source, Element* contextElement, ParserContentPolicy parserContentPolicy)
 {
-    return XMLDocumentParser::parseDocumentFragment(source, this, contextElement, scriptingPermission);
+    return XMLDocumentParser::parseDocumentFragment(source, this, contextElement, parserContentPolicy);
 }
 
 }
diff --git a/Source/WebCore/dom/DocumentFragment.h b/Source/WebCore/dom/DocumentFragment.h
index 1aca4cc..d23c482 100644
--- a/Source/WebCore/dom/DocumentFragment.h
+++ b/Source/WebCore/dom/DocumentFragment.h
@@ -33,8 +33,8 @@
 public:
     static PassRefPtr<DocumentFragment> create(Document*);
 
-    void parseHTML(const String&, Element* contextElement, FragmentScriptingPermission = AllowScriptingContent);
-    bool parseXML(const String&, Element* contextElement, FragmentScriptingPermission = AllowScriptingContent);
+    void parseHTML(const String&, Element* contextElement, ParserContentPolicy = AllowScriptingContent);
+    bool parseXML(const String&, Element* contextElement, ParserContentPolicy = AllowScriptingContent);
     
     virtual bool canContainRangeEndPoint() const { return true; }
     virtual bool isTemplateContent() const { return false; }
diff --git a/Source/WebCore/dom/Element.cpp b/Source/WebCore/dom/Element.cpp
index 6e89c01..aa7f52d 100644
--- a/Source/WebCore/dom/Element.cpp
+++ b/Source/WebCore/dom/Element.cpp
@@ -1019,57 +1019,61 @@
 // It is a simple solution that has the advantage of not requiring any
 // code or configuration change if a new event handler is defined.
 
-static bool isEventHandlerAttribute(const QualifiedName& name)
+static inline bool isEventHandlerAttribute(const Attribute& attribute)
 {
-    return name.namespaceURI().isNull() && name.localName().startsWith("on");
+    return attribute.name().namespaceURI().isNull() && attribute.name().localName().startsWith("on");
 }
 
-// FIXME: Share code with Element::isURLAttribute.
-static bool isAttributeToRemove(const QualifiedName& name, const AtomicString& value)
+bool Element::isJavaScriptURLAttribute(const Attribute& attribute)
 {
-    return (name.localName() == hrefAttr.localName() || name.localName() == nohrefAttr.localName()
-        || name == srcAttr || name == actionAttr || name == formactionAttr) && protocolIsJavaScript(stripLeadingAndTrailingHTMLSpaces(value));
+    if (!isURLAttribute(attribute))
+        return false;
+    if (!protocolIsJavaScript(stripLeadingAndTrailingHTMLSpaces(attribute.value())))
+        return false;
+    return true;
 }
 
-void Element::parserSetAttributes(const Vector<Attribute>& attributeVector, FragmentScriptingPermission scriptingPermission)
+bool Element::isJavaScriptAttribute(const Attribute& attribute)
+{
+    if (isEventHandlerAttribute(attribute))
+        return true;
+    if (isJavaScriptURLAttribute(attribute))
+        return true;
+    return false;
+}
+
+void Element::stripJavaScriptAttributes(Vector<Attribute>& attributeVector)
+{
+    size_t destination = 0;
+    for (size_t source = 0; source < attributeVector.size(); ++source) {
+        if (isJavaScriptAttribute(attributeVector[source]))
+            continue;
+
+        if (source != destination)
+            attributeVector[destination] = attributeVector[source];
+
+        ++destination;
+    }
+    attributeVector.shrink(destination);
+}
+
+void Element::parserSetAttributes(const Vector<Attribute>& attributeVector)
 {
     ASSERT(!inDocument());
     ASSERT(!parentNode());
-
     ASSERT(!m_elementData);
 
     if (attributeVector.isEmpty())
         return;
 
-    Vector<Attribute> filteredAttributes = attributeVector;
-
-    // If the element is created as result of a paste or drag-n-drop operation
-    // we want to remove all the script and event handlers.
-    if (!scriptingContentIsAllowed(scriptingPermission)) {
-        size_t i = 0;
-        while (i < filteredAttributes.size()) {
-            Attribute& attribute = filteredAttributes[i];
-            if (isEventHandlerAttribute(attribute.name())) {
-                filteredAttributes.remove(i);
-                continue;
-            }
-
-            if (isAttributeToRemove(attribute.name(), attribute.value()))
-                attribute.setValue(emptyAtom);
-            i++;
-        }
-    }
-
     if (document() && document()->sharedObjectPool())
-        m_elementData = document()->sharedObjectPool()->cachedShareableElementDataWithAttributes(filteredAttributes);
+        m_elementData = document()->sharedObjectPool()->cachedShareableElementDataWithAttributes(attributeVector);
     else
-        m_elementData = ShareableElementData::createWithAttributes(filteredAttributes);
+        m_elementData = ShareableElementData::createWithAttributes(attributeVector);
 
-    // Iterate over the set of attributes we already have on the stack in case
-    // attributeChanged mutates m_elementData.
-    // FIXME: Find a way so we don't have to do this.
-    for (unsigned i = 0; i < filteredAttributes.size(); ++i)
-        attributeChanged(filteredAttributes[i].name(), filteredAttributes[i].value());
+    // Use attributeVector instead of m_elementData because attributeChanged might modify m_elementData.
+    for (unsigned i = 0; i < attributeVector.size(); ++i)
+        attributeChanged(attributeVector[i].name(), attributeVector[i].value());
 }
 
 bool Element::hasAttributes() const
diff --git a/Source/WebCore/dom/Element.h b/Source/WebCore/dom/Element.h
index 9ca4a57..baa9d42 100644
--- a/Source/WebCore/dom/Element.h
+++ b/Source/WebCore/dom/Element.h
@@ -378,7 +378,9 @@
     virtual void parseAttribute(const QualifiedName&, const AtomicString&) { }
 
     // Only called by the parser immediately after element construction.
-    void parserSetAttributes(const Vector<Attribute>&, FragmentScriptingPermission);
+    void parserSetAttributes(const Vector<Attribute>&);
+
+    void stripJavaScriptAttributes(Vector<Attribute>&);
 
     const ElementData* elementData() const { return m_elementData.get(); }
     UniqueElementData* ensureUniqueElementData();
@@ -720,6 +722,9 @@
 
     void createRendererIfNeeded();
 
+    bool isJavaScriptAttribute(const Attribute&);
+    bool isJavaScriptURLAttribute(const Attribute&);
+
     RefPtr<ElementData> m_elementData;
 };
     
diff --git a/Source/WebCore/dom/FragmentScriptingPermission.h b/Source/WebCore/dom/FragmentScriptingPermission.h
index 97a71ca..21b3552 100644
--- a/Source/WebCore/dom/FragmentScriptingPermission.h
+++ b/Source/WebCore/dom/FragmentScriptingPermission.h
@@ -23,30 +23,44 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef FragmentScriptingPermission_h
-#define FragmentScriptingPermission_h
+// FIXME: Move this file to ParserContentPolicy.h.
+
+#ifndef ParserContentPolicy_h
+#define ParserContentPolicy_h
 
 namespace WebCore {
 
-// FIXME: This enum is poorly named.  It is used to remove script contents when
-// generating DocumentFragments for paste in platform/*/Pasteboard.*.
-enum FragmentScriptingPermission {
-    DisallowScriptingAndPluginContentIfNeeded,
+enum ParserContentPolicy {
+    DisallowScriptingAndPluginContent,
     DisallowScriptingContent,
     AllowScriptingContent,
     AllowScriptingContentAndDoNotMarkAlreadyStarted,
 };
 
-static inline bool scriptingContentIsAllowed(FragmentScriptingPermission scriptingPermission) 
+static inline bool scriptingContentIsAllowed(ParserContentPolicy parserContentPolicy) 
 {
-    return scriptingPermission == AllowScriptingContent || scriptingPermission == AllowScriptingContentAndDoNotMarkAlreadyStarted;
+    return parserContentPolicy == AllowScriptingContent || parserContentPolicy == AllowScriptingContentAndDoNotMarkAlreadyStarted;
 }
 
-static inline bool pluginContentIsAllowed(FragmentScriptingPermission scriptingPermission)
+static inline ParserContentPolicy disallowScriptingContent(ParserContentPolicy parserContentPolicy)
 {
-    return scriptingPermission != DisallowScriptingAndPluginContentIfNeeded;
+    if (!scriptingContentIsAllowed(parserContentPolicy))
+        return parserContentPolicy;
+    return DisallowScriptingContent;
+}
+
+static inline bool pluginContentIsAllowed(ParserContentPolicy parserContentPolicy)
+{
+    return parserContentPolicy != DisallowScriptingAndPluginContent;
+}
+
+static inline ParserContentPolicy allowPluginContent(ParserContentPolicy parserContentPolicy)
+{
+    if (pluginContentIsAllowed(parserContentPolicy))
+        return parserContentPolicy;
+    return DisallowScriptingContent;
 }
 
 } // namespace WebCore
 
-#endif // FragmentScriptingPermission_h
+#endif // ParserContentPolicy_h
diff --git a/Source/WebCore/dom/ScriptableDocumentParser.cpp b/Source/WebCore/dom/ScriptableDocumentParser.cpp
index 708dc1d..23bb887 100644
--- a/Source/WebCore/dom/ScriptableDocumentParser.cpp
+++ b/Source/WebCore/dom/ScriptableDocumentParser.cpp
@@ -26,12 +26,20 @@
 #include "config.h"
 #include "ScriptableDocumentParser.h"
 
+#include "Document.h"
+#include "Frame.h"
+#include "ScriptController.h"
+#include "Settings.h"
+
 namespace WebCore {
 
-ScriptableDocumentParser::ScriptableDocumentParser(Document* document)
+ScriptableDocumentParser::ScriptableDocumentParser(Document* document, ParserContentPolicy parserContentPolicy)
     : DecodedDataDocumentParser(document)
     , m_wasCreatedByScript(false)
+    , m_parserContentPolicy(parserContentPolicy)
 {
+    if (!pluginContentIsAllowed(m_parserContentPolicy) && (!document->settings() || document->settings()->unsafePluginPastingEnabled()))
+        m_parserContentPolicy = allowPluginContent(m_parserContentPolicy);
 }
 
 };
diff --git a/Source/WebCore/dom/ScriptableDocumentParser.h b/Source/WebCore/dom/ScriptableDocumentParser.h
index b140cf1..39223ce 100644
--- a/Source/WebCore/dom/ScriptableDocumentParser.h
+++ b/Source/WebCore/dom/ScriptableDocumentParser.h
@@ -27,6 +27,7 @@
 #define ScriptableDocumentParser_h
 
 #include "DecodedDataDocumentParser.h"
+#include "FragmentScriptingPermission.h"
 #include <wtf/text/TextPosition.h>
 
 namespace WebCore {
@@ -50,14 +51,17 @@
     void setWasCreatedByScript(bool wasCreatedByScript) { m_wasCreatedByScript = wasCreatedByScript; }
     bool wasCreatedByScript() const { return m_wasCreatedByScript; }
 
+    ParserContentPolicy parserContentPolicy() { return m_parserContentPolicy; }
+
 protected:
-    explicit ScriptableDocumentParser(Document*);
+    explicit ScriptableDocumentParser(Document*, ParserContentPolicy = AllowScriptingContent);
 
 private:
     virtual ScriptableDocumentParser* asScriptableDocumentParser() { return this; }
 
     // http://www.whatwg.org/specs/web-apps/current-work/#script-created-parser
     bool m_wasCreatedByScript;
+    ParserContentPolicy m_parserContentPolicy;
 };
 
 }