Article headlines are split across multiple lines after translating tagesschau.de
https://bugs.webkit.org/show_bug.cgi?id=214652
<rdar://problem/63995571>
Reviewed by Tim Horton.
Source/WebCore:
Treat block-level heading elements (`h1` through `h6`) as item boundaries. On this website, we currently vend
the main heading and sub-heading as separate tokens in the same item, which sometimes causes the machine
translation framework to try and rearrange and split tokens, since it treats the combination of the main heading
and sub-heading text as a single sentence to be translated.
Test: TextManipulation.StartTextManipulationExtractsHeadingElementsAsSeparateItems
* editing/TextManipulationController.cpp:
(WebCore::isEnclosingItemBoundaryElement):
Tools:
Adds a new API test (a reduced version of article headlines from the front page of tagesschau.de).
* TestWebKitAPI/Tests/WebKitCocoa/TextManipulation.mm:
git-svn-id: http://svn.webkit.org/repository/webkit/trunk@264729 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/Source/WebCore/editing/TextManipulationController.cpp b/Source/WebCore/editing/TextManipulationController.cpp
index 03a9d82..b31dac5 100644
--- a/Source/WebCore/editing/TextManipulationController.cpp
+++ b/Source/WebCore/editing/TextManipulationController.cpp
@@ -316,6 +316,10 @@
if (element.hasTagName(HTMLNames::spanTag) && displayType == DisplayType::InlineBlock)
return true;
+ if (displayType == DisplayType::Block && (element.hasTagName(HTMLNames::h1Tag) || element.hasTagName(HTMLNames::h2Tag) || element.hasTagName(HTMLNames::h3Tag)
+ || element.hasTagName(HTMLNames::h4Tag) || element.hasTagName(HTMLNames::h5Tag) || element.hasTagName(HTMLNames::h6Tag)))
+ return true;
+
return false;
}