0

Strip Zero Width Whitespace from PDFium text strings

When getting text from PDFium, the library does not filter ZWW
(0x200B), since it is a valid non-control character. It is ignorable
though, so the embedder aka Chrome, has the option of whether or not
to display this character. Given that it shouldn't have any visual
display, including it in the displayed text can lead to weird UI
situations. Like the length of text being longer then number of
characters displayed or navigating the cursor requires multiple key
presses to get over the ZWW.

BUG=chromium:743522

Change-Id: I5312a3aad4a752659fb4455853cd1030f0660bd9
Reviewed-on: https://chromium-review.googlesource.com/1210966
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#589271}
This commit is contained in:
Ryan Harrison
2018-09-06 20:22:37 +00:00
committed by Commit Bot
parent cbd64a18f4
commit a7a26d22d4

@ -12,6 +12,8 @@ namespace chrome_pdf {
namespace {
constexpr base::char16 kZeroWidthWhitespace = 0x200B;
void AdjustForBackwardsRange(int* index, int* count) {
int& char_index = *index;
int& char_count = *count;
@ -105,6 +107,9 @@ base::string16 PDFiumRange::GetText() const {
api_string_adapter.Close(written);
}
// Strip ignorable non-displaying whitespace
rv.erase(std::remove(rv.begin(), rv.end(), kZeroWidthWhitespace), rv.end());
return rv;
}