Strip Zero Width Whitespace from PDFium text strings
When getting text from PDFium, the library does not filter ZWW (0x200B), since it is a valid non-control character. It is ignorable though, so the embedder aka Chrome, has the option of whether or not to display this character. Given that it shouldn't have any visual display, including it in the displayed text can lead to weird UI situations. Like the length of text being longer then number of characters displayed or navigating the cursor requires multiple key presses to get over the ZWW. BUG=chromium:743522 Change-Id: I5312a3aad4a752659fb4455853cd1030f0660bd9 Reviewed-on: https://chromium-review.googlesource.com/1210966 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#589271}
This commit is contained in:

committed by
Commit Bot

parent
cbd64a18f4
commit
a7a26d22d4
@ -12,6 +12,8 @@ namespace chrome_pdf {
|
||||
|
||||
namespace {
|
||||
|
||||
constexpr base::char16 kZeroWidthWhitespace = 0x200B;
|
||||
|
||||
void AdjustForBackwardsRange(int* index, int* count) {
|
||||
int& char_index = *index;
|
||||
int& char_count = *count;
|
||||
@ -105,6 +107,9 @@ base::string16 PDFiumRange::GetText() const {
|
||||
api_string_adapter.Close(written);
|
||||
}
|
||||
|
||||
// Strip ignorable non-displaying whitespace
|
||||
rv.erase(std::remove(rv.begin(), rv.end(), kZeroWidthWhitespace), rv.end());
|
||||
|
||||
return rv;
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user