0
Files
src/ash/projector
Benjamin Zielinski 64cb892523 Respect original transcript spacing when splitting by sentences
Starting in V2, we split transcripts into individual sentences.
Currently, this is done by building up from transcript hypothesis parts.
However, it turns out that sometimes the speech model may return a mix
of character types, and we should rely on the whitespace as given by the
full text of the transcript rather than a single type of spacing per
language.

Additionally, the speech models also sometimes return a delimiter
character inside of hypothesis parts, so we must remove this before
trying to use any hypothesis parts.

Bug: b/330271007
Change-Id: Ib7dda966140fdd824a4a80a454e3fcc62c06d01b
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5402048
Reviewed-by: Ahmed Nasr <anasr@google.com>
Reviewed-by: Li Lin <llin@chromium.org>
Commit-Queue: Benjamin Zielinski <bzielinski@google.com>
Cr-Commit-Position: refs/heads/main@{#1280884}
2024-04-01 21:38:02 +00:00
..
2024-03-05 16:58:58 +00:00