Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
23
18
342
Nguyen Van Anh Tuan
tuanio
Follow
hieptuyet1102's profile picture
suzii's profile picture
hongquoc's profile picture
9 followers
·
82 following
https://tuanio.github.io/
tuanio
AI & ML interests
Natural Language Processing and Speech Processing
Recent Activity
reacted
to
sanchit-gandhi
's
post
with ❤️
20 days ago
Why does returning timestamps help Whisper reduce hallucinations? 🧐 Empirically, most practitioners have found that setting `return_timestamps=True` helps reduce hallucinations, particularly when doing long-form evaluation with Transformers’ “chunked” algorithm. But why does this work?.. My interpretation is that forcing the model to predict timestamps is contradictory to hallucinations. Suppose you have the transcription: ```markdown The cat sat on the on the on the mat. ``` Where we have a repeated hallucination for “on the”. If we ask the model to predict timestamps, then the “on the” has to contribute to the overall segment-level timing, e.g.: ```markdown <|0.00|> The cat sat on the on the on the mat.<|5.02|> ``` However, it’s impossible to fit 3 copies of “on the” within the time allocation given to the segment, so the probability for this hallucinatory sequence becomes lower, and the model actually predicts the correct transcription with highest probability: ```markdown <|0.00|> The cat sat on the mat.<|5.02|> ``` In this sense, the end timestamp is of the opposite of the initial timestamp constraint they describe in Section 4.5 of the paper https://huggingface.co/papers/2212.04356 → it helps the model remove extra words at the end of the sequence (rather than the initial timestamp which helps when the model ignores words at the start), but the overall principle is the same (using timestamps to improve the probability of more realistic sequences). Leaving it open to you: why do you think timestamps reduces Whisper hallucinations?
liked
a dataset
28 days ago
chuuhtetnaing/myanmar-speech-dataset-openslr-80
liked
a model
about 1 month ago
pyannote/speaker-diarization-community-1
View all activity
Organizations
tuanio
's datasets
13
Sort: Recently updated
tuanio/sample_data_colab
Preview
•
Updated
Apr 4
•
5
tuanio/example
Viewer
•
Updated
Apr 4
•
1
•
9
tuanio/test_audio
Viewer
•
Updated
Mar 10
•
1
•
17
tuanio/chart-qa-vietnamese
Updated
Sep 15, 2024
•
3
tuanio/LaVy-Bench-GPT4o
Viewer
•
Updated
Jul 19, 2024
•
60
•
43
•
1
tuanio/bad_word_sent_cls
Viewer
•
Updated
Jul 17, 2024
•
500k
•
23
tuanio/chart-vllm-ver1
Viewer
•
Updated
Jul 4, 2024
•
443
•
14
•
2
tuanio/bad_word_token_cls
Viewer
•
Updated
Feb 14, 2024
•
500k
•
9
tuanio/processed_bad_word_cls_dataset
Viewer
•
Updated
Jan 23, 2024
•
100k
•
17
•
1
tuanio/book_corpus-input_ids-invalid-random_shuffle-len256
Viewer
•
Updated
Oct 26, 2023
•
6.15M
•
6
tuanio/book_corpus-input_ids-valid-len256
Viewer
•
Updated
Oct 26, 2023
•
6.16M
•
17
tuanio/cddata
Updated
Aug 2, 2023
•
3
tuanio/fashion-images-crawled
Updated
Mar 30, 2023
•
4