Collections
Discover the best community collections!
Collections including paper arxiv:2309.13876 
						
					
				- 
	
	
	HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsPaper • 2309.15701 • Published • 2
- 
	
	
	CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech EncodersPaper • 2309.07707 • Published • 1
- 
	
	
	Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingPaper • 2311.00430 • Published • 57
- 
	
	
	Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataPaper • 2309.13876 • Published • 1
- 
	
	
	Large-Scale Automatic Audiobook CreationPaper • 2309.03926 • Published • 55
- 
	
	
	UniAudio: An Audio Foundation Model Toward Universal Audio GenerationPaper • 2310.00704 • Published • 21
- 
	
	
	Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsPaper • 2309.11977 • Published • 2
- 
	
	
	SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsPaper • 2308.16692 • Published • 1
- 
	
	
	Woodpecker: Hallucination Correction for Multimodal Large Language ModelsPaper • 2310.16045 • Published • 17
- 
	
	
	HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality ModelsPaper • 2310.14566 • Published • 27
- 
	
	
	SILC: Improving Vision Language Pretraining with Self-DistillationPaper • 2310.13355 • Published • 9
- 
	
	
	Conditional Diffusion DistillationPaper • 2310.01407 • Published • 20
- 
	
	
	Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingPaper • 2311.00430 • Published • 57
- 
	
	
	Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataPaper • 2309.13876 • Published • 1
- 
	
	
	Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech RecognitionPaper • 2310.06434 • Published • 4
- 
	
	
	Large-Scale Automatic Audiobook CreationPaper • 2309.03926 • Published • 55
- 
	
	
	Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsPaper • 2309.11977 • Published • 2
- 
	
	
	SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsPaper • 2308.16692 • Published • 1
- 
	
	
	AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingPaper • 2308.05734 • Published • 37
- 
	
	
	Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMsPaper • 2310.13961 • Published • 5
- 
	
	
	ZeroGen: Efficient Zero-shot Learning via Dataset GenerationPaper • 2202.07922 • Published • 1
- 
	
	
	Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small ModelsPaper • 2310.13671 • Published • 19
- 
	
	
	Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMsPaper • 2309.09582 • Published • 4
- 
	
	
	Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingPaper • 2311.00430 • Published • 57
- 
	
	
	Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataPaper • 2309.13876 • Published • 1
- 
	
	
	Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech RecognitionPaper • 2310.06434 • Published • 4
- 
	
	
	HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsPaper • 2309.15701 • Published • 2
- 
	
	
	CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech EncodersPaper • 2309.07707 • Published • 1
- 
	
	
	Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo LabellingPaper • 2311.00430 • Published • 57
- 
	
	
	Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataPaper • 2309.13876 • Published • 1
- 
	
	
	Large-Scale Automatic Audiobook CreationPaper • 2309.03926 • Published • 55
- 
	
	
	Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsPaper • 2309.11977 • Published • 2
- 
	
	
	SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsPaper • 2308.16692 • Published • 1
- 
	
	
	AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingPaper • 2308.05734 • Published • 37
- 
	
	
	Large-Scale Automatic Audiobook CreationPaper • 2309.03926 • Published • 55
- 
	
	
	UniAudio: An Audio Foundation Model Toward Universal Audio GenerationPaper • 2310.00704 • Published • 21
- 
	
	
	Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic PromptsPaper • 2309.11977 • Published • 2
- 
	
	
	SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsPaper • 2308.16692 • Published • 1
- 
	
	
	Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMsPaper • 2310.13961 • Published • 5
- 
	
	
	ZeroGen: Efficient Zero-shot Learning via Dataset GenerationPaper • 2202.07922 • Published • 1
- 
	
	
	Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small ModelsPaper • 2310.13671 • Published • 19
- 
	
	
	Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMsPaper • 2309.09582 • Published • 4
- 
	
	
	Woodpecker: Hallucination Correction for Multimodal Large Language ModelsPaper • 2310.16045 • Published • 17
- 
	
	
	HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality ModelsPaper • 2310.14566 • Published • 27
- 
	
	
	SILC: Improving Vision Language Pretraining with Self-DistillationPaper • 2310.13355 • Published • 9
- 
	
	
	Conditional Diffusion DistillationPaper • 2310.01407 • Published • 20
 
							
							 
				