 stereoplegic
			's Collections
			stereoplegic
			's Collections
			
			
		Byte-level
		
	updated
			
 
				
				
 - ByT5: Towards a token-free future with pre-trained byte-to-byte models- 
			Paper
			 •- 
			2105.13626
			 •
			Published
				
			•- 
				3
			 
 - Beyond Language Models: Byte Models are Digital World Simulators- 
			Paper
			 •- 
			2402.19155
			 •
			Published
				
			•- 
				53
			 
 - MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers- 
			Paper
			 •- 
			2305.07185
			 •
			Published
				
			•- 
				9
			 
 - Byte-Level Recursive Convolutional Auto-Encoder for Text- 
			Paper
			 •- 
			1802.01817
			 •
			Published
 - Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering- 
			Paper
			 •- 
			2403.09622
			 •
			Published
				
			•- 
				18
			 
 - Bytes are All You Need: End-to-End Multilingual Speech Recognition and
  Synthesis with Bytes- 
			Paper
			 •- 
			1811.09021
			 •
			Published
				
			•- 
				1
			 
 - Neural Machine Translation with Byte-Level Subwords- 
			Paper
			 •- 
			1909.03341
			 •
			Published
 - Neural Machine Translation without Embeddings- 
			Paper
			 •- 
			2008.09396
			 •
			Published
 - ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical
  Normalization by Fine-tuning ByT5- 
			Paper
			 •- 
			2110.15248
			 •
			Published
 - MonoByte: A Pool of Monolingual Byte-level Language Models- 
			Paper
			 •- 
			2209.11035
			 •
			Published
 - Are Character-level Translations Worth the Wait? Comparing Character-
  and Subword-level Models for Machine Translation- 
			Paper
			 •- 
			2302.14220
			 •
			Published
 - Bilingual End-to-End ASR with Byte-Level Subwords- 
			Paper
			 •- 
			2205.00485
			 •
			Published
 - MambaByte: Token-free Selective State Space Model- 
			Paper
			 •- 
			2401.13660
			 •
			Published
				
			•- 
				60
			 
 - CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation- 
			Paper
			 •- 
			2103.06874
			 •
			Published
				
			•- 
				2
			 
 - SpaceByte: Towards Deleting Tokenization from Large Language Modeling- 
			Paper
			 •- 
			2404.14408
			 •
			Published
				
			•- 
				7
			 
 - Integrating Multi-scale Contextualized Information for Byte-based Neural
  Machine Translation- 
			Paper
			 •- 
			2405.19290
			 •
			Published
 - Word-Level Representation From Bytes For Language Modeling- 
			Paper
			 •- 
			2211.12677
			 •
			Published
 - byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings- 
			Paper
			 •- 
			2106.13302
			 •
			Published