 neuraloverflow
			's Collections
			neuraloverflow
			's Collections
			
			
				
				
 - BitNet: Scaling 1-bit Transformers for Large Language Models- 
			Paper
			 •- 
			2310.11453
			 •
			Published
				
			•- 
				105
			 
 - Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection- 
			Paper
			 •- 
			2310.11511
			 •
			Published
				
			•- 
				78
			 
 - In-Context Learning Creates Task Vectors- 
			Paper
			 •- 
			2310.15916
			 •
			Published
				
			•- 
				43
			 
 - Matryoshka Diffusion Models- 
			Paper
			 •- 
			2310.15111
			 •
			Published
				
			•- 
				43
			 
 - Contrastive Prefence Learning: Learning from Human Feedback without RL- 
			Paper
			 •- 
			2310.13639
			 •
			Published
				
			•- 
				25
			 
 - Safe RLHF: Safe Reinforcement Learning from Human Feedback- 
			Paper
			 •- 
			2310.12773
			 •
			Published
				
			•- 
				28
			 
 - An Image is Worth Multiple Words: Learning Object Level Concepts using
  Multi-Concept Prompt Learning- 
			Paper
			 •- 
			2310.12274
			 •
			Published
				
			•- 
				13
			 
 - Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V- 
			Paper
			 •- 
			2310.11441
			 •
			Published
				
			•- 
				29
			 
 - In-Context Pretraining: Language Modeling Beyond Document Boundaries- 
			Paper
			 •- 
			2310.10638
			 •
			Published
				
			•- 
				30
			 
 - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
  Latent Diffusion- 
			Paper
			 •- 
			2310.03502
			 •
			Published
				
			•- 
				78
			 
 - How FaR Are Large Language Models From Agents with Theory-of-Mind?- 
			Paper
			 •- 
			2310.03051
			 •
			Published
				
			•- 
				35
			 
 - Large Language Models Cannot Self-Correct Reasoning Yet- 
			Paper
			 •- 
			2310.01798
			 •
			Published
				
			•- 
				36
			 
 - Enable Language Models to Implicitly Learn Self-Improvement From Data- 
			Paper
			 •- 
			2310.00898
			 •
			Published
				
			•- 
				23
			 
 - PixArt-α: Fast Training of Diffusion Transformer for
  Photorealistic Text-to-Image Synthesis- 
			Paper
			 •- 
			2310.00426
			 •
			Published
				
			•- 
				61
			 
 - Conditional Diffusion Distillation- 
			Paper
			 •- 
			2310.01407
			 •
			Published
				
			•- 
				20
			 
 - Vision Transformers Need Registers- 
			Paper
			 •- 
			2309.16588
			 •
			Published
				
			•- 
				83
			 
 - Latent Consistency Models: Synthesizing High-Resolution Images with
  Few-Step Inference- 
			Paper
			 •- 
			2310.04378
			 •
			Published
				
			•- 
				22
			 
 - CodeFusion: A Pre-trained Diffusion Model for Code Generation- 
			Paper
			 •- 
			2310.17680
			 •
			Published
				
			•- 
				73
			 
 - Personas as a Way to Model Truthfulness in Language Models- 
			Paper
			 •- 
			2310.18168
			 •
			Published
				
			•- 
				5
			 
 - A Picture is Worth a Thousand Words: Principled Recaptioning Improves
  Image Generation- 
			Paper
			 •- 
			2310.16656
			 •
			Published
				
			•- 
				50
			 
 - Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling- 
			Paper
			 •- 
			2311.00430
			 •
			Published
				
			•- 
				57
			 
 - LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing- 
			Paper
			 •- 
			2311.00571
			 •
			Published
				
			•- 
				43
			 
 - Controllable Music Production with Diffusion Models and Guidance
  Gradients- 
			Paper
			 •- 
			2311.00613
			 •
			Published
				
			•- 
				26
			 
 - De-Diffusion Makes Text a Strong Cross-Modal Interface- 
			Paper
			 •- 
			2311.00618
			 •
			Published
				
			•- 
				23
			 
 - The Generative AI Paradox: "What It Can Create, It May Not Understand"- 
			Paper
			 •- 
			2311.00059
			 •
			Published
				
			•- 
				20
			 
 - Grounding Visual Illusions in Language: Do Vision-Language Models
  Perceive Illusions Like Humans?- 
			Paper
			 •- 
			2311.00047
			 •
			Published
				
			•- 
				10
			 
 - CapsFusion: Rethinking Image-Text Data at Scale- 
			Paper
			 •- 
			2310.20550
			 •
			Published
				
			•- 
				27
			 
 - Beyond U: Making Diffusion Models Faster & Lighter- 
			Paper
			 •- 
			2310.20092
			 •
			Published
				
			•- 
				12
			 
 - LoRAShear: Efficient Large Language Model Structured Pruning and
  Knowledge Recovery- 
			Paper
			 •- 
			2310.18356
			 •
			Published
				
			•- 
				24
			 
 - Unleashing the Power of Pre-trained Language Models for Offline
  Reinforcement Learning- 
			Paper
			 •- 
			2310.20587
			 •
			Published
				
			•- 
				18
			 
 - TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?- 
			Paper
			 •- 
			2305.07759
			 •
			Published
				
			•- 
				36
			 
 - Textbooks Are All You Need- 
			Paper
			 •- 
			2306.11644
			 •
			Published
				
			•- 
				146
			 
 - QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models- 
			Paper
			 •- 
			2310.16795
			 •
			Published
				
			•- 
				27
			 
 - FLAP: Fast Language-Audio Pre-training- 
			Paper
			 •- 
			2311.01615
			 •
			Published
				
			•- 
				18
			 
 - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module- 
			Paper
			 •- 
			2311.05556
			 •
			Published
				
			•- 
				87
			 
 - Levels of AGI for Operationalizing Progress on the Path to AGI- 
			Paper
			 •- 
			2311.02462
			 •
			Published
				
			•- 
				38
			 
 - The Impact of Large Language Models on Scientific Discovery: a
  Preliminary Study using GPT-4- 
			Paper
			 •- 
			2311.07361
			 •
			Published
				
			•- 
				14
			 
 - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads
  to Answers Faster- 
			Paper
			 •- 
			2311.08263
			 •
			Published
				
			•- 
				16
			 
 - Technical Report: Large Language Models can Strategically Deceive their
  Users when Put Under Pressure- 
			Paper
			 •- 
			2311.07590
			 •
			Published
				
			•- 
				17
			 
 - Music ControlNet: Multiple Time-varying Controls for Music Generation- 
			Paper
			 •- 
			2311.07069
			 •
			Published
				
			•- 
				45
			 
 - Prompt Engineering a Prompt Engineer- 
			Paper
			 •- 
			2311.05661
			 •
			Published
				
			•- 
				25
			 
 - PolyMaX: General Dense Prediction with Mask Transformer- 
			Paper
			 •- 
			2311.05770
			 •
			Published
				
			•- 
				11
			 
 - UFOGen: You Forward Once Large Scale Text-to-Image Generation via
  Diffusion GANs- 
			Paper
			 •- 
			2311.09257
			 •
			Published
				
			•- 
				48
			 
 - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models- 
			Paper
			 •- 
			2311.10093
			 •
			Published
				
			•- 
				59
			 
 - mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration- 
			Paper
			 •- 
			2311.04257
			 •
			Published
				
			•- 
				22
			 
 - Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
  an Alternative to Attention Layers in Transformers- 
			Paper
			 •- 
			2311.10642
			 •
			Published
				
			•- 
				26
			 
 - Orca 2: Teaching Small Language Models How to Reason- 
			Paper
			 •- 
			2311.11045
			 •
			Published
				
			•- 
				77
			 
 - Exponentially Faster Language Modelling- 
			Paper
			 •- 
			2311.10770
			 •
			Published
				
			•- 
				119
			 
 - MultiLoRA: Democratizing LoRA for Better Multi-Task Learning- 
			Paper
			 •- 
			2311.11501
			 •
			Published
				
			•- 
				37
			 
 - System 2 Attention (is something you might need too)- 
			Paper
			 •- 
			2311.11829
			 •
			Published
				
			•- 
				44
			 
 - GAIA: a benchmark for General AI Assistants- 
			Paper
			 •- 
			2311.12983
			 •
			Published
				
			•- 
				237
			 
 - Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model- 
			Paper
			 •- 
			2311.13231
			 •
			Published
				
			•- 
				29
			 
 - Alpha-CLIP: A CLIP Model Focusing on Wherever You Want- 
			Paper
			 •- 
			2312.03818
			 •
			Published
				
			•- 
				34
			 
 - Magicoder: Source Code Is All You Need- 
			Paper
			 •- 
			2312.02120
			 •
			Published
				
			•- 
				81
			 
 - FaceStudio: Put Your Face Everywhere in Seconds- 
			Paper
			 •- 
			2312.02663
			 •
			Published
				
			•- 
				33
			 
 - Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis- 
			Paper
			 •- 
			2312.03491
			 •
			Published
				
			•- 
				35
			 
 - Chain of Code: Reasoning with a Language Model-Augmented Code Emulator- 
			Paper
			 •- 
			2312.04474
			 •
			Published
				
			•- 
				33
			 
 - DeepCache: Accelerating Diffusion Models for Free- 
			Paper
			 •- 
			2312.00858
			 •
			Published
				
			•- 
				24
			 
 - Your ViT is Secretly an Image Segmentation Model- 
			Paper
			 •- 
			2503.19108
			 •
			Published
				
			•- 
				23
			 
 - Dita: Scaling Diffusion Transformer for Generalist
  Vision-Language-Action Policy- 
			Paper
			 •- 
			2503.19757
			 •
			Published
				
			•- 
				51
			 
 - Paper2Code: Automating Code Generation from Scientific Papers in Machine
  Learning- 
			Paper
			 •- 
			2504.17192
			 •
			Published
				
			•- 
				120
			 
 - TTRL: Test-Time Reinforcement Learning- 
			Paper
			 •- 
			2504.16084
			 •
			Published
				
			•- 
				120
			 
 - Absolute Zero: Reinforced Self-play Reasoning with Zero Data- 
			Paper
			 •- 
			2505.03335
			 •
			Published
				
			•- 
				185