-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
Ubaidullayev
Nurmukhamed
AI & ML interests
None yet
Organizations
None yet
llm-practical
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 41
good-papers
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
llm-performance
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 57 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 22 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 31 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 19
todo
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
good-papers
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
llm-practical
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 41
llm-performance
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 57 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 22 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 31 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 19