-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:1512.03385
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 14 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 31
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
Paper • 2110.05909 • Published • 2 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Paper • 2404.10407 • Published • 1
-
Attention Is All You Need
Paper • 1706.03762 • Published • 91 -
ImageNet Large Scale Visual Recognition Challenge
Paper • 1409.0575 • Published • 9 -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 39 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
Paper • 2110.05909 • Published • 2 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Paper • 2404.10407 • Published • 1
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 14 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
Attention Is All You Need
Paper • 1706.03762 • Published • 91 -
ImageNet Large Scale Visual Recognition Challenge
Paper • 1409.0575 • Published • 9 -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 31
-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 39 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10