Daniel Bourke's picture

Daniel Bourke PRO

mrdbourke

·

https://www.mrdbourke.com

AI & ML interests

Computer vision. Small on-device models. VLMs. High-quality tutorials.

Recent Activity

liked a model 5 days ago

nvidia/NVIDIA-Nemotron-Parse-v1.1

upvoted an article 5 days ago

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms

liked a model 5 days ago

facebook/sam3

View all activity

Organizations

None yet

commented a paper about 2 months ago

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28 • 44 •

New activity in apple/MobileCLIP2-S0 about 2 months ago

Add MobileCLIP2-demo

#4 opened about 2 months ago by

New activity in UCSC-VLAA/Recap-DataComp-1B 2 months ago

Update Recap-DataComp-1B card with OpenVision 2 context, metadata, and links

#4 opened 3 months ago by

New activity in google/gemma-3n-E4B-it-litert-preview 5 months ago

Conversion script example/documentation?

#42 opened 5 months ago by

New activity in nvidia/C-RADIOv3-B 6 months ago

Paper/results for v3 model

#2 opened 6 months ago by

New activity in BLIP3o/BLIP3o-Pretrain-Long-Caption 6 months ago

Dataset availability

#1 opened 6 months ago by

New activity in ustc-community/dfine-large-obj365 7 months ago

Which is the D-FINE-L E25 model pre-trained on Objects 365?

#1 opened 7 months ago by

commented a paper 9 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 154 •

New activity in MiniMaxAI/MiniMax-Text-01 9 months ago

Links broken on blog post release of these models

#22 opened 10 months ago by

commented a paper about 1 year ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133 •

New activity in apple/DataCompDR-1B over 1 year ago

Issue when downloading the data via streaming

#3 opened over 1 year ago by

commented a paper about 2 years ago

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Paper • 2311.04257 • Published Nov 7, 2023 • 22 •

New activity in mrdbourke/food_vision_199_classes almost 3 years ago

Dataset Viewer issue

#1 opened about 3 years ago by