Tune-A-Video-library (Tune a video concepts library)

ginipick

posted an update 11 days ago

Post

1958

🍌 Huggwarts Banana - Nanobanana + Long-form Text Rendering Add-on

🎯 The Critical Flaw in Nanobanana
Google's Nanobanana is excellent, but has one critical weakness:
Long-form text rendering breaks easily.
Huggwarts Banana = Nanobanana + Long-form Text Rendering Add-on

ginigen/AI

✨ Core Value
🔥 Nanobanana's Foundation

Text-to-Image generation
Multiple image models
Style customization
High-quality outputs

🎨 + Our Add-on Enhancement
100% Backward Compatible - Use it exactly like Nanobanana!

🚀 Extended Features
📌 1. Perfect Long-form Text Rendering

Natural line-breaking for lengthy sentences
Full multilingual support (Korean, English, Japanese, Chinese, etc.)
Complex character combinations (double consonants, compound vowels, special symbols) handled flawlessly
Mixed emoji 😊 and special characters ♥
8 optimized fonts

🎬 2. Auto Image-to-Video Conversion
One-click video generation
Up to 12 seconds video length
Auto-applied effects: fade-in, zoom, slide transitions
SNS-optimized formats: 9:16 (Reels), 1:1 (Feed), 16:9 (YouTube)

🎨 3. Advanced Text Effects
Gradient colors: rainbow to subtle color transitions
Shadow effects: 3D depth perception
Neon glow: eye-catching luminous text
Custom backgrounds: AI-generated or custom upload

📐 4. Multiple Aspect Ratios
16:9 / 9:16 / 1:1 / 4:3, 3:4, 3:2, 2:3

🎯 5. Smart Model Selection
Automatic optimal model selection
Text-focused vs. Image-focused modes
Reference image support
High-resolution output (up to 4096x4096)

💡 Simple 4-Step Workflow
1️⃣ Input Text → English/Korean/Multilingual/Mixed content
2️⃣ Select Style → Font, color, effects
3️⃣ Generate Image → 10-30 seconds
4️⃣ Convert to Video (Optional) → One-click automation

🎯 Use Cases
✅ SNS Creators - Long-form text Reels/Shorts production
✅ Marketers - Rapid text-based promotional images/videos
✅ Educators - Visual text learning materials

Nymbo

posted an update 11 days ago

Post

1504

Two new tools added to the Nymbo/Tools MCP server, File_System and Shell_Exec. You can theoretically do basically anything with these two tools, and it should enable support for many Claude Skills.

GPT-5-Codex proves that for many cases, shell commands really are all you need, and Claude Skills seem to lean into this. The thing is, nothing about the design of Claude Skills actually restricts them to proprietary models!

# File_System

There's a new directory inside the repo called Filesystem, that's the agent's "root". It can perform the following actions : list, read, write, append, mkdir, move, copy, delete, info, help. It's able to keep this all within the scope of one tool call by making the Action field required and all other fields optional. Using a filesystem shouldn't require 15 different tools.

Files created in the public HF space live in the space's running container, and gets cleared when the space is restarted. When running the server locally, files are actually stored on disk.

# Shell_Exec

What good is a filesystem if you can't execute commands in that filesystem? This tool automatically detects if the server is running on Windows or Linux, and suggests using the appropriate shell (PowerShell/Bash). Both of these new tools require that the agent uses relative paths, rather than absolute paths. I could be convinced to back pedal on this.

# Closing Thoughts

The File_System and Shell_Exec tools aren't super polished yet, I'll continue to improve the agent's instructions and UX of using the new tools. Most of my testing was done with gpt-oss-20b and if it messes up, it gets the gist after one failed tool call. It should work perfectly fine for the GPU poor.

1 reply

·

multimodalart

posted an update 14 days ago

Post

1767

Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt

Nymbo

posted an update 16 days ago

Post

1693

I've made some improvements to my custom Deep_Research tool in the Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!

The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (Qwen3-235B), with only a little bit of context shown from each page.

# New "Filterer" Process

The new process includes another LLM call before the researcher process. The Filterer (also Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.

# Researcher Context

The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.

It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.

It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.

akhaliq

authored a paper 16 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published 20 days ago • 32

jayw

authored 4 papers 22 days ago

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Paper • 2503.14492 • Published Mar 18 • 20

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Paper • 2506.09042 • Published Jun 10

CVPR 2023 Text Guided Video Editing Competition

Paper • 2310.16003 • Published Oct 24, 2023

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

Paper • 2510.04290 • Published 24 days ago • 11

Nymbo

posted an update 26 days ago

Post

615

I have a few Sora-2 invites - 15509N

1 reply

·

yuna0x0

authored a paper about 1 month ago

See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

Paper • 2509.22653 • Published Sep 26 • 23

Shaldon

authored a paper about 1 month ago

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

Paper • 2509.20360 • Published Sep 24 • 17

Nymbo

posted an update about 1 month ago

Post

1049

There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.

# Deep_Research Tool

It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.

The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.

# More Fetch/Web Search Improvements

The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.

The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.

More to come soon ~

ehristoforu

posted an update about 2 months ago

Post

2137

🚀Hello from the Project Fluently team!

✨ We are happy to share with you our new universal LLM models based on Qwen3 1.7B and 4B — powerful, multilingual and ready to solve a wide range of problems!

🛠️ We have conducted additional training and carefully merged them to achieve even better results and maximize the potential of the models.

🆓 And most importantly — the models are completely open and free under the Apache-2.0 license!

🔗 Links to repositories:
- FluentlyQwen3-4B: fluently/FluentlyQwen3-4B
- FluentlyQwen3-1.7B: fluently/FluentlyQwen3-1.7B

😍 We will be very glad to hear your feedback and impressions! Your opinion is very important to us!

Nymbo

posted an update about 2 months ago

Post

1016

I have a few updates to my MCP server I wanna share: New Memory tool, improvements to web search & speech generation.

# Memory_Manager Tool

We now have a Memory_Manager tool. Ask ChatGPT to write all its memories verbatim, then tell gpt-oss-20b to save each one using the tool, then take them anywhere! It stores memories in a memories.json file in the repo, no external database required.

The Memory_Manager tool is currently hidden from the HF space because it's intended for local use. It's enabled by providing a HF_READ_TOKEN in the env secrets, although it doesn't actually use the key for anything. There's probably a cleaner way of ensuring memory is only used locally, I'll come back to this.

# Fetch & Websearch

The Fetch_Webpage tool has been simplified a lot. It now converts the page to Markdown and returns the page with three length settings (Brief, Standard, Full). This is a lot more reliable than the old custom extraction method.

The Search_DuckDuckGo tool has a few small improvements. The input is easier for small models to get right, and the output is more readable.

# Speech Generation

I've added the remaining voices for Kokoro-82M, it now supports all 54 voices with all accents/languages.

I also removed the 30 second cap by making sure it computes all chunks in sequence, not just the first. I've tested it on outputs that are ~10 minutes long. Do note that when used as an MCP server, the tool will timeout after 1 minute, nothing I can do about that for right now.

# Other Thoughts

Lots of MCP use cases involve manipulating media (image editing, ASR, etc.). I've avoided adding tools like this so far for two reasons:

1. Most of these solutions would require assigning it a ZeroGPU slot.
2. The current process of uploading files like images to a Gradio space is still a bit rough. It's doable but requires additional tools.

Both of these points make it a bit painful for local usage. I'm open to suggestions for other tools that rely on text.

ginipick

posted an update 2 months ago

Post

4264

🍌 Nano Banana + Video: AI Image Style Transfer & Video Generation Tool

🎨 Key Features
1️⃣ Image Style Transfer

ginigen/Nano-Banana-Video

📸 Upload up to 2 images for style fusion
✨ High-quality image generation with Google Nano Banana model
🎭 Apply desired styles with text prompts

2️⃣ Video Generation

🎬 Convert generated images to videos
📐 Maintain original aspect ratio option
⏱️ Adjustable duration (1-4 seconds)

🚀 How to Use
Step-by-Step Guide
Step 1: Image Generation 🖼️

Enter style description
Upload 1-2 images (optional)
Click "Generate Magic ✨"

Step 2: Video Creation 📹

Send generated image to video tab
Set animation style
Generate video!

💡 Use Cases

🏞️ Transform landscape photos into artistic masterpieces
🤖 Bring static images to life
🎨 Mix styles from two different images
📱 Create short videos for social media

⚡ Tech Stack
Google Nano Banana Stable Video Diffusion Gradio Replicate API

#AIVideoGenerator #ImageToVideoConverter #StyleTransferAI #GoogleNanoBanana #StableVideoDiffusion #AIAnimationTool #TextToVideo #ImageAnimationSoftware #AIArtGenerator #VideoCreationTool #MachineLearningVideo #DeepLearningAnimation #HuggingFaceSpaces #ReplicateAPI #GradioApplication #ZeroGPUComputing #AIStyleMixing #AutomatedVideoProduction #NeuralStyleTransfer #AIPoweredCreativity

ginipick

posted an update 2 months ago

Post

3661

🎉 Fashion Fit 360: The New Standard in AI Virtual Try-On!

🚀 Now Live and Free to Use!Say goodbye to online shopping uncertainty - "Will this look good on me?" - with our revolutionary solution!Fashion Fit 360 is a cutting-edge AI-powered virtual fitting service that transforms your fashion shopping experience.

LINK: ginigen/Fashion-Fit360

✨ Core Features
🔄 360-Degree Multi-Pose Generation
Transform a single front-facing photo into 6 different viewing angles!
Front, side, and back views for complete visualization
Experience a real fitting room mirror effect
Check fit and style from every perspective

👗 15 Fashion Item Categories
Apparel: Tops, bottoms, dresses
Jewelry: Necklaces, earrings, rings, bracelets
Accessories: Sunglasses, eyewear, hats, ties, bow ties, belts
Essentials: Bags, shoes

🎯 Perfect For:
🛍️ Online Shopping Enthusiasts: Preview before purchase - zero return hassles!
💍 Jewelry Lovers: Virtually try expensive pieces before investing
🎁 Thoughtful Gift-Givers: Test items on recipient photos beforehand
👔 Business Professionals: Preview suit and tie combinations
👗 Fashion Designers: Rapidly visualize design samples

💡 Why Fashion Fit 360?Fashion Fit 360 delivers innovation beyond conventional services.While most virtual fitting platforms only support clothing, we offer complete support for 15 accessory types. Unlike competitors providing only front views, Fashion Fit 360 generates 6 poses for true 360-degree visualization, ensuring you can verify actual fit perfectly.Performance is unmatched - get results in under 20 seconds with one-click simplicity and no complex configurations. Plus, download all generated images as a convenient ZIP file, eliminating tedious individual saves.

🔥 Key Differentiators
🎨 360-Degree Multi-Pose Image Generation
🤖 FLUX.1-Fill based OmniTry integrated model with Flux.1 KONTEXT LoRA technology

Nymbo

posted an update 2 months ago

Post

1004

I built a general use MCP space ~ Fetch webpages, DuckDuckGo search, Python code execution, Kokoro TTS, Image Gen, Video Gen.

# Tools

1. Fetch webpage
2. Web search via DuckDuckGo (very concise, low excess context)
3. Python code executor
4. Kokoro-82M speech generation
5. Image Generation (use any model from HF Inference Providers)
6. Video Generation (use any model from HF Inference Providers)

The first four tools can be used without any API keys whatsoever. DDG search is free and the code execution and speech gen is done on CPU. Having a HF_READ_TOKEN in the env variables will show all tools. If there isn't a key present, The Image/Video Gen tools are hidden.

Nymbo/Tools

1 reply

·

ginipick

posted an update 2 months ago

Post

3402

✨ HairPick | Preview Your Perfect Hair Transformation in 360° ✨

🎊 Free Trial for Hugging Face Launch! Hurry! ⏰
Hello! Introducing an innovative AI service that helps you choose the perfect hairstyle without any regrets before visiting the salon!

🎯 Try It Now
ginigen/Hair-Pick

🔄 What Makes HairPick Special? 360° Complete Preview!
Other hair simulators only show the front view? 😑

HairPick is different!
✅ Front + 4 random angles = Total 5 multi-angle images generated
✅ Perfect check from side profile 👤 diagonal 📐 back view 👥!
✅ 100+ trendy hairstyle library 💇‍♀️

💡 Highly Recommended For:
🎯 "I really don't want to fail this time!"
→ Check side volume and back lines thoroughly
🎯 "It's hard to explain exactly to my stylist"
→ Perfect communication with 360° result images!
🎯 "I have a profile photo/photoshoot coming up"
→ Preview your best look from every angle
🚀 Super Simple Usage (Just 1 Minute!)

1️⃣ One Selfie 📸
Take a front-facing photo in bright light (show your forehead and face outline clearly!)
2️⃣ Choose Your Style 💫
Select from 100+ options: short cuts, medium, long hair, layered, bangs, and more
3️⃣ Check 360° Results 🔄
Compare front + side + back + diagonal angles all at once!
4️⃣ Go to the Salon! ✂️
Save your favorite result → Show it to your stylist

📸 Pro Tips for Perfect Results!
💡 Lighting: Natural light or bright, even indoor lighting
💡 Angle: Camera at eye level, facing straight ahead
💡 Preparation: No hats❌ No sunglasses❌ Hair tucked behind ears⭕

🎁 Now's Your Chance!
"The era of deciding based on front view only is over!"
HairPick isn't just simple hair synthesis, it's a next-level AI hair simulator that predicts your actual appearance in 360°.

🔥 Limited free access for Hugging Face launch!
🔥 100+ latest trend styles!
🔥 ZERO failures with 360° perfect prediction!

✂️ Click before you cut! Take on the perfect hair transformation with HairPick! 🌟

#HairPick #AIHairSimulator #360HairPreview

2 replies

·

Nymbo

posted an update 2 months ago

Post

1026

Anyone using Jan-v1-4B for local MCP-based web search, I highly recommend you try out Intelligent-Internet/II-Search-4B

Very impressed with this lil guy and it deserves more downloads. It's based on the original version of Qwen3-4B but find that it questions reality way less often. Jan-v1 seems to think that everything it sees is synthetic data and constantly gaslights me

Tune a video concepts library

AI & ML interests

Recent Activity

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

CVPR 2023 Text Guided Video Editing Competition

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

AI & ML interests

Recent Activity

Team members 85

Tune-A-Video-library's activity