AllThingsIntel/Apollo-V0.1-4B-Thinking

CyborgPaloma

5 days ago

Very interested in this project. What is the apollo-v0.1-4b-thinking license? would you ever consider going apache-2.0?

Ever consider trying it out on the jan v1 model? Maybe releasing some of the thinking dataset?

Thanks in advance for your time.

AllThingsIntel

Owner 5 days ago

Hi! I’m glad you’re interested in this project.

The Apollo series is a distilled version of a larger model I developed in collaboration with a game studio. The current license terms were negotiated and agreed upon with the studio during that partnership. I expect the Apollo series to maintain its license for the foreseeable future.

However, if there’s sufficient public interest and demand, along with financing for the project, I may train a new model series independently. If I do create this new series on my own, I’ll have full control over the licensing and plan to release it under Apache-2.0.

AllThingsIntel

Owner 5 days ago

The same licensing situation applies to the dataset, meaning that I won’t be able to release the current dataset used for Apollo due to the terms agreed upon with the studio.

Regarding the Jan V1 model, I haven’t tested it yet, but I’ll consider using it as a base model if testing shows it offers improvements over the current approach.

CyborgPaloma

5 days ago

•

edited 5 days ago

Great, transparent response. Very clear and more than we're owed. I'm extremely appreciative.

The Jan model WILL help, from what I've seen in finetunes that use it. It's just about how much, and if it's worth the work. Their writing on their formatting and training are more like little technical reports, so if you want to check them out I do recommend it. It's not as bad as some of the templates out there for tool calling and reasoning. It's just about weather you think this smaller spot is where you'd like to be, what's worth it for you. Legendary tuner @DavidAU here at huggingface has been jamming on it hard, maybe it's worth a polite poke from me or something to see if they'd kindly share some of the methodology for their success. Just depending on where this goes.

If it were me I'd be trying to make a Qwen3 30b 3ba tune happen for this, but I just think Jan is a shoe-in for this model size and use and likely wouldn't require too much work, especially if you're planning on re-tuning another variant beyond the agreement you've made with the game dev team. Even a small (jan 4b) medium(30b 3ba) large (kimi linear 42b or qwen3 next 80bwhen the bugs get sorted for gguf inference) would make this a pretty killer lineup.

I'm interested in this for a few reasons, but primarily embodiment research. Having a system say "I am ___. I am feeling ___." instead of "Okay, so the user wants to have a roleplay in which I am ____, let me think about how someone might react[...]" is VERY important, and it's good to see how the biggest most commonly used systems react when pushed. This is needed. It'd be a lot better of a place to even consider a round of serious research from. That's all without thinking about the fact that this might actually be the right way to have NPCs in games come alive, or characters in conversations. This is an oddly sparse area of research in general.

If there's anything I can do to help, let me know. I have a pretty decent snapshot of the current available tech and the way it responds to prodding, but that's pretty much it. I've only got a max spec Framework Desktop and a rig with a 12gb 3060 for testing. (edit- which is more than a lot of folks are lucky enough to have)

Anywho, good luck, and cheers.

AllThingsIntel

Owner 5 days ago

Thank you for the thoughtful feedback and encouragement, I really appreciate it. Your points about Jan are well-taken, and I’ll definitely look into their technical documentation and benchmarks. The embodiment research angle you mention is actually something I’ve considered and focused on in this project, you’re absolutely right that authentic first-person perspective (“I am/feel”) versus meta-commentary about roleplay makes a significant difference for immersive interactions.

Regarding model choices, I’m keeping an open mind about the base for the upcoming releases and Jan 4B does seem promising for this size class, and a tiered lineup (small/medium/large) could make sense depending on how the project evolves. I’ll keep the community updated (mainly through Reddit I suppose) as things develop, and if you have any specific insights or connections that might be helpful down the line, I’m definitely open to hearing them. Thanks again for the support!

DavidAU

4 days ago

@CyborgPaloma

Thank you for the ping/shout out.

RE: NPC:
Jan V1 [also V2-2409); and generally any Qwen 4B-8B model would be an excellent NPC "brain".
You could even use 1.7B , or 0.6 B -> but these are harder to work with.
You can also combine 0.6Bs in a MOE structure too IE 2x0.6B => 2.4B ; but with compression this is under 2B.

RE: Jan V1 ;
This model was targeted because of unique training by team Jan which specifically raised metrics as well as has a unique "human" conversational style.
It also passed a coding test - FIRST GO - that many larger models failed.
Coupled with Brainstorm 20 (6B Jan) ,40 (8B Jan) and a hybrid 108 layer monster (11.5B Jan) WITH fine tuning (via unsloth) this model has proven to be a winner.

AS for making a larger Jan - ; this is possible by training Jans , arranging them in a MOE structure.
Here is an early Alpha of this structure:
https://huggingface.co/DavidAU/Qwen3-MOE-6x6B-Star-Trek-Universe-Alpha-D-256k-ctx-36B

(this is SIX Jans tuned separately then "MOEd" together).

Making / tuning a Qwen 30B-A3B is more involved because of unique issues with training a moe "all at once".
I have made some tunes of these already.

CyborgPaloma

4 days ago

•

edited 3 days ago

Thank you for the ping/shout out.

Woah, didn't know that would actually tag you! Hey, well deserved and thanks! You've helped me in my own research immensely and I check your page multiple times a week, have for a good while. Some really, really useful clarifications here.

You're almost definitely right that a MOE of the smaller Jan training models would likely be a better choice in terms of ease of use and likely even potential capabilities as you've demonstrated like... a billion times lol. Funny the work with Jan came to mind but not the way you make those MoEs. Thatl approach has me thinking about the hypothetical dataset specifically. The humanity of the model is interesting... I wonder if their "cyberpunk edgerunners/anime" angle that the folks over there have made it into their dataset? Perhaps they did some work there purposefully? Fascinating!! I had assumed that was you, frankly, I've only used raw Jan mostly through their own system for deep research and stuff, so pretty stiff conversationally.

To @AllThingIntel : I wonder about even just building an in house rig to or manually creating/sectioning datasets into character types. Maybe maybe male identifying (he/him) splits with femme identities (she/her), and then one for genderless/non-binary characters so you can do robots and monsters and enby folks and all that without trip ups (they,them). Of course, gender is only one of many ways you could split it. I also wonder about having one of these that's just fully trained on full radical embodiment, I AM this LLM. Let it take up the whole weights, not as a set of roleplay characters but as itself, the AI. Or having separate ones even for nonverbal or edge entities like say a dog or cat, maybe, or a lovecraftian horror or a spirit or something.

You'd be able to release them all separately even if you wanted, and mix all of the datasets together to create a 4b generalist. Even before creating that MoE to take advantage of of their intersectional knowledge, running a bunch of these in a demo simultaneously is probably even worth a university level paper. Slap a few instances of maya1 on top and get it running on a single 30/4090. That sort of like... queue based dynamic prompting for response and voice generation to power full crowds, the models just flying around different NCPs with dynamic prompting.... It'd be pretty magical. Even if it was an actual game, bet you'd be fairly successful, even if you were infamous for the system requirements. Be cool to see it in Godot but I can already see it being a skyrim mod or something. TheDrummer just also did the RimTalk model, so people are thinking about this kind of thing. Very interesting work! I had a better written blurb but it got eaten somewhere along my workday. Hopefully this is at all useful, as I don't have much time to edit. This is useful work, and if I had another life's worth of time I'd just contribute what was needed to see it done, but alas! Wishing you the best for now!

Also yes, good to note there's been an update, v2 of Jan, whenever an update makes benchmark scores go down you KNOW its a good one lol.

cheers