- You can train a model in a language it has never been trained in using the PT model. Thereβs no need for large datasets. - With the PT model, you can easily replicate the voice of any character you want. Just 1k samples are enough. - You can add emotion support with a small dataset.
Samsung Hacking Incident: Samsung Electronics' Official Hugging Face Account Compromised Samsung Electronics' official Hugging Face account has been hacked. Approximately 17 hours ago, two new language models (LLMs) were registered under Samsung Electronics' official Hugging Face account. These models are:
The model descriptions contain absurd and false claims, such as being trained on "1 million W200 GPUs," hardware that doesn't even exist. Moreover, community participants on Hugging Face who have noticed this issue are continuously posting that Samsung Electronics' account has been compromised. There is concern about potential secondary and tertiary damage if users download these LLMs released under the Samsung Electronics account, trusting Samsung's reputation without knowing about the hack. Samsung Electronics appears to be unaware of this situation, as they have not taken any visible measures yet, such as changing the account password. Source: https://discord.gg/openfreeai
smolagents v1.14.0 is out! π π MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable. πͺ¨ Amazon Bedrock: Native support for Bedrock-hosted models. SmolAgents is now more powerful, flexible, and enterprise-ready. πΌ
ππππ ποΈ πππ±π-πππ¬ππ πππ«π«ππ’π§ π ππ§ππ«πππ’π¨π§ π¦π¨πππ₯ MESA is a novel generative model based on latent denoising diffusion capable of generating 2.5D representations (co-registered colour and depth maps) of terrains based on text prompt conditioning.
Work developed by Paul BorneβPons (@NewtNewt) during his joint internship at Adobe & ESA, and in collaboration with asterisk labs.
It looks like Llama 4 team gamed the LMArena benchmarks by making their Maverick model output emojis, longer responses and ultra high enthusiasm! Is that ethical or not? They could certainly do a better job by working with teams like llama.cpp, just like Qwen team did with Qwen 3 before releasing the model.
In 2024 I started playing with LLMs just before the release of Llama 3. I think Meta contributed a lot to this field and still contributing. Most LLM fine tuning tools are based on their models and also the inference tool llama.cpp has their name on it. The Llama 4 is fast and maybe not the greatest in real performance but still deserves respect. But my enthusiasm towards Llama models is probably because they rank highest on my AHA Leaderboard:
Looks like they did a worse job compared to Llama 3.1 this time. Llama 3.1 has been on top for a while.
Ranking high on my leaderboard is not correlated to technological progress or parameter size. In fact if LLM training is getting away from human alignment thanks to synthetic datasets or something else (?), it could be easily inversely correlated to technological progress. It seems there is a correlation regarding the location of the builders (in the West or East). Western models are ranking higher. This has become more visible as the leaderboard progressed, in the past there was less correlation. And Europeans seem to be in the middle!
Whether you like positive vibes from AI or not, maybe the times are getting closer where humans may be susceptible to being gamed by an AI? What do you think?
multimodal > Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS) > InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)
LLMs > NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use > Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset > Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS) > Skywork-OR1-32B-Preview is a new reasoning model by Skywork
Image Generation > HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)