Zmushko Philip

fzmushko

AI & ML interests

None yet

Recent Activity

submitted a paper 9 days ago

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

upvoted a paper about 2 months ago

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

upvoted a paper 3 months ago

Reasoning Shift: How Context Silently Shortens LLM Reasoning

View all activity

Organizations

None yet

submitted a paper to Daily Papers 9 days ago

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Paper • 2606.30634 • Published 11 days ago • 24

upvoted a paper about 2 months ago

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

Paper • 2605.07850 • Published May 8 • 18

upvoted a paper 3 months ago

Reasoning Shift: How Context Silently Shortens LLM Reasoning

Paper • 2604.01161 • Published Apr 1 • 32

upvoted 2 papers 5 months ago

Rethinking Global Text Conditioning in Diffusion Transformers

Paper • 2602.09268 • Published Feb 9 • 8

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Paper • 2601.22813 • Published Jan 30 • 63

upvoted 2 papers 9 months ago

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

Paper • 2510.04849 • Published Oct 6, 2025 • 117

Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization

Paper • 2509.23202 • Published Sep 27, 2025 • 30

upvoted a paper 10 months ago

Benchmarking Optimizers for Large Language Model Pretraining

Paper • 2509.01440 • Published Sep 1, 2025 • 25

authored a paper 12 months ago

Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning

Paper • 2412.11689 • Published Dec 16, 2024 • 2

upvoted an article 12 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 781

liked a Space 12 months ago

KVPress Leaderboard

KVPress leaderboard: benchmark KV Cache compression methods

upvoted an article about 1 year ago

Article

Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚

Isayoften

•

Aug 26, 2024

• 92

upvoted 8 papers about 1 year ago

NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Paper • 2506.10911 • Published Jun 12, 2025 • 9

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Paper • 2506.09501 • Published Jun 11, 2025 • 20

Reparameterized LLM Training via Orthogonal Equivalence Transformation

Paper • 2506.08001 • Published Jun 9, 2025 • 6

Mathesis: Towards Formal Theorem Proving from Natural Languages

Paper • 2506.07047 • Published Jun 8, 2025 • 6

Unified Scaling Laws for Compressed Representations

Paper • 2506.01863 • Published Jun 2, 2025 • 19

SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models

Paper • 2505.17967 • Published May 23, 2025 • 17

Alchemist: Turning Public Text-to-Image Data into Generative Gold

Paper • 2505.19297 • Published May 25, 2025 • 85

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20, 2025 • 79