Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ManniX-ITAย 
posted an update 7 days ago
Post
283
v1.1.0 was Claude + Ollama chat. Eight releases later the stack is a grounded research pipeline plus a local-first memory layer; the token crunch is operational now, not a quality wall.

๐Ÿš€ claude-hooks v1.8.3 โ€” highlights since v1.1.0.

๐Ÿง  /consultants v2 โ€” agentic council, matured.
๐Ÿ›  tool_executor โ€” PLANโ†’REPORT lane runs read_file / grep / glob over the codebase before the researcher speaks; claims grounded in tool output, not vibes.
โœ๏ธโ”€ coder โ€” sandboxed write_file role with per-language model routing (50KB/file, 1MB/lane caps).
๐Ÿ›ก๏ธ CitationLinter โ€” three-layer verifier at the researcher boundary; every path:line claim checked against an mtime-cached code_graph. Catches fabricated filenames before they launder through critics + synthesizer.

๐Ÿ’พ M14 cross-session memory (default on).
LangGraph BaseStore wired across four namespaces: research / tool_results / project / user. Per-namespace TTL: research=30d, tool_results=24h, project+user=forever. Hourly Caliber-style distillation reaper summarizes
expiring research into the durable project namespace BEFORE deletion โ€” episodic โ†’ semantic, like human consolidation. Originals only dropped after a successful summary write.

๐Ÿ” sqlite_vec โ€” full pgvector parity (v1.7).
Hybrid recall via RRF over vector cosine + BM25 (FTS5). KG surface: kg_create_entities / kg_add_observations / kg_create_relations / kg_search_nodes. Bundled sqlite-vec-mcp launcher went 3โ†’8 tools so Cursor / Codex /
OpenWebUI / Claude Desktop share the same .db. Lazy schema migration carries v1.6.x dbs in place, non-destructive.

๐Ÿงฉ llamafile chat + embed (v1.4 + v1.5).
HyDE / reflect / consolidate / get-advice / consultants route to a daemon-supervised local llamafile via the llamafile:// model prefix. Multi-instance LRU, per-label idle reap, sticky CPU fallback. Stack runs
offline now.

๐Ÿง Linux / macOS / Windows. PostgreSQL OR SQLite. Local OR cloud LLMs.

๐Ÿ”— github.com/mann1x/claude-hooks

The v1.1.0 post had a single "one model wins all" PROD line; v1.8 has per-axis baselines from three live cohorts (May 16-17 '26).

โœ๏ธ /consultants coder โ€” per-language routing (mlang v1.0.1, suite hash ddef8095, 6 langs ร— 13 questions).

Lang Primary Fallback
c glm-5.1:cloud deepseek-v4-pro:cloud
cpp deepseek-v4-flash:cloud kimi-k2.6:cloud
csharp deepseek-v4-pro:cloud kimi-k2.6:cloud
go kimi-k2.6:cloud deepseek-v4-pro:cloud
python glm-5.1:cloud kimi-k2.6:cloud
rust deepseek-v4-flash:cloud deepseek-v4-pro:cloud

No single model wins across languages. Out-of-cohort langs (ts/java/ruby/swift/shell) fall to global default glm-5.1:cloud โ†’ kimi-k2.6:cloud.

๐Ÿ›  tool_executor โ€” 48 trials, suite 7921555c:
โ€ข gemma4:31b-cloud โ€” 87.5% / Q=5.00 โ† winner (tiebreak: quality โ†’ wall โ†’ call count)
โ€ข kimi-k2.6 / deepseek-v4-pro / glm-5.1 โ€” all 87.5% but lost tiebreakers
โ€ข gemini-3-flash-preview โ€” 75% (qualifies)
โ€ข qwen3-coder-next โ€” 62.5%, DISQUALIFIED. The Python coder winner is NOT a good tool_executor โ€” "best at writing code" โ‰  "best at mechanical tool chains for reading code".

โฑ Stall thresholds (M11a, suite c8306c62):
โ€ข kimi-k2.6:cloud cold TTFT = 150s โ†’ needs stall=390s. The global 300s default would falsely STARTUP_STALL it.
โ€ข qwen3-coder-next fastest startup (390ms).
โ€ข deepseek-v4-flash slowest p99 wall (255s) โ†’ hard_cap raised to 780s.

๐Ÿงฎ Rubric: pass_rate โ‰ฅ 70% AND avg_quality โ‰ฅ 3.5. Baselines append-only at docs/consultants-skill-eval-baselines.md. Re-bench any new model with claude-consultants skill-eval <suite> --live --accept-cost.

๐Ÿ”— Bench dirs: benchmarks/consultants/results/2026-05-17/

In this post