---
pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
tags:
- language
- aquif
- text-generation-inference
- reasoning
- math
- coding
- frontier
- aquif-3.5
- moe
language:
- en
- de
- it
- pt
- fr
- hi
- es
- th
- zh
- ja
base_model:
- aquif-ai/aquif-3.5-Plus-30B-A3B
---

<small>*Disclaimer: aquif-3.5-Plus was made with the merging technique used in Qwen3-30B-A3B-YOYO-V3 to enable hybrid reasoning. aquif-3.5-Max, finetuned from Plus and expanded through DavidAU's brainstorm technique, went through further RL and SFT to focus more on the reasoning and coding aspects of it.*</small> 

# aquif-3.5-Plus & aquif-3.5-Max

The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities, hybrid reasoning modes and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

**aquif-3.5-Plus** combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

**aquif-3.5-Max** represents frontier model capabilities built on top of Plus's architecture, delivering exceptional performance across all benchmark categories.

## Model Repository Links

| Model | HuggingFace Repository |
|-------|----------------------|
| aquif-3.5-Plus | [aquif-ai/aquif-3.5-Plus](https://huggingface.co/aquif-ai/aquif-3.5-Plus-30B-A3B) |
| aquif-3.5-Max | [aquif-ai/aquif-3.5-Max](https://huggingface.co/aquif-ai/aquif-3.5-Max-42B-A3B) |

## Model Overview

| Model | Total (B) | Active Params (B) | Reasoning | Context Window | Thinking Modes |
|-------|-----------|-------------------|-----------|-----------------|----------------|
| aquif-3.5-Plus | 30.5 | 3.3 | ✅ Hybrid | 1M | ✅ Interchangeable |
| aquif-3.5-Max | 42.4 | 3.3 | ✅ Reasoning-Only | 1M | ✅ Interchangeable |

## Model Details

### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

## Artificial Analysis Intelligence Index (AAII) Benchmarks

### Core Performance Metrics

| Benchmark                | Plus (Non-Reasoning) | Plus (Reasoning) | Max (Non-Reasoning) | Max (Reasoning) |
| :----------------------- | -------------------: | ---------------: | ------------------: | --------------: |
| MMLU-Pro                 |                 80.2 |             82.8 |                82.8 |            85.4 |
| GPQA Diamond             |                 72.1 |             79.7 |                75.6 |            83.2 |
| AIME 2025                |                 64.7 |             90.3 |                69.0 |            94.6 |
| LiveCodeBench            |                 50.5 |             76.4 |                55.9 |            81.6 |
| Humanity’s Last Exam     |                  4.3 |             12.1 |                 7.8 |            15.6 |
| TAU2-Telecom             |                 34.2 |             41.5 |                43.2 |            51.3 |
| IFBench                  |                 39.3 |             54.3 |                49.3 |            65.4 |
| TerminalBench-Hard       |                 10.1 |             15.2 |                18.0 |            23.9 |
| AA-LCR                   |                 30.4 |             59.9 |                31.7 |            61.2 |
| SciCode                  |                 29.5 |             35.7 |                34.7 |            40.9 |
| **AAII Composite Score** |        **42 (41.5)** |    **55 (54.8)** |       **47 (46.8)** |   **60 (60.3)** |

### Long Context Evals (RULER)
| Model Name                 |  Acc avg |    4k |    8k |  16k |  32k |  64k |  96k | 128k | 192k | 256k | 384k | 512k | 640k | 768k | 896k | 1000k |
| :------------------------- | -------: | ----: | ----: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ----: |
| aquif-3.5-Plus (Reasoning) | **91.4** |  99.6 | 100.0 | 99.2 | 98.2 | 97.4 | 96.8 | 96.8 | 94.8 | 89.6 | 90.2 | 84.0 | 82.6 | 81.9 | 80.1 |  77.5 |
| aquif-3.5-Max (Reasoning)  | **92.1** | 100.0 | 100.0 | 99.7 | 98.5 | 97.8 | 97.1 | 96.9 | 95.8 | 92.1 | 91.1 | 85.5 | 84.8 | 80.0 | 79.9 |  79.6 |


### Comparable Models by Configuration

**aquif-3.5-Plus (Non-Reasoning) — AAII 42**

| Model | AAII Score |
|-------|-----------|
| GPT-5 mini | 42 |
| Claude Haiku 4.5 | 42 |
| Gemini 2.5 Flash Lite 2509 | 42 |
| **aquif-3.5-Plus (Non-Reasoning)** | **42** |
| DeepSeek V3 0324 | 41 |
| Qwen3 VL 32B Instruct | 41 |
| Qwen3 Coder 480B A35B | 42 |

**aquif-3.5-Plus (Reasoning) — AAII 55**

| Model | AAII Score |
|-------|-----------|
| GLM-4.6 | 56 |
| Gemini 2.5 Flash 2509 | 54 |
| Claude Haiku 4.5 | 55 |
| **aquif-3.5-Plus (Reasoning)** | **55** |
| Qwen3 Next 80B A3B | 54 |

**aquif-3.5-Max (Non-Reasoning) — AAII 47**

| Model | AAII Score |
|-------|-----------|
| Gemini 2.5 Flash 2509 | 47 |
| **aquif-3.5-Max (Non-Reasoning)** | **47** |
| DeepSeek-V3.2 Exp | 46 |
| Ling-1T | 45 |
| GLM-4.6 | 45 |
| Qwen3 235B A22B 2507 | 45 |

**aquif-3.5-Max (Reasoning) — AAII 60**

| Model | AAII Score |
|-------|-----------|
| Gemini 2.5 Pro | 60 |
| Grok 4 Fast | 60 |
| **aquif-3.5-Max** | **60** |
| MiniMax-M2 | 61 |
| gpt-oss-120B high | 61 |
| GPT-5 mini | 61 |
| DeepSeek-V3.1-Terminus | 58 |
| Claude Opus 4.1 | 59 |

## Key Features

**Massive Context Windows**: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

**Efficient Architecture**: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

**Flexible Reasoning**: aquif-3.5-Plus and Max provide interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

**Multilingual Support**: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

## Usage Recommendations

**aquif-3.5-Plus:**
- Complex reasoning requiring flexibility between speed and depth
- Scientific analysis and mathematical problem-solving with thinking enabled
- Rapid-response applications with thinking disabled
- Code generation and review
- Multilingual applications up to 1M token contexts

**aquif-3.5-Max:**
- Frontier-level problem-solving without compromise
- Advanced research and scientific computing
- Competition mathematics and algorithmic challenges
- Comprehensive code analysis and generation
- Complex multilingual tasks requiring maximum reasoning capability

## Setting Thinking Mode (aquif-3.5-Plus)

Toggle between thinking and non-thinking modes by modifying the chat template:

```
set thinking = true    # Enable reasoning mode
set thinking = false   # Disable thinking mode (faster inference)
```

Simply set the variable in your chat template before inference to switch modes. No model reloading required.

## Technical Specifications

Both models support:
- BF16 and FP16 precision
- Mixture of Experts architecture optimizations
- Efficient attention mechanisms with optimized KV caching
- Up to 1M token context window
- Multi-head attention with sparse routing

## Acknowledgements

- **Qwen Team**: Base architecture contributions
- **Meta Llama Team**: Core model foundations
- **Hugging Face**: Model hosting and training infrastructure

## License

This project is released under the Apache 2.0 License. See LICENSE file for details.

---

*Made in 🇧🇷*

© 2025 aquif AI. All rights reserved.