Model Architecture Naming: KDA
#11
by
dkleine
- opened
I noticed in the illustration of the model architecture (Figure 3 in the paper) that the KDA block includes "Kimi Delta Attention". The naming “Kimi Delta Attention” could be a bit confusing since it appears inside the KDA block but actually refers to the modified gated delta rule itself.