Level 11.16 β Real Symbolic Benchmarks (bAbI/CLUTRR SOTA)
:]: 2026-02-16 :]in:]: 11.16 β Vnot:] inaland:]andya :] with] withandmin:] :]toand Tewithty: 100-102 | :]with: PASS (374 thosewiththatin, 370 pass, 4 skip)
:]inye :]andtoandβ
| :]Version | Zon:]ande | :]with |
|---|---|---|
| bAbI Task 1 (1-hop) | 10/10 (100%) | PASS |
| bAbI Task 2 (2-hop) | 8/8 (100%) | PASS |
| bAbI Task 3 (3-hop) | 5/5 (100%) | PASS |
| bAbI Task 8 (withpandwithtoand) | 8/8 (100%) | PASS |
| bAbI Combined | 31/31 (100%) | PASS |
| CLUTRR 1-hop (parentβchild) | 12/12 (100%) | PASS |
| CLUTRR 2-hop (grandparentβgc) | 9/9 (100%) | PASS |
| CLUTRR 3-hop (great-gpβgreat-gc) | 6/6 (100%) | PASS |
| CLUTRR 4-hop (gggpβgggc) | 3/3 (100%) | PASS |
| CLUTRR inverse (childβparent) | 12/12 (100%) | PASS |
| CLUTRR Combined | 42/42 (100%) | PASS |
| SOTA strong avg clean | 100% | PASS |
| SOTA strong avg noise=5 | 84% | PASS |
| SOTA weak avg noise=5 | 39% | PASS |
| SOTA advantage at noise=5 | 45pp | PASS |
:] this zonchandtβ
:] :]in:]β
Sandmin:] dinandzhaboutto Trinity :] :] innot:] inaland:]andyu on with] :]toakh bAbI and CLUTRR. :] aboutzon:], that VSA-aboutwithnaboutin:] reasoning not :]toabout :]from:] on in:]andkh thosewith], nabout and toaboutnfor]withbywith] with not:]withandminaboutlandchewithtoandmand withandwith]and on :]and:] taskkh.
:] :]fromchandtoaboutinβ
- bAbI (Facebook AI Research): 4 tandpa :] β single fact, two facts, three facts, lists/sets β inwithe 100%
- CLUTRR (Compositional Language Understanding): kinship reasoning dabout 4- :]in β 100% on inwithekh :]andonkh
- Indexed memory pattern β for] to inywithabouttoabouty accuracy: per-transition memories with :] chandwith] :] (3 :]) inmewiththat :]withtoabouty :]and
:] andwith]in:]β
:] fromfor]ande ethat :]innya: indexed vs flat memory and:] :] zon:]ande for multi-hop reasoning:
- Indexed (per-transition, cap=3): 100% clean, 89% prand noise=5
- Flat (all-in-one, cap=12): 44% clean, 33% prand noise=5
- :]andtsa: 56pp on CLUTRR taskkh
:]andchewithtoande :]andβ
Test 100: bAbI-Style QA on VSA KGβ
:]and:]andya 4- :] andz bAbI benchmark:
- Task 1 (Single Supporting Fact): 1-hop :]with
person β location. :]: 10 :] bind(person, place), treeBundleN. - Task 2 (Two Supporting Facts): 2-hop
item β owner β location. Paboutwith]ande inverse owns memory, :] chain :] location memory. - Task 3 (Three Supporting Facts): 3-hop
item β owner β location β region. Trand bywith]in:] unbind/match. - Task 8 (Lists/Sets): Multi-entity :]with :] 2-hop chain.
Vwithe 31 :]with β 100% accuracy.
Test 101: CLUTRR Kinship Reasoningβ
:] :]inabout: 3 with]and Γ 5 byfor]andy = 15 :]. Per-transition indexed memories: for] :] byfor]andya (gen0βgen1, gen1βgen2, ...) :]andtwithya in from:] :]and with 3 :]and.
| :]andon | :]ande | Result |
|---|---|---|
| 1 hop | parentβchild | 12/12 (100%) |
| 2 hop | grandparentβgrandchild | 9/9 (100%) |
| 3 hop | great-grandparentβgreat-grandchild | 6/6 (100%) |
| 4 hop | great-great-gpβgreat-great-gc | 3/3 (100%) |
| 1 hop | childβparent (inverse) | 12/12 (100%) |
| ALL | CLUTRR Combined | 42/42 (100%) |
Test 102: SOTA Comparison Benchmarkβ
:]innotnande strong vs weak weight classes on :]andkh :]toakh with :]:
bAbI Task 1 (1-hop):
| Vewith | n=0 | n=1 | n=3 | n=5 |
|---|---|---|---|---|
| strong(5) | 100% | 100% | 80% | 80% |
| weak(20) | 100% | 90% | 40% | 45% |
CLUTRR 2-hop Kinship:
| Vewith | n=0 | n=1 | n=3 | n=5 |
|---|---|---|---|---|
| strong(indexed) | 100% | 100% | 78% | 89% |
| weak(flat) | 44% | 22% | 33% | 33% |
Combined SOTA Summary:
| :]to | Vewith | Clean | Noise=5 | Advantage |
|---|---|---|---|---|
| bAbI T1 | strong | 100% | 80% | |
| bAbI T1 | weak | 100% | 45% | 35pp |
| CLUTRR 2h | strong | 100% | 89% | |
| CLUTRR 2h | weak | 44% | 33% | 56pp |
| Average | strong | 100% | 84% | |
| Average | weak | 72% | 39% | 45pp |
:]inaboute fromfor]ande: Indexed vs Flat Memoryβ
Na CLUTRR taskkh flat memory (12 :] in :] bundle) :]and:] dabout 44% :] :] :]. Indexed memory (3 :] on transition) with] 100%. Prandchandon: prand flat bundling 12 :], signal-to-noise ratio :] nandzhe :] :]andchandmaboutwithtand for for]inabouty tonandgand andz 15 :]. Indexed approach section:] :]with]withtinabout on :]in:] :]and.
:] :]in:] :] andz Level 11.10+: indexed memories β this :] mawith]andraboutinanandya VSA reasoning.
:]Author Level 11β
| Level | Feature | Result |
|---|---|---|
| 11.10 | Intermediate indexing | 225/225 100% |
| 11.11 | Path discovery + beam | BFS 100%, beam 60% |
| 11.12 | Arbitrary graph | Cycles 3/3, neighbors 12/12 |
| 11.13 | Massive KG 1000 | 989/1000 (98.9%) |
| 11.14 | Weighted edges | 72pp advantage |
| 11.15 | Massive weighted | 625/625, 42pp |
| 11.16 | bAbI+CLUTRR SOTA | 100% both, 45pp advantage |
Chewithtonya with]torandtVersionβ
- bAbI β :]toabout 4 andz 20 :]: :]andzaboutin:] Tasks 1, 2, 3, 8. Ne :]andzaboutin:] counting (Task 7), yes/no (Task 6), indefinite knowledge (Task 10) and :]ande. :] bAbI coverage β :] :]froma.
- CLUTRR β landnot:] :]toand: Tewithtand:]withya :]toabout :] landnandya parentβchild. :] CLUTRR infor] branch queries (uncle, cousin), tofrom:] :] cross-relation composition.
- Noise model :]: Ternary random noise injection β not that zhe with], that adversarial perturbation or missing data. :] noise patterns with]note.
- Codebook size: CLUTRR and:] with]and 3 for]and:]in on generation. :] :]and and:] withfromnand for]and:]in.
Tech Tree: :]ande stepandβ
- :] bAbI-20: Vwithe 20 :] benchmark β counting, pathfinding, deduction, induction
- Branch kinship: uncle, cousin, nephew β cross-relation multi-hop
- Large-scale CLUTRR: Sfromnand with], dewithyattoand byfor]andy, :]andwithtand:] for]inye tonandgand