ICLR 2026 Accepted

Learning Human Habits with Rule-Guided Active Inference

A biologically inspired wake-sleep framework that unifies world-model planning and symbolic habit rules to model human-like decision dynamics in sequential and visual domains.

Zhiren Gong^1,2, Chao Yang¹, Wendi Ren¹, Shuang Li¹

¹ The Chinese University of Hong Kong, Shenzhen ² Nanyang Technological University

Paper

Model framework for rule-guided active inference — Figure. End-to-end model framework with world model, habitual policy, wake phase, and sleep phase.

Accuracy Gain in Real Trajectories

On NBA SportVU, our wake-sleep consolidation reaches Acc@3 91.32%, clearly above DreamerV2 at 83.57%. The gap indicates that reusable rule priors improve prediction quality without sacrificing model-based reasoning.

Latency Advantage in Online Decision

In the same setup, online inference drops to 35.92 ms versus 52.73 ms, showing that rule-triggered habitual shortcuts reduce deliberation cost while preserving robust behavior in sequential planning tasks.

4

Diverse domains

91.32%

NBA Acc@3 (ours)

95.87%

Car-Following Acc@3 (ours)

77.20%

Atari Acc@3 (ours)

10.44-159.45 ms

Ours latency range across domains

Abstract

Humans combine deliberate planning in novel contexts with fast habitual responses in familiar contexts. This paper models that switching behavior in a unified active-inference framework where habits are represented as interpretable symbolic rules.

The core training design is a biologically inspired wake-sleep cycle: wake extracts candidate rules from real trajectories when they consistently reduce free energy; sleep performs generative replay to consolidate, prune, and semantically anchor those rules.

Across sports trajectories, driving behavior, medical diagnosis, and Atari strategy, the approach improves predictive accuracy and efficiency against logic-based, deep learning, active inference, model-based RL, and LLM-based baselines, while yielding interpretable habit structures.

Method

1) Wake Phase: Harvest Habits

Run active inference on real trajectories, infer latent state and intention, and mine high-confidence rules when free-energy reductions recur.

Output: candidate rule tuples and confidence updates.

2) Sleep Phase: Consolidate Rules

Perform generative replay with the world model, jointly update model/policy parameters, refine rule prototypes, and prune weak rules.

Output: compact, reusable, semantically anchored habit library.

3) Online Fusion: Plan + Habit

Blend expected-free-energy planning with rule-triggered priors, enabling low-latency responses in familiar states and flexible planning in novel states.

Result: interpretable acceleration without losing adaptability.

Core Design Insight

Rules are learned, not fixed: they emerge from repeated free-energy reductions on real behavior traces.
Rules are grounded: each rule is linked to latent state prototypes and discrete intentions, bridging neural and symbolic reasoning.
Rules are calibrated: sleep-phase replay stabilizes and denoises rules before deployment.
Rules do not replace planning: arbitration keeps model-based reasoning active under uncertainty.

Method Evidence (Click to Expand)

NBA training dynamics of free-energy terms — Figure. Decreasing free-energy components during wake-sleep optimization indicate stable convergence.

Rule visualization and envelopes in latent space — Figure. Rule envelopes in latent space expose interpretable habit structures.

Main Results

Best-Configuration Comparison Across Domains

Hint: scroll horizontally on smaller screens.

Method	NBA SportVU		Car-Following		DDXPlus		Atari-Berzerk
Method	Acc@3	Lat (ms)	Acc@3	Lat (ms)	Acc@3	Lat (ms)	Acc@3	Lat (ms)
RNNLogic	60.55	26.90	68.14	7.58	16.29	124.32	27.50	72.46
STLR	74.67	174.18	76.57	58.29	18.33	872.00	38.72	432.35
Re-Net	68.45	218.42	70.71	72.23	20.18	1112.42	32.48	723.02
DAI	70.58	262.33	73.35	146.33	39.27	2033.25	52.28	977.24
DAI-MC	80.61	386.50	82.87	189.75	52.15	2304.23	58.20	1429.00
LaTee	73.32	1244.20	74.75	528.33	22.14	95028.72	54.21	3230.43
Qwen-0.5B	64.18	2845.35	68.32	1256.82	19.62	125842.15	51.25	4856.72
DreamerV2	83.57	52.73	85.38	38.57	61.48	452.25	72.18	108.02
Ours	91.32	35.92	95.87	10.44	73.58	159.45	77.20	92.63

Table content is aligned with the compact overall best-configuration comparison in the accepted manuscript.

NBA testing metrics including accuracy and latency — Figure. Held-out testing curves (Acc@k, HHAR, latency) on NBA show strong and stable performance.

Figure. Atari testing metrics indicate sustained improvements under temporal visual complexity.

Rule-Performance Trade-off

The paper reports a consistent Pareto knee across datasets: increasing rule count lowers latency, but accuracy follows an inverted-U due to rule conflicts beyond the compact optimum.

Insights

Knee is operationally optimal: the best deployment point is not maximal rule count but balanced rule compactness.
Coverage and precision diverge: beyond the knee, rule-hit rate can still rise while Acc@k drops due to competing rules.
Memory-latency-accuracy triad: larger rule banks increase memory and can erode accuracy gains despite lower planning cost.

NBA tradeoff Pareto front — Figure. NBA Pareto front: compact rule banks deliver the best accuracy-latency balance.

Car-following tradeoff Pareto front — Figure. Car-following tradeoff: low latency with high Acc@3 near the Pareto knee.

DDXPlus tradeoff Pareto front — Figure. DDXPlus tradeoff under large action space (225 actions).

Atari tradeoff Pareto front — Figure. Atari-Berzerk tradeoff validates robustness in temporal visual decision tasks.

Ablation & Mechanism Checks

Ablation summary figure — Figure. Ablation confirms each component contributes distinct capabilities (speed, precision, stability).

Paper-Aligned Ablation Insights

Removing rules hurts both accuracy and latency.
Removing latent intention reduces precision and increases runtime.
Dropping VFE consistency causes major predictive degradation.
Greedy rule selection can be fast but unstable and less accurate.

The full model outperforms partial variants by combining rule grounding, planning arbitration, and wake-sleep consolidation.

Interpretable Case Views

How to Read These Cases

Each panel visualizes the same mechanism in a different domain: the world model provides predictive structure, while rule triggers serve as low-cost habitual shortcuts when confidence is high. The key signal is not only trajectory plausibility but also whether decisions stay stable under domain-specific complexity.

NBA: rules align with tactical motion patterns and reduce unnecessary deliberation.
Car-Following: compact habits preserve high accuracy with very low latency.
DDXPlus: rule grounding improves rare-action diagnostic behavior in large action space.
Atari: latent-state reconstruction and rule fusion stabilize temporal visual decision-making.

NBA interpretable trajectory analysis — Figure. NBA rule-guided inference with world-model overlays and interpretable trajectory control.

Car-following interpretable analysis — Figure. Car-following case: habitual control supports low-latency robust predictions.

DDXPlus interpretable diagnosis analysis — Figure. DDXPlus case: rule-guided reasoning improves rare-action diagnostic pathways.

Atari interpretable analysis — Figure. Atari-Berzerk case: world-model reconstruction and rule-triggered action selection.

Resources

Paper

OpenReview page for ICLR 2026 publication.

Contact

zhiren001@e.ntu.edu.sg

BibTeX

@inproceedings{
zhiren2026learning,
title={Learning Human Habits with Rule-Guided Active Inference},
author={GONG ZHIREN and Chao Yang and Wendi Ren and Shuang Li},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=FZXwkBH6s7}
}