Plug-and-Play Reasoning Router

LoTR: Logic-of-Thought Routing for LLM Reasoning

LoTR is a lightweight module that complements external reasoning strategies by routing internal attention-head pathways according to the current logic state of each reasoning step.

Zhiren Gong^1,2, Ming Xiao⁴, Chau Yuen³, Wei Yang Bryan Lim¹

¹ College of Computing and Data Science, Nanyang Technological University ² Interdisciplinary Graduate Programme, Nanyang Technological University ³ School of Electrical and Electronic Engineering, Nanyang Technological University ⁴ Department of Information Science and Engineering, KTH Royal Institute of Technology

▶ Watch the tutorial Paper (coming soon) Code (coming soon)

LoTR end-to-end plug-and-play routing pipeline — Figure. LoTR as plug-and-play reasoning through logic-conditioned internal pathway routing.

Paradigm-level accuracy radar across backbones — Figure. Paradigm-level accuracy profile across three backbones.

Figure. Reasoning tokens and latency per backbone under LoTR-enhanced paradigms.

Cross-Setting Effectiveness

Across 3 backbones, 8 benchmarks, and 8 reasoning paradigms, LoTR improves average accuracy by 7.93% with only +3.18% reasoning tokens overall. The largest relative uplift appears on Llama-3.1-8B (16.67%), showing that logic-state-conditioned pathway routing can unlock substantial gains under diverse wrappers.

Practical Efficiency Profile

LoTR is not a heavy re-reasoning system: it adds a lightweight probe-and-route controller and still achieves latency reductions on two backbones (-10.76% on Llama, -12.15% on Mixtral). This indicates that better internal pathway matching can improve quality without relying on brute-force compute growth.

3

Backbones

8

Benchmarks

8

Reasoning paradigms evaluated

7.93%

Overall average accuracy gain

16.67%

Llama-3.1-8B gain

4.25%

Qwen2.5-14B gain

4.94%

Mixtral-8x7B gain

+3.18%

Token overhead overall

-10.76%

Latency on Llama

-12.15%

Latency on Mixtral

Abstract

Existing reasoning paradigms prescribe stronger external procedures, but they rarely control whether the model's internal computation pathway matches the dynamic logic required by each step.

LoTR addresses this gap with logic-conditioned routing: it identifies logic-state mixtures from internal information transfer and softly composes state-specific head-pathway templates online.

The result is a plug-and-play module that improves diverse reasoning wrappers without changing frozen backbone weights or redesigning external reasoning programs.

Method

1) Offline Logic Basis

Cluster sentence-level internal readouts from multiple paradigms to build a compact recurring logic-state basis.

Output: logic centroids and soft state targets.

2) Probe Learning

Train lightweight layer-wise probes to infer state mixtures from current sentence representations.

Output: online logic-state occupancy estimates.

3) Pathway Routing

Compose state-coupled head templates with probe outputs and softly route attention pathways per step.

Output: logic-matched internal execution under fixed external strategy.

Logic states across reasoning paradigms — Figure. Different reasoning paradigms occupy different logic-state regions and transition patterns.

State-wise head pathway patterns — Figure. Distinct recurring logic states couple to distinct head-level routing templates.

Figure. Trajectory-conditioned logic-state marginals under K=5 logic basis.

Figure. Offline clustering diagnostics across K choices for logic-state basis selection.

Method-Level Insight

Logic states are not static labels; they are recurring execution regimes inferred from sentence-level internal transfer.
Routing quality depends on both state separability and stable pathway templates; the K-sweep figure shows why moderate K is preferred.
The online controller works because it composes cached pathway templates with current state mixture, keeping inference overhead lightweight.

Main Results

Llama-3.1-8B Paradigm Averages (Accuracy %)

Paradigm	Baseline	Best Prior Plug-in	LoTR	Gain vs Baseline
Vanilla	45.8	47.6	48.3	+2.5
Chain-of-Thought	34.2	43.4	43.8	+9.6
Plan-and-Solve	24.4	30.5	32.8	+8.4
Self-Refine	43.7	47.2	47.3	+3.6
Self-Consistency	31.6	39.7	41.2	+9.6
Best-of-N	31.3	39.8	40.2	+8.9
Constrained Beam	41.1	43.3	45.3	+4.2
MCTS	40.7	43.4	43.2	+2.5

Llama-3.1-8B Full Benchmark Table (Baseline vs LoTR)

Paradigm	Variant	GSM8K	MATH	BoolQ	HumanEval	MMLU	FOLIO	HotpotQA	NarrativeQA	Average
Vanilla	Baseline	78.9	54.1	74.7	62.0	40.9	29.1	7.1	19.8	45.8
Vanilla	+ LoTR	82.0	57.0	82.2	62.0	45.4	30.0	7.6	19.9	48.3
Chain-of-Thought	Baseline	75.1	55.4	32.8	53.0	33.2	18.0	2.0	4.2	34.2
Chain-of-Thought	+ LoTR	79.6	62.5	39.0	53.0	37.2	18.7	29.4	30.9	43.8
Plan-and-Solve	Baseline	68.5	34.0	13.5	24.0	29.7	20.9	1.7	2.8	24.4
Plan-and-Solve	+ LoTR	72.2	39.2	20.0	24.0	32.8	21.7	21.4	31.0	32.8
Self-Refine	Baseline	69.6	38.4	68.2	50.0	56.4	29.8	15.8	21.6	43.7
Self-Refine	+ LoTR	75.8	46.0	71.5	50.0	58.4	30.5	17.3	29.0	47.3
Self-Consistency	Baseline	79.7	47.7	40.1	35.0	24.6	19.4	1.9	4.1	31.6
Self-Consistency	+ LoTR	84.4	55.2	41.8	41.0	32.0	18.7	26.0	30.7	41.2
Best-of-N	Baseline	79.6	46.8	39.3	35.0	24.9	19.1	1.9	4.0	31.3
Best-of-N	+ LoTR	85.2	53.5	39.2	34.0	32.2	20.2	26.7	31.0	40.2
Constrained Beam	Baseline	43.1	26.4	80.4	52.4	45.4	38.3	15.7	27.2	41.1
Constrained Beam	+ LoTR	46.8	37.2	84.8	55.0	53.0	40.9	14.9	29.7	45.3
MCTS	Baseline	51.5	28.0	75.0	54.0	42.6	34.1	13.2	26.9	40.7
MCTS	+ LoTR	52.6	29.0	82.2	58.0	46.8	34.5	13.6	28.6	43.2

Click to Explore More Experimental Blocks

Backbone	Evaluation scope	Quality effect	Efficiency effect	Paradigm-Averaged Improvement
Llama-3.1-8B	8 paradigms × 8 benchmarks	Largest overall uplift, especially under CoT/PS/SC/BoN wrappers	Latency reduced by 10.76% in aggregate comparison	16.67%
Qwen2.5-14B	8 paradigms × 8 benchmarks	Stable gains under diverse wrappers with stronger realization of logic-sensitive tasks	Token usage reduced by 2.28% on backbone-level summary	4.25%
Mixtral-8x7B	8 paradigms × 8 benchmarks	Consistent quality gains despite sparse MoE execution characteristics	Latency reduced by 12.15% on backbone-level summary	4.94%

This summary is organized with unified columns to avoid mixed numeric/text layouts and keep cross-backbone comparison visually consistent.

Benchmark	Type	Metric	Role in evaluation
GSM8K	Mathematical reasoning	Exact Match	Arithmetic multi-step consistency.
MATH	Advanced math reasoning	Exact Match	Difficult symbolic derivation.
BoolQ	Reading comprehension	Exact Match	Boolean factual verification.
HumanEval	Code generation	Pass@1	Program correctness under reasoning wrappers.
MMLU	General knowledge	Exact Match	Cross-domain academic reasoning.
FOLIO	Formal logic	Exact Match	Logical entailment consistency.
HotpotQA	Multi-hop QA	F1	Long-range evidence aggregation.
NarrativeQA	Long-form QA	Exact Match	Narrative-level reasoning fidelity.

Efficiency & Trade-off

Llama-3.1-8B Efficiency by Paradigm

Paradigm	Baseline Tok / Lat	LoTR Tok / Lat	Latency change
Vanilla	128.6 / 2.87s	109.4 / 2.17s	-24.4%
Chain-of-Thought	258.0 / 5.79s	286.7 / 6.12s	+5.7%
Plan-and-Solve	196.0 / 5.69s	270.7 / 7.36s	+29.3%
Self-Refine	50.8 / 8.96s	46.0 / 7.72s	-13.8%
Self-Consistency	242.1 / 18.95s	256.4 / 15.86s	-16.3%
Best-of-N	242.2 / 16.73s	255.6 / 15.68s	-6.3%
Constrained Beam	22.0 / 9.15s	40.3 / 7.32s	-20.0%
MCTS	34.1 / 9.63s	41.1 / 7.17s	-25.5%

Figure. Paradigm-level accuracy profile across three backbones.

Llama tradeoff frontier — Figure. Accuracy-efficiency trade-off on Llama-3.1-8B.

Qwen2.5-14B tradeoff frontier — Figure. Accuracy-efficiency trade-off on Qwen2.5-14B.

Mixtral-8x7B tradeoff frontier — Figure. Accuracy-efficiency trade-off on Mixtral-8x7B.

Trade-off Interpretation

LoTR shifts the frontier by improving accuracy while keeping latency competitive, rather than simply increasing token budget.
The gain is most visible when wrappers are search-heavy: LoTR improves execution quality with smaller additional cost than brute-force scaling.
Cross-backbone consistency suggests routing-by-logic is a transferable control mechanism, not a single-model trick.

Ablation & Stress Tests

Ablation (Llama-3.1-8B)

Variant	BoolQ Acc / Tok	MMLU Acc / Tok	HotpotQA Acc / Tok
Full LoTR	60.5 / 93.3	43.2 / 139.4	21.0 / 137.5
Collapsed states (K=1)	60.1 / 93.6	42.8 / 139.3	21.0 / 139.1
Hard targets only	59.5 / 94.6	42.1 / 140.4	19.8 / 135.5
Shared probe across layers	60.4 / 92.3	42.9 / 140.5	21.0 / 136.0
One global template	58.8 / 92.6	41.8 / 138.9	21.3 / 138.1
Trajectory-level routing	59.1 / 93.2	42.9 / 138.7	21.8 / 136.7
No state-conditioned routing	58.8 / 93.0	43.1 / 140.0	21.4 / 138.3

Stress test over cluster number K — Figure. Stress test over number of logic clusters K.

Stress test over temperature tau — Figure. Stress test over soft-target temperature tau.

Ablation Insight

Removing state conditioning or collapsing the logic basis consistently hurts quality-efficiency balance.
Soft targets and localized layer-wise probes are important for stable performance across factual and multi-hop benchmarks.
Stress tests show LoTR is not knife-edge tuned: moderate K and tau ranges remain robust.

Case Study

Under the same Chain-of-Thought wrapper, LoTR maintains step-level constraint verification and avoids premature answer commitment. The trajectory shows that improvements come from better internal pathway alignment, not from changing the external reasoning script itself.

Case Interpretation

The qualitative trajectory shows LoTR does not replace the external wrapper; instead, it reduces internal pathway mismatch. In practice this means fewer premature commitments and better late-step constraint verification under the same prompt scaffold.

Resources

Video Tutorial

A narrated, subtitled 4½-minute walkthrough of the problem, the idea, the method and the results. Watch it →

Paper

Coming soon.

Code

Coming soon.

Citation

Formal BibTeX will be released soon.

Contact

For questions or collaborations, contact zhiren001@e.ntu.edu.sg.