Plug-and-Play Reasoning Router

LoTR: Logic-of-Thought Routing for LLM Reasoning

LoTR is a lightweight module that complements external reasoning strategies by routing internal attention-head pathways according to the current logic state of each reasoning step.

Zhiren Gong1,2, Ming Xiao4, Chau Yuen3, Wei Yang Bryan Lim1

1 College of Computing and Data Science, Nanyang Technological University 2 Interdisciplinary Graduate Programme, Nanyang Technological University 3 School of Electrical and Electronic Engineering, Nanyang Technological University 4 Department of Information Science and Engineering, KTH Royal Institute of Technology

LoTR end-to-end plug-and-play routing pipeline
Figure. LoTR as plug-and-play reasoning through logic-conditioned internal pathway routing.
Paradigm-level accuracy radar across backbones
Figure. Paradigm-level accuracy profile across three backbones.
Reasoning tokens and latency per backbone
Figure. Reasoning tokens and latency per backbone under LoTR-enhanced paradigms.

Cross-Setting Effectiveness

Across 3 backbones, 8 benchmarks, and 8 reasoning paradigms, LoTR improves average accuracy by 7.93% with only +3.18% reasoning tokens overall. The largest relative uplift appears on Llama-3.1-8B (16.67%), showing that logic-state-conditioned pathway routing can unlock substantial gains under diverse wrappers.

Practical Efficiency Profile

LoTR is not a heavy re-reasoning system: it adds a lightweight probe-and-route controller and still achieves latency reductions on two backbones (-10.76% on Llama, -12.15% on Mixtral). This indicates that better internal pathway matching can improve quality without relying on brute-force compute growth.

3

Backbones

8

Benchmarks

8

Reasoning paradigms evaluated

7.93%

Overall average accuracy gain

16.67%

Llama-3.1-8B gain

4.25%

Qwen2.5-14B gain

4.94%

Mixtral-8x7B gain

+3.18%

Token overhead overall

-10.76%

Latency on Llama

-12.15%

Latency on Mixtral

Abstract

Existing reasoning paradigms prescribe stronger external procedures, but they rarely control whether the model's internal computation pathway matches the dynamic logic required by each step.

LoTR addresses this gap with logic-conditioned routing: it identifies logic-state mixtures from internal information transfer and softly composes state-specific head-pathway templates online.

The result is a plug-and-play module that improves diverse reasoning wrappers without changing frozen backbone weights or redesigning external reasoning programs.

Method

1) Offline Logic Basis

Cluster sentence-level internal readouts from multiple paradigms to build a compact recurring logic-state basis.

Output: logic centroids and soft state targets.

2) Probe Learning

Train lightweight layer-wise probes to infer state mixtures from current sentence representations.

Output: online logic-state occupancy estimates.

3) Pathway Routing

Compose state-coupled head templates with probe outputs and softly route attention pathways per step.

Output: logic-matched internal execution under fixed external strategy.

Logic states across reasoning paradigms
Figure. Different reasoning paradigms occupy different logic-state regions and transition patterns.
State-wise head pathway patterns
Figure. Distinct recurring logic states couple to distinct head-level routing templates.
Trajectory-conditioned logic-state marginals
Figure. Trajectory-conditioned logic-state marginals under K=5 logic basis.
Offline clustering diagnostics across K choices
Figure. Offline clustering diagnostics across K choices for logic-state basis selection.

Method-Level Insight

  • Logic states are not static labels; they are recurring execution regimes inferred from sentence-level internal transfer.
  • Routing quality depends on both state separability and stable pathway templates; the K-sweep figure shows why moderate K is preferred.
  • The online controller works because it composes cached pathway templates with current state mixture, keeping inference overhead lightweight.

Main Results

Llama-3.1-8B Paradigm Averages (Accuracy %)

Paradigm Baseline Best Prior Plug-in LoTR Gain vs Baseline
Vanilla45.847.648.3+2.5
Chain-of-Thought34.243.443.8+9.6
Plan-and-Solve24.430.532.8+8.4
Self-Refine43.747.247.3+3.6
Self-Consistency31.639.741.2+9.6
Best-of-N31.339.840.2+8.9
Constrained Beam41.143.345.3+4.2
MCTS40.743.443.2+2.5

Llama-3.1-8B Full Benchmark Table (Baseline vs LoTR)

Paradigm Variant GSM8K MATH BoolQ HumanEval MMLU FOLIO HotpotQA NarrativeQA Average
VanillaBaseline78.954.174.762.040.929.17.119.845.8
Vanilla+ LoTR82.057.082.262.045.430.07.619.948.3
Chain-of-ThoughtBaseline75.155.432.853.033.218.02.04.234.2
Chain-of-Thought+ LoTR79.662.539.053.037.218.729.430.943.8
Plan-and-SolveBaseline68.534.013.524.029.720.91.72.824.4
Plan-and-Solve+ LoTR72.239.220.024.032.821.721.431.032.8
Self-RefineBaseline69.638.468.250.056.429.815.821.643.7
Self-Refine+ LoTR75.846.071.550.058.430.517.329.047.3
Self-ConsistencyBaseline79.747.740.135.024.619.41.94.131.6
Self-Consistency+ LoTR84.455.241.841.032.018.726.030.741.2
Best-of-NBaseline79.646.839.335.024.919.11.94.031.3
Best-of-N+ LoTR85.253.539.234.032.220.226.731.040.2
Constrained BeamBaseline43.126.480.452.445.438.315.727.241.1
Constrained Beam+ LoTR46.837.284.855.053.040.914.929.745.3
MCTSBaseline51.528.075.054.042.634.113.226.940.7
MCTS+ LoTR52.629.082.258.046.834.513.628.643.2

Click to Explore More Experimental Blocks

Backbone Evaluation scope Quality effect Efficiency effect Paradigm-Averaged Improvement
Llama-3.1-8B8 paradigms × 8 benchmarksLargest overall uplift, especially under CoT/PS/SC/BoN wrappersLatency reduced by 10.76% in aggregate comparison16.67%
Qwen2.5-14B8 paradigms × 8 benchmarksStable gains under diverse wrappers with stronger realization of logic-sensitive tasksToken usage reduced by 2.28% on backbone-level summary4.25%
Mixtral-8x7B8 paradigms × 8 benchmarksConsistent quality gains despite sparse MoE execution characteristicsLatency reduced by 12.15% on backbone-level summary4.94%

This summary is organized with unified columns to avoid mixed numeric/text layouts and keep cross-backbone comparison visually consistent.

Benchmark Type Metric Role in evaluation
GSM8KMathematical reasoningExact MatchArithmetic multi-step consistency.
MATHAdvanced math reasoningExact MatchDifficult symbolic derivation.
BoolQReading comprehensionExact MatchBoolean factual verification.
HumanEvalCode generationPass@1Program correctness under reasoning wrappers.
MMLUGeneral knowledgeExact MatchCross-domain academic reasoning.
FOLIOFormal logicExact MatchLogical entailment consistency.
HotpotQAMulti-hop QAF1Long-range evidence aggregation.
NarrativeQALong-form QAExact MatchNarrative-level reasoning fidelity.

Efficiency & Trade-off

Llama-3.1-8B Efficiency by Paradigm

Paradigm Baseline Tok / Lat LoTR Tok / Lat Latency change
Vanilla128.6 / 2.87s109.4 / 2.17s-24.4%
Chain-of-Thought258.0 / 5.79s286.7 / 6.12s+5.7%
Plan-and-Solve196.0 / 5.69s270.7 / 7.36s+29.3%
Self-Refine50.8 / 8.96s46.0 / 7.72s-13.8%
Self-Consistency242.1 / 18.95s256.4 / 15.86s-16.3%
Best-of-N242.2 / 16.73s255.6 / 15.68s-6.3%
Constrained Beam22.0 / 9.15s40.3 / 7.32s-20.0%
MCTS34.1 / 9.63s41.1 / 7.17s-25.5%
Paradigm-level accuracy profile across backbones
Figure. Paradigm-level accuracy profile across three backbones.
Llama tradeoff frontier
Figure. Accuracy-efficiency trade-off on Llama-3.1-8B.
Qwen2.5-14B tradeoff frontier
Figure. Accuracy-efficiency trade-off on Qwen2.5-14B.
Mixtral-8x7B tradeoff frontier
Figure. Accuracy-efficiency trade-off on Mixtral-8x7B.

Trade-off Interpretation

  • LoTR shifts the frontier by improving accuracy while keeping latency competitive, rather than simply increasing token budget.
  • The gain is most visible when wrappers are search-heavy: LoTR improves execution quality with smaller additional cost than brute-force scaling.
  • Cross-backbone consistency suggests routing-by-logic is a transferable control mechanism, not a single-model trick.

Ablation & Stress Tests

Ablation (Llama-3.1-8B)

Variant BoolQ Acc / Tok MMLU Acc / Tok HotpotQA Acc / Tok
Full LoTR60.5 / 93.343.2 / 139.421.0 / 137.5
Collapsed states (K=1)60.1 / 93.642.8 / 139.321.0 / 139.1
Hard targets only59.5 / 94.642.1 / 140.419.8 / 135.5
Shared probe across layers60.4 / 92.342.9 / 140.521.0 / 136.0
One global template58.8 / 92.641.8 / 138.921.3 / 138.1
Trajectory-level routing59.1 / 93.242.9 / 138.721.8 / 136.7
No state-conditioned routing58.8 / 93.043.1 / 140.021.4 / 138.3
Stress test over cluster number K
Figure. Stress test over number of logic clusters K.
Stress test over temperature tau
Figure. Stress test over soft-target temperature tau.

Ablation Insight

  • Removing state conditioning or collapsing the logic basis consistently hurts quality-efficiency balance.
  • Soft targets and localized layer-wise probes are important for stable performance across factual and multi-hop benchmarks.
  • Stress tests show LoTR is not knife-edge tuned: moderate K and tau ranges remain robust.

Case Study

Under the same Chain-of-Thought wrapper, LoTR maintains step-level constraint verification and avoids premature answer commitment. The trajectory shows that improvements come from better internal pathway alignment, not from changing the external reasoning script itself.

Qualitative case study with and without LoTR
Figure. Qualitative case study: with vs without LoTR under the same external reasoning paradigm.

Case Interpretation

The qualitative trajectory shows LoTR does not replace the external wrapper; instead, it reduces internal pathway mismatch. In practice this means fewer premature commitments and better late-step constraint verification under the same prompt scaffold.

Resources

Paper

Coming soon.

Code

Coming soon.

Citation

Formal BibTeX will be released soon.