Liang, Y., & Patel, S. (2025). “Cognitive Paths: Simulating Human-like Reasoning in Large Language Models.” Journal of Artificial Cognition Research, 12(2), 45–67.
Abstract:
This paper proposes a hierarchical reasoning framework that mimics human cognitive stages in LLM inference. By modeling attention transitions between short-term and long-term token dependencies, the system demonstrates more stable reasoning across arithmetic and causal tasks. Experiments indicate significant improvements in logical consistency without additional model parameters.
Rodriguez, C., & Ahmed, L. (2024). “From Chains to Trees: A Structured Approach to Multi-Step Reasoning.” Proceedings of the Global NLP Symposium, 2024, 311–322.
Abstract:
The authors introduce a Tree-of-Thought expansion mechanism to enhance multi-step reasoning in generative transformers. Unlike linear chain-based prompts, this approach allows branching exploration of intermediate solutions before final aggregation. Simulation results reveal improved generalization in unseen problem domains.
Okafor, N., & Jensen, H. (2025). “Implicit Reasoning Biases in Instruction-Tuned Language Models.” Computational Semantics Letters, 9(1), 1–19.
Abstract:
This study investigates hidden reasoning biases arising from instruction-tuning datasets. Through counterfactual prompt evaluation, the authors show that models often inherit latent heuristics that distort causal interpretation. The paper proposes a debiasing regularizer to align reasoning outcomes with ground-truth logic.
Takeda, M., Singh, R., & Zhao, P. (2023). “Self-Evaluative Prompting for Reflective Reasoning.” Transactions on Neural Reasoning Systems, 3(4), 201–223.
Abstract:
The work presents a self-evaluation loop where the model generates reasoning steps, critiques them, and revises conclusions iteratively. This reflective mechanism stabilizes reasoning depth while reducing hallucinated intermediate steps. Results on synthetic reasoning benchmarks demonstrate stronger step-to-answer alignment.
Chen, L., & O’Connor, B. (2025). “ReasonScore: A Unified Metric for Evaluating Logical Soundness in Language Model Outputs.” AI Evaluation Review, 18(3), 155–174.
Abstract:
The paper introduces ReasonScore, a composite metric combining entailment coherence, factual support, and inference trace validity. It enables automated assessment of reasoning-oriented model outputs beyond raw accuracy. Benchmarking shows strong correlation between ReasonScore and expert human judgment.