Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering
Zhaohe Liao, Jiangtong Li, Siyu Sun, Qingyang Liu, Fengshun Xiao, Tianjiao Li, Qiang Zhang,
Guang Chen, Li Niu, Changjun Jiang, Liqing Zhang
International Conference on Machine Learning (ICML), 2025
PDF /
code /
bibtex
@inproceedings{du2025rcp,
title={Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering},
author={Zhaohe Liao, Jiangtong Li, Siyu Sun, Qingyang Liu, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Guang Chen, Li Niu, Changjun Jiang, Liqing Zhang},
booktitle={International Conference on Machine Learning},
year={2025}
}
In this work, we propose a novel two-stage Languagecentric Tree Reasoning (LTR) framework that enhances the reasoning capabilities and transparency of MLLMs. Experiments across 11 VideoQA benchmarks demonstrate that our LTR framework significantly improves both accuracy and interpretability compared to state-of-the-art MLLMs. To our knowledge, this is the first work to implement a language-centric logical tree to guide MLLM reasoning in VideoQA, paving the way for language-centric video understanding from perception to cognition.