Junbin Xiao (肖俊斌)

Ph.D in Computer Science
image

I am a Research Fellow at NUS, working with Prof Angela Yao and Tat-Seng Chua. Previously, I obtained my PhD at the Department of Computer Science, National University of Singapore (NUS), supervised by Prof. Tat-Seng Chua and closely collaborated with Prof. Angela Yao . From Nov. 2021 to Apr. 2022, I worked as a research intern at Sea AI Lab (SAIL) and was jointly advised by Dr. Pan Zhou and Prof. Shuicheng Yan. Prior to that, I received my M.S.Eng degree from the Institute of Computing Technology, Chinese Academy of Sciences at 2018 and B.Eng. degree from Sichuan University at 2015, respectively.

I devote myself to developing AI techniques that can understand the physical world, interact and communicate with human beings to provide personalized assistance. The topics of interest cover video understanding, question answering, visual grounding, and robotics. The techniques emphasize multimodal large language models, robustness and trustworthiness. I am recently focusing on trustworthy Multimodal LLMs and their applications in egocentric embodied assistance. I am actively looking for Research Interns/Assistants/Visiting Students. (NUS master students who are with CV/NLP/MultiModal experiences are highly preferred.)


News

Six papers are accepted to SIGIR'25,ICMR'25,MICCAI'25 and ICCV'25 respectively

| Jun. 2025

Invited to be reviewer in NeurIPS'25, MM'25.

| Mar. 2025

I will give a talk about NExT-GQA: Visually Grounded VideoQA inivited by Twelve Labs

CVPR'24 | Jul. 2024

Two papers about video-language models and trustworthy K-VQA are accepted to ACL'24 and MM'24 respectively

| Jul. 2024

Our exploration of VQA in trustworthiness,3D object affordance and ego-car accident (3 papers) all are accepted to CVPR'24

CVPR'24 | Feb. 2024

Invited to be reviewer in CVPR'24 and ICLR'24.

| Oct. 2023

Two papers are accepted to T-PAMI'23 and ACM MM'23 respectively.

Aug. 2023

Two papers are accepted to T-PAMI'23 and ICCV'23 respectively.

Jul. 2023

Invited to serve as PC Member in AAAI'24.

AAAI | Jul. 2023

Invited to be reviewer in NeurIPS'23 dataset and benchmark track.

NeurIPS | Jun. 2023

Invited to be reviewer in ACM MM'23.

ACM MM | Apr. 2023

Featured Publications

EgoBlind
EgoBlind: Towards Egocentric Visual Assistance for the Blind

Junbin Xiao*, Nanxin Huang*, Hao Qiu, Zhulin Tao, Xun Yang, Richang Hong, Meng Wang, Angela Yao

[ arXiv'25 / Project Page / Github / Cite]
egointention
EgoIntention: Visual Intention Grounding for Egocentric Assistants

Pengzhan Sun, Junbin Xiao*(Corresponding Author), Tze Ho Elden Tse, Yicong Li, Arjun Akula, Angela Yao

[ ICCV'25 / Project Page / Github / Cite]
Vid-Syn
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

Leilei Li, Jianwu Fang, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

[ ICCV'25 / Project Page / Github / Cite]
LASO-Unseen
Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories

Yicong Li, Yiyang Chen, Zhenyuan Ma, Junbin Xiao, Xiang Wang, Angela Yao

[ ICCV'25 / Project Page / Github / Cite]
BUTD
Bottom-Up and Top-Down Thoughts for Visual Intention Grounding

Kangcheg Liu, Junbin Xiao*(Corresponding Author), Rui Zhang, Hanqi Lv, Zidong Du

[ ICMR'25 / Project Page / Github / Cite]
SubGPT
Unleashing the Power of LLMs for Medical Video Answer Localization

Junbin Xiao*, Qingyun Li*, Yusen Yang, Liang Qiu, Angela Yao

[ MICCAI'25 / Project Page / Github / Cite]
DeVE-QA
Question Answering Dense Video Events

Hangyu Qin, Junbin Xiao*(Corresponding Author), Angela Yao

[ SIGIR'25 / Project Page / Github / Cite]
egotextvqa
EgoTextVQA: Towards Egocentric Scene-Text Video Question Answering

Zhou Sheng, Junbin Xiao*(Corresponding Author), Qingyun Li, Yicong Li,Xun Yang, Dan Guo, Meng Wang, Tat Seng Chua, Angela Yao

[ CVPR'25 / Project Page / Github / Cite]
consist
On the Consistency of Video Large Language Models in Temporal Comprehension

Minjoon Jung, Junbin Xiao*(Corresponding Author), Byoung-Tak Zhang, Angela Yao

[ CVPR'25 / Project Page / Github / Cite]
VideoQA-LLMs
VideoQA in the era of LLMs: An Empericial Study

Junbin Xiao, Nanxin Huang, Hangyu Qin, Dongyang Li, Yicong Li, Fengbin Zhu, Zhulin Tao, Jianxing Yu, Liang Lin, Tat-Seng Chua, Angela Yao

[ IJCV'25 / Project Page / Github / Cite]
T2S-QA
Scene Text Grounding for Text-based Video Question Answering

Sheng Zhou, Junbin Xiao*(Corresponding Author), Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua

[ TMM'25 / Project Page / Github / Cite]
LASO
LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li, Na Zhao, Junbin Xiao, Feng Chuan, Xiang Wang, Tat-Seng Chua

[ CVPR'24 / Project Page / Github / Cite]
MM-AU
Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Leilei Li, Junfei Zhou, Junbin Xiao, Jongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

[ CVPR'24 / Project Page / Github / Cite]
NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li, Tat-Seng Chua

[ CVPR'24 (Highlight) / Project Page / Github / Cite]
TranSTR
Discovering Spatio-Temporal Rationales for Video Question Answering

Yicong Li, Junbin Xiao*(Corresponding Author), Chun Feng, Xiang Wang*, Tat-Seng Chua

[ ICCV'23 / Project Page / Github / Cite]
CoVGT
Contrastive Video Question Answering via Video Graph Transformer

Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

[T-PAMI'23 / Project Page / Github / Cite]
TIGV
Transformer-Empowered Invariant Grounding for Video Question Answering

Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

[T-PAMI'23 / Project Page / Github / Cite]
VideoQA Survey
Video Question Answering: Datasets, Algorithms and Challenges

Yaoyao Zhong*, Junbin Xiao*(Equal Contribution), Wei Ji*, Yicong Li, Weihong Deng, Tat-Seng Chua

[EMNLP'22 / Project Page / Github / Cite]
VGT
Video Graph Transformer for Video Question Answering

Junbin Xiao, Pan Zhou, Tat Seng Chua, Shuicheng Yan

[ ECCV'22 / Project Page / Github / Poster / Cite]
EIGV
Equivariant and Invariant Grounding for Video Question Answering

Yicong Li, Xiang Wang, Junbin Xiao, Tat Seng Chua

[ACM MM'22 / Project Page / Github / Poster / Cite]
IGV
Invariant Grounding for Video Question Answering

Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

[CVPR'22, Best Paper Finalist / Project Page / Github / Poster / Cite]
HQGA
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua

[AAAI'22, Oral / Project Page / Github / Poster / Cite]
VidVRD-II
Video Visual Relation Detection via Interactive Inference

Xindi Shang, Yicong Li, Junbin Xiao, Wei Ji, Tat-Seng Chua

[ACM MM'21 / Project Page / Github / Poster / Cite]
NExT-QA Dataset
NExT-QA: Next Phase of Question Answering to Explaining Temporal Actions

Junbin Xiao, Xindi Shang, Yao Angela, Tat-Seng Chua

[CVPR'21, Strong Accept / Project Page / Github / Poster / Cite]
Video Relation Grounding
Visual Relation Grounding in Videos

Junbin Xiao, Xindi Shang, Xun Yang, Sheng Tang, Tat-Seng Chua

[ECCV'20, Spotlight / Project Page / Github / Poster / Cite]
Video Relation Dataset
Annotating Object and Relations in User-Generated Videos

Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, Tat-Seng Chua

[ICMR'19, Oral / Project Page / Github / Poster / Cite]

Others

Reviewer for Conference: NeurIPS(Y23, Y24), ICLR(Y24,Y25), CVPR(Y22-Y25), ICCV(Y23,Y25), ECCV(Y22,Y24), AAAI(Y21-Y25), ACL(Y24), ACM MM(Y19-Y24), EMNLP(Y24), ACCV(Y24), ICASSP(Y21-Y22) etc.

Reviewer for Journal: PAMI, IJCV, TIP, TMM, TNNLS, ToMM, IPM, etc