I devote myself to developing AI techniques that can understand the physical world, interact and communicate with human beings to provide personalized assistance. The topics of interest cover video understanding, question answering, visual grounding, and robotics. The techniques emphasize multimodal large language models, robustness and trustworthiness. I am recently focusing on trustworthy Multimodal LLMs and their applications in egocentric embodied assistance. I am actively looking for Research Interns/Assistants/Visiting Students. (NUS master students who are with CV/NLP/MultiModal experiences are highly preferred.)
News
Six papers are accepted to SIGIR'25,ICMR'25,MICCAI'25 and ICCV'25 respectively
Invited to be reviewer in NeurIPS'25, MM'25.
I will give a talk about NExT-GQA: Visually Grounded VideoQA inivited by Twelve Labs
Two papers about video-language models and trustworthy K-VQA are accepted to ACL'24 and MM'24 respectively
Our exploration of VQA in trustworthiness,3D object affordance and ego-car accident (3 papers) all are accepted to CVPR'24
Invited to be reviewer in CVPR'24 and ICLR'24.
Two papers are accepted to T-PAMI'23 and ACM MM'23 respectively.
Two papers are accepted to T-PAMI'23 and ICCV'23 respectively.
Invited to serve as PC Member in AAAI'24.
Invited to be reviewer in NeurIPS'23 dataset and benchmark track.
Invited to be reviewer in ACM MM'23.
Successfully defensed my Ph.D.
Thesis: Visual Relation Driven Video Question Answering. Supervisor: Prof. Tat-Seng Chua. Committee: Prof. Mohan Kankanhalli, Prof. Roger Zimmermann. Chair: Prof. Terence Sim
Featured Publications
Others
Reviewer for Conference: NeurIPS(Y23, Y24), ICLR(Y24,Y25), CVPR(Y22-Y25), ICCV(Y23,Y25), ECCV(Y22,Y24), AAAI(Y21-Y25), ACL(Y24), ACM MM(Y19-Y24), EMNLP(Y24), ACCV(Y24), ICASSP(Y21-Y22) etc.
Reviewer for Journal: PAMI, IJCV, TIP, TMM, TNNLS, ToMM, IPM, etc