Shulin Tian
shulin002 [at] ntu [dot] edu [dot] sg
I am a PhD student at Nanyang Technological University (NTU), Singapore, supervised by Prof. Ziwei Liu and Dr. Hongyuan Zhu .
Previously, I obtained my Bachelor's Degree from NTU, and I spent a wonderful time working with Prof. Ranjay Krishna at University of Washington on vision-language model reasoning, and Prof. Bihan Wen at NTU on low-light image enhancement.
News
[06/2026] HippoCamp and PerceptionComp are accepted to ECCV 2026. Congrats to all coauthors!
[05/2026] I am recognized as an outstanding reviewer for CVPR 2026. News link: here .
[05/2026] Ego-R1 is accepted to TPAMI. Congrats to all coauthors!
[04/2026] We release SimpleStream , a simple baseline for streaming video understanding.
[04/2026] We release HippoCamp and FileGram , two papers related to the file-system agentic memory.
[03/2026] We release Insight-V++ , towards advanced long-chain visual reasoning with multimodal large language models.
[02/2026] We release Demo-ICL , in-context learning for procedural video knowledge acquisition.
[06/2025] Evaluation Agent was selected for an oral presentation and SAC Highlight Award (43/8350) at ACL 2025. Congrats to all coauthors!
[06/2025] We release the Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning. Code and data can be found here .
[05/2025] MMInA leaderboard is now live at MMInA Proj Page .
[05/2025] Two papers are accepted to ACL 2025 (one main and one findings).
[03/2025] I am acknowledged as an outstanding reviewer for ICLR 2025 [SCOPE Workshop].
[01/2025] Our paper "AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation" is accepted to ICLR 2025.
[12/2024] Our paper "Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models" is released.
[08/2024] Starting my PhD at MMLab@NTU .
[04/2024] Our paper "MMInA: Benchmarking Multihop Multimodal Internet Agents" is released.
[06/2023] Our paper "Enhancing Low-Light Images Using Infrared-Encoded Images" is accepted to ICIP 2023.
(* equal contributions, † project lead, corresponding author)
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
Yalun Dai* ,
Hao Li* ,
Shulin Tian ,
Runmao Yao,
Yuhao Dong ,
Fangzhou Hong,
Zhaoxi Chen,
Fangfu Liu,
Baoliang Tian,
Dingwen Zhang,
Tao Wang ,
Kim-Hui Yap ,
Ziwei Liu
arXiv , 2026
Paper /
Project Page /
Code
Area: Spatial intelligence, tool-use agents, spatial reasoning
HippoCamp: Benchmarking Contextual Agents on Personal Computers
ECCV 2026
Paper /
Project Page /
Code /
Data
Area: Contextual agents, personal AI assistants, multimodal memory
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
arXiv , 2026
Paper /
Project Page /
Code /
Data
Area: Agent personalization, file-system memory, behavioral traces
SimpleStream: A Simple Baseline for Streaming Video Understanding
arXiv , 2026
Paper /
Project Page /
Code
Area: Streaming video understanding, VLM
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
ECCV 2026
Paper /
Project Page /
Code /
Data
Area: Video benchmark, perception-centric reasoning, multimodal LLM
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models
arXiv , 2026
Paper
Area: Visual reasoning, multimodal LLM
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
ACL 2026 (Main)
Paper /
Project Page /
Code
Area: Multimodal benchmark, understanding & generation
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
TPAMI | ICCV MMRAgI Workshop 2025
Paper /
Project Page /
Code /
Data
Area: Agentic tool-use, long video reasoning, egocentric
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
arXiv , 2026
Paper /
Code
Area: Video reasoning, in-context learning
Your browser does not support the video tag.
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
ACL 2025 (Main) 🏆 Oral + SAC Highlight Award
Paper /
Project Page /
Code
Area: Agent, GenAI
Your browser does not support the video tag.
MMInA: Benchmarking Multihop Multimodal Internet Agents
ACL 2025 (Findings)
Paper /
Project Page /
Code /
Data
Area: Multimodal agent benchmark on long-horizon reasoning
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
ICLR 2025
Paper /
Project Page
Area: Robotics, VLM
Enhancing Low-light Images Using Infrared Encoded Images
ICIP 2023
Paper /
Code /
Data
Area: Low-light image enhancement
Nanyang Technological University
PhD in Computer Science
Aug. 2024 - Present
Nanyang Technological University
BEng in Electrical & Electronic Engineering (Highest Distinction)
Aug. 2020 - May 2024
Professional Service
Talk Organizer: The AI Talks
Conference Reviewer: ACL 2025-26; ICLR 2026; CVPR 2026; ECCV 2026; ICRA 2025; EMNLP 2025; NAACL 2025
Journal Reviewer: IJCV
Workshop Reviewer: SCOPE @ ICLR 2025; LangRob @ CoRL 2024
Miscellanea
When it comes to music, I do:
When it comes to sports, I always try new things and do:
🤿 Diving: PADI Certificated Open Water (2022) & Advanced Open Water Diver (2024)
🏃♀️ Others: badminton, hiking...