TIPSv2 (CVPR'26) and TIPS (ICLR'25)
-
Updated
Jun 1, 2026 - Jupyter Notebook
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
Cambrian-S: Towards Spatial Supersensing in Video
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Visual Spatial Tuning
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models
[NeurIPS DB 2025] IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
[CVPR 2026 (Highlight)] 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
[ICLR 2026] 🦅 FALCON: an effective vision-language-action model injects rich 3D spatial tokens into the action head, enabling robust spatial understanding and SOTA performance across diverse manipulation tasks.
[AAAI 2026 Oral] STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
POMA-3D: The Point Map Way to 3D Scene Understanding.
[ECCV 2026] One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models (Layered 3D Spatial Understanding)
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
[CVPR 2026] HandVQA: Diagnosing and Improving Fine-Grained Spatial Reasoning about Hands in Vision-Language Models
Autoregressive Mosaics is a project that attempts to force an LLM trained only on text to paint a picture one discrete pixel at a time.
Add a description, image, and links to the spatial-understanding topic page so that developers can more easily learn about it.
To associate your repository with the spatial-understanding topic, visit your repo's landing page and select "manage topics."