Workshop Overview

Vision language models (VLMs) have demonstrated remarkable capabilities in integrating visual perception with natural language understanding, powering applications such as multimodal assistants, robotics, autonomous systems, and accessibility tools. However, their real-world deployment faces significant challenges in efficiency, scalability, and reliability. This workshop will bring together researchers and practitioners from academia and industry to highlight cutting-edge research, systems-level optimizations, and evaluation methodologies that are often overlooked yet pivotal for robust real-world integration. Efficiency, robustness, and reliability will be emphasized as core design principles, essential to advancing VLMs from experimental systems to dependable deployed technologies. By convening researchers at the intersection of multimodal learning, efficient inference and training, robustness and uncertainty estimation, and large-scale systems design, the workshop aims to establish concrete pathways toward building VLMs that can operate reliably under practical constraints. We hope this workshop will serve as a venue for exchanging insights on model design, efficiency techniques, and robustness evaluation that bridge the gap between research and real-world systems.

Call for Papers

Overview

The VLM4RWD workshop @ NeurIPS 2025 invite high-quality submissions that identify key challenges, introduce novel methodologies, and push the boundaries of efficient and reliable vision language models toward the next generation of autonomous systems. In addition to research papers, we also welcome demos and poster abstracts aligned with the workshop theme.
Accepted papers will be presented as posters during the poster sessions. Selected works will also be highlighted as contributed talks. Topics of interest include, but are not limited to:

Topics

Data pipelines for efficient multimodal learning, from pretraining, to finetuning, and ultimately test-time adaptation
Approaches for accelerating VLMs inference through the design of algorithms and systems
Compression and distillation for VLMs deployment under constrained resources
Sparse, modular, and retrieval-augmented VLM architectures
Effective and efficient training for complex reasoning tasks in visual scenes
Robust training and evaluation of robustness of VLMs
Benchmarks for deployment-oriented evaluation of VLMs
Mitigating hallucination and improving multimodal grounding
Emerging frontiers: agentic VLMs for real-world integration

Submission Guidelines

Formatting Instructions: Workshop papers should be max 8 pages in length (excluding refrences and appendix), following the NeurIPS 2025 main conference format. Supplementary materials and appendices are allowed but will be considered optional for reviewers.
Submission link: Submissions will be managed via: OpenReview submission system.
Reviews: The review process will be double-blind. All submissions must be anonymized and the leakage of any identification information is prohibited.
What can be sumitted: Consistent with common workshop practices, we also welcome demo papers (2-4 pages) and extended abstracts (2 pages). we also welcome submissions of papers previously published; however, they will not be eligible for awards and will be limited to poster session.

Schedule

Half-day workshop at NeurIPS 2025, Nov 30th, Mexico City, Mexico

13:00 - 13:10

Opening Remarks

Welcome and introduction to the workshop

13:10 - 13:40

Keynote: Dr. Roozbeh Mottaghi

Meta FAIR/UWashington

Title: Visual Embodied Planning

13:40 - 14:10

Keynote: Dr. Jiachen Li

University of California, Riverside

Title: Toward Reliable Robotic Foundation Models in the Open World

14:10 - 14:30

Keynote: Dr. Krzysztof Czarnecki

University of Waterloo

Title: EdgeScenes: A Road Hazard Ontology and Video Benchmarking and Engineering AI-Based Driving Automation

14:30 - 15:00

Keynote: Dr. Elahe Arani

Wayve

Title: From Pixels and Words to Action: Vision Language World Models for Embodied AI

15:00 - 15:40

Poster/Interactive Demo Session and Coffee Break

Accepted poster and demos author discuss their work

15:40 - 16:10

Keynote: Dr. Yan Wang

NVIDIA

Title: Advancing End-To-End Autonomous Driving With Reasoning Vision-Language-Action Models

16:10 - 16:40

Panel Discussion: Dr. Zhijing Jin, Dr. Mohsen Fayyaz, Dr. Krzysztof Czarnecki

University of Toronto, Microsoft, University of Waterloo

Title: Causal and Temporal Reasoning for Video Understanding

16:40 - 17:10

Keynote: Dr. Behrad Toghi

General Motors

Title: Challenges of Real World Deployments for Physical AI

17:10 - 17:40

Contributed Oral Presentations

Three papers will be presented by their authors.

17:40 - 17:50

Closing Discussion & Best Paper Reward

Workshop closing remarks

17:50 - 20:00

Social event

After the workshop, participants are welcome to stay for an informal social gathering. This will be an opportunity for authors and attendees to discuss posters, share ideas, and continue conversations.

Accepted Papers

Closed-Task Validation: A More Robust and Efficient Proxy for Guiding VLM Training Enci Zhang, Z.Q. ZHANG, Jiahao Xie, Ruiqi Lu, Boyan Zhou, Cheng Yang

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning Jingkun Ma, Runzhe Zhan, Yang Li, Di Sun, Hou Pong Chan, Lidia S. Chao, Derek F. Wong

From Scenes to Semantics: PersianCLEVR for Bilingual 3D Visual Reasoning Kianoosh Vadaei, Melika Shirian, Arshia Hemmat, Mohammad Hassan Heydari, Ali Mamanpoosh, Afsaneh Fatemi

Efficient Inference Scaling for Safety Assurance Ruizhong Qiu, Gaotang Li, Ting-Wei Li, Tianxin Wei, Jingrui He, Hanghang Tong

Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar

Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning Zuyao You

MedVCTP: Improving Accuracy and Explainability in Medical Visual Reasoning Aman Syed, Siwon Ryu, Nayan Saxena, Kevin Zhu

Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning Qingyuan Wu, Jianheng Liu, Jianye HAO, Jun Wang, Kun Shao

UpstreamQA: A Modular Framework for Explicit Reasoning on Video Question Answering Tasks Jason Nguyen, Ameet Rao, Alexander Chang, Ishaan Kumar, Erin Tan

AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs Aahana Basappa, Pranay Goel, Anusri Karra, Anish Karra, Asa Gilmore, Kevin Zhu

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction Hangxuan Li, Renjun Jia, Xuezhang Wu, zeqi zheng, Yunjie Qian, Xianling Zhang

Do Vision–Language Models Understand Visual Persuasiveness? Gyuwon Park

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao

From Vision to Action: Enabling Real-World Agentic VLMs Aravilli Atchuta Ram

A Comprehensive Survey of Multimodal LLMs for Scientific Discovery Liang Yan, Xu Jiang, Jian Ma, Yuhang Liu, Tian Bian, Qichao Wang, Abhishek Basu, Yu Rong, Tingyang Xu, Pengcheng Wu, Le Song, Imran Razzak, Junchi Yan, Zengfeng Huang, Yutong Xie

Scene Understanding via Scene Representation Generation with Vision-Language Models Yuan Chen, Peng Shi

Efficient Vision-Language Reasoning via Adaptive Token Pruning Xue li, Xiaonan Song

Contact Us

Questions or feedback? Feel free to reach out. We would love to hear from you.

Email

For general inquiries:

mnasraza@uwaterloo.ca

yimu.wang@uwaterloo.ca

Paper Submissions

Paper submissions will be managed through:

OpenReview Submission System

Workshop Location

NeurIPS 2025

Mexico City, Mexico

Exact venue details will be announced closer to the event

Frequently Asked Questions

Is the workshop in-person or virtual?

The workshop will be held in-person at NeurIPS 2025 in Mexico City, Mexico.

Will the workshop proceedings be archival?

No, the workshop proceedings will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

Vision Language Models: Challenges of RealWorld Deployment

Workshop Overview

Call for Papers

Overview

Topics

Submission Guidelines

Important Dates

Paper Submission

Notification

Camera Ready

Workshop Date

Schedule

Opening Remarks

Keynote: Dr. Roozbeh Mottaghi

Keynote: Dr. Jiachen Li

Keynote: Dr. Krzysztof Czarnecki

Keynote: Dr. Elahe Arani

Poster/Interactive Demo Session and Coffee Break

Keynote: Dr. Yan Wang

Panel Discussion: Dr. Zhijing Jin, Dr. Mohsen Fayyaz, Dr. Krzysztof Czarnecki

Keynote: Dr. Behrad Toghi

Contributed Oral Presentations

Closing Discussion & Best Paper Reward

Social event

Invited Speakers

Roozbeh Mottaghi

Jiachen Li

Elahe Arani

Yan Wang

Behrad Toghi

Invited Panelist

Mohsen Fayyaz

Zhijing Jin

Accepted Papers

Organizing Committee

Mozhgan Nasr Azadani

Yimu Wang

Krzysztof Czarnecki

Negar Arabzadeh

Richard He Bai

Lukas Schmid

Our Sponsor