Vision Language Models: Challenges of RealWorld Deployment

1st Workshop on VLM4RWD | NeurIPS 2025 | Nov 30th, 2025 | Mexico City, Mexico

Workshop Overview

Vision language models (VLMs) have demonstrated remarkable capabilities in integrating visual perception with natural language understanding, powering applications such as multimodal assistants, robotics, autonomous systems, and accessibility tools. However, their real-world deployment faces significant challenges in efficiency, scalability, and reliability. This workshop will bring together researchers and practitioners from academia and industry to highlight cutting-edge research, systems-level optimizations, and evaluation methodologies that are often overlooked yet pivotal for robust real-world integration. Efficiency, robustness, and reliability will be emphasized as core design principles, essential to advancing VLMs from experimental systems to dependable deployed technologies. By convening researchers at the intersection of multimodal learning, efficient inference and training, robustness and uncertainty estimation, and large-scale systems design, the workshop aims to establish concrete pathways toward building VLMs that can operate reliably under practical constraints. We hope this workshop will serve as a venue for exchanging insights on model design, efficiency techniques, and robustness evaluation that bridge the gap between research and real-world systems.

Call for Papers

Overview


The VLM4RWD workshop @ NeurIPS 2025 invite high-quality submissions that identify key challenges, introduce novel methodologies, and push the boundaries of efficient and reliable vision language models toward the next generation of autonomous systems. In addition to research papers, we also welcome demos and poster abstracts aligned with the workshop theme.
Accepted papers will be presented as posters during the poster sessions. Selected works will also be highlighted as contributed talks. Topics of interest include, but are not limited to:

Topics


  • Data pipelines for efficient multimodal learning, from pretraining, to finetuning, and ultimately test-time adaptation
  • Approaches for accelerating VLMs inference through the design of algorithms and systems
  • Compression and distillation for VLMs deployment under constrained resources
  • Sparse, modular, and retrieval-augmented VLM architectures
  • Effective and efficient training for complex reasoning tasks in visual scenes
  • Robust training and evaluation of robustness of VLMs
  • Benchmarks for deployment-oriented evaluation of VLMs
  • Mitigating hallucination and improving multimodal grounding
  • Emerging frontiers: agentic VLMs for real-world integration

Submission Guidelines


  • Formatting Instructions: Workshop papers should be max 8 pages in length (excluding refrences and appendix), following the NeurIPS 2025 main conference format. Supplementary materials and appendices are allowed but will be considered optional for reviewers.
  • Submission link: Submissions will be managed via: OpenReview submission system.
  • Reviews: The review process will be double-blind. All submissions must be anonymized and the leakage of any identification information is prohibited.
  • What can be sumitted: Consistent with common workshop practices, we also welcome demo papers (2-4 pages) and extended abstracts (2 pages). we also welcome submissions of papers previously published; however, they will not be eligible for awards and will be limited to poster session.

Important Dates

Paper Submission

Oct 31th, 2025 (AoE) Nov 5th, 2025

Notification

Nov 12th, 2025

Camera Ready

Nov 22th, 2025

Workshop Date

Nov 30th, 2025

Schedule

Half-day workshop at NeurIPS 2025, Nov 30th, Mexico City, Mexico

13:00 - 13:10

Opening Remarks

Welcome and introduction to the workshop

13:10 - 13:40
Roozbeh Mottaghi

Keynote: Dr. Roozbeh Mottaghi

Meta FAIR/UWashington

Title: Visual Embodied Planning

13:40 - 14:10
Jiachen Li

Keynote: Dr. Jiachen Li

University of California, Riverside

Title: Toward Reliable Robotic Foundation Models in the Open World

14:10 - 14:30
Krzysztof Czarnecki

Keynote: Dr. Krzysztof Czarnecki

University of Waterloo

Title: EdgeScenes: A Road Hazard Ontology and Video Benchmarking and Engineering AI-Based Driving Automation

14:30 - 15:00
Elahe Arani

Keynote: Dr. Elahe Arani

Wayve

Title: From Pixels and Words to Action: Vision Language World Models for Embodied AI

15:00 - 15:40

Poster/Interactive Demo Session and Coffee Break

Accepted poster and demos author discuss their work

15:40 - 16:10
Yan Wang

Keynote: Dr. Yan Wang

NVIDIA

Title: Advancing End-To-End Autonomous Driving With Reasoning Vision-Language-Action Models​

16:10 - 16:40
Zhijing-Jin Mohsen Fayyaz Krzysztof Czarnecki

Panel Discussion: Dr. Zhijing Jin, Dr. Mohsen Fayyaz, Dr. Krzysztof Czarnecki

University of Toronto, Microsoft, University of Waterloo

Title: Causal and Temporal Reasoning for Video Understanding

16:40 - 17:10
Behrad

Keynote: Dr. Behrad Toghi

General Motors

Title: Challenges of Real World Deployments for Physical AI

17:10 - 17:40

Contributed Oral Presentations

Three papers will be presented by their authors.

17:40 - 17:50

Closing Discussion & Best Paper Reward

Workshop closing remarks

17:50 - 20:00

Social event

After the workshop, participants are welcome to stay for an informal social gathering. This will be an opportunity for authors and attendees to discuss posters, share ideas, and continue conversations.

Invited Speakers

Invited Panelist

Accepted Papers

Closed-Task Validation: A More Robust and Efficient Proxy for Guiding VLM Training Enci Zhang, Z.Q. ZHANG, Jiahao Xie, Ruiqi Lu, Boyan Zhou, Cheng Yang
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning Jingkun Ma, Runzhe Zhan, Yang Li, Di Sun, Hou Pong Chan, Lidia S. Chao, Derek F. Wong
From Scenes to Semantics: PersianCLEVR for Bilingual 3D Visual Reasoning Kianoosh Vadaei, Melika Shirian, Arshia Hemmat, Mohammad Hassan Heydari, Ali Mamanpoosh, Afsaneh Fatemi
Efficient Inference Scaling for Safety Assurance Ruizhong Qiu, Gaotang Li, Ting-Wei Li, Tianxin Wei, Jingrui He, Hanghang Tong
Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar
Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning Zuyao You
MedVCTP: Improving Accuracy and Explainability in Medical Visual Reasoning Aman Syed, Siwon Ryu, Nayan Saxena, Kevin Zhu
Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning Qingyuan Wu, Jianheng Liu, Jianye HAO, Jun Wang, Kun Shao
UpstreamQA: A Modular Framework for Explicit Reasoning on Video Question Answering Tasks Jason Nguyen, Ameet Rao, Alexander Chang, Ishaan Kumar, Erin Tan
AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs Aahana Basappa, Pranay Goel, Anusri Karra, Anish Karra, Asa Gilmore, Kevin Zhu
Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction Hangxuan Li, Renjun Jia, Xuezhang Wu, zeqi zheng, Yunjie Qian, Xianling Zhang
Do Vision–Language Models Understand Visual Persuasiveness? Gyuwon Park
MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao
From Vision to Action: Enabling Real-World Agentic VLMs Aravilli Atchuta Ram
A Comprehensive Survey of Multimodal LLMs for Scientific Discovery Liang Yan, Xu Jiang, Jian Ma, Yuhang Liu, Tian Bian, Qichao Wang, Abhishek Basu, Yu Rong, Tingyang Xu, Pengcheng Wu, Le Song, Imran Razzak, Junchi Yan, Zengfeng Huang, Yutong Xie
Scene Understanding via Scene Representation Generation with Vision-Language Models Yuan Chen, Peng Shi
Efficient Vision-Language Reasoning via Adaptive Token Pruning Xue li, Xiaonan Song

Organizing Committee

Contact Us

Questions or feedback? Feel free to reach out. We would love to hear from you.

Email

For general inquiries:

mnasraza@uwaterloo.ca

yimu.wang@uwaterloo.ca

Paper Submissions

Paper submissions will be managed through:

OpenReview Submission System

Workshop Location

NeurIPS 2025

Mexico City, Mexico

Exact venue details will be announced closer to the event

Frequently Asked Questions

Is the workshop in-person or virtual?

The workshop will be held in-person at NeurIPS 2025 in Mexico City, Mexico.

Will the workshop proceedings be archival?

No, the workshop proceedings will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.