Vision-Language Models for Embodied AI

Course description

This course focuses on visual-language modelling for robotics, exploring how multimodal foundation models enable robots to perceive, reason, and act from combined visual and linguistic inputs. In this course, students will learn to build and deploy vision-language models for tasks such as grounding, instruction following, and action planning in embodied environments, with emphasis on GenAI techniques and real-world robotic applications.

Instructor

Jihong Park

 

 

Number of credits: 12