Vision-Language Models for Embodied AI
Course description
This course focuses on visual-language modelling for robotics, exploring how multimodal foundation models enable robots to perceive, reason, and act from combined visual and linguistic inputs. In this course, students will learn to build and deploy vision-language models for tasks such as grounding, instruction following, and action planning in embodied environments, with emphasis on GenAI techniques and real-world robotic applications.
Instructor
Jihong Park
Number of credits: 12