Multimodal GenAI in Action: Bridging Text, Vision, and Beyond

About the Event

Join this insightful session to explore the exciting world of Multimodal Generative AI and its real-world impact. Discover how these powerful systems combine text, images, audio, and video to deliver richer, more human-like interactions. We’ll break down core architectures, explore alignment techniques, and showcase practical applications. Dive into two innovative systems: LLaVA, a vision-language assistant for visual Q&A, and AI Guide Dog (AIGD), which helps visually impaired users navigate in real time. Whether you're an AI enthusiast or a tech professional, this session will equip you with actionable insights into the future of multimodal AI.

Key Takeaways:

Understand how multimodal Generative AI integrates text, images, audio, and video for richer interactions.
Explore the core architectures and techniques that align multiple data modalities effectively.
Discover real-world applications of multimodal GenAI in healthcare, entertainment, and navigation.
Gain insights into systems like LLaVA and AI Guide Dog, showcasing practical multimodal AI implementations.

About the Speaker

Aishwarya Jadhav

Machine Learning Research Engineer at Waymo

Aishwarya Jadhav is a Machine Learning Engineer at Waymo (Google), specializing in Perception systems for autonomous robo-taxis. Previously, she worked on Tesla's Autopilot team, contributing to AI models for Full Self-Driving and the Optimus Robot. With expertise in computer vision and large-scale ML, Aishwarya has also worked at Google Ads and Morgan Stanley. She holds a Master’s in Computer and Data Science from Carnegie Mellon University, where she led the AI Guide Dog project, developing real-time navigation systems for the visually impaired. You can reach her on LinkedIn.

Participate in discussion

Registration Details

2317

Registered

Multimodal GenAI in Action: Bridging Text, Vision, and Beyond