A Comprehensive Guide to Vision Language Models

About

This talk comprehensively introduces Vision-Language Models (VLMs), their importance, and a wide range of applications. It delves into the technical aspects of pre-training VLMs, covering common techniques and recent advancements. Attendees will gain hands-on experience through live demonstrations using open-source VLMs and minimal reproducible Colab notebooks. Additionally, the talk will focus on fine-tuning PaliGemma, Google's latest VLM, providing a step-by-step guide for specific tasks.

Key Takeaways:

  • In-depth Understanding of VLMs: Participants will learn the fundamentals of VLMs, their significance, and diverse use cases across various domains.
  • Technical Know-how: The talk will equip attendees with knowledge of pre-training techniques, including both established methods and cutting-edge research directions in the field.
  • Practical Skills: Through live code demonstrations using open-source VLMs and Colab notebooks, participants will gain hands-on experience and learn how to work with these models effectively.
  • Fine-tuning Expertise: The talk will provide a detailed walkthrough of fine-tuning PaliGemma, Google's latest VLM, enabling attendees to adapt the model for their tasks.
  • Combination of Theory and Practice: The session will balance conceptual depth and practical techniques, ensuring participants grasp both the theoretical underpinnings and the practical applications of VLMs.

Speaker

Book Tickets
Stay informed about DHS 2025

Download agenda

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details