Supercharge RAG with Multimodality and Azure Document Intelligence

About

The session focused on the advancements in NLP with Retrieval-Augmented Generation (RAG) models, highlighting their ability to combine information retrieval and generative language models for context-rich answers. It stressed the importance of incorporating image understanding and hierarchical document structure analysis to manage the visual data that accompanies textual information. The session provided an implementation guide using Azure Document Intelligence to convert images to markdown and recognize document structure, focusing on visual feature extraction, visual semantics, multimodal data fusion, structure recognition, semantic role labeling, and structure-aware retrieval. It also detailed the setup of Azure services and models, including GPT-4-Vision-Preview and Azure Document Intelligence, and offered a walkthrough for utilizing these tools.

Key Takeaways:

  • Advanced understanding of how to supercharge RAG models with multimodal data processing capabilities.
  • How could RAG models be improved to accurately interpret and incorporate visual information and complex document structures?
  • Practical insights into setting up the environment, coding, and using the tools to achieve a fully functioning multimodal RAG system.
  • Use of semantic chunking for improved accuracy in information retrieval and response generation.
  • The session will conclude with an example of how the multimodal RAG model accurately responded to a query related to a plot in a PDF document, including citations.

Speaker

video thumbnail
Book Tickets
Stay informed about DHS 2025

Download agenda

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details