In our latest episode of the Leading with data, we are thrilled to host Ines Montani, a renowned developer in the field of AI and NLP technology. As the co-founder and CEO of Explosion, and a co-developer of the leading open-source library spaCy and the innovative annotation tool Prodigy, Ines brings a wealth of knowledge and experience. This episode delves into the evolution of spaCy and Prodigy, the unique structure of Explosion, and the transformative impact of generative AI. Join us as we explore insights from the frontlines of NLP and decode the future of data science with Ines Montani.
You can listen to this episode of Leading with Data on popular platforms like Spotify, Google Podcasts, and Apple. Pick your favorite to enjoy the insightful content!
Let’s look into the details of our conversation with Ines Montani :
Since 2017, our focus has been on making it easier for users to not just use off-the-shelf models but to train their own. We’ve seen spaCy evolve with more components and use cases, especially in extracting structure from text. Our goal has been to enable developers to build custom solutions that they can run in-house, just like developing code. We’ve also been addressing the challenges that come with black box models and APIs, empowering developers to take back control of their NLP stack.
Explosion is structured around spaCy, our open-source library, and includes consulting and spaCy LLM. We’ve always aimed to build a business on top of spaCy, offering more than just the library while keeping it open source. We didn’t want to lock off features or offer only support, as that would compromise the ease of use. Instead, we developed Prodigy, an annotation tool designed as a developer tool, and we engage in consulting to apply our tools to real-world use cases. This helps us ensure that what we’re building is genuinely useful.
The generative AI wave has been impressive, especially seeing how scaling up models can yield such good results. It’s been a mix of surprise and anticipation, as we’ve been closely watching how it fits into NLP workflows and what specific problems it solves. While there’s excitement about few-shot and zero-shot learning, we believe that structured data remains crucial, and there’s still a need for custom tooling around generative AI.
One major pain point is prompt engineering, which is still more of an art than a science. Another is the specificity required for business applications, as general-purpose models often don’t deliver good results for specialized terminology. Additionally, the dependency on large models and APIs can be economically and operationally challenging, with issues like lack of data privacy and deterministic output. We’re addressing these with spaCy LLM, which provides structured prediction tasks and a familiar output for developers.
I expect a movement towards smaller models, as there’s a lot of potential for them to be just as effective for specific tasks. There will likely be more discussion around data privacy and explainability, as well as a pushback against the monopolization of AI by big tech. Open-source models will continue to play a significant role, and we’ll see a return to focusing on workflows and tooling that support operations and product questions.
I’m excited about the potential for significantly better systems in structuring unstructured text and the advancements in multimodal data. However, I’m concerned about the overestimation of AI capabilities and the societal impact of misleading perceptions about AI. The misuse of technology and the propagation of bugs are more immediate threats than dystopian scenarios of AI dominance.
Organizations should consider whether they need generative model capabilities at runtime or if they can move this dependency to development. If real-time generation isn’t crucial, open-source models can be more economical and offer greater control. Investing time in creating high-quality data can lead to models that outperform large generative models on specific tasks, making open-source a viable option for many companies.
Focus on developing core skills like programming and problem-solving rather than chasing the latest technologies. Understanding the basics of language and having subject matter expertise can be invaluable. Think from first principles and prioritize skills that will remain relevant regardless of technological trends.
Our conversation with Ines Montani offered deep insights into the dynamic world of NLP and AI. From the evolution of spaCy and Prodigy to the future trends in the NLP industry, Ines shared invaluable perspectives on the importance of structured data, custom tooling, and the balance between open-source models and big tech APIs. Her advice to young professionals emphasizes foundational skills and subject matter expertise. As we navigate the ever-evolving landscape of AI and machine learning, the insights from Ines Montani will undoubtedly serve as a guiding light. We wish all our listeners the best of luck in their data science journeys!
For more engaging sessions on AI, data science, and GenAI, stay tuned with us on Leading with Data.