Insights on spaCy, Prodigy and Generative AI by Ines Montani

Nitika Sharma Last Updated : 11 Jul, 2024

4 min read

In our latest episode of the Leading with data, we are thrilled to host Ines Montani, a renowned developer in the field of AI and NLP technology. As the co-founder and CEO of Explosion, and a co-developer of the leading open-source library spaCy and the innovative annotation tool Prodigy, Ines brings a wealth of knowledge and experience. This episode delves into the evolution of spaCy and Prodigy, the unique structure of Explosion, and the transformative impact of generative AI. Join us as we explore insights from the frontlines of NLP and decode the future of data science with Ines Montani.

You can listen to this episode of Leading with Data on popular platforms like Spotify, Google Podcasts, and Apple. Pick your favorite to enjoy the insightful content!

Key Insights from our Conversation with Ines Montani

The evolution of spaCy and Prodigy has been centered around enabling developers to build custom NLP solutions that are run in-house.
Explosion’s unique structure combines open-source libraries, consulting, and specialized tools like spaCy LLM to address industry-specific NLP challenges.
Generative AI has brought impressive advancements but also highlighted the need for structured data and custom tooling in industry applications.
The NLP industry is likely to see a shift towards smaller, more efficient models and increased discussions on data privacy and AI ethics.
For organizations, the decision between open-source models and big tech APIs should be based on the specific needs of their applications and the ability to control and understand their AI systems.
Young professionals entering the NLP field should focus on foundational skills and subject matter expertise to adapt to the evolving landscape of AI and machine learning.

Join our upcoming Leading with Data sessions for insightful discussions with AI and Data Science leaders!

Let’s look into the details of our conversation with Ines Montani :

How has the journey of spaCy and Prodigy evolved since 2017?

Since 2017, our focus has been on making it easier for users to not just use off-the-shelf models but to train their own. We’ve seen spaCy evolve with more components and use cases, especially in extracting structure from text. Our goal has been to enable developers to build custom solutions that they can run in-house, just like developing code. We’ve also been addressing the challenges that come with black box models and APIs, empowering developers to take back control of their NLP stack.

What is the unique structure of Explosion and how do the different components come together?

Explosion is structured around spaCy, our open-source library, and includes consulting and spaCy LLM. We’ve always aimed to build a business on top of spaCy, offering more than just the library while keeping it open source. We didn’t want to lock off features or offer only support, as that would compromise the ease of use. Instead, we developed Prodigy, an annotation tool designed as a developer tool, and we engage in consulting to apply our tools to real-world use cases. This helps us ensure that what we’re building is genuinely useful.

How have you personally experienced the generative AI wave?

The generative AI wave has been impressive, especially seeing how scaling up models can yield such good results. It’s been a mix of surprise and anticipation, as we’ve been closely watching how it fits into NLP workflows and what specific problems it solves. While there’s excitement about few-shot and zero-shot learning, we believe that structured data remains crucial, and there’s still a need for custom tooling around generative AI.

What are some common pain points in implementing generative AI in industry applications?

One major pain point is prompt engineering, which is still more of an art than a science. Another is the specificity required for business applications, as general-purpose models often don’t deliver good results for specialized terminology. Additionally, the dependency on large models and APIs can be economically and operationally challenging, with issues like lack of data privacy and deterministic output. We’re addressing these with spaCy LLM, which provides structured prediction tasks and a familiar output for developers.

What trends do you foresee in the NLP industry in the next few years?

I expect a movement towards smaller models, as there’s a lot of potential for them to be just as effective for specific tasks. There will likely be more discussion around data privacy and explainability, as well as a pushback against the monopolization of AI by big tech. Open-source models will continue to play a significant role, and we’ll see a return to focusing on workflows and tooling that support operations and product questions.

What excites you about the future applications of NLP, and what concerns you?

I’m excited about the potential for significantly better systems in structuring unstructured text and the advancements in multimodal data. However, I’m concerned about the overestimation of AI capabilities and the societal impact of misleading perceptions about AI. The misuse of technology and the propagation of bugs are more immediate threats than dystopian scenarios of AI dominance.

How should organizations decide between open-source models and relying on big tech APIs?

Organizations should consider whether they need generative model capabilities at runtime or if they can move this dependency to development. If real-time generation isn’t crucial, open-source models can be more economical and offer greater control. Investing time in creating high-quality data can lead to models that outperform large generative models on specific tasks, making open-source a viable option for many companies.

What advice would you give to young people entering the NLP domain?

Focus on developing core skills like programming and problem-solving rather than chasing the latest technologies. Understanding the basics of language and having subject matter expertise can be invaluable. Think from first principles and prioritize skills that will remain relevant regardless of technological trends.

Summing-up

Our conversation with Ines Montani offered deep insights into the dynamic world of NLP and AI. From the evolution of spaCy and Prodigy to the future trends in the NLP industry, Ines shared invaluable perspectives on the importance of structured data, custom tooling, and the balance between open-source models and big tech APIs. Her advice to young professionals emphasizes foundational skills and subject matter expertise. As we navigate the ever-evolving landscape of AI and machine learning, the insights from Ines Montani will undoubtedly serve as a guiding light. We wish all our listeners the best of luck in their data science journeys!

For more engaging sessions on AI, data science, and GenAI, stay tuned with us on Leading with Data.

Check our upcoming sessions here.

Nitika Sharma

Hello, I am Nitika, a tech-savvy Content Creator and Marketer. Creativity and learning new things come naturally to me. I have expertise in creating result-driven content strategies. I am well versed in SEO Management, Keyword Operations, Web Content Writing, Communication, Content Strategy, Editing, and Writing.

Leading with Data

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Insights on spaCy, Prodigy and Generative AI by Ines Montani

Key Insights from our Conversation with Ines Montani

How has the journey of spaCy and Prodigy evolved since 2017?

What is the unique structure of Explosion and how do the different components come together?

How have you personally experienced the generative AI wave?

What are some common pain points in implementing generative AI in industry applications?

What trends do you foresee in the NLP industry in the next few years?

What excites you about the future applications of NLP, and what concerns you?

How should organizations decide between open-source models and relying on big tech APIs?

What advice would you give to young people entering the NLP domain?

Summing-up

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au