Automate Blog to Twitter Thread using Gemini-2.0, LangChain, and Streamlit

Avijit Biswas Last Updated : 13 Jan, 2025
12 min read

In today’s digital landscape, content repurposing has become crucial for maximizing reach and engagement. One effective strategy is transforming long-form content like blog posts into engaging Twitter threads. However, manually creating these threads can be time-consuming and challenging. In this article, we’ll explore how to build an application to automate blog to Twitter thread creation using Google’s Gemini-2.0 LLM, ChromaDB, and Streamlit.

Automate Blog To Twitter Thread using Gemini-2.0, Langchain, and Streamlit

Learning Objectives

  • Automate blog to Twitter thread transformation using Google’s Gemini-2.0, ChromaDB, and Streamlit for efficient content repurposing.
  • Gain hands-on experience to build automate blog to Twitter thread with embedding models and AI-driven prompt engineering.
  • Understand the capabilities of Google’s Gemini-2.0 LLM for automated content transformation.
  • Explore the integration of ChromaDB for efficient semantic text retrieval.
  • Build a Streamlit-based web application for seamless PDF-to-Twitter thread conversion.
  • Gain hands-on experience with embedding models and prompt engineering for content generation.

This article was published as a part of the Data Science Blogathon.

What is Gemini-2.0?

Gemini-2.0 is Google’s latest multimodal Large Language Model (LLM), representing a significant advancement in AI capabilities. It is now available as Gemini-2.0-flash-exp API in Vertext AI Studio. It offers improved performance in areas like:

  • Multimodal understanding , coding, complex instructions following and function calling in natural language.
  • Context-aware content creation.
  • Complex reasoning and analysis.
  • It has native image generation, image editing, controllable text-to-speech generation.
  • Low-latency responses with the Flash variant.

For our project, we are specifically using the gemini-2.0-flash-exp model API, which is optimized for quick response while maintaining high-quality output.

What is the ChromaDB Vector Database?

ChromaDB is an open-source embedding database that excels at storing and retrieving vector embeddings. It is a high-performance database designed for efficient storing, searching, and managing embeddings generated by AI models. It enables similarity searches by indexing and comparing vectors based on their proximity to other similar vectors in multidimensional space.

What is the ChromaDB Vector Database?
Source: Chroma
  • Efficient similar search capabilities
  • Easy integration with popular embedding models
  • Local storage and persistence
  • Flexible querying options
  • Lightweight deployment

In our application, ChromaDB is the backbone for storing and retrieving relevant chunks of text based on semantic similarity, enabling more contextual and accurate thread generation.

What is Streamlit UI?

Streamlit is an open-source Python library designed to quickly build interactive and data-driven web applications for AI/ML projects. Its focus on simplicity enables developers to create visually appealing and functional apps with minimal effort.

Key Features:

  • Ease of Use: Developers can turn Python scripts into web apps with a few lines of code.
  • Widgets: It offers a wide range of input widgets (sliders, dropdowns, text inputs) to make applications interactive.
  • Data Visualization: It Supports integration with popular Python libraries like Matplotlib, Plotly, and Altair for dynamic viz.
  • Real-time Updates: Automatically rerun apps when code or input changes, providing a seamless user experience.
  • No Web Development Required: Remove the need to learn HTML, CSS, or Javascript.

Application of StreamLit

Streamlit is widley used for building bashboards, exploratory data analysis tools, AI/ML application prototypes. Its simplicity and interactivity makes it ideal for rapid prototying and sharing insights with non-technical stakeholders. We are using streamlit for desiging the interface for the our application.

Motivation for Tweet Generation Automation

The primary motivation behind automating tweet thread generation include:

  • Time efficiency: Reducing the annual effort required to create engaging Twitter threads.
  • Consistency: Maintaining a consistent voice and format across all threads.
  • Scalability: Processing multiple article quickly and efficiently.
  • Enhanced engagement: Leveraging AI to create more compelling and shareable content.
  • Content optimization: Using data-driven approaches to structure threads effectively.

Project Environmental Setup Using Conda

To set up the project environment, follow these steps:

#create a new conda env
conda create -n tweet-gen python=3.11
conda activate tweet-gen

Install required packages

pip install langchain langchain-community langchain-google-genai
pip install chromadb streamlit python-dotenv pypdf pydantic

Now create a project folder named BlogToTweet or whatever you wish.

Also, create a .env file in your project root. Get your GOOGLE API KEY from here and put it in the .env file.

GOOGLE_API_KEY="<your API KEY>"

We are all set up to dive into the main implementation part.

Project Implementation

In our project, there are four important files each having its functionality for better development. 

  • Services: For putting all the important services in it.
  • models: For all the important Pydantic data models.
  • main: For testing the automation in the terminal.
  • app: For Streamlit UI implementation.

Implementing Models

We will start with implementing Pydantic data models in the models.py file. What is Pydantic? read this.

from typing import Optional, List
from pydantic import BaseModel

class ArticleContent(BaseModel):
    title: str
    content: str
    author: Optional[str]
    url: str

class TwitterThread(BaseModel):
    tweets: List[str]
    hashtags: List[str]

It is a simple yet important model that will give the article content and all the tweets a consistent structure.

Implementing Services

The ContentRepurposer handles the core functionality of the application. Here is the skeletal structure of that class.

# services.py
import os
from dotenv import load_dotenv
from typing import List
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from models import ArticleContent, TwitterThread

class ContentRepurposer:
    def __init__(self, content):
        pass

    def process_pdf(self, pdf_path: str) -> ArticleContent:
        pass

    def get_relevant_chunk(self, query: str, k: int = 3) -> List[str]:
        pass

    def generate_twitter_thread(self, article: ArticleContent):
        pass

    def process_article(self, pdf_path: str):
        pass

In the initial method, we will put all important parameters of the class

def __init__(self):
        from pydantic import SecretStr

        google_api_key = os.getenv("GOOGLE_API_KEY")
        if google_api_key is None:
            raise ValueError("GOOGLE_API_KEY environment variable is not set")
        _google_api_key = SecretStr(google_api_key)
        
        # Initialize Gemini model and embeddings
        self.embeddings = GoogleGenerativeAIEmbeddings(
            model="models/embedding-001",
        )
        self.llm = ChatGoogleGenerativeAI(

            model="gemini-2.0-flash-exp",
            temperature=0.7)
        
        # Initialize text splitter
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["\n\n", "\n", " ", ""]
        )

Here, we use Pydantic SecretStr for the secure use of the API_KEY, for embedding our articles we use the GoogleGenerativeAIEmbeddings function using the embedding-001 model. To create the tweets from the article we will use the ChatGoogleGenerativeAI function with the latest Gemini-2.0-flash-exp model. RecursiveCharacterTextSplitter is used for splitting a large document into parts here we split the document in chunk_size 1000 with 200 character overlap.

Processing PDF

The system processes PDFs using PyPDFLoader from LangChain and implements text chunking.

def process_pdf(self, pdf_path: str) -> ArticleContent:
        """Process local PDF and create embeddings"""
        # Load PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load()
        
        # Extract text
        text = " ".join(page.page_content for page in pages)
        
        # Split text into chunks
        chunks = self.text_splitter.split_text(text)
        
        # Create and store embeddings in Chroma
        self.vectordb = Chroma.from_texts(
            texts=chunks,
            embedding=self.embeddings,
            persist_directory="./data/chroma_db"
        )
        
        # Extract title and author
        lines = [line.strip() for line in text.split("\n") if line.strip()]
        title = lines[0] if lines else "Untitled"
        author = lines[1] if len(lines) > 1 else None
        
        return ArticleContent(
            title=title,
            content=text,
            author=author,
            url=pdf_path
        )

In the above code, we implement the PDF processing functionality of the application.

  • Load and Extract PDF Text: The PyPDFLoader reads the PDF file and extracts the text content from all pages, concatenating it into a single string.
  • Split Text into Chunks: The text is divided into smaller chunks using the text_splitter for better processing and bedding creation.
  • Generate Embeddings: Chroma creates vector embeddings from the text chunks and stores them in a persistent database directory.
  • Extract Title and Author: The first non-empty line is used as the title, and the second as the author.
  • Return Article Content: Construct an Article Content object containing the title, full text, author, and file path.

Getting the relevant Chunk

def get_relevant_chunks(self, query: str, k: int = 3) -> List[str]:
        """Retrieve relevant chunks from vector database"""
        results = self.vectordb.similarity_search(query, k=k)
        return [doc.page_content for doc in results]

This code retrieves the top k (default 3) most relevant text chunks from the vector database based on similarity to the given query.

Generating Tweet thread from Article

This method is the most important because here we will use all the generative AI, embedding, and prompts together to generate the Thread from the client’s PDF file.

def generate_twitter_thread(self, article: ArticleContent) -> TwitterThread:
        """Generate Twitter thread using Gemini"""
        # First, get the most relevant chunks for different aspects
        intro_chunks = self.get_relevant_chunks("introduction and main points")
        technical_chunks = self.get_relevant_chunks("technical details and implementation")
        conclusion_chunks = self.get_relevant_chunks("conclusion and key takeaways")
       
        thread_prompt = PromptTemplate(
            input_variables=["title", "intro", "technical", "conclusion"],
            template="""
            Write an engaging Twitter thread (8-10 tweets) summarizing this technical article in an approachable and human-like style.

            Title: {title}

            Introduction Context:
            {intro}

            Technical Details:
            {technical}

            Key Takeaways:
            {conclusion}

            Guidelines:
            1. Start with a hook that grabs attention (e.g., a surprising fact, bold statement, or thought-provoking question).
            2. Use a conversational tone and explain complex details simply, without jargon.
            3. Include concise tweets under 280 characters, following the 1/n numbering format.
            4. Break down the key insights logically, and make each tweet build curiosity for the next one.
            5. Include relevant examples, analogies, or comparisons to aid understanding.
            6. End the thread with a strong conclusion and a call to action (e.g., "Read the full article," "Follow for more insights").
            7. Make it relatable, educational, and engaging.

            Output format:
            - A numbered list of tweets, with each tweet on a new line.
            - After the tweets, suggest 3-5 hashtags that summarize the thread, starting with #.
            """
        )
        
        chain = LLMChain(llm=self.llm, prompt=thread_prompt)
        result = chain.run({
            "title": article.title,
            "intro": "\n".join(intro_chunks),
            "technical": "\n".join(technical_chunks),
            "conclusion": "\n".join(conclusion_chunks)
        })
        
        # Parse the result into tweets and hashtags
        lines = result.split("\n")
        tweets = [line.strip() for line in lines if line.strip() and not line.strip().startswith("#")]
        hashtags = [tag.strip() for tag in lines if tag.strip().startswith("#")]
        
        # Ensure we have at least one tweet and hashtag
        if not tweets:
            tweets = ["Thread about " + article.title]
        if not hashtags:
            hashtags = ["#AI", "#TechNews"]
            
        return TwitterThread(tweets=tweets, hashtags=hashtags)

Let’s understand what is happening in the above code step by step

  • Retrieve Relevant Chunks: The method first extracts relevant chunks of text for the introduction, technical details, and conclusion using the get_relevant_chunks method.
  • Prepare a Prompt: A PromptTemplate is created with instructions to write an engaging Twitter thread summarizing the article, including details on tone, structure, and formatting guidelines.
  • Run the LLM Chain: The LLMChain is used with the LLM models to process the prompt and generate a thread based on the article’s title and extracted chunks.
  • Parse Results: The generated output is split into tweets and hashtags, ensuring proper formatting and extracting the necessary components.
  • Return Twitter Thread: The method returns a TwitterThread object containing the formatted tweets and hashtags.

Process The Article

This method processes a PDF file to extract its content and generates a Twitter thread summarizing it. and last it will return a Twitter Thread.

def process_article(self, pdf_path: str) -> TwitterThread:
        """Main method to process article and generate content"""
        try:
            article = self.process_pdf(pdf_path)
            thread = self.generate_twitter_thread(article)
            return thread
        except Exception as e:
            print(f"Error processing article: {str(e)}")
            raise

Upto here We implemented all the necessary code for this project, now there are two ways we can proceed further.

  • Implementing the Main file for testing and
  • Implementing Streamlit Application for the web interface

If you don’t want to test the application in terminal mode then you can skip the Main file implementation and go directly to the Streamlit Application implementation.

Implementing the Main file for testing

Now, we put together all the modules to test the application.

import os
from dotenv import load_dotenv
from services import ContentRepurposer


def main():
    # Load environment variables
    load_dotenv()
    google_api_key = os.getenv("GOOGLE_API_KEY")

    if not google_api_key:
        raise ValueError("GOOGLE_API_KEY environment variable not found")

    # Initialize repurposer
    repurposer = ContentRepurposer()

    # Path to your local PDF
    # pdf_path = "data/guide_to_jax.pdf"
    pdf_path = "data/build_llm_powered_app.pdf"

    try:
        thread = repurposer.process_article(pdf_path)

        print("Generated Twitter Thread:")
        for i, tweet in enumerate(thread.tweets, 1):
            print(f"\nTweet {i}/{len(thread.tweets)}:")
            print(tweet)

        print("\nSuggested Hashtags:")
        print(" ".join(thread.hashtags))

    except Exception as e:
        print(f"Failed to process article: {str(e)}")


if __name__ == "__main__":
    main()

Here, you can see that it simply imports all the modules, Check the GOOGLE_API_KEY availability, initiates ContentRepuposer() class, and then in the try block creates a thread by calling the process_article() method from the repurposer object. At the last, some printing methods for tweets printing on the terminal and the Exception handling.

To test the application, create a folder named data in your project root and put your downloaded PDF there. To download the article from AnalyticsVidya, go to any article click the download button, and download it.

Now on your terminal,

python main.py

Example Blog 1 Output

Example1 Output:

Example Blog 2 Output

Example2 Output:

I think you get the idea of how beautiful the application is! Let’s make it more aesthetically practical.

Implementing the Streamlit APP

Now we will do pretty much the same as above in a more UI-centric way.

Importing Libraries and Env Configuration

import os
import streamlit as st
from dotenv import load_dotenv
from services import ContentRepurposer
import pyperclip
from pathlib import Path

# Load environment variables
load_dotenv()

# Set page configuration
st.set_page_config(page_title="Content Repurposer", page_icon="🐦", layout="wide")

Custom CSS

# Custom CSS
st.markdown(
    """
<style>
    .tweet-box {
        background-color: #181211;
        border: 1px solid #e1e8ed;
        border-radius: 10px;
        padding: 15px;
        margin: 10px 0;
    }
    .copy-button {
        background-color: #1DA1F2;
        color: white;
        border: none;
        border-radius: 5px;
        padding: 5px 10px;
        cursor: pointer;
    }
    .main-header {
        color: #1DA1F2;
        text-align: center;
    }
    .hashtag {
        color: #1DA1F2;
        background-color: #E8F5FE;
        padding: 5px 10px;
        border-radius: 15px;
        margin: 5px;
        display: inline-block;
    }
</style>
""",
    unsafe_allow_html=True,
)

Here, we have made some CSS styling for the web pages (tweets, copy buttons, hashtags) is CSS confusing to you? go to W3Schools

Some Important Functions

def create_temp_pdf(uploaded_file):
    """Create a temporary PDF file from uploaded content"""
    temp_dir = Path("temp")
    temp_dir.mkdir(exist_ok=True)

    temp_path = temp_dir / "uploaded_pdf.pdf"
    with open(temp_path, "wb") as f:
        f.write(uploaded_file.getvalue())

    return str(temp_path)


def initialize_session_state():
    """Initialize session state variables"""
    if "tweets" not in st.session_state:
        st.session_state.tweets = None
    if "hashtags" not in st.session_state:
        st.session_state.hashtags = None


def copy_text_and_show_success(text, success_key):
    """Copy text to clipboard and show success message"""
    try:
        pyperclip.copy(text)
        st.success("Copied to clipboard!", icon="✅")
    except Exception as e:
        st.error(f"Failed to copy: {str(e)}")

Here, the create_temp_pdf() method will create a temp directory in the project folder and will put the uploaded PDF there for the entire process.

initialize_session_state() method will check whether the tweets and hashtags are in the Streamlit session or not.

The copy_text_and_show_success() method will use the Pyperclip library to copy the tweets and hashtags to the clipboard and show that the copy was successful.

Main Function

def main():
    initialize_session_state()

    # Header
    st.markdown(
        "<h1 class='main-header'>📄 Content to Twitter Thread 🐦</h1>",
        unsafe_allow_html=True,
    )

    # Create two columns for layout
    col1, col2 = st.columns([1, 1])

    with col1:
        st.markdown("### Upload PDF")
        uploaded_file = st.file_uploader("Drop your PDF here", type=["pdf"])

        if uploaded_file:
            st.success("PDF uploaded successfully!")

            if st.button("Generate Twitter Thread", key="generate"):
                with st.spinner("Generating Twitter thread..."):
                    try:
                        # Get Google API key
                        google_api_key = os.getenv("GOOGLE_API_KEY")
                        if not google_api_key:
                            st.error(
                                "Google API key not found. Please check your .env file."
                            )
                            return

                        # Save uploaded file
                        pdf_path = create_temp_pdf(uploaded_file)

                        # Process PDF and generate thread
                        repurposer = ContentRepurposer()
                        thread = repurposer.process_article(pdf_path)

                        # Store results in session state
                        st.session_state.tweets = thread.tweets
                        st.session_state.hashtags = thread.hashtags

                        # Clean up temporary file
                        os.remove(pdf_path)

                    except Exception as e:
                        st.error(f"Error generating thread: {str(e)}")

    with col2:
        if st.session_state.tweets:
            st.markdown("### Generated Twitter Thread")

            # Copy entire thread section
            st.markdown("#### Copy Complete Thread")
            all_tweets = "\n\n".join(st.session_state.tweets)
            if st.button("📋 Copy Entire Thread"):
                copy_text_and_show_success(all_tweets, "thread")

            # Display individual tweets
            st.markdown("#### Individual Tweets")
            for i, tweet in enumerate(st.session_state.tweets, 1):
                tweet_col1, tweet_col2 = st.columns([4, 1])

                with tweet_col1:
                    st.markdown(
                        f"""
                    <div class='tweet-box'>
                        <p>{tweet}</p>
                    </div>
                    """,
                        unsafe_allow_html=True,
                    )

                with tweet_col2:
                    if st.button("📋", key=f"tweet_{i}"):
                        copy_text_and_show_success(tweet, f"tweet_{i}")

            # Display hashtags
            if st.session_state.hashtags:
                st.markdown("### Suggested Hashtags")

                # Display hashtags with copy button
                hashtags_text = " ".join(st.session_state.hashtags)
                hashtags_col1, hashtags_col2 = st.columns([4, 1])

                with hashtags_col1:
                    hashtags_html = " ".join(
                        [
                            f"<span class='hashtag'>{hashtag}</span>"
                            for hashtag in st.session_state.hashtags
                        ]
                    )
                    st.markdown(hashtags_html, unsafe_allow_html=True)

                with hashtags_col2:
                    if st.button("📋 Copy Tags"):
                        copy_text_and_show_success(hashtags_text, "hashtags")


if __name__ == "__main__":
    main()

If you read this code closely, you will see that Streamlit creates two columns: one for the PDF uploader function and the other for showing the generated tweets.

In the first column, we have done pretty much the same as the previous main.py with some extra markdown, adding buttons for uploading and generating threads using the Streamlit object.

In the second column, Streamlit iterates the tweet list or generated thread, puts each tweet in a tweet box, and creates a copy button for the individual tweet, and in the last, it will show all the hashtags and their copy buttons.

Now the fun part!!

Open your terminal and type

streamlit run .\app.py

If everything is done right It will start a Streamlit application in your default browser.

twitter thread

Now, drag and drop your downloaded PDF on the box, it will automatically upload the PDF to the system, and click on the Generate Twitter Thread button to generate tweets.

Twitter Thread

You can copy full thread or individual tweet using respective copy buttons.

I hope doing hands-on projects like these will help you learn many practical concepts on Generative AI, Python libraries, and programming. Happy Coding, Stay healthy.

All the code used in this article is here.

Conclusion

This project demonstrates the power of combining modern AI technologies to automate content repurposing. By leveraging Gemini-2.0 and ChromaDB, we have created a system that not only saves time but also maintains high-quality output. The modular architecture ensure easy maintenance and extensibility, while the Streamlit interface makes it accessible to non-technical users.

Key Takeaways

  • The project demonstrates successful integration of cutting-edge AI tools for practival content automation.
  • The architecture’s modularity allows for easy maintenance and future enhancements, making it a sustainable solution for content repurposing.
  • The Streamlit interface makes the tool accessible to content creators without technical expertise, bridging the gap between complex AI technology and practical usage.
  • The implementation can handle various content types and volumes, making it suitable for both individual content creators and large organizations.

Frequently Asked Questions

Q1. How does the syste handle long article?

A. The system uses RecursiveCharacterTextSplitter to break down long articles into manageable chunks, which are then embedded and stored in ChromaDB. When generating threads, it retrieves the most relevant chunk using similarity search.

Q2. What’s the optimal temperature setting for Gemini-2.0 in this application?

A. We used a temperature of 0.7, which provided a good balance between creativity and coherence. You can adjust this setting based on specific needs, with higher values (>0.7) producing more creative output and lower values (<0.7) generating more focused content.

Q3. How does the system ensure tweet length compliance?

A. The prompt template explicitly specifies the 280-character limit, and the LLM is trained to respect this constraint. You can add additional validation to ensure compliance programmatically.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

A self-taught, project-driven learner, love to work on complex projects on deep learning, Computer vision, and NLP. I always try to get a deep understanding of the topic which may be in any field such as Deep learning, Machine learning, or Physics. Love to create content on my learning. Try to share my understanding with the worlds.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details