In today’s digital landscape, content repurposing has become crucial for maximizing reach and engagement. One effective strategy is transforming long-form content like blog posts into engaging Twitter threads. However, manually creating these threads can be time-consuming and challenging. In this article, we’ll explore how to build an application to automate blog to Twitter thread creation using Google’s Gemini-2.0 LLM, ChromaDB, and Streamlit.
This article was published as a part of the Data Science Blogathon.
Gemini-2.0 is Google’s latest multimodal Large Language Model (LLM), representing a significant advancement in AI capabilities. It is now available as Gemini-2.0-flash-exp API in Vertext AI Studio. It offers improved performance in areas like:
For our project, we are specifically using the gemini-2.0-flash-exp model API, which is optimized for quick response while maintaining high-quality output.
ChromaDB is an open-source embedding database that excels at storing and retrieving vector embeddings. It is a high-performance database designed for efficient storing, searching, and managing embeddings generated by AI models. It enables similarity searches by indexing and comparing vectors based on their proximity to other similar vectors in multidimensional space.
In our application, ChromaDB is the backbone for storing and retrieving relevant chunks of text based on semantic similarity, enabling more contextual and accurate thread generation.
Streamlit is an open-source Python library designed to quickly build interactive and data-driven web applications for AI/ML projects. Its focus on simplicity enables developers to create visually appealing and functional apps with minimal effort.
Key Features:
Application of StreamLit
Streamlit is widley used for building bashboards, exploratory data analysis tools, AI/ML application prototypes. Its simplicity and interactivity makes it ideal for rapid prototying and sharing insights with non-technical stakeholders. We are using streamlit for desiging the interface for the our application.
The primary motivation behind automating tweet thread generation include:
To set up the project environment, follow these steps:
#create a new conda env
conda create -n tweet-gen python=3.11
conda activate tweet-gen
Install required packages
pip install langchain langchain-community langchain-google-genai
pip install chromadb streamlit python-dotenv pypdf pydantic
Now create a project folder named BlogToTweet or whatever you wish.
Also, create a .env file in your project root. Get your GOOGLE API KEY from here and put it in the .env file.
GOOGLE_API_KEY="<your API KEY>"
We are all set up to dive into the main implementation part.
In our project, there are four important files each having its functionality for better development.
We will start with implementing Pydantic data models in the models.py file. What is Pydantic? read this.
from typing import Optional, List
from pydantic import BaseModel
class ArticleContent(BaseModel):
title: str
content: str
author: Optional[str]
url: str
class TwitterThread(BaseModel):
tweets: List[str]
hashtags: List[str]
It is a simple yet important model that will give the article content and all the tweets a consistent structure.
The ContentRepurposer handles the core functionality of the application. Here is the skeletal structure of that class.
# services.py
import os
from dotenv import load_dotenv
from typing import List
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from models import ArticleContent, TwitterThread
class ContentRepurposer:
def __init__(self, content):
pass
def process_pdf(self, pdf_path: str) -> ArticleContent:
pass
def get_relevant_chunk(self, query: str, k: int = 3) -> List[str]:
pass
def generate_twitter_thread(self, article: ArticleContent):
pass
def process_article(self, pdf_path: str):
pass
In the initial method, we will put all important parameters of the class
def __init__(self):
from pydantic import SecretStr
google_api_key = os.getenv("GOOGLE_API_KEY")
if google_api_key is None:
raise ValueError("GOOGLE_API_KEY environment variable is not set")
_google_api_key = SecretStr(google_api_key)
# Initialize Gemini model and embeddings
self.embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001",
)
self.llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash-exp",
temperature=0.7)
# Initialize text splitter
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
Here, we use Pydantic SecretStr for the secure use of the API_KEY, for embedding our articles we use the GoogleGenerativeAIEmbeddings function using the embedding-001 model. To create the tweets from the article we will use the ChatGoogleGenerativeAI function with the latest Gemini-2.0-flash-exp model. RecursiveCharacterTextSplitter is used for splitting a large document into parts here we split the document in chunk_size 1000 with 200 character overlap.
The system processes PDFs using PyPDFLoader from LangChain and implements text chunking.
def process_pdf(self, pdf_path: str) -> ArticleContent:
"""Process local PDF and create embeddings"""
# Load PDF
loader = PyPDFLoader(pdf_path)
pages = loader.load()
# Extract text
text = " ".join(page.page_content for page in pages)
# Split text into chunks
chunks = self.text_splitter.split_text(text)
# Create and store embeddings in Chroma
self.vectordb = Chroma.from_texts(
texts=chunks,
embedding=self.embeddings,
persist_directory="./data/chroma_db"
)
# Extract title and author
lines = [line.strip() for line in text.split("\n") if line.strip()]
title = lines[0] if lines else "Untitled"
author = lines[1] if len(lines) > 1 else None
return ArticleContent(
title=title,
content=text,
author=author,
url=pdf_path
)
In the above code, we implement the PDF processing functionality of the application.
def get_relevant_chunks(self, query: str, k: int = 3) -> List[str]:
"""Retrieve relevant chunks from vector database"""
results = self.vectordb.similarity_search(query, k=k)
return [doc.page_content for doc in results]
This code retrieves the top k (default 3) most relevant text chunks from the vector database based on similarity to the given query.
This method is the most important because here we will use all the generative AI, embedding, and prompts together to generate the Thread from the client’s PDF file.
def generate_twitter_thread(self, article: ArticleContent) -> TwitterThread:
"""Generate Twitter thread using Gemini"""
# First, get the most relevant chunks for different aspects
intro_chunks = self.get_relevant_chunks("introduction and main points")
technical_chunks = self.get_relevant_chunks("technical details and implementation")
conclusion_chunks = self.get_relevant_chunks("conclusion and key takeaways")
thread_prompt = PromptTemplate(
input_variables=["title", "intro", "technical", "conclusion"],
template="""
Write an engaging Twitter thread (8-10 tweets) summarizing this technical article in an approachable and human-like style.
Title: {title}
Introduction Context:
{intro}
Technical Details:
{technical}
Key Takeaways:
{conclusion}
Guidelines:
1. Start with a hook that grabs attention (e.g., a surprising fact, bold statement, or thought-provoking question).
2. Use a conversational tone and explain complex details simply, without jargon.
3. Include concise tweets under 280 characters, following the 1/n numbering format.
4. Break down the key insights logically, and make each tweet build curiosity for the next one.
5. Include relevant examples, analogies, or comparisons to aid understanding.
6. End the thread with a strong conclusion and a call to action (e.g., "Read the full article," "Follow for more insights").
7. Make it relatable, educational, and engaging.
Output format:
- A numbered list of tweets, with each tweet on a new line.
- After the tweets, suggest 3-5 hashtags that summarize the thread, starting with #.
"""
)
chain = LLMChain(llm=self.llm, prompt=thread_prompt)
result = chain.run({
"title": article.title,
"intro": "\n".join(intro_chunks),
"technical": "\n".join(technical_chunks),
"conclusion": "\n".join(conclusion_chunks)
})
# Parse the result into tweets and hashtags
lines = result.split("\n")
tweets = [line.strip() for line in lines if line.strip() and not line.strip().startswith("#")]
hashtags = [tag.strip() for tag in lines if tag.strip().startswith("#")]
# Ensure we have at least one tweet and hashtag
if not tweets:
tweets = ["Thread about " + article.title]
if not hashtags:
hashtags = ["#AI", "#TechNews"]
return TwitterThread(tweets=tweets, hashtags=hashtags)
Let’s understand what is happening in the above code step by step
This method processes a PDF file to extract its content and generates a Twitter thread summarizing it. and last it will return a Twitter Thread.
def process_article(self, pdf_path: str) -> TwitterThread:
"""Main method to process article and generate content"""
try:
article = self.process_pdf(pdf_path)
thread = self.generate_twitter_thread(article)
return thread
except Exception as e:
print(f"Error processing article: {str(e)}")
raise
Upto here We implemented all the necessary code for this project, now there are two ways we can proceed further.
If you don’t want to test the application in terminal mode then you can skip the Main file implementation and go directly to the Streamlit Application implementation.
Now, we put together all the modules to test the application.
import os
from dotenv import load_dotenv
from services import ContentRepurposer
def main():
# Load environment variables
load_dotenv()
google_api_key = os.getenv("GOOGLE_API_KEY")
if not google_api_key:
raise ValueError("GOOGLE_API_KEY environment variable not found")
# Initialize repurposer
repurposer = ContentRepurposer()
# Path to your local PDF
# pdf_path = "data/guide_to_jax.pdf"
pdf_path = "data/build_llm_powered_app.pdf"
try:
thread = repurposer.process_article(pdf_path)
print("Generated Twitter Thread:")
for i, tweet in enumerate(thread.tweets, 1):
print(f"\nTweet {i}/{len(thread.tweets)}:")
print(tweet)
print("\nSuggested Hashtags:")
print(" ".join(thread.hashtags))
except Exception as e:
print(f"Failed to process article: {str(e)}")
if __name__ == "__main__":
main()
Here, you can see that it simply imports all the modules, Check the GOOGLE_API_KEY availability, initiates ContentRepuposer() class, and then in the try block creates a thread by calling the process_article() method from the repurposer object. At the last, some printing methods for tweets printing on the terminal and the Exception handling.
To test the application, create a folder named data in your project root and put your downloaded PDF there. To download the article from AnalyticsVidya, go to any article click the download button, and download it.
Now on your terminal,
python main.py
Example Blog 1 Output
Example Blog 2 Output
I think you get the idea of how beautiful the application is! Let’s make it more aesthetically practical.
Now we will do pretty much the same as above in a more UI-centric way.
import os
import streamlit as st
from dotenv import load_dotenv
from services import ContentRepurposer
import pyperclip
from pathlib import Path
# Load environment variables
load_dotenv()
# Set page configuration
st.set_page_config(page_title="Content Repurposer", page_icon="🐦", layout="wide")
# Custom CSS
st.markdown(
"""
<style>
.tweet-box {
background-color: #181211;
border: 1px solid #e1e8ed;
border-radius: 10px;
padding: 15px;
margin: 10px 0;
}
.copy-button {
background-color: #1DA1F2;
color: white;
border: none;
border-radius: 5px;
padding: 5px 10px;
cursor: pointer;
}
.main-header {
color: #1DA1F2;
text-align: center;
}
.hashtag {
color: #1DA1F2;
background-color: #E8F5FE;
padding: 5px 10px;
border-radius: 15px;
margin: 5px;
display: inline-block;
}
</style>
""",
unsafe_allow_html=True,
)
Here, we have made some CSS styling for the web pages (tweets, copy buttons, hashtags) is CSS confusing to you? go to W3Schools
def create_temp_pdf(uploaded_file):
"""Create a temporary PDF file from uploaded content"""
temp_dir = Path("temp")
temp_dir.mkdir(exist_ok=True)
temp_path = temp_dir / "uploaded_pdf.pdf"
with open(temp_path, "wb") as f:
f.write(uploaded_file.getvalue())
return str(temp_path)
def initialize_session_state():
"""Initialize session state variables"""
if "tweets" not in st.session_state:
st.session_state.tweets = None
if "hashtags" not in st.session_state:
st.session_state.hashtags = None
def copy_text_and_show_success(text, success_key):
"""Copy text to clipboard and show success message"""
try:
pyperclip.copy(text)
st.success("Copied to clipboard!", icon="✅")
except Exception as e:
st.error(f"Failed to copy: {str(e)}")
Here, the create_temp_pdf() method will create a temp directory in the project folder and will put the uploaded PDF there for the entire process.
initialize_session_state() method will check whether the tweets and hashtags are in the Streamlit session or not.
The copy_text_and_show_success() method will use the Pyperclip library to copy the tweets and hashtags to the clipboard and show that the copy was successful.
def main():
initialize_session_state()
# Header
st.markdown(
"<h1 class='main-header'>📄 Content to Twitter Thread 🐦</h1>",
unsafe_allow_html=True,
)
# Create two columns for layout
col1, col2 = st.columns([1, 1])
with col1:
st.markdown("### Upload PDF")
uploaded_file = st.file_uploader("Drop your PDF here", type=["pdf"])
if uploaded_file:
st.success("PDF uploaded successfully!")
if st.button("Generate Twitter Thread", key="generate"):
with st.spinner("Generating Twitter thread..."):
try:
# Get Google API key
google_api_key = os.getenv("GOOGLE_API_KEY")
if not google_api_key:
st.error(
"Google API key not found. Please check your .env file."
)
return
# Save uploaded file
pdf_path = create_temp_pdf(uploaded_file)
# Process PDF and generate thread
repurposer = ContentRepurposer()
thread = repurposer.process_article(pdf_path)
# Store results in session state
st.session_state.tweets = thread.tweets
st.session_state.hashtags = thread.hashtags
# Clean up temporary file
os.remove(pdf_path)
except Exception as e:
st.error(f"Error generating thread: {str(e)}")
with col2:
if st.session_state.tweets:
st.markdown("### Generated Twitter Thread")
# Copy entire thread section
st.markdown("#### Copy Complete Thread")
all_tweets = "\n\n".join(st.session_state.tweets)
if st.button("📋 Copy Entire Thread"):
copy_text_and_show_success(all_tweets, "thread")
# Display individual tweets
st.markdown("#### Individual Tweets")
for i, tweet in enumerate(st.session_state.tweets, 1):
tweet_col1, tweet_col2 = st.columns([4, 1])
with tweet_col1:
st.markdown(
f"""
<div class='tweet-box'>
<p>{tweet}</p>
</div>
""",
unsafe_allow_html=True,
)
with tweet_col2:
if st.button("📋", key=f"tweet_{i}"):
copy_text_and_show_success(tweet, f"tweet_{i}")
# Display hashtags
if st.session_state.hashtags:
st.markdown("### Suggested Hashtags")
# Display hashtags with copy button
hashtags_text = " ".join(st.session_state.hashtags)
hashtags_col1, hashtags_col2 = st.columns([4, 1])
with hashtags_col1:
hashtags_html = " ".join(
[
f"<span class='hashtag'>{hashtag}</span>"
for hashtag in st.session_state.hashtags
]
)
st.markdown(hashtags_html, unsafe_allow_html=True)
with hashtags_col2:
if st.button("📋 Copy Tags"):
copy_text_and_show_success(hashtags_text, "hashtags")
if __name__ == "__main__":
main()
If you read this code closely, you will see that Streamlit creates two columns: one for the PDF uploader function and the other for showing the generated tweets.
In the first column, we have done pretty much the same as the previous main.py with some extra markdown, adding buttons for uploading and generating threads using the Streamlit object.
In the second column, Streamlit iterates the tweet list or generated thread, puts each tweet in a tweet box, and creates a copy button for the individual tweet, and in the last, it will show all the hashtags and their copy buttons.
Now the fun part!!
Open your terminal and type
streamlit run .\app.py
If everything is done right It will start a Streamlit application in your default browser.
Now, drag and drop your downloaded PDF on the box, it will automatically upload the PDF to the system, and click on the Generate Twitter Thread button to generate tweets.
You can copy full thread or individual tweet using respective copy buttons.
I hope doing hands-on projects like these will help you learn many practical concepts on Generative AI, Python libraries, and programming. Happy Coding, Stay healthy.
All the code used in this article is here.
This project demonstrates the power of combining modern AI technologies to automate content repurposing. By leveraging Gemini-2.0 and ChromaDB, we have created a system that not only saves time but also maintains high-quality output. The modular architecture ensure easy maintenance and extensibility, while the Streamlit interface makes it accessible to non-technical users.
A. The system uses RecursiveCharacterTextSplitter to break down long articles into manageable chunks, which are then embedded and stored in ChromaDB. When generating threads, it retrieves the most relevant chunk using similarity search.
A. We used a temperature of 0.7, which provided a good balance between creativity and coherence. You can adjust this setting based on specific needs, with higher values (>0.7) producing more creative output and lower values (<0.7) generating more focused content.
A. The prompt template explicitly specifies the 280-character limit, and the LLM is trained to respect this constraint. You can add additional validation to ensure compliance programmatically.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.