How to Create Your Personalized News Digest Using AI Agents?

Santosh 18 Sep, 2024
11 min read

Introduction

The capabilities of large language models (LLMs) are advancing rapidly. They enable us to build a variety of LLM applications. These range from task automation to workflow optimization. One exciting application is using LLMs to create an intelligent news digest or newsletter agent. This agent can pull in relevant content, summarize it, and deliver it in a customized format. It can interact dynamically with external tools and data sources to fetch relevant information. In this article, let us learn how to build a news digest agent for a personalized daily news digest with LangGraph and external tools like News API.

How to Create Your Personalized News Digest Using AI Agents

Overview

  • Understand the architecture of LangGraph and its key components (State, Nodes, and Edges) to build customizable workflow agents.
  • Learn how to integrate external APIs like NewsAPI to fetch real-time data for dynamic content generation in newsletters.
  • Develop the skills to use LLMs for content evaluation by implementing a scoring system that ranks news articles based on quality criteria.
  • Gain practical knowledge of automating email delivery with curated content using Python’s email-sending libraries.

Brief About LangGraph

LangGraph is built on top of LangChain. LangGraph is a framework designed for building dynamic workflows that integrate LLMs with custom logic and tools. This allows for highly customized and complex workflows that combine multiple tools and APIs.

LangGraph consists of three core components:

  1. State: The State contains the data that is shared throughout the application. It can be any Python data structure that can hold the data. We can define it using a State object with different parameters. Alternatively, we can also use pre-built MessagesState which can contain only a list of messages.
  2. Nodes: Nodes are functions that can read and modify the State. These functions take the State as the first argument to read or write to the State. We also have a START node to denote which node will take the user input and be called first and an END node to denote the end of the graph.
  3. Edges: Edges define the flow of data through different nodes. We also have conditional edges which use a function to determine which node to go to next. The advantage of LangGraph is that we can customize the agent in many ways. So, there can be more than one way to build this agent.
3 componenets of LangGraph

As shown in the image, edges connect nodes, and nodes read or write the data in the State.

Also Read: Optimize Your Organisation’s Email Marketing with GenAI Agents

Prerequisites

Before we start building the LLM agent, let’s make sure we have the required keys and passwords.

Accessing an LLM via API

Begin by generating an API key for the LLM you are using. Create a text file with the name ‘.env’. Store this key securely in a .env file to keep it private and easily accessible within your project.

Here’s an example of how a .env file looks like

Accessing an LLM via API

Fetching News Data

To gather news content, we will use https://newsapi.org/. Sign up for an API key and store it in the same .env file for secure access.

Sending the Email

To send email using Python, we can enable ‘less secure apps’ and store the Gmail password in the .env file. If that option is not available, we can gain access to Gmail by following the steps mentioned here.

Libraries Required

We have used the following versions for the major libraries:

  • langchain – 0.2.14
  • langgraph – 0.2.14
  • langchain-openai – 0.1.14
  • newsapi-python – 0.2.7

Define the Application Flow

The goal is to query the agent using natural language to gather news on a specific topic and get the newsletter via email. To implement this flow, we will first define three tools to handle each key task and then build the agent to call the LLM and tools.

The three tools are as follows:

  1. Fetching the News: The News API retrieves relevant news articles based on the parsed query.
  2. Scoring the News: The fetched articles are passed to another LLM, which evaluates and scores them for quality. The output is a list of articles sorted by their quality score.
  3. Delivering the News: The top-scoring articles are formatted into a well-readable email and sent to the user.

Now we can start defining the functions.

Get News

Import the necessary libraries and load the .env file

import os 
import json
import pandas as pd
from datetime import datetime, timedelta
from IPython.display import Image, display
from typing import List, Literal, Optional, TypedDict, Annotated
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv('/.env')

# alternative to the .env file we can also use the .txt file as follows
with open('mykey.txt', 'r') as file:
    openai_key = file.read()
    
os.environ['OPENAI_API_KEY'] = openai_key

Initiate the news_api from NewsApiClient and API key

from newsapi import NewsApiClient

NEWS_API_KEY = os.environ['NEWS_API_KEY']

news_api = NewsApiClient(api_key=NEWS_API_KEY)

Now let’s define the LangChain tool using the ‘tool’ decorator from LangChain

@tool
def get_news(query: str, past_days: int, domains: str):
    """
    Get news on the given parameters like query, past_days, etc.
    Args:
        query: search news about this topic
        past_days: For how many days in the past should we search?
        domains: search news in these resources
    """
    today = datetime.today()
    from_date = today - timedelta(days=past_days)
    news_details = news_api.get_everything(q=query, from_param=from_date, domains=domains,
                                           sort_by='relevancy')
    return news_details

The agent can also sort the articles based on relevancy. Here’s an example of how the output of this function looks like:

News details

‘@tool’ decorator is used to define langchain tool. Then we can bind this tool to the LLM. In the above function, the doc string is also important. That is what gets passed to the LLM as a prompt to have those arguments in the output of the tool-calling LLM.

# initialize the LLM
gpt = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# we can bind the tool to the LLM so that the LLM can return the tool based on the query.
gpt_with_tools = gpt.bind_tools([get_news])

Score News

The score_news function processes news articles by scoring them based on predefined criteria. Then the function returns a sorted list of the highest-quality articles.

Import the required methods

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.messages import HumanMessage

Let us define the function

def score_news(news_details: dict):
    """
    Calculate score for news_articles and sort them by the score.
        news_details: all the news articles    
    
    """
    # access the last message of the state for the articles.
    # passing all the articles to the LLM will increase the cost. 
    # we can choose to score only some articles.
    json_articles = json.loads(news_details['messages'][-1].content)['articles']
    if len(json_articles) > 15:
        articles = json_articles[:15]
    else:
        articles = json_articles
    
    # system prompt to guide the LLM to score the articles.
    system_prompt = """
    You are a news quality evaluator.
    I will provide you with a news article, with a title, description, and truncated content and other details. 
    Analyze and score the news article based on the following criteria:

    Clarity: How well the article conveys the message in a concise and understandable manner.
        Scale: 1 (unclear) to 25 (very clear)

    Credibility: Based on the description and other details provided, how likely is the article to be credible and factually accurate?
        Scale: 1 (not credible) to 25 (highly credible)

    Engagement potential: How likely the article is to capture the reader's attention or provoke further thought.
        Scale: 1 (not engaging) to 25 (very engaging)

    Impact: How significant or influential the article is in terms of its potential societal, technological, or political consequences.
        Scale: 1 (minimal impact) to 25 (high impact)

    Provide the total score out of 100 for the news article, adding the scores for each of the above criteria.

    You will be evaluating a lot news articles. So, score them such that we can sort all of them later.

    """
    prompt_template = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{news}")])

    
    # define pydantic class to get the output in a structured format.
   
    class News(BaseModel):
        """News scoring system"""
    
        total_score: int = Field(description='total score for the news article')
        
        source: str = Field(description="The source of the news")
        author: Optional[str] = Field(default=None, description="The author to the news")
        
        title: str = Field(description="The title of the news")
        description: str = Field(description="The description to the news")
        
        url: str = Field(description="The url of the news")
        urlToImage: Optional[str] = Field(default=None, description="The image url of the news")

    # GPT 4o performs better at scoring but more costly.
    gpt_4o = ChatOpenAI(model='gpt-4o', temperature=0)
    structured_gpt = gpt_4o.with_structured_output(News)
    chain = prompt_template | structured_gpt
    
    # send each article to the LLM to get the score with the other details.
    results = [chain.invoke({'news': article}).dict() for article in articles]

    # sort the articles by total score.
    df = pd.DataFrame(results).sort_values(by='total_score', ascending=False)
    
    return {"messages": [HumanMessage(content=df.to_dict(orient='records'))]}

The function takes the state as the input with the name as news_details. Since the state has all the messages, we can access the last message for the articles. We can choose to score only some articles from the top to save the costs. We can try different system prompts to get the best scoring system.

It is easier to process the data if the output is in a defined format. So, we can use LLM with structured output, where the structure is defined using the Pydantic class.

Then we can score each article and store them in a dataframe. Once we sort the articles using the total score and add them as a message to the state.

Explanation

1. Input

The function receives the state object as input, which contains all messages. The latest message from this state holds the news articles. To minimize costs, instead of scoring all articles, we can limit the number of articles.

2. Scoring Process

We provide a detailed system prompt to the LLM, instructing it to score each article based on the criteria given in the system prompt.

The LLM evaluates each article based on the criteria defined in the system prompt and assigns a total score out of 100, adding scores of each criterion.

3. Structured Output

To ensure the output is structured and easy to process, we define a Pydantic model (News). This model includes fields like `total_score`, `title`, `description`, and `url`. By using this structured format, the LLM can return consistent, well-organized results.

4. LLM Integration

We use GPT-4o, known for its accuracy in structured tasks, to score the articles. It is found that GPT-4o is better than GPT-4o-mini in rating the articles. Each article is passed through the LLM, and the results are converted into a dictionary format using Pydantic.

5. Sorting and Output

After scoring all the articles, we store them in a Pandas DataFrame, sort them by their `total_score` in descending order.  Then we can return the sorted list as a message to the State, ready to be used in the next part of the workflow.

Send Email

The send_email function takes a list of sorted news articles, generates an HTML email, and sends it to the recipient.

Import the libraries

import smtplib, ssl
import base64
import email

define the send_email function

def send_email(sorted_news):
 
    # get the sorted news from the last message of the state.
    articles = sorted_news['messages'][-1].content
    
    # If the news_article has image, we can display it in the email.
    news_items_html = ""
    for article in articles[:10]:
        if article['urlToImage'] is not None:
            news_items_html += f"""
            <div class="news-item">
                <img src="{article['urlToImage']}" alt="{article['title']}">
                <div>
                    <h3><a href="{article['url']}">{article['title']}</a></h3>
                    <p>{article['description']}</p>
                </div>
            </div>
            """
        else:
            news_items_html += f"""
            <div class="news-item">
                <div>
                    <h3><a href="{article['url']}">{article['title']}</a></h3>
                    <p>{article['description']}</p>
                </div>
            </div>
            """
            
    # CSS for styling the HTML message. we add the above 'news_items_html' here.
    html = f"""
        <html>
        <head>
            <style>
                body {{
                    font-family: Arial, sans-serif;
                    background-color: #c4c4c4;
                    margin: 0;
                    padding: 0;
                }}
                .container {{
                    width: 80%;
                    max-width: 600px;
                    margin: 0 auto;
                    background-color: #ffffff;
                    padding: 20px;
                    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
                }}
                h1 {{
                    text-align: center;
                    color: #333;
                }}
                .news-item {{
                    display: flex;
                    align-items: center;
                    justify-content: space-between;
                    border-bottom: 1px solid #eeeeee;
                    padding: 15px 0;
                }}
                .news-item h3 {{
                    margin: 0;
                    font-size: 16px;
                    color: #007BFF;
                    margin-left: 5px;
                }}
                .news-item p {{
                    font-size: 14px;
                    color: #666666;
                    margin: 5px 0;
                    margin-left: 5px;
                }}
                .news-item a {{
                    color: #007BFF;
                    text-decoration: none;
                }}
                .news-item img {{
                    width: 100px;
                    height: 100px;
                    object-fit: cover;
                    border-radius: 8px;
                }}
                .footer {{
                    margin-top: 20px;
                    text-align: center;
                    font-size: 12px;
                    color: #999999;
                }}
            </style>
        </head>
        <body>
            <div class="container">
                <h1>Curated News</h1>
                {news_items_html}
                <div class="footer">
                    <p>This is your personalized newsletter.</p>
                </div>
            </div>
        </body>
        </html>
    """
    
    port = 465  # For SSL

    sender_email = "[email protected]"
    password = os.environ['GMAIL_PASSWORD']
    
    context = ssl.create_default_context()
 
    # add the content for the email
    mail = email.message.EmailMessage()
    mail['To'] = "[email protected]"
    mail['From'] = "[email protected]"
    mail['Subject'] = "News Digest"
    mail.set_content(html, subtype='html')

    
    with smtplib.SMTP_SSL("smtp.gmail.com", port, context=context) as server:
        server.login(sender_email, password)
        server.send_message(mail)

Explanation

1. Extracting Sorted News

The function starts by accessing the sorted news articles from the last message in the State. We limit the number of articles displayed in the email to the top 10.

2. Generating HTML Content

The function dynamically constructs the HTML for each news article. If an article includes an image (`urlToImage`), the image is embedded in the email next to the article’s title, link, and description. Otherwise, only the title and description are displayed. This HTML block (`news_items_html`) is generated using a loop that processes each article.

3. HTML and CSS Styling

The HTML email is styled using embedded CSS to ensure a visually appealing layout. The styles cover:

  • Container: The main email content is wrapped in a centered container with a white background and subtle shadow.
  • News Items: Each news article is displayed with its title (as a clickable link), description, and optionally an image. The layout uses flexbox to align the image and text side by side, with a border separating each news item.

4. Composing the Email

The email is set up using Python’s `email.message.EmailMessage` class. The HTML content, subject line (“News Digest”), sender, and recipient are specified. The HTML is included as the main content using `mail.set_content(html, subtype=’html’)`.

5. Sending the Email

The function uses Gmail’s SMTP server to send the email securely via SSL (port 465). The sender’s Gmail credentials are fetched from the environment variable `GMAIL_PASSWORD` to avoid hardcoding sensitive information. After logging into the SMTP server, the email is sent to the recipient.

Building the Agent

Let us build the agent based on the tools and functions defined above.

Step 1. Defining functions to call the models and tools.

from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, MessagesState, START, END

# function to call the model which return the tool based on the query.
def call_model(state: MessagesState):
    messages = state["messages"]
    response = gpt_with_tools.invoke(messages)
    return {"messages": [response]}
    
# if the last message from the above LLM is tool_calls then we return "tools"
def call_tools(state: MessagesState) -> Literal["tools", END]:
    messages = state["messages"]
    last_message = messages[-1]
    if last_message.tool_calls:
        return "tools"
    return END

Step 2. Building the workflow graph. Now we can use all the defined functions to build the agent.

#create a tool node with function so that we can use this in the graph. 
get_news_tool = ToolNode([get_news])


workflow = StateGraph(MessagesState)

# We start the agent from the call_model function.
workflow.add_node("LLM", call_model)
workflow.add_edge(START, "LLM")

# Add the get_news_tool, which is called from the above LLM based on the query.
workflow.add_node("tools", get_news_tool)
workflow.add_conditional_edges("LLM", call_tools)

# then we connect to the score_news function from get_news function
workflow.add_node("score", score_news)
workflow.add_edge("tools", "score")

# then we connect to the send_email function from score_news function
workflow.add_node("mail", send_email)
workflow.add_edge("score", "mail")

# we can end with the agent after sending the mail
workflow.add_edge("mail", END)

Step 3. Compiling the graph.

agent = workflow.compile()
display(Image(agent.get_graph().draw_mermaid_png()))
Workflow

Now we can call the agent with a query.

let’s use a query that has fewer news to print the outputs at each step of the agent.

query = "what's the news on Inidan cricket team in the past month from cricinfo?"

# this query will go the START node.
inputs = {"messages": [("user", query)]}

async for chunk in agent.astream(inputs, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

The output will be in the format shown below. If no articles are returned, we can change the query.

Personalized News Digest Using AI Agents

As we can see, we start with the query. The LLM will then call the tool ‘get_news’. Then, the tool returns all the articles. The ‘score_news’ function will then process them and output a list of articles with scores. Then ‘send_email’ function sends the email, though there is no output in the state.

In this way, we can query the agent about any topic and get an email with curated news.

Conclusion

Building a newsletter agent using LangGraph and LLMs offers a powerful way to automate news curation and delivery. By combining real-time data, intelligent scoring, and personalized email delivery, this approach streamlines the creation of customized newsletters, enhancing reader engagement and content relevance effortlessly.

Frequently Asked Questions

Q1. What is LangGraph, and how does it work?

A. LangGraph is a framework for building dynamic workflows that integrate large language models (LLMs) with custom logic. It allows developers to define workflows as graphs using States, Nodes, and Edges, where each Node represents a function or task, and Edges define the flow of data between these tasks.

Q2. What are the main components of LangGraph?

A. LangGraph consists of three core components: State, which holds data shared across the application; Nodes, which represent individual functions that read or modify the State; and Edges, which define the flow of data between Nodes. Conditional Edges allow for flexible, decision-based workflows.

Q3. Can LangGraph integrate external APIs and tools?

A. Yes, LangGraph can integrate external APIs and tools. You can define Nodes to handle specific tasks, such as making API calls or interacting with third-party services, and then use these Nodes within the workflow to create dynamic, real-time applications.

Q4. How does LangGraph handle conditional workflows?

A. LangGraph allows you to define conditional Edges, which use a function to determine the next step in the workflow. This feature makes it easy to handle complex, decision-based scenarios where the flow depends on specific conditions or user input.

Santosh 18 Sep, 2024

I am working as an Associate Data Scientist at Analytics Vidhya, a platform dedicated to building the Data Science ecosystem. My interests lie in the fields of Deep Learning and Natural Language Processing (NLP).

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,