How to Create Your Personalized News Digest Using AI Agents?

Santhosh Reddy Dandavolu Last Updated : 04 Oct, 2024

11 min read

Introduction

The capabilities of large language models (LLMs) are advancing rapidly. They enable us to build a variety of LLM applications. These range from task automation to workflow optimization. One exciting application is using LLMs to create an intelligent news digest or newsletter agent. This agent can pull in relevant content, summarize it, and deliver it in a customized format. It can interact dynamically with external tools and data sources to fetch relevant information. In this article, let us learn how to build a news digest agent for a personalized daily news digest with LangGraph and external tools like News API.

How to Create Your Personalized News Digest Using AI Agents

Overview

Understand the architecture of LangGraph and its key components (State, Nodes, and Edges) to build customizable workflow agents.
Learn how to integrate external APIs like NewsAPI to fetch real-time data for dynamic content generation in newsletters.
Develop the skills to use LLMs for content evaluation by implementing a scoring system that ranks news articles based on quality criteria.
Gain practical knowledge of automating email delivery with curated content using Python’s email-sending libraries.

Brief About LangGraph
Prerequisites
Define the Application Flow
Building the Agent
Frequently Asked Questions

Brief About LangGraph

LangGraph is built on top of LangChain. LangGraph is a framework designed for building dynamic workflows that integrate LLMs with custom logic and tools. This allows for highly customized and complex workflows that combine multiple tools and APIs.

LangGraph consists of three core components:

State: The State contains the data that is shared throughout the application. It can be any Python data structure that can hold the data. We can define it using a State object with different parameters. Alternatively, we can also use pre-built MessagesState which can contain only a list of messages.
Nodes: Nodes are functions that can read and modify the State. These functions take the State as the first argument to read or write to the State. We also have a START node to denote which node will take the user input and be called first and an END node to denote the end of the graph.
Edges: Edges define the flow of data through different nodes. We also have conditional edges which use a function to determine which node to go to next. The advantage of LangGraph is that we can customize the agent in many ways. So, there can be more than one way to build this agent.

As shown in the image, edges connect nodes, and nodes read or write the data in the State.

Also Read: Optimize Your Organisation’s Email Marketing with GenAI Agents

Prerequisites

Before we start building the LLM agent, let’s make sure we have the required keys and passwords.

Accessing an LLM via API

Begin by generating an API key for the LLM you are using. Create a text file with the name ‘.env’. Store this key securely in a .env file to keep it private and easily accessible within your project.

Here’s an example of how a .env file looks like

Fetching News Data

To gather news content, we will use https://newsapi.org/. Sign up for an API key and store it in the same .env file for secure access.

Sending the Email

To send email using Python, we can enable ‘less secure apps’ and store the Gmail password in the .env file. If that option is not available, we can gain access to Gmail by following the steps mentioned here.

Libraries Required

We have used the following versions for the major libraries:

langchain – 0.2.14
langgraph – 0.2.14
langchain-openai – 0.1.14
newsapi-python – 0.2.7

Define the Application Flow

The goal is to query the agent using natural language to gather news on a specific topic and get the newsletter via email. To implement this flow, we will first define three tools to handle each key task and then build the agent to call the LLM and tools.

The three tools are as follows:

Fetching the News: The News API retrieves relevant news articles based on the parsed query.
Scoring the News: The fetched articles are passed to another LLM, which evaluates and scores them for quality. The output is a list of articles sorted by their quality score.
Delivering the News: The top-scoring articles are formatted into a well-readable email and sent to the user.

Now we can start defining the functions.

Get News

Import the necessary libraries and load the .env file

import os 
import json
import pandas as pd
from datetime import datetime, timedelta
from IPython.display import Image, display
from typing import List, Literal, Optional, TypedDict, Annotated
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv('/.env')

# alternative to the .env file we can also use the .txt file as follows
with open('mykey.txt', 'r') as file:
    openai_key = file.read()
    
os.environ['OPENAI_API_KEY'] = openai_key

Initiate the news_api from NewsApiClient and API key

from newsapi import NewsApiClient

NEWS_API_KEY = os.environ['NEWS_API_KEY']

news_api = NewsApiClient(api_key=NEWS_API_KEY)

Now let’s define the LangChain tool using the ‘tool’ decorator from LangChain

@tool
def get_news(query: str, past_days: int, domains: str):
    """
    Get news on the given parameters like query, past_days, etc.
    Args:
        query: search news about this topic
        past_days: For how many days in the past should we search?
        domains: search news in these resources
    """
    today = datetime.today()
    from_date = today - timedelta(days=past_days)
    news_details = news_api.get_everything(q=query, from_param=from_date, domains=domains,
                                           sort_by='relevancy')
    return news_details

The agent can also sort the articles based on relevancy. Here’s an example of how the output of this function looks like:

‘@tool’ decorator is used to define langchain tool. Then we can bind this tool to the LLM. In the above function, the doc string is also important. That is what gets passed to the LLM as a prompt to have those arguments in the output of the tool-calling LLM.

# initialize the LLM
gpt = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# we can bind the tool to the LLM so that the LLM can return the tool based on the query.
gpt_with_tools = gpt.bind_tools([get_news])

Score News

The score_news function processes news articles by scoring them based on predefined criteria. Then the function returns a sorted list of the highest-quality articles.

Import the required methods

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.messages import HumanMessage

Let us define the function

def score_news(news_details: dict):
    """
    Calculate score for news_articles and sort them by the score.
        news_details: all the news articles    
    
    """
    # access the last message of the state for the articles.
    # passing all the articles to the LLM will increase the cost. 
    # we can choose to score only some articles.
    json_articles = json.loads(news_details['messages'][-1].content)['articles']
    if len(json_articles) > 15:
        articles = json_articles[:15]
    else:
        articles = json_articles
    
    # system prompt to guide the LLM to score the articles.
    system_prompt = """
    You are a news quality evaluator.
    I will provide you with a news article, with a title, description, and truncated content and other details. 
    Analyze and score the news article based on the following criteria:

    Clarity: How well the article conveys the message in a concise and understandable manner.
        Scale: 1 (unclear) to 25 (very clear)

    Credibility: Based on the description and other details provided, how likely is the article to be credible and factually accurate?
        Scale: 1 (not credible) to 25 (highly credible)

    Engagement potential: How likely the article is to capture the reader's attention or provoke further thought.
        Scale: 1 (not engaging) to 25 (very engaging)

    Impact: How significant or influential the article is in terms of its potential societal, technological, or political consequences.
        Scale: 1 (minimal impact) to 25 (high impact)

    Provide the total score out of 100 for the news article, adding the scores for each of the above criteria.

    You will be evaluating a lot news articles. So, score them such that we can sort all of them later.

    """
    prompt_template = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{news}")])

    
    # define pydantic class to get the output in a structured format.
   
    class News(BaseModel):
        """News scoring system"""
    
        total_score: int = Field(description='total score for the news article')
        
        source: str = Field(description="The source of the news")
        author: Optional[str] = Field(default=None, description="The author to the news")
        
        title: str = Field(description="The title of the news")
        description: str = Field(description="The description to the news")
        
        url: str = Field(description="The url of the news")
        urlToImage: Optional[str] = Field(default=None, description="The image url of the news")

    # GPT 4o performs better at scoring but more costly.
    gpt_4o = ChatOpenAI(model='gpt-4o', temperature=0)
    structured_gpt = gpt_4o.with_structured_output(News)
    chain = prompt_template | structured_gpt
    
    # send each article to the LLM to get the score with the other details.
    results = [chain.invoke({'news': article}).dict() for article in articles]

    # sort the articles by total score.
    df = pd.DataFrame(results).sort_values(by='total_score', ascending=False)
    
    return {"messages": [HumanMessage(content=df.to_dict(orient='records'))]}

The function takes the state as the input with the name as news_details. Since the state has all the messages, we can access the last message for the articles. We can choose to score only some articles from the top to save the costs. We can try different system prompts to get the best scoring system.

It is easier to process the data if the output is in a defined format. So, we can use LLM with structured output, where the structure is defined using the Pydantic class.

Then we can score each article and store them in a dataframe. Once we sort the articles using the total score and add them as a message to the state.

Explanation

1. Input

The function receives the state object as input, which contains all messages. The latest message from this state holds the news articles. To minimize costs, instead of scoring all articles, we can limit the number of articles.

2. Scoring Process

We provide a detailed system prompt to the LLM, instructing it to score each article based on the criteria given in the system prompt.

The LLM evaluates each article based on the criteria defined in the system prompt and assigns a total score out of 100, adding scores of each criterion.

3. Structured Output

To ensure the output is structured and easy to process, we define a Pydantic model (News). This model includes fields like `total_score`, `title`, `description`, and `url`. By using this structured format, the LLM can return consistent, well-organized results.

4. LLM Integration

We use GPT-4o, known for its accuracy in structured tasks, to score the articles. It is found that GPT-4o is better than GPT-4o-mini in rating the articles. Each article is passed through the LLM, and the results are converted into a dictionary format using Pydantic.

5. Sorting and Output

After scoring all the articles, we store them in a Pandas DataFrame, sort them by their `total_score` in descending order. Then we can return the sorted list as a message to the State, ready to be used in the next part of the workflow.

Send Email

The send_email function takes a list of sorted news articles, generates an HTML email, and sends it to the recipient.

Import the libraries

import smtplib, ssl
import base64
import email

define the send_email function

def send_email(sorted_news):
 
    # get the sorted news from the last message of the state.
    articles = sorted_news['messages'][-1].content
    
    # If the news_article has image, we can display it in the email.
    news_items_html = ""
    for article in articles[:10]:
        if article['urlToImage'] is not None:
            news_items_html += f"""
            <div class="news-item">
                <img src="{article['urlToImage']}" alt="{article['title']}">
                <div>
                    <h3><a href="{article['url']}">{article['title']}</a></h3>
                    <p>{article['description']}</p>
                </div>
            </div>
            """
        else:
            news_items_html += f"""
            <div class="news-item">
                <div>
                    <h3><a href="{article['url']}">{article['title']}</a></h3>
                    <p>{article['description']}</p>
                </div>
            </div>
            """
            
    # CSS for styling the HTML message. we add the above 'news_items_html' here.
    html = f"""
        <html>
        <head>
            <style>
                body {{
                    font-family: Arial, sans-serif;
                    background-color: #c4c4c4;
                    margin: 0;
                    padding: 0;
                }}
                .container {{
                    width: 80%;
                    max-width: 600px;
                    margin: 0 auto;
                    background-color: #ffffff;
                    padding: 20px;
                    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
                }}
                h1 {{
                    text-align: center;
                    color: #333;
                }}
                .news-item {{
                    display: flex;
                    align-items: center;
                    justify-content: space-between;
                    border-bottom: 1px solid #eeeeee;
                    padding: 15px 0;
                }}
                .news-item h3 {{
                    margin: 0;
                    font-size: 16px;
                    color: #007BFF;
                    margin-left: 5px;
                }}
                .news-item p {{
                    font-size: 14px;
                    color: #666666;
                    margin: 5px 0;
                    margin-left: 5px;
                }}
                .news-item a {{
                    color: #007BFF;
                    text-decoration: none;
                }}
                .news-item img {{
                    width: 100px;
                    height: 100px;
                    object-fit: cover;
                    border-radius: 8px;
                }}
                .footer {{
                    margin-top: 20px;
                    text-align: center;
                    font-size: 12px;
                    color: #999999;
                }}
            </style>
        </head>
        <body>
            <div class="container">
                <h1>Curated News</h1>
                {news_items_html}
                <div class="footer">
                    <p>This is your personalized newsletter.</p>
                </div>
            </div>
        </body>
        </html>
    """
    
    port = 465  # For SSL

    sender_email = "[email protected]"
    password = os.environ['GMAIL_PASSWORD']
    
    context = ssl.create_default_context()
 
    # add the content for the email
    mail = email.message.EmailMessage()
    mail['To'] = "[email protected]"
    mail['From'] = "[email protected]"
    mail['Subject'] = "News Digest"
    mail.set_content(html, subtype='html')

    
    with smtplib.SMTP_SSL("smtp.gmail.com", port, context=context) as server:
        server.login(sender_email, password)
        server.send_message(mail)

Explanation

1. Extracting Sorted News

The function starts by accessing the sorted news articles from the last message in the State. We limit the number of articles displayed in the email to the top 10.

2. Generating HTML Content

The function dynamically constructs the HTML for each news article. If an article includes an image (`urlToImage`), the image is embedded in the email next to the article’s title, link, and description. Otherwise, only the title and description are displayed. This HTML block (`news_items_html`) is generated using a loop that processes each article.

3. HTML and CSS Styling

The HTML email is styled using embedded CSS to ensure a visually appealing layout. The styles cover:

Container: The main email content is wrapped in a centered container with a white background and subtle shadow.
News Items: Each news article is displayed with its title (as a clickable link), description, and optionally an image. The layout uses flexbox to align the image and text side by side, with a border separating each news item.

4. Composing the Email

The email is set up using Python’s `email.message.EmailMessage` class. The HTML content, subject line (“News Digest”), sender, and recipient are specified. The HTML is included as the main content using `mail.set_content(html, subtype=’html’)`.

5. Sending the Email

The function uses Gmail’s SMTP server to send the email securely via SSL (port 465). The sender’s Gmail credentials are fetched from the environment variable `GMAIL_PASSWORD` to avoid hardcoding sensitive information. After logging into the SMTP server, the email is sent to the recipient.

Building the Agent

Let us build the agent based on the tools and functions defined above.

Step 1. Defining functions to call the models and tools.

from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, MessagesState, START, END

# function to call the model which return the tool based on the query.
def call_model(state: MessagesState):
    messages = state["messages"]
    response = gpt_with_tools.invoke(messages)
    return {"messages": [response]}
    
# if the last message from the above LLM is tool_calls then we return "tools"
def call_tools(state: MessagesState) -> Literal["tools", END]:
    messages = state["messages"]
    last_message = messages[-1]
    if last_message.tool_calls:
        return "tools"
    return END

Step 2. Building the workflow graph. Now we can use all the defined functions to build the agent.

#create a tool node with function so that we can use this in the graph. 
get_news_tool = ToolNode([get_news])


workflow = StateGraph(MessagesState)

# We start the agent from the call_model function.
workflow.add_node("LLM", call_model)
workflow.add_edge(START, "LLM")

# Add the get_news_tool, which is called from the above LLM based on the query.
workflow.add_node("tools", get_news_tool)
workflow.add_conditional_edges("LLM", call_tools)

# then we connect to the score_news function from get_news function
workflow.add_node("score", score_news)
workflow.add_edge("tools", "score")

# then we connect to the send_email function from score_news function
workflow.add_node("mail", send_email)
workflow.add_edge("score", "mail")

# we can end with the agent after sending the mail
workflow.add_edge("mail", END)

Step 3. Compiling the graph.

agent = workflow.compile()
display(Image(agent.get_graph().draw_mermaid_png()))

Now we can call the agent with a query.

let’s use a query that has fewer news to print the outputs at each step of the agent.

query = "What's the news on the Indian cricket team in the past month?"

# this query will go the START node.
inputs = {"messages": [("user", query)]}

async for chunk in agent.astream(inputs, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

The output will be in the format shown below. If no articles are returned, we can change the query.

As we can see, we start with the query. The LLM will then call the tool ‘get_news’. Then, the tool returns all the articles. The ‘score_news’ function will then process them and output a list of articles with scores. Then ‘send_email’ function sends the email, though there is no output in the state.

In this way, we can query the agent about any topic and get an email with curated news.

Conclusion

Building a newsletter agent using LangGraph and LLMs offers a powerful way to automate news curation and delivery. By combining real-time data, intelligent scoring, and personalized email delivery, this approach streamlines the creation of customized newsletters, enhancing reader engagement and content relevance effortlessly.

Frequently Asked Questions

Q1. What is LangGraph, and how does it work?

A. LangGraph is a framework for building dynamic workflows that integrate large language models (LLMs) with custom logic. It allows developers to define workflows as graphs using States, Nodes, and Edges, where each Node represents a function or task, and Edges define the flow of data between these tasks.

Q2. What are the main components of LangGraph?

A. LangGraph consists of three core components: State, which holds data shared across the application; Nodes, which represent individual functions that read or modify the State; and Edges, which define the flow of data between Nodes. Conditional Edges allow for flexible, decision-based workflows.

Q3. Can LangGraph integrate external APIs and tools?

A. Yes, LangGraph can integrate external APIs and tools. You can define Nodes to handle specific tasks, such as making API calls or interacting with third-party services, and then use these Nodes within the workflow to create dynamic, real-time applications.

Q4. How does LangGraph handle conditional workflows?

A. LangGraph allows you to define conditional Edges, which use a function to determine the next step in the workflow. This feature makes it easy to handle complex, decision-based scenarios where the flow depends on specific conditions or user input.

Santhosh Reddy Dandavolu

I am working as an Associate Data Scientist at Analytics Vidhya, a platform dedicated to building the Data Science ecosystem. My interests lie in the fields of Natural Language Processing (NLP), Deep Learning, and AI Agents.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Create Your Personalized News Digest Using AI Agents?

Introduction

Overview

Table of Contents

Brief About LangGraph

Prerequisites

Accessing an LLM via API

Fetching News Data

Sending the Email

Libraries Required

Define the Application Flow

Get News

Score News

Explanation

Send Email

Explanation

Building the Agent

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang