Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Adarsh Balan Last Updated : 29 Nov, 2024

8 min read

Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge. Picture this: thus, unlike many other extensions that require deep setup and constant coding, Mage AI has a clear and undemanding step-by-step setup and you are welcome to use its clear interface without having to drag and drop items. I’ll also share with you how to import Mage AI with PostgreSQL so that you can create your first data pipeline with Mage AI. Allow me to introduce you the steps that will help make your data processing even better!

Learning Outcomes

Understand how to configure Mage AI for seamless integration with PostgreSQL.
Learn to upload raw data to PostgreSQL and create schemas using pgAdmin4.
Master the process of building and managing data pipelines in Mage AI.
Explore how to set up automated triggers and scheduling for data pipelines.
Gain insights into Mage AI’s advanced features like real-time processing and monitoring.

This article was published as a part of the Data Science Blogathon.

Learning Outcomes
What is Mage AI?
Setup Mage AI with Postgres to Build and Manage Your Data Pipeline
Additional Features of Mage AI
Conclusion
Frequently Asked Questions

What is Mage AI?

Mage AI simplifies the integration of growing data workflows as an open-source tool. With its clean design and app-like interface, data engineers and analysts can easily create data pipelines using one-click options, eliminating the need for coding. Importing, analyzing and manipulating big data is much easier with Mage AI which comes with features such as drag and drop, data transformation, data source compatibility among others. The former enables users to spend their time on the analytics side instead of worrying about the underlying infrastructures to set up. Mage AI also supports Python scripting where one can define custom transformations which make it suitable for both, a technical and non technical user.

Benefits of Using Mage AI with PostgreSQL

Let us look into the benefits of using Mage AI with PostgreSQL.

Streamlined Data Management: Mage AI simplifies data pipeline creation with its drag-and-drop interface, making it easy to load, transform, and export data from PostgreSQL without manual coding.
Enhanced Automation: Automate recurring data tasks, like ETL processes, by setting up triggers and scheduled pipelines, reducing the need for constant manual interventions.
Seamless Integration: Mage AI integrates smoothly with PostgreSQL, enabling users to manage large datasets efficiently and perform complex data operations within the same workflow.
Customizable Transformations: Leverage Python scripting in Mage AI to perform custom data transformations on PostgreSQL data, allowing flexibility for advanced data processing.
Scalable and Reliable: Mage AI efficiently manages pipelines, ensuring smooth handling of both small and large datasets, while PostgreSQL’s scalability supports business growth without performance bottlenecks.
User-Friendly: The intuitive interface makes it accessible to users with varying levels of technical expertise, enabling quicker learning and faster deployment of data solutions.

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Setting up Mage AI with Postgres allows you to seamlessly build and manage powerful data pipelines, automating workflows and simplifying complex data tasks for efficient insights. Let us look into the steps required to set up Mage AI with Postgres.

Step1: Preparing Your Postgres Database

Before diving into Mage AI, upload your raw data files to Postgres using pgAdmin4, and create the correct schema for each file. Here’s how to get started:

Upload Raw Files to Postgres via pgAdmin4

Open pgAdmin4 and connect to your Postgres server.
Create a new database or use an existing one.
Ensure that you add the correct schema for each raw data file.
Upload/Export your data files to the appropriate tables within this schema.

import pandas as pd
import chardet

# Open the file in binary mode and read a sample
with open("expensemaster.csv", 'rb') as file:
    sample = file.read(10000)  # Read first 10,000 bytes as a sample

# Detect encoding
detected = chardet.detect(sample)
print(detected['encoding'])

# Use the detected encoding to read the CSV
try:
    df = pd.read_csv("expensemaster.csv", encoding=detected['encoding'])
except UnicodeDecodeError:
    # If reading fails, try with a common encoding like UTF-8
    df = pd.read_csv("expensemaster.csv", encoding="utf-8")

# Infer data types
dtype_mapping = {
    'object': 'TEXT',
    'int64': 'BIGINT',
    'float64': 'DOUBLE PRECISION',
    'datetime64[ns]': 'TIMESTAMP',
    'bool': 'BOOLEAN'
}

column_definitions = ', '.join([f'"{col}" {dtype_mapping[str(df[col].dtype)]}' for col in df.columns])

# Generate the CREATE TABLE SQL
table_name = 'expensemaster'
create_table_sql = f'CREATE TABLE {table_name} ({column_definitions});'
print(create_table_sql)

Click refresh on the “Tables’ to get the newly created table.

Start the Postgres Service

Make sure the Postgres service is running. You can check this in pgAdmin4 or by using the psql terminal.

Step2: Gathering Postgres Configuration Details

You’ll need specific details to configure Mage AI with Postgres. Here’s what you need and how to find it:

POSTGRES_DBNAME: The name of your Postgres database.
POSTGRES_SCHEMA: The schema where your data files are uploaded.
POSTGRES_USER: The username for your Postgres database.
POSTGRES_PASSWORD: The password for your Postgres database.
POSTGRES_HOST: The host IP address of your Postgres server.
POSTGRES_PORT: Usually 5432 for Postgres.

Step3: Installing Mage AI Using Docker in VS Code

To install Mage AI, we will use Docker Extension in Visual Studio Code (VS Code). Ensure you have Docker Desktop and the Docker extension for VS Code installed.

Install Docker Desktop

Download and install Docker Desktop from here and initialize it.

Install the Docker Extension for VS Code:

Open VS Code and go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window or by pressing Ctrl+Shift+X.
Search for “Docker” and install the Docker extension by Microsoft.

Pull the Mage AI Docker Image

Open a terminal in VS Code and navigate to your project folder.
Run the following command to pull the latest Mage AI Docker image:

docker pull mageai/mageai:latest

Run the Mage AI Docker Image

Once the Mage AI image is pulled, go to the Docker tab in VS Code.
Find the Mage AI image and run it. This will create a new container.
Right-click on the newly created container and select “Open in Browser.”
The Mage AI interface should now load in your default web browser.

Step4: Configuring Mage AI to Connect with Postgres

Configure the database connection in io_config.yaml:

Navigate to the All Files section of your pipeline.
Locate and open the io_config.yaml file.
Add your Postgres connection details as follows

Step4: Configuring Mage AI to Connect with Postgres

Allow Mage AI to Access the Postgres Database

To grant access to the database on your IP address, you need to modify the pg_hba.conf file.
Locate the pg_hba.conf file at C:\Program Files\PostgreSQL\16\data.
Open the file and add the row under the # IPv4 local connections section as shown in the Fig. 4.

Step5: Creating Your First Data Pipeline

Now that Mage AI is configured to connect with Postgres, we can create our first data pipeline. We’ll start by setting up data loader blocks for each dataset and using the drag-and-drop feature to connect them in a flowchart.

Create Data Loader Blocks

For each dataset, create a separate data loader block.
In the Mage AI interface, drag and drop a data loader block onto the canvas for each dataset you need to load from Postgres.
Configure each data loader block with the appropriate connection details and query to fetch the data from Postgres.

Connect the Data Loader Blocks to the Transformer block

Use the drag-and-drop feature to connect the data loader blocks in the flowchart to the next transformer code block. This visual representation helps in understanding the data flow and ensuring all steps are connected correctly.

Creating Data Exporter Blocks

In the Mage AI interface, after configuring your data loader and transformation blocks, add a data exporter block to the canvas.
Choose “Postgres” as the destination for the data under Python.
Provide the necessary connection details to your Postgres database. Write the code to export the transformed data back to the PostgreSQL database.

Step6: Creating Triggers and Scheduling Pipelines

Mage AI offers the ability to create triggers for running your pipeline and scheduling it for regular execution. This ensures your data is always up-to-date without manual intervention.

Creating a Trigger

In Mage AI, you can set up triggers to run your pipeline based on specific events or conditions. For example, you can trigger a pipeline to run whenever new data is added to your Postgres database.
To create a trigger, navigate to the pipeline settings and configure the trigger conditions as needed.

Scheduling the Pipeline

Mage AI supports scheduling pipelines to run at regular intervals. This can be done through the scheduling settings in the Mage AI dashboard.
You can specify the frequency (daily, weekly, etc.) and the time for the pipeline to run.

Additional Features of Mage AI

Mage AI provides several powerful features to automate and enhance your data pipelines:

Integration with Multiple Data Sources: Mage AI also accepts numerous kinds of data inputs: databases, cloud storage, and APIs enabling you to construct diverse and extensive data flows.
Advanced Transformation Capabilities: Based on Python, Mage AI provides you an opportunity to implement custom transformation with the help of decorators which facilitates the process of realization of various data transformation algorithms.
Scalability: Mage AI optimizes your throughput for big data, enabling it to handle increasing amounts of data as they grow.
Monitoring and Alerts: Mage AI provides a strong monitoring and alerting functionality and allows one to monitor the workflow of the pipeline as well as receive notifications on failures.
User-Friendly Interface: The graphical layout of the data pipelines means that users do not have to worry about complicated coding in order to manipulate and transform their data.

They to make Mage AI a tool to automate the data workflows as the data infrastructure so that you do not need to spend much time on it.

Conclusion

Today, information is a valuable asset, making data management essential for organizations. This article provides clear guidance on configuring Mage AI with PostgreSQL, helping you build a robust data pipeline that not only streamlines multiple processes but also significantly boosts productivity. With the software associate, the utilization of Mage AI alongside with robust databases such as PostgreSQL enable users to handle, analyze and make the right decisions in the shortest time possible. As organizations have stepped up efforts in data-driven methodologies and frameworks, technologies such as Mage AI are poised be the dominant models for managing data.

Explore the code behind this article on GitHub!

Frequently Asked Questions

Q1. What is Mage AI?

A. Mage AI is an open-source tool designed to simplify the process of building and managing data workflows. It provides a user-friendly interface and automation features that help data professionals create pipelines without extensive coding knowledge.

Q2. Why use PostgreSQL with Mage AI?

A. PostgreSQL is a powerful, open-source relational database management system known for its robustness and scalability. When paired with Mage AI, it allows users to efficiently store, retrieve, and manipulate large datasets, making it an ideal choice for data pipelines.

Q3. Do I need programming skills to use Mage AI?

A. While some familiarity with programming concepts can be helpful, Mage AI is designed to be user-friendly and accessible to users with varying levels of technical expertise. Many tasks can be accomplished through its intuitive interface.

Q4. Can I integrate other data sources with Mage AI?

A. Yes, Mage AI supports integration with various data sources, allowing users to build comprehensive data pipelines that pull in data from multiple platforms, enhancing the overall data ecosystem.

Q5. Is Mage AI free to use?

A. Mage AI is an open-source tool, which means it is free to use. However, users may incur costs associated with hosting, storage, and other related services, depending on their infrastructure choices.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Adarsh Balan

Hi! I'm Adarsh, a Business Analytics graduate from ISB, currently deep into research and exploring new frontiers. I'm super passionate about data science, AI, and all the innovative ways they can transform industries. Whether it's building models, working on data pipelines, or diving into machine learning, I love experimenting with the latest tech. AI isn't just my interest, it's where I see the future heading, and I'm always excited to be a part of that journey!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Learning Outcomes

Table of contents

What is Mage AI?

Benefits of Using Mage AI with PostgreSQL

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Step1: Preparing Your Postgres Database

Upload Raw Files to Postgres via pgAdmin4

Start the Postgres Service

Step2: Gathering Postgres Configuration Details

Step3: Installing Mage AI Using Docker in VS Code

Install Docker Desktop

Install the Docker Extension for VS Code:

Pull the Mage AI Docker Image

Run the Mage AI Docker Image

Step4: Configuring Mage AI to Connect with Postgres

Allow Mage AI to Access the Postgres Database

Step5: Creating Your First Data Pipeline

Create Data Loader Blocks

Connect the Data Loader Blocks to the Transformer block

Creating Data Exporter Blocks

Step6: Creating Triggers and Scheduling Pipelines

Creating a Trigger

Scheduling the Pipeline

Additional Features of Mage AI

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm