Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge. Picture this: thus, unlike many other extensions that require deep setup and constant coding, Mage AI has a clear and undemanding step-by-step setup and you are welcome to use its clear interface without having to drag and drop items. I’ll also share with you how to import Mage AI with PostgreSQL so that you can create your first data pipeline with Mage AI. Allow me to introduce you the steps that will help make your data processing even better!
Learning Outcomes
Understand how to configure Mage AI for seamless integration with PostgreSQL.
Learn to upload raw data to PostgreSQL and create schemas using pgAdmin4.
Master the process of building and managing data pipelines in Mage AI.
Explore how to set up automated triggers and scheduling for data pipelines.
Gain insights into Mage AI’s advanced features like real-time processing and monitoring.
Mage AI simplifies the integration of growing data workflows as an open-source tool. With its clean design and app-like interface, data engineers and analysts can easily create data pipelines using one-click options, eliminating the need for coding. Importing, analyzing and manipulating big data is much easier with Mage AI which comes with features such as drag and drop, data transformation, data source compatibility among others. The former enables users to spend their time on the analytics side instead of worrying about the underlying infrastructures to set up. Mage AI also supports Python scripting where one can define custom transformations which make it suitable for both, a technical and non technical user.
Benefits of Using Mage AI with PostgreSQL
Let us look into the benefits of using Mage AI with PostgreSQL.
Streamlined Data Management: Mage AI simplifies data pipeline creation with its drag-and-drop interface, making it easy to load, transform, and export data from PostgreSQL without manual coding.
Enhanced Automation: Automate recurring data tasks, like ETL processes, by setting up triggers and scheduled pipelines, reducing the need for constant manual interventions.
Seamless Integration: Mage AI integrates smoothly with PostgreSQL, enabling users to manage large datasets efficiently and perform complex data operations within the same workflow.
Customizable Transformations: Leverage Python scripting in Mage AI to perform custom data transformations on PostgreSQL data, allowing flexibility for advanced data processing.
Scalable and Reliable: Mage AI efficiently manages pipelines, ensuring smooth handling of both small and large datasets, while PostgreSQL’s scalability supports business growth without performance bottlenecks.
User-Friendly: The intuitive interface makes it accessible to users with varying levels of technical expertise, enabling quicker learning and faster deployment of data solutions.
Setup Mage AI with Postgres to Build and Manage Your Data Pipeline
Setting up Mage AI with Postgres allows you to seamlessly build and manage powerful data pipelines, automating workflows and simplifying complex data tasks for efficient insights. Let us look into the steps required to set up Mage AI with Postgres.
Step1: Preparing Your Postgres Database
Before diving into Mage AI, upload your raw data files to Postgres using pgAdmin4, and create the correct schema for each file. Here’s how to get started:
Upload Raw Files to Postgres via pgAdmin4
Open pgAdmin4 and connect to your Postgres server.
Create a new database or use an existing one.
Ensure that you add the correct schema for each raw data file.
Upload/Export your data files to the appropriate tables within this schema.
import pandas as pd
import chardet
# Open the file in binary mode and read a sample
with open("expensemaster.csv", 'rb') as file:
sample = file.read(10000) # Read first 10,000 bytes as a sample
# Detect encoding
detected = chardet.detect(sample)
print(detected['encoding'])
# Use the detected encoding to read the CSV
try:
df = pd.read_csv("expensemaster.csv", encoding=detected['encoding'])
except UnicodeDecodeError:
# If reading fails, try with a common encoding like UTF-8
df = pd.read_csv("expensemaster.csv", encoding="utf-8")
# Infer data types
dtype_mapping = {
'object': 'TEXT',
'int64': 'BIGINT',
'float64': 'DOUBLE PRECISION',
'datetime64[ns]': 'TIMESTAMP',
'bool': 'BOOLEAN'
}
column_definitions = ', '.join([f'"{col}" {dtype_mapping[str(df[col].dtype)]}' for col in df.columns])
# Generate the CREATE TABLE SQL
table_name = 'expensemaster'
create_table_sql = f'CREATE TABLE {table_name} ({column_definitions});'
print(create_table_sql)
Click refresh on the “Tables’ to get the newly created table.
Start the Postgres Service
Make sure the Postgres service is running. You can check this in pgAdmin4 or by using the psql terminal.
Step2: Gathering Postgres Configuration Details
You’ll need specific details to configure Mage AI with Postgres. Here’s what you need and how to find it:
POSTGRES_DBNAME: The name of your Postgres database.
POSTGRES_SCHEMA: The schema where your data files are uploaded.
POSTGRES_USER: The username for your Postgres database.
POSTGRES_PASSWORD: The password for your Postgres database.
POSTGRES_HOST: The host IP address of your Postgres server.
POSTGRES_PORT: Usually 5432 for Postgres.
Step3: Installing Mage AI Using Docker in VS Code
To install Mage AI, we will use Docker Extension in Visual Studio Code (VS Code). Ensure you have Docker Desktop and the Docker extension for VS Code installed.
Install Docker Desktop
Download and install Docker Desktop from here and initialize it.
Install the Docker Extension for VS Code:
Open VS Code and go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window or by pressing Ctrl+Shift+X.
Search for “Docker” and install the Docker extension by Microsoft.
Pull the Mage AI Docker Image
Open a terminal in VS Code and navigate to your project folder.
Run the following command to pull the latest Mage AI Docker image:
docker pull mageai/mageai:latest
Run the Mage AI Docker Image
Once the Mage AI image is pulled, go to the Docker tab in VS Code.
Find the Mage AI image and run it. This will create a new container.
Right-click on the newly created container and select “Open in Browser.”
The Mage AI interface should now load in your default web browser.
Step4: Configuring Mage AI to Connect with Postgres
Configure the database connection in io_config.yaml:
Navigate to the All Files section of your pipeline.
Locate and open the io_config.yaml file.
Add your Postgres connection details as follows
Allow Mage AI to Access the Postgres Database
To grant access to the database on your IP address, you need to modify the pg_hba.conf file.
Locate the pg_hba.conf file at C:\Program Files\PostgreSQL\16\data.
Open the file and add the row under the # IPv4 local connections section as shown in the Fig. 4.
Step5: Creating Your First Data Pipeline
Now that Mage AI is configured to connect with Postgres, we can create our first data pipeline. We’ll start by setting up data loader blocks for each dataset and using the drag-and-drop feature to connect them in a flowchart.
Create Data Loader Blocks
For each dataset, create a separate data loader block.
In the Mage AI interface, drag and drop a data loader block onto the canvas for each dataset you need to load from Postgres.
Configure each data loader block with the appropriate connection details and query to fetch the data from Postgres.
Connect the Data Loader Blocks to the Transformer block
Use the drag-and-drop feature to connect the data loader blocks in the flowchart to the next transformer code block. This visual representation helps in understanding the data flow and ensuring all steps are connected correctly.
Creating Data Exporter Blocks
In the Mage AI interface, after configuring your data loader and transformation blocks, add a data exporter block to the canvas.
Choose “Postgres” as the destination for the data under Python.
Provide the necessary connection details to your Postgres database. Write the code to export the transformed data back to the PostgreSQL database.
Step6: Creating Triggers and Scheduling Pipelines
Mage AI offers the ability to create triggers for running your pipeline and scheduling it for regular execution. This ensures your data is always up-to-date without manual intervention.
Creating a Trigger
In Mage AI, you can set up triggers to run your pipeline based on specific events or conditions. For example, you can trigger a pipeline to run whenever new data is added to your Postgres database.
To create a trigger, navigate to the pipeline settings and configure the trigger conditions as needed.
Scheduling the Pipeline
Mage AI supports scheduling pipelines to run at regular intervals. This can be done through the scheduling settings in the Mage AI dashboard.
You can specify the frequency (daily, weekly, etc.) and the time for the pipeline to run.
Additional Features of Mage AI
Mage AI provides several powerful features to automate and enhance your data pipelines:
Integration with Multiple Data Sources: Mage AI also accepts numerous kinds of data inputs: databases, cloud storage, and APIs enabling you to construct diverse and extensive data flows.
Advanced Transformation Capabilities: Based on Python, Mage AI provides you an opportunity to implement custom transformation with the help of decorators which facilitates the process of realization of various data transformation algorithms.
Scalability: Mage AI optimizes your throughput for big data, enabling it to handle increasing amounts of data as they grow.
Monitoring and Alerts: Mage AI provides a strong monitoring and alerting functionality and allows one to monitor the workflow of the pipeline as well as receive notifications on failures.
User-Friendly Interface: The graphical layout of the data pipelines means that users do not have to worry about complicated coding in order to manipulate and transform their data.
They to make Mage AI a tool to automate the data workflows as the data infrastructure so that you do not need to spend much time on it.
Conclusion
Today, information is a valuable asset, making data management essential for organizations. This article provides clear guidance on configuring Mage AI with PostgreSQL, helping you build a robust data pipeline that not only streamlines multiple processes but also significantly boosts productivity. With the software associate, the utilization of Mage AI alongside with robust databases such as PostgreSQL enable users to handle, analyze and make the right decisions in the shortest time possible. As organizations have stepped up efforts in data-driven methodologies and frameworks, technologies such as Mage AI are poised be the dominant models for managing data.
Frequently Asked Questions
Q1. What is Mage AI?
A. Mage AI is an open-source tool designed to simplify the process of building and managing data workflows. It provides a user-friendly interface and automation features that help data professionals create pipelines without extensive coding knowledge.
Q2. Why use PostgreSQL with Mage AI?
A. PostgreSQL is a powerful, open-source relational database management system known for its robustness and scalability. When paired with Mage AI, it allows users to efficiently store, retrieve, and manipulate large datasets, making it an ideal choice for data pipelines.
Q3. Do I need programming skills to use Mage AI?
A. While some familiarity with programming concepts can be helpful, Mage AI is designed to be user-friendly and accessible to users with varying levels of technical expertise. Many tasks can be accomplished through its intuitive interface.
Q4. Can I integrate other data sources with Mage AI?
A. Yes, Mage AI supports integration with various data sources, allowing users to build comprehensive data pipelines that pull in data from multiple platforms, enhancing the overall data ecosystem.
Q5. Is Mage AI free to use?
A. Mage AI is an open-source tool, which means it is free to use. However, users may incur costs associated with hosting, storage, and other related services, depending on their infrastructure choices.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Hi! I'm Adarsh, a Business Analytics graduate from ISB, currently deep into research and exploring new frontiers. I'm super passionate about data science, AI, and all the innovative ways they can transform industries. Whether it's building models, working on data pipelines, or diving into machine learning, I love experimenting with the latest tech. AI isn't just my interest, it's where I see the future heading, and I'm always excited to be a part of that journey!
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Powered By
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.