Integrating Python in Power BI: Get the best of both worlds

Kaushikrch Last Updated : 09 Nov, 2020

7 min read

Overview

A demonstration of statistical analytics by Integrating Python within Power BI
Share the findings using dashboards and reports

Introduction

Power BI is Microsoft’s proprietary product for performing business intelligence tasks. Since 2018, integrating statistical and general-purpose software like R and Python has been made possible by Power BI.

How does it help you? If you are a business intelligence (BI) professional and want to perform certain data science tasks, you have to rely on the data science team. On the other hand, a Python developer needs to rely on the BI team to showcase their analysis in a presentable format, maybe in a dashboard. However, Power BI gets rid of this co-dependency as you can now run Python within an integrated environment.

Integrating python with power BI

It is imperative for the new age data scientist to understand the complete pipeline of solving complex business problems. Generally, this includes gathering, cleaning, exploring, transforming data to come up with predictions for future events. Finally, the analysis is presented in a dashboard or a report. Thus, the traditional roles of a business intelligence professional and predictive modeling expert are getting blurred.

This enhances the need for a comprehensive tool that will perform all the above-mentioned tasks in the analytical pipeline. If this tool comes from the makers of Excel, nothing like it. Thus, Power BI is the new talk of the town for performing embedded analytics.

In this tutorial, we will understand the integrating capabilities of Power BI and Python and go hands-on to perform certain tasks that a standalone system would not achieve easily.

Note: We assume you have used Power BI and Python. For a beginner-friendly introduction to Power BI, check out the blog:

10 Useful Data Analysis Expressions (DAX) Functions for Power BI Beginners

Setup the integrated environment
Import data using Python script
Using Power Query to transform data
Using Python’s statistical capabilities within Power BI
Generating analytical reports

Setup the integrated environment

The first step is to get an integrated environment up and running. To do this, you should have a distribution of Python installed on your machine. For this purpose, I prefer the base distribution of Python. For all my coding related tasks, I use Anaconda. Nonetheless, integrating Anaconda with Power BI can be a complicated exercise.

Post-installation, the integrated environment requires you to install four Python packages. They are Pandas (for data manipulation and analysis), Matplotlib and Seaborn (for plotting), and Numpy (for scientific calculations).

You may use the pip command in your command-line tool to install these packages.

pip install pandas
pip install matplotlib
pip install numpy
pip install seaborn

After installing these packages we have to enable Python Scripting in Power BI. You can open Power BI to check whether it automatically detects the Python distribution installed on your machine. Go to Files -> Options and Settings -> Options. Under Python Scripting you should see the Home Directory for Python installed on your machine.

Integrating Python with Power BI: Enabling Python Scripting in Power BI

Import Data using Python script

Now, you can run a quick test to check whether Python works within the Power BI stack. To begin with, you can import a small dataset in Power BI using Python script.

For this purpose, go to the Home ribbon, click on Get Data and select Other. This section allows you to import data from a varied list of sources, especially, Web, Hadoop Distributed File System (HDFS), Spark, etc. apart from using scripts like R or Python. Here, we will import the Churn Prediction dataset that is stored on my machine.

Integrating Python with Power BI: Get Data in Power BI

Click on Connect. It will open up a section where you can write the following Python script:

Integrating Python with Power BI: Python Script

Clicking on OK will load the Navigator and it will ask you to select the churn data, then click on Load. You can go to the data view to check whether the data has been loaded. Now, you are ready to use Power Query to perform one-click data transformations.

Using Power Query to transform data

Those who have hiked the Python learning curve would recognize the fact that transforming data is more or less a straightforward activity, but it may not be as easy for a person just heading for their data science journey.

However, with Power Query Editor, we can shape and transform data with a single click. Not only that, but Power BI also keeps a record of all the operations that go into the pipeline of data transformation before any analysis. To demonstrate the easy data transformation capabilities, we will showcase how to use Power Query.

Once you have loaded the data in Power BI, click on Transform Data under the Home tab to open Query Editor.

Integrating Python with Power BI : Query Editor

This opens the Query Editor and gives you a lot of options to perform cleaning, reshaping, and transformation of data.

Integrating Python with Power BI: Power Query Window

We will convert the customer_nw_category variable into a text field as these represent the Customer Net Worth Category and it should not be used as a continuous variable.

To do this, we will select the column, go to Data Type, and change the data type to Text. Power Query records this step under the Applied Steps section. It is a good practice to rename this step, for easy recall. We will rename it to “nw_cat Text”. Similarly, we will transform the churn column into a logical variable, representing True for 1 (churned) and False for 0 (not churned) and rename the step to “churn – True/False”.

Integrating Python with Power BI: Power Query Steps

After you have completed the transformation step, click on Close & Apply (on the top left corner) to apply these transformations to the data.

Using Python’s statistical within Power BI

Although Power BI has a comprehensive library of visualization, it is not a trivial matter to create a correlation matrix in it. Yet, correlation matrix heatmap forms an integral component of data analysis reports.

In this section, we will demonstrate how to create a correlation matrix heatmap using Python’s correlation function. This heatmap will be displayed on the Report section in Power BI.

Head over to the Report section in Power BI and click on Python visual denoted by Py symbol under the Visualizations section. At the left, you will notice an empty Python visual appearing and a Python script Editor popping up at the bottom. In other words, Power BI gives you the option of creating visualizations with scripts.

You will notice that currently, the Values field is empty.

To illustrate the correlation heatmap, we will get all the continuous variables into the Values field, namely, age, all average monthly balance columns, current, and previous month balance and current and previous month transaction columns, a number of dependents, and vintage (the time of association). This is an important step. Otherwise, Power BI wouldn’t recognize these variables to be part of the visualization.

As we get the variables into the Values field, the Python script is automatically populated with the following codes:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

# dataset = pandas.DataFrame(age, average_monthly_balance_prevQ, average_monthly_balance_prevQ2, current_balance, current_month_balance, current_month_credit, current_month_debit, dependents, previous_month_balance, previous_month_credit, previous_month_end_balance, previous_month_debit, vintage)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

We will write a short code in Python to create a correlation (Pearson coefficient) matrix heatmap using the seaborn package.

# import the charting libraries matplotlib and seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# create the correlation matrix on the dataset
corr = dataset.corr()
# create a heatmap of the correlation matrix
sns.heatmap(corr, cmap="YlGnBu")
# show plot
plt.show()

Finally, after running this script using the Run Script button, it produces a correlation matrix heatmap.

Correlation

Generating analytical reports

After analyzing the heatmap, we can conclude:

For all the customers,

age and number of dependents have no correlation with the other variables
average monthly balance in the last two quarters are moderately correlated
average monthly balance in the last quarter is highly correlated with the current month balance and the previous month balance

We can produce this heatmap for customers who have churned and compare it with those who have not. Thus, we apply a filter of churn = True or False using the blue boxes to observe the heatmap for the two groups of customers separately.

The below chart represents the picture for customers who have not churned. However, a different story emerges for these two types of customers. The customers who have not churned have a much higher correlation among the average monthly balance of the last two quarters and the current and previous months’ balance.

Not Churn

Whereas, for the churned customers, the average monthly balance in the last two quarters has a low to moderate correlation with current and previous months’ balance.

Churned

Therefore, this analysis demonstrates how we can draw some useful insights from analyzing the data to predict the behavior of churning customers.

End Notes

In this article, we learned about integrating Python within the Power BI distribution. We used the reporting capabilities of Power BI along with the analytical capabilities of Python to build an analytical report.

To conclude, this integrated environment gives more power into the hands of data scientists and business intelligence professionals. They can easily capitalize on the beneficial aspects of both of these tools.

Kaushikrch

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Scott Boston

Can we use these python scripts in PowerBI when publishing to app.powerbi.com workspaces?

Show 1 reply

Kaushik

The Python scripts used here reads the data and plots the correlation matrix. These features don't change the publishing capabilities of Power BI. You can still publish these reports that use external scripts.

Reading list

Introduction

Tools

Libraries

Plots

Use cases

Integrating Python in Power BI: Get the best of both worlds

Overview

Introduction

Table of Contents

Setup the integrated environment

Import Data using Python script

Using Power Query to transform data

Using Python’s statistical within Power BI

Generating analytical reports

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR