In this tutorial, we will discover Graph Network Tools and Packages in Python that are currently dominating in the data science industry. The world is all about relations. Every entity we see around us is related to each other somehow. Modeling these real-world entities into programming objects is always a challenging task for programmers. When programming, as we endeavor to model these real-world objects, we need a specific kind of data structure.
Graphs are the best choice for us to solve this challenging problem. There are several tools and packages in the Python language for integrating graph data structures into our codebase. One common format for handling graph data in Python is through CSV files, providing a convenient way to store and manipulate graph-related data.
This article was published as a part of the Data Science Blogathon.
We will explore the following tools in this article:
NetworKit, an open-source toolkit gaining traction for large-scale network analysis, aims to tackle the challenges posed by massive networks, often comprising hundreds of thousands to billions of edges. Its primary goal is to provide efficient graph algorithms tailored for such scales, many of which optimize for parallel execution on multicore processors.
These algorithms play a pivotal role in computing standard network analysis metrics, including degree sequences, clustering coefficients, and centrality measures. Notably, NetworKit stands out from alternatives like NetworkX by placing a strong emphasis on parallelism and scalability. Moreover, it serves as a valuable platform for algorithmic experimentation, facilitating the integration of newly developed algorithms stemming from cutting-edge research. Its utility extends beyond mere analysis; it also functions as a repository for datasets, fostering a comprehensive ecosystem for network analysis endeavors.
NetworKit, a Python library, offers a versatile toolkit for various computational tasks. Leveraging the Cython toolchain, it seamlessly integrates performance-optimized algorithms crafted in C++. These algorithms, often utilizing OpenMP for shared-memory parallelism, enhance efficiency significantly. Python’s interactive nature facilitates seamless interaction with diverse data analysis tools, amplifying the library’s utility. Furthermore, NetworKit’s core can be seamlessly incorporated as a native library, empowering users to delve into network graphs and social network analysis effortlessly.
For installation
#To install networkit
pip3 install networkit
import networkit as nk
import matplotlib.pyplot as plt
#Create directed graph object and to add nodes
G = nk.Graph(5, directed=True,weighted=True)
#Add edges to the graph
G.addEdge(1, 3)
G.addEdge(2, 4)
G.addEdge(1, 2)
G.addEdge(3, 4)
G.addEdge(2, 3)
G.addEdge(4, 0)
#Set weights to edges
G.setWeight(1, 3, 2)
G.setWeight(2, 4, 3)
G.setWeight(3, 4, 4)
G.setWeight(4, 0, 5)
#To see the graph structural overview
print(nk.overview(G))
#For visualization
nk.viztasks.drawGraph(G)
plt.show()
Output
Many of the netoworkit functionalities are in the development stage. For further details check the official documentation of networkit here.
Igraph is a set of graph-based network analysis tools focused on performance, portability, and simplicity of use. It’s a free and open-source tool, written in C and C++, and can be easily integrated with different programming languages such as R, Python, Mathematica, and C/C++. Additionally, its versatility extends to dash and visualizations, making it a comprehensive solution for various graph-related tasks.
For installation
#to install igraph
pip3 install igraph
#to install cairocffi for visualization
pip3 install cairocffi
from igraph import *
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
g = Graph(directed=True)
#Adding the vertices
g.add_vertices(7)
#Adding the vertex properties
g.vs["name"] = ["Alice", "Bob", "Claire", "Dennis", "Esther", "Frank", "George"]
g.vs["age"] = [25, 31, 18, 47, 22, 23, 50]
g.vs["gender"] = ["f", "m", "f", "m", "f", "m", "m"]
#Set the edges
g.add_edges([(0,1), (0,2), (2,3), (3,4), (4,2), (2,5), (5,0), (6,3), (5,6)])
#Set the edge properties
g.es["is_formal"] = [False, False, True, True, True, False, True, False, False]
#Set different colors based on the gender
g.vs["label"] = g.vs["name"]
color_dict = {"m": "blue", "f": "pink"}
g.vs["color"] = [color_dict[gender] for gender in g.vs["gender"]]
#To display the Igraph
layout = g.layout("kk")
plot(g, layout=layout, bbox=(400, 400), margin=20)
Output
For further details check the official documentation of igraph here.
Graph-tool is a Python module that allows you to manipulate and analyze graphs statistically (a.k.a. networks). Unlike most other Python modules with similar capabilities, the basic data structures and algorithms are developed in C++, with a heavy reliance on the Boost Graph Library and substantial use of template metaprogramming. This gives it performance comparable to a pure C/C++ library (both in terms of memory usage and processing time). The graph tool module includes a Graph class and a number of graph-related techniques. The Boost Graph Library is used to write the internals of this class, as well as most algorithms, in C++ for performance.
For Installation
Python packages are usually very easy to install via pip. However, the case is different for graph-tool. This is because graph-tool is really a C++ library wrapped in Python, and it has a lot of C++ dependencies like Boost, CGAL, and ex-pat, which aren’t available through Python-only package management systems like pip.
Using Docker is the most hands-off and OS-agnostic way to install graph-tool. If you have Docker installed, all you have to do is run
docker pull tiagopeixoto/graph-tool
This will download a Docker image based on Arch GNU/Linux that includes graph-tool and may be used on any recent GNU/Linux distribution, as well as MacOS X and Windows. It also includes matplotlib, Pandas, IPython, and Jupyter, among other essential Python tools.
You can start an interactive Python shell in the container by running the command after fetching the image:
docker run -it -u user -w /home/user tiagopeixoto/graph-tool ipython
which will install graph-tool and give you a Python 3 environment.
Sample python code for generating the graph with vertices and edges using graph-tool
#For getting all sub modules of graph-tool
from graph_tool.all import *
#Initializing a graph object (By default, newly created graphs are always directed)
g = Graph()
#Initializing an undirected graph object
ug = Graph(directed=False)
#For making a undirected graph to directed one on-the-fly
ug.set_directed(True)
#Adding the verices
v1 = g.add_vertex()
v2 = g.add_vertex()
#Adding the edges between vertices
e = g.add_edge(v1, v2)
#For visualizing the generated directed graph
graph_draw(g, vertex_text=g.vertex_index, output="sampleGraph.pdf")
Output
For further details check the official documentation of the graph tool here.
NetworkX is a Python tool for creating, manipulating, and studying complex networks’ structure, dynamics, and functions. Python 3.8, 3.9, or 3.10 is required for NetworkX, and it is written itself in Python. It’s used to investigate massive, complicated networks that are represented as graphs with nodes and edges. With the NetworkX library, we can load and store complex networks, create a variety of random and traditional networks, study their structure, establish network models, develop new network algorithms, and visualize them for better understanding and analysis. Since it is written in Python, NetworkX runs significantly slower than some other libraries, but its ease of use and powerful capabilities for data visualization make it a valuable tool for network analysis tasks.
Python integration
For Installation
pip3 install networkx
pip3 install pyvis # For visualization
Sample python code for generating the directed graph with vertices and edges (with weights) using networkx
import networkx as nx
from pyvis.network import Network
net = Network()
#Create directed graph object
Graph = nx.DiGraph()
#Create nodes
Graph.add_node("C++")
Graph.add_node("Python")
Graph.add_node("Java")
Graph.add_node("C#")
Graph.add_node("Go")
Graph.add_node("Julia")
#For adding the edges between nodes with weight and value (Edges with higher value will appear as bold)
Graph.add_edge("C++", "C#", weight=.75, value=20)
Graph.add_edge("Python", "Julia", weight=.75, value=20)
Graph.add_edge("Java", "Go", weight=.5)
Graph.add_edge("Python", "C#", weight=.5)
Graph.add_edge("Go", "C++", weight=.5)
Graph.add_edge("Java", "Julia", weight=.5)
Graph.add_edge("Java", "C#", weight=.75, value=20)
#For visualizing the graph
net.from_nx(Graph)
net.show("graph.html")
Output
For further details check the official documentation of networkx here.
Graphs have long been an important component of NLP applications, such as syntax-based Machine Translation, knowledge graph-based question answering, and abstract meaning representation for common sense reasoning tasks. NetworkX, igraph, networkit and graph-tool are just a few of the graph data packages available to Python programmers. Aside from the pros and cons, their interfaces for managing and processing Python graph data structures are extremely comparable. In this article, we explored the usage details of some popular graph network libraries in Python that data scientists commonly use today. They are
There are many more libraries are there in Python for building and managing the graph networks, but the above-mentioned libraries are used widely today. The other three packages are written in C/C++ but offer Python APIs. Networkx is purely written in Python. A graph has an R and Mathematica binding. All the above libraries place a greater emphasis on performance and provide multi-processing capabilities. If you run into any problems while building the graph, or if you have any ideas/suggestions/comments, please share them in the space below.
Happy coding!! 😊
In Python, you can graph data using libraries like Matplotlib, Seaborn, or Plotly. First, load your data into a DataFrame using Pandas. Then, use the plotting functions from these libraries to create various types of graphs such as line plots, bar plots, and scatter plots. These libraries offer customization options for creating visually appealing graphs.
A. The alternative to NetworkX is NetworKit, which emphasizes parallelism and scalability, making it suitable for large-scale network analysis tasks.
A. The best way to visualize a network is by using graph visualization software or libraries like Matplotlib, Seaborn, or Plotly, depending on the complexity and customization requirements.
A. The choice between Gephi and Cytoscape depends on the specific requirements and preferences. Gephi stands out for its user-friendly interface and visualization capabilities, while Cytoscape provides advanced network analysis tools and integration with various data sources.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Informative article. Thanks for sharing.
Hi, Great article. Can any of these libraries build 3D network graphs, with nodes and edges, that rotates in 3D?