Let me share a story with you. Years back, when the data team of the International Consortium of Investigative Journalists (ICIJ) received a dump of data that today we know as the Panama Papers, they would probably have thought it to be a futile endeavor. With that massive amount of complex data, it should have appeared to be an overwhelming call of duty for the knowledge workers! To quote ICIJ, the Panama Papers is a “giant leak of more than 11.5 million financial and legal records exposes a system that enables crime, corruption, and wrongdoing, hidden by secretive offshore companies.” The rouge offshore financial scandals that we have been reading about in the news since 2015 were the efforts of the team of ICIJ to leverage the power of knowledge graphs (KG) that gave context and connection to the data. The complex, multi-year multimedia data in Knowledge Graphs were linked so investigators could walk through the connections to unravel some of the biggest scandals of the past.
Now that we are familiar with the potential of KGs let’s see what is there for us! By the end of this article, I promise to leave you with a detailed understanding of the concepts behind KGs, why, where, and how we can use KGs. So, Basically in this article you will get to know about the knowledge graph and how to implement it, so We are Covering Knowledge Graph nad What is Knowledge graph their application.
Learning Objectives
This article’s learning goal is to make our data more intelligent and smart using a technique called Knowledge Graphs. KGs score brownie points over the rest of the normal graphs because of the addition of organizing techniques. The article also explores how to find knowledge graphs in the real world and help you build one.
This article was published as a part of the Data Science Blogathon.
A knowledge graph is a structured representation of knowledge that uses a graph-based data model. It organizes information into nodes and edges, where nodes represent entities or concepts, and edges denote relationships between them. Knowledge graphs enable efficient storage, retrieval, and inference of interconnected knowledge, facilitating advanced search, analysis, and reasoning tasks.
Most of us know that graphs in the computation are a pleasant and flexible way of data modeling that supports various complex algorithms and data science and computation. But what differentiates KGs from graphs is the application of an organizing principle that helps humans and software to interpret it quickly. So rather than repeatedly encoding intelligent behavior into applications, we directly encode it into the data once and for all! KGs results from decades of research into semantic computation, but with the advent of modern graph computations, they can be easily extended into real-world problems.
Thus we can say the KGs are structured knowledge bases that represent real-world entities and the relationships between them. Most KGs store this knowledge in triplets known as Subject-Predicate-Object (SPO) format, which aligns with the Resource Description Framework (RDF) standards. The existence of a particular SPO triplet indicates that the respective triplets possess a relationship of a specific type. For example, consider the following knowledge.
“Leonard Nimoy was an actor who played the character Spock in the science-fiction movie Star Trek.”
A sample knowledge graph of the following is shown in the figure below. Here the nodes represent entities, the edge labels represent types of relations, and the edges themselves represent existing relationships.
While the SPO triplets that can be extracted from the given knowledge are shown below:
Now we understand the structure of KGs. Next, we would look into the organizing principles of KGs, which bring out their essence and differentiate it from typical graphs.
There are several ways to organize data in graphs, each with advantages and drawbacks. In this section, we will be discussing each of the organizing hierarchies. We would start with plain simple graphs and try to explain how adding successive layers of organization helps make the data smart and more interpretable, thereby helping solve increasingly sophisticated problems.
These are graphs that haven’t had any organizing principle applied to them. Still, we know that they help solve our daily challenges as they underpin some very important systems. Instead of associating the “organizing principles”‘ with the data, the programs and systems that consume these graph data are embedded with the “organizing principles.”
A typical example of the same would be the sales of an online store. The figure below shows a small portion of the sales and product catalog graph, showing the customers and their purchases in the form of a plain old graph.
Looking straight into the graph might not be intuitive. Still, when the knowledge that P nodes represent products, C nodes represent customers, and the connections between the nodes represent purchases are encoded into the program, it would be easy to answer questions like products that a particular customer bought and vice versa or computing the popularity of the product. There is no doubt to the fact that these kind of graph information are helpful and does help compactly provide data, but in cases when data scientists with no prior knowledge of the domain try to dry run the code, then either someone will have to explain how to read the data, or he might have to reverse engineer the codes to understand how to interpret the data. Thus a better solution is to make the data smart by applying some organizing principles to the graphs, which we will see in the following three subsections.
The first organizing principle that we would see is the property graph model. It is richer and far more organized and supports labeled nodes, types, and directions of relationships and properties (key-value pairs) on both nodes. Thus it can provide humans and machines with some essential clues about the information it contains. Thus this organizing style makes the graph self-descriptive to a certain level and is a clear step towards making the data smarter! Also, some preprocessing and visualizations can be carried out without any domain knowledge just by leveraging the features of property graph models.
The figure above shows an enriched view of sales and product catalogs, which include labels, properties, and named relationships.
In the above “organizing principle,” we saw that creating categories of nodes using labels is evidently useful. Still, on deeper thought, it may be realized that the associativity between the labels is missing!
Continuing our sales product catalog, a good way to proceed would be to enrich the product classification with some “higher order organizing principle”, so that even if the shopkeeper is out of stock with a particular product, he might be able to win a sale by suggesting some similar product. We need a taxonomy to support this kind of “x is a kind of y” reasoning.
Taxonomy is a classification scheme that allows a broader-narrower hierarchy. Items sharing similar properties are grouped in the same category, and the taxonomy helps relate one category to other globally. This kind of hierarchy allows positioning specific things like products at the bottom, and more general things like brands and products are placed towards the top of the hierarchy. The hierarchy in this type of “organizing principle” is constructed with category nodes connected by subcategory_of relationships. And the products are connected to the appropriate part of the taxonomy to classify them as ready for sale. The same is shown in the figure below.
The figure above shows the product catalog hierarchy layered on top of customer and sales data. This enables a more organized visualization of the data. An even more interesting fact is that we can provide multiple hierarchical organizations simultaneously to help provide even more insight into the data.
Taxonomies help organize by bringing in the subcategory_of relations; Ontology allows define more complex relationships between categories like part_of, compatible_with, and depends_on. Thus following the ontological instructions, we can not only explore the categories vertically (hierarchically), but it also allows for horizontal comparison. Besides this, they can be built in a modular fashion to make them more compact with sophisticated use of layering. Thus ontology helps make knowledge actionable. The figure below is an ontological representation showing the upgrade paths for products in a category.
Thus till now, we have seen different types of organizing principles of KG. However, the organizing principle we choose to use should always be driven by its intended usage. It is advisable not to build rich and overcomplicated features into the organizing principles if no associate processes or agents would use them. It is a common mistake to opt for an overly ambitious organizing principle as it would be costly in terms of resources and time.
Now that we have understood KGs and the different organizing principles, the next question is how to implement them. Implementing KGs typically involves the following steps:
The first step is collecting data from structured/ unstructured databases or text or multimedia data from images and videos.
The next step would be to pre-process it to remove irrelevant and redundant information to ensure that data is in a format that can be readily utilized for building the KGs
The third step is to extract the entities and relationships from the data. Named Entity Recognition, relationship extraction, and object detection can achieve this.
Once the entities and relationships have been extracted, the next step is constructing the knowledge graphs. Graph databases like Neo4j or Titan can achieve this.
Then, follow it by populating the KG with extracted entities and relationships.
Once KG has been constructed, it can be queried to achieve useful information.
Finally, the KG should be regularly maintained, updated with new data, and monitored for errors.
It is noteworthy to mention that these steps are not discrete and may vary depending on the specific use case and technology. Additionally, libraries and frameworks like OpenAI, GPT 3, and Google’s Tensor can help with the steps.
Now we know how to build KG, it would be interesting for you to be aware of the usage of KG.
While I hope I have successfully sufficed your craving for knowledge for today, I want to leave you with some parting information. I want to introduce you to something called “Scene Graphs.” A scene graph(SG) differs from KG because SG extracts SPO from images and videos. Knowledge graphs are used to represent real-world entities and their relationships and are used to represent information in a structured format in a general sense. They can represent structures such as people, things, and concepts. Scene graphs, on the other hand, are used to embody objects, attributes, and spatial relationships between objects (containment, proximity, actions, etc.) in images and videos (3D environments).
Thus today, we have looked deeply into making our data more intelligent and smart. The technique that we utilized for the same is Knowledge Graphs. To briefly summarized today’s read, the key takeaways for you in this article would be:
A. Yes, knowledge graphs are closely related to Natural Language Processing (NLP). They can be used in NLP tasks such as information retrieval, question answering, entity linking, and semantic parsing. Knowledge graphs provide a structured representation of knowledge that can enhance the understanding and processing of natural language text.
A. The use of knowledge graphs is diverse and extensive. They facilitate organizing, integrating, and representing complex knowledge from various sources in a structured and interconnected manner. Knowledge graphs enable advanced search, recommendation systems, question answering, semantic analysis, entity linking, data integration, and machine learning tasks. They enhance data understanding, reasoning, and decision-making by providing a comprehensive and interconnected view of information.
Knowledge graphs are like flexible mind maps for data, good for connections. Relational databases are like filing cabinets, great for organized info. They can even work together!
Google Search uses a giant database called the Knowledge Graph to understand your searches and show you better results. Think of it as a super-powered dictionary for Google Search.
References:
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.