0% found this document useful (0 votes)
27 views20 pages

Developers Guide How to Build Knowledge Graph

The document is a comprehensive guide on building a knowledge graph, explaining its components, benefits, and practical applications. It covers the steps to create a knowledge graph using Neo4j, including setting up an account, creating a database instance, and designing a data model. The guide also discusses how to enrich the knowledge graph with structured and unstructured data, and provides use cases for various industries.

Uploaded by

yuv bindal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
27 views20 pages

Developers Guide How to Build Knowledge Graph

The document is a comprehensive guide on building a knowledge graph, explaining its components, benefits, and practical applications. It covers the steps to create a knowledge graph using Neo4j, including setting up an account, creating a database instance, and designing a data model. The guide also discusses how to enrich the knowledge graph with structured and unstructured data, and provides use cases for various industries.

Uploaded by

yuv bindal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 20

EBOOK

THE DEVELOPER’S GUIDE:

How to Build a
Knowledge Graph
The Developer’s Guide: How to Build a Knowledge Graph

Table of Contents

The Developer’s Guide: How to Build a Knowledge Graph....................................... 3

What Is a Knowledge Graph?..................................................................................... 3

Components of a Knowledge Graph............................................................. 4

Build and Query Your Knowledge Graph................................................................... 5

Sign Up for a Neo4j Account........................................................................ 5

Create a Graph Database Instance.............................................................. 5

Create a Graph Data Model.......................................................................... 6

Load Data Into Your Knowledge Graph........................................................ 11

Load Structured Data....................................................................... 11

Query Your Knowledge Graph....................................................................... 13

MATCH Clause.................................................................................. 14

CREATE and MERGE Clauses.......................................................... 15

Next Steps.................................................................................................................. 16

Expand Your Knowledge Graph With Unstructured Data........................... 16

Load Unstructured Data................................................................................ 17

Enrich Your Knowledge Graph Using Graph Algorithms............................. 17

Use Cases and Design Patterns................................................................................. 17

Supply Chain.................................................................................................. 18

Entity Resolution........................................................................................... 18

GenAI............................................................................................................. 18

Concluding Thoughts and Further Learning............................................................. 20


The Developer’s Guide: How to Build a Knowledge Graph

The Developer’s Guide: How What Is a Knowledge Graph?


to Build a Knowledge Graph A knowledge graph maps entities — objects, events,
or concepts — and their relationships into an
Our minds make sense of data by connecting interconnected structure. This relationship-centric
different pieces of information to form a cohesive approach models real-world scenarios with precision
picture. Traditional database systems like MySQL while embedding domain-specific knowledge and
and PostgreSQL store data in rigid boxes that don’t business rules as organizing principles.
connect easily. This causes headaches for companies
and the developers working with these systems. Used to integrate different types of information, a
Examples include: knowledge graph works well for use cases that pull
data from multiple sources, including structured
• Information silos block natural collaboration (traditional database entries) and unstructured
between teams. (documents, social media posts) data. This unified
• Complex join operations and foreign keys lead view of a company’s knowledge is highly valuable,
to poor runtime performance. especially compared to traditional data design where
• Fixed schemas resist adaptation as business information is fragmented, and relationships must be
needs change. reconstructed through JOIN queries.
The biggest problem with traditional database For example, a knowledge graph might represent
systems is that the intricate context around the data patients, symptoms, diseases, etc., as depicted in
— how everything fits together — often gets lost. the diagram below. A patient might have symptoms
It’s like trying to understand a complex codebase similar to patients with overlapping symptoms,
without any documentation or understanding of the diseases, or side effects. Prescriptions could also be
relationships between different modules. Over time, linked to the pharmaceutical companies that make
traditional databases become increasingly difficult them or the doctors who often prescribe them for
to maintain and modify, which steadily erodes their several different illnesses.
business value.

A knowledge graph solves these problems. Rather


than holding data static in rows and columns, a
knowledge graph organizes information in its natural
form: a web of interconnected entities. A flexible
schema makes it simple to add new entities and
relationships as they emerge. Patterns that get
lost easily in a traditional database, such as similar
purchasing behaviors or fraudulent transactions, are
clearly visible in a knowledge graph.

This guide walks you through everything you need Figure 1. Knowledge graph example
to know to build your first knowledge graph. You’ll
learn core concepts and how to think about modeling Knowledge graphs surface hidden patterns through
data with relationships. Then, you’ll set up your connections in data. For instance, a medication
own knowledge graph and start querying it to manufacturer depends on a supplier network for
answer questions that you can’t answer in a product components. A knowledge graph could
“traditional database.” reveal that several key suppliers are located in
a hurricane-prone region — a risk that might go
unnoticed in a traditional database.

3
The Developer’s Guide: How to Build a Knowledge Graph

Components of a Knowledge Graph Relationships, also known as connections, contain


There are three major components of any knowledge information about how nodes interact with or relate
graph: nodes, relationships, and organizing principles. to one another. They add context and meaning to the
data — a patient linked to a health condition with
Nodes represent instances of specific entities, such
a “DIAGNOSED_WITH” relationship or connected to
as tangible objects (people, places, things), abstract
a prescription to show what medications they are
concepts, or events. Nodes are the fundamental
“TAKING,” for instance:
building blocks of a knowledge graph. You can have
as many nodes as needed in a graph.

In the healthcare example, nodes represent individual


patients and diseases:

Figure 4. Healthcare example — relationships

Properties are attributes that provide information


about nodes and relationships. They enrich the graph
with detailed metadata and domain context.

Figure 2. Healthcare example — nodes In the healthcare example, the “Patient” node could
have properties like name, date of birth, and contact
Labels identify nodes by role or type, serving as a
information, while the “Disease” node could include
classifier or tag that defines their function or purpose
properties like name and description:
in your domain. They add semantic meaning to nodes,
making the graph more intuitive to understand and
query. When you specify a label in your query, it
helps the graph database find the type of node you’re
looking for.

Two labels from the healthcare example would be


“Patient” and “Disease”:

Figure 5. Healthcare example — properties

Organizing principles bring business context to the


graph by defining how entities, relationships, and
properties are structured and used. They specify
the types of nodes and relationships, establish
hierarchies or categories, and guide interactions
within the graph. This structure makes the data

Figure 3. Healthcare example — labels

4
The Developer’s Guide: How to Build a Knowledge Graph

easier to understand and enables more efficient another layer: unstructured data. This is where the
querying, analysis, and inference across different magic starts to happen in a knowledge graph: the
levels of detail. ability to add new and different types of data and
then query relationships across all the data.
In the healthcare knowledge graph, diseases could
be organized into categories (such as cardiovascular Sign Up for a Neo4j Account
or respiratory diseases), while patients could be You’ll build your knowledge graph on the cloud-
grouped by risk factors or age ranges. This structure hosted, fully managed Neo4j AuraDB Graph
enables analysis at various levels, from individual Database. Neo4j stores data as nodes and
patient-disease relationships to broader population relationships, supports the Cypher graph query
health trends. language, and offers tools for data visualization, data
science, and data connectors. Before using AuraDB,
you’ll need an account.

If you already have a Neo4j account, you can log


into the Aura Console and skip to the next step to
create a database instance.

Follow these steps to create a Neo4j Aura account:

1. Navigate to the Neo4j Aura Console.

2. Click on Sign up below the login box.


Figure 6. Healthcare example — organizing principles

Build and Query Your


Knowledge Graph
Now that you know the fundamentals, you can create
your first knowledge graph using a Neo4j graph
database. Though you could create a knowledge
graph in another type of database, a property graph
database like Neo4j is purpose-built for this task. A
property graph database aligns naturally with the
structure of a knowledge graph, making it the most
Figure 7. Neo4j Aura signup screen
intuitive option for implementation.
3. Type your email address into the input box and
This guide teaches you to build a knowledge graph
click Continue to set up the password and other
from start to finish. The example is retail transaction
necessary information. Alternatively, you can sign in
data with products, product categories, customers,
using the Google or organization account option.
and orders. You’ll learn how to design a knowledge
graph, populate it with data, and query it using 4. If prompted, agree to the Neo4j terms.
Cypher. As you move through this process, you’ll see
how a knowledge graph allows you to answer multi- Next, you’ll create a graph database instance to hold
step questions — sometimes even answering two your knowledge graph.
questions with a single streamlined query. Create a Graph Database Instance
Once you feel confident querying your knowledge In this step, you’ll create the actual database
graph, you’ll have a chance to experiment with instance to store your knowledge graph. If you

5
The Developer’s Guide: How to Build a Knowledge Graph

haven’t already, navigate to the Aura Console and complete, you can move to the next section, where
log in. Then: you’ll design a graph data model for importing data.

1. Click the Create instance button: Create a Graph Data Model


Now that you have a database instance ready, you
need to populate it with data. The Data Importer tool
will help you design the structure of your knowledge
graph by drawing entities and relationships to
represent your domain of interest.

From the main Aura console, click the Import option


in the left menu:

Figure 8. Neo4j Aura Create instance screen

2. You’ll see a list of instance types, with the Figure 10. Data import screen
Professional tier (center option) highlighted by
Click New data source in the middle:
default. AuraDB Free is a great way to start learning
and exploring knowledge graphs. When you’re ready
to move to production-quality, high-performance
applications in the cloud, you can progress to AuraDB
Professional. We’ll use the Free instance for our
knowledge graph, so click the Select button at the
bottom of the Free tier.

Figure 11. Selecting New data source screen

Then choose the .CSV* option.


(lower part of the pop-up):

Figure 9. New AuraDB Free instance screen

3. A pop-up should appear with the credentials for


your instance. Click Download and Continue to
download the credentials file. (Important: You cannot
access the password after this point.) Your database Figure 12. Aura New data source screen
instance will take a few minutes to create. Once

6
The Developer’s Guide: How to Build a Knowledge Graph

Data Importer may not automatically connect to your


running instance, so in the upper left, if it says “No
instance connected,” follow these steps:

1. Click on the drop-down next to No instance


connected and click Connect to instance:

Figure 15. Aura Data Importer connected instance screen

You’ll create a simplified version of the Northwind


Graph example, an ecommerce demonstration
based on the popular Northwind sample dataset.
The dataset is formatted as CSV files containing
information about customers, orders, products,
categories, and suppliers. Let’s create the model
Figure 13. Aura No instance connected screen below in Data Importer:

2. Click Connect next to your instance:

Figure 14. Aura Connect to instance screen

3. In the credentials pop-up, type in the username


and password for your instance. The downloaded
credential file from earlier is helpful here. Figure 16. Northwind graph data model
Click Connect.

You should be on the main Data Importer screen and


see your connected instance in the upper left:

7
The Developer’s Guide: How to Build a Knowledge Graph

To start designing the model, click Add node label: 1. Type Customer as the label in the Name field. This
label will identify the type of entity these nodes
represent in the graph.

2. Click the + sign next to Properties to add


node properties.

3. Edit the property by clicking the Property1 button


under Properties, type customerID, and press
Enter. To the right of the property name, select the
appropriate data type from the drop-down (in this
case, string).

Repeat this process for companyName and city. Your


completed Customer node should look like this:
Figure 17. Initial add node label screen

Note: To minimize the Data source tab along the left


side, click the Data sources icon in the upper left of
the visualization pane:

Figure 20. Customer node screen

Next, create another node type for Orders. Click


the Add node label icon in the top-left corner of the
sketch area:
Figure 18. Data sources icon screen

A circle should appear in the pane, along with a right


tab containing definition metadata:

Figure 21. Add node label screen

A new blank node will appear in your workspace.


Label this new node type as “Order” and add the
following properties: orderID (integer), orderDate
(datetime), and shippedDate (datetime):
Figure 19. Definition screen

This will be the Customer node in the data model.


Customer nodes will have three properties:
customerID (string), companyName (string), and
city (string). Follow these steps to define the node:

8
The Developer’s Guide: How to Build a Knowledge Graph

Release the button only when you’re over the “Order”


node to create a new relationship (or else you’ll
create a new blank node).

Notice that the relationship has a direction from


“Customer” to “Order.” By drawing a line between
the “Customer” and “Order” nodes, you’re modeling
a customer’s purchase in the knowledge graph. The
relationship explicitly defines how customers relate
to orders.

Relationships need to have a type. In the


Figure 22. Order node label screen
relationship’s metadata pane on the right, name this
Next, you’ll create a relationship between the relationship type “PURCHASED.”
“Customer” and “Order” node types to represent that
This intuitive representation of data and relationships
a customer places an order. Click the Customer node
enables powerful querying capabilities because
to select it. Hover your mouse over the border of
the model clearly defines not only that there is
the node to see a green + button. Click and hold the
a relationship but also how the entities relate to
mouse button, then drag the gray circle that appears
one another. A purchase relationship would help
to the “Order” node:
us understand customer order history and habits,
but a Customer-CREATES > Order could point to a
shopping cart that hasn’t been purchased yet. We’ll
discuss more ways to answer business questions
with graph data later in this guide.

This intuitive representation is how we tend to model


and think of data, even for other types of databases.
The difference with a knowledge graph is that what
you draw is exactly what you’re going to store and
query in the database. With other database types,
you’d have to take this intuitive model and figure out
how to implement it within the technical limitations
of that database (the transition the conceptual data
Figure 23. Create the relationship between “Customer” and
model from the physical data model). In a knowledge
“Order” screen
graph, the conceptual data model and physical data
model are one and the same.

To finish the Northwind graph model, create a third


node type called “Product” and add three properties
to it: productID (integer), productName (string), and
unitPrice (float):

Figure 23 (cont). Create the relationship between “Customer” and


“Order” screen

9
The Developer’s Guide: How to Build a Knowledge Graph

Figure 25. Create “Product” node screen Figure 27. Create “Supplier” node type screen

Next, create a new relationship type by drawing a line Your knowledge graph model now has four node
from the “Order” node to the “Product” node. Name types and three relationship types, but it still lacks
this new relationship type “ORDERS”: an organizing principle. In the Northwind example,
your organizing principle could be a product
hierarchy that streamlines product group searches.
As another option, you could choose a process-based
principle around the order fulfillment stages to
optimize the supply chain and delivery network.

For this example, you’ll add a product hierarchy as


the organizing principle of your graph.

Start by adding another node type in the data model.


Name it “Category” and include the properties
categoryID (integer) and categoryName (string):

Figure 26. Create a new relationship type by drawing a line from


“Order” node to “Product” node screen

Remember that relationships can have properties,


too. Add the property quantity (integer) to store the
number of that product ordered.

To add suppliers to our model, create a fourth node


type called “Supplier” and add three properties to
it: supplierID (integer), companyName (string), and
city (string). Create a relationship from “Supplier” to
“Product” and name it “SUPPLIES”: Figure 28. Create “Category” node type screen

Next, create a new relationship type starting from


the “Product” node and going to the “Category” node
named “PART_OF”:

10
The Developer’s Guide: How to Build a Knowledge Graph

Figure 29. Create “PART_OF” relationship type screen


Figure 30. Data sources screen
Now that your knowledge graph model has relevant
relationships and an organizing principle, you’re ready Choose the bottom of the two options (Drag & Drop
to bring it to life with data. and browse support CSV) for loading data into
your knowledge graph and add the files you just
Load Data Into Your Knowledge Graph
downloaded.
Loading data into your knowledge graph creates
nodes and relationships about specific customers,
orders, products, and categories with the model you
defined.

For this example, you’ll import CSV data from the


Northwind database.

Load Structured Data

Download the following CSV files from the Northwind


GitHub repository:

• categories.csv
• customers.csv
• order-details.csv
Figure 31. Drag & Drop and browse support CSV selection screen
• orders.csv
• products.csv
• suppliers.csv

In the Aura workspace, click the Data sources icon in


the top-left corner of the sketch area to expand the
Files menu:

11
The Developer’s Guide: How to Build a Knowledge Graph

Here’s what you should see after uploading the files Next, scroll down to the Properties section. It shows
(properties collapsed): a list of the properties you defined earlier in your
graph model. Since you used the same naming
convention as the GitHub repository, the property
names in your graph model will match the field
names in the CSV files, which simplifies the mapping
process. Though you can map each property from
your model to the CSV field manually using the drop-
downs, a simpler option is to click Map from table
just above the properties, choose which columns
from the CSV files to map, and click Confirm:

Figure 33. Map from table screen

To map relationships, you’ll notice an additional


section, Node ID mapping, with fields to map From
Figure 32. File upload screen
and To nodes:
After adding your data files, close the Data source
menu by clicking the icon again.

Map the data from the files to the nodes and


relationships in your graph model by clicking any
node or relationship in the drawing and locating the
Table > Name field in the Definition tab. Open the
drop-down list and select the appropriate file:

• For the “Customer” node, select customers.csv


• For the “Order” node, choose orders.csv
• For the “Product” node, pick products.csv
• For the “Supplier” node, select suppliers.csv
Figure 34. Node ID mapping screen
• For the “Category” node, use categories.csv
• For the “PURCHASED” relationship, select Complete mappings for each of your graph model’s
orders.csv nodes and relationships in any order you prefer. As
• For the “ORDERS” relationship, pick order- you map, Aura Workspace places a green checkmark
details.csv next to each fully mapped element, indicating that
• For the “SUPPLIES” relationship, pick all fields for that node or relationship have been
products.csv successfully populated:
• For the “PART_OF” relationship, choose
products.csv
12
The Developer’s Guide: How to Build a Knowledge Graph

Figure 35. Complete mappings screen Figure 37. Import results screen

After completing the mapping process for all Click the X to close the pop-up window.
elements of your knowledge graph, you’re ready to
Now that the data is imported into the database,
populate the database.
you can use queries to understand behaviors and
Click the Run import button to load your data: patterns in the data.

Query Your Knowledge Graph


You will query your knowledge graph using the
Cypher query language. Cypher is the most widely
adopted implementation of the ISO Graph Query
Language (GQL) standard designed specifically for
graph databases. Cypher is a declarative language
(like SQL), which means you write queries by
specifying the results you want and not dictating
how to get them. It offers several advantages over
traditional query languages like SQL or SPARQL,
including reduced code complexity, easier debugging,
Figure 36. Run import screen and intuitive representation of data patterns.

This action starts the import process. You’ll see a Cypher expresses graph patterns in a way that
progress bar indicating the status of the import. resembles how they’re drawn on a whiteboard. For
Once complete, a pop-up window will display the instance, a statement like “customer orders product”
import results. The window provides a quick overview can be represented in Cypher as:
of the import process outcome and lets you verify
(c:Customer)-[r:ORDERS]->(p:Product)
whether the data was successfully imported into your
knowledge graph.

Figure 38. Cypher “customer orders product” diagram

13
The Developer’s Guide: How to Build a Knowledge Graph

The next sections cover the three most important relationship to a “Category” node (variable c).
Cypher clauses you’ll need to write queries and The “Category” node is filtered to only match
interact with your knowledge graph: where categoryName is “Beverages.”
• RETURN p, rel, c specifies the data to be
1. MATCH finds and returns the nodes or
returned from the matched pattern - products
patterns specified.
(p), PART_OF relationships (rel), and categories
2. CREATE adds new nodes or patterns specified (c):
to the graph.

3. MERGE executes a find-or-create operation, first


checking if the pattern exists in the graph (MATCH), Figure 39. Cypher `MATCH` beverages query
then either returns the existing pattern or creates the
To execute the query, click the Play icon next to the
pattern if it doesn’t exist.
query box.
For a more comprehensive understanding of
Cypher, the Cypher Fundamentals course on
Neo4j GraphAcademy offers an in-depth overview
of Cypher and provides hands-on exercises to
reinforce your learning.

MATCH Clause

Cypher’s MATCH clause finds nodes or patterns in a


graph database. Use MATCH when you want to find
nodes and relationships in your graph, an essential
Figure 40. Executing the query screen
part of data retrieval and analysis. It serves a similar
purpose to the SELECT statement in SQL, allowing Here’s the sample output:
you to retrieve data based on specified criteria.

Explore your Northwind knowledge graph with the


queries in the next paragraphs.

Navigate to Tools > Query in the left menu and enter


the following Cypher code in the query box at the
top right:

MATCH (p:Product)-[rel:PART_OF]-
>(c:Category {categoryName: “Bever-
ages”})
RETURN p, rel, c; Figure 41. Sample output screen

The query above displays a visualization of nodes


This query finds all products that belong to the
and relationships because we returned entire nodes
beverage category. It’s a common type of query in
and relationships, but you can specify parts of the
ecommerce systems where you want to look up
pattern and workspace could display text, tables,
products in a specific category. Let’s break down
or other formats.
the query:
As an example, the “Ipoh Coffee” product has run out
• MATCH (p:Product)-[rel:PART_OF]-
of stock, and you need to identify which orders need
>(c:Category {categoryName:
to be updated and customers contacted:
“Beverages”}) matches “Product” nodes
(mapped to variable p) that have a “PART_OF”

14
The Developer’s Guide: How to Build a Knowledge Graph

MATCH (c:Customer)-[r1:PURCHASED]-
>(o:Order)-[r2:ORDERS]->(p:Product
{productName: “Ipoh Coffee”})
RETURN c.companyName, COUNT(o) AS or-
ders, collect(o.orderID)
ORDER BY orders DESC;

First, the query searches for customers who


purchased orders that contain the “Ipoh Coffee”
product. The next line returns the customer’s Figure 43. Produce graph network
company name, the count of orders impacted, and
The results show us that there are a few products in
the affected order IDs in a list. The last line orders the
the “Produce” category (in purple), and the related
results by the number of orders, sorting from highest
suppliers (in blue) are connected to a single product.
to lowest (`DESC`, for descending order).
This tells us that while we don’t rely on one supplier
The output is as follows: for multiple different products, we also do not have
backup suppliers if the existing (and only) supplier of
a product is impacted in some way.

So far, we’ve focused on how to query data that was


previously loaded. But what if you want to create
new data? You can use Cypher’s MERGE and CREATE
clauses to add new data to the graph.

CREATE and MERGE Clauses

The CREATE clause adds new nodes, relationships,


and properties to the graph. It always creates new
Figure 42. Out-of-stock product impacts data, even if identical data already exists. It’s similar
to the INSERT statement in SQL.
Next, produce might be having weather that creates
a higher or lower average crop. To see how that might The MERGE clause combines the functionality of
affect your suppliers, customers, and inventory, you MATCH and CREATE. It first attempts to find the
could run a query like the following: specified pattern in the graph. If the pattern exists, it
behaves like MATCH and returns the existing data. If
MATCH (cust:Customer)-[r1:PURCHASED]-
the pattern doesn’t exist, it behaves like CREATE and
>(o:Order)-[r2:ORDERS]->(p:Product)-
saves the pattern.
[r3:PART_OF]->(c:Category {category-
Name:”Produce”}), When using these clauses, it’s important to
(p)<-[r4:SUPPLIES]-(s:Supplier) understand that Cypher operates on entire patterns
RETURN *; rather than individual elements. When you MATCH
or MERGE, Cypher looks for the complete pattern
This query finds customers who purchased orders specified. For example, if you MERGE a pattern and
containing products that are part of the “Produce” the nodes exist but the relationship does not,
category, as well as the product suppliers, and Cypher will create the entire pattern new, producing
returns all the data. duplicate nodes.

To prevent data duplication, match and then merge


individual parts of the pattern separately to ensure

15
The Developer’s Guide: How to Build a Knowledge Graph

that only the new elements get created. data sources


• Load unstructured data
Here’s an example where the product category
• Enrich the knowledge graph using
“Grains/Cereals” already exists, but the product and
graph algorithms
relationship are new:
Rather than walk through each of these approaches
MERGE (p:Product {productID: 78, pro-
step by step, this guide will provide you with some
ductName: “Organic Quinoa”})
suggestions for how you can explore the next steps
MERGE (c:Category {categoryID: 9, ca-
on your own.
tegoryName: “Grains/Cereals”})
MERGE (p)-[r:PART_OF]->(c) Expand Your Knowledge Graph With
RETURN *; Unstructured Data
You’ve built a knowledge graph with customer,
product, and order information, with product
categories as an organizing principle. You can
broaden the types of questions the knowledge graph
can answer by widening the scope of data it contains
or by adding more organizing principles. You can
load the data using the graphical data importer you
already learned about or explore other approaches.
At the time of this guide’s publication, AuraDB’s data
importer can also connect directly to PostgreSQL,
Figure 44. MERGE pattern screen MySQL, and SQL Server databases, so you could load
structured data directly from a relational database.
The message at the bottom of the image confirms Some ideas for other types of data and organizing
that the Cypher statement created one new node principles include:
(quinoa product) and one new relationship.
• Adding an organizing principle for the
MERGE clauses for each node and relationship do customers to help you answer questions
a find-or-create operation to ensure that you only about different customer segments. You could
add a new product, category, or “PART_OF” include location, industry, revenue, or other
relationship when each does not already exist. principles depending on the types of questions
You can run this statement multiple times without being asked by the business.
creating duplicates because the merges will find • Loading additional data about your customers
the data that already exists. (such as web clickstream activity), which would
enable you to tie customers’ behavior to their

Next Steps purchases and use the knowledge graph to


offer recommendations.
• Adding supply chain information for the
Now that you have your initial version of a knowledge products in the knowledge graph, which his
graph, what can you do next? Remember that a would enable you to use the knowledge graph
knowledge graph, especially one built on a graph to optimize the supply chain and mitigate the
database with a flexible schema like Neo4j, can risk of disruption.
expand and grow to answer more questions and
serve more business needs. Here are a few ideas for These ideas are, of course, just a starting point. You
expanding the utility of your knowledge graph: can follow the process outlined in this guide to come
up with your own ideas.
• Expand your knowledge graph with additional

16
The Developer’s Guide: How to Build a Knowledge Graph

Load Unstructured Data


One of the things that a knowledge graph built
with the Neo4j graph database can do is combine
structured and unstructured or semi-structured
data in a single knowledge graph. Integrating these
types of data enables you to answer questions that
Figure 45. Providing the Builder with suggested labels and
wouldn’t otherwise be possible. GenAI use cases, in
relationship types screen
particular, can benefit from this capability by using
vector embeddings and similarity searches to apply Now when you load the PDF invoices, the LLM
GraphRAG techniques for building applications that Knowledge Graph Builder will connect the invoice
enable end users to interact with and ask questions of information to the existing products to create a
the knowledge graph using plain language. single integrated knowledge graph built from both
structured and unstructured data.
You can use Neo4j’s LLM Knowledge Graph Builder
to load unstructured data (such as PDFs) into your Enrich Your Knowledge Graph Using
existing knowledge graph. To experiment with this Graph Algorithms
approach, you can use the sample PDF invoices from
the Northwind order purchases to get started. The In addition to making the knowledge graph more
tool uses an LLM to extract nodes and relationships useful with more data or organizing principles, you
from unstructured content and bridge the can also use graph algorithms to uncover more
unstructured data (with vector representations) and insight from your knowledge graph. Algorithms such
structured data in a single knowledge graph. as node similarity (to use similar customers’ behavior
to recommend products) or pathfinding (to optimize
If you want to use the LLM Knowledge Graph Builder supply chains) bring advanced capabilities that
with an existing knowledge graph, you can take a unlock deeper or previously hidden patterns in
couple of steps to help the Builder integrate with your the data.
existing graph. First, the Builder uses a specific label
Entity (with two underscores each at the beginning
and end) and property *id* to merge information into Use Cases and
an existing graph; you can set this label and property
by running the following Cypher: Design Patterns
The schema of a knowledge graph makes it
MATCH (p:Product) SET p.id=p.product- straightforward to represent complex business
Name, p:__Entity__ relationships without extensive preplanning. You can
incorporate additional information without having to
make disruptive changes, just as we explored in the
This query will match all existing nodes with the previous section.
Product label, set the id property to the productName
and add an Entity label. The Builder will use this label A knowledge graph’s context-rich data structure
and property when it finds Product information in also enhances the explainability of insights since
the unstructured data and use the existing Products a knowledge graph stores relationships between
instead of creating new ones. The other step is data and its sources. Most importantly, it produces
to provide the Builder with suggested labels and more accurate and relevant insights than siloed data
relationship types to use. Do this by clicking the systems, as it combines data from multiple systems
Graph Enhancement button after you connect the into a single view.
Builder to your database and provide the suggested
node labels and relationship types as shown in the
image below:

17
The Developer’s Guide: How to Build a Knowledge Graph

To learn more about using a knowledge graph for


supply chain, check out the article series on graph
data science for supply chains.

Entity Resolution
Entity resolution is the process of identifying whether
multiple records are referencing the same real-world
entity. In its simplest form, you can perform entity
resolution with hand-crafted queries to compare
key identifying attributes according to a company’s
business rules. However, this approach takes a lot of
Figure 46. Seven graphs of the enterprise effort to write code and a lot of time to run
the comparisons.
The next section explores a few use cases of
knowledge graphs to illustrate these benefits With a knowledge graph, you can accelerate the
in practice. development and runtime requirements of entity
resolution. Storing data as a knowledge graph has
Supply Chain
several advantages over other approaches:
Effective supply chain management requires
understanding relationships between suppliers, • Shared identifiers or attributes can be easily
distributors, warehouses, transportation logistics, discovered by modeling them as separate
raw materials, products, etc. A knowledge graph nodes in the knowledge graph. Modeling in
is a natural way to model and store this kind of this way makes it clear when two entities share
information because the connections between common information and are a candidate
different pieces of data are numerous, complex, for merging.
and (often) constantly changing. Because of these • Graph algorithms, such as weakly connected
characteristics, a knowledge graph provides a strong components, can segment the knowledge
foundation for supply chain optimization, contingency graph into separated communities, where there
planning, and risk management. are no shared connections between the data.
This helps reduce the number of comparisons
The benefits of using a knowledge graph for supply
needed in entity resolution because nodes in
chain insights include:
separate communities don’t need to
• The impact of a supply chain disruption can be compared.
be easily found by following the relationships • Knowledge graphs speed up transitive
downstream from the disruption. comparisons, which are needed to identify
• Graph algorithms, such as shortest path, can when more than two digital entities represent
help to optimize delivery routes and sourcing the same real-world entity.
strategies for time, cost, or other metrics.
To learn more about using a knowledge graph for
• Graph queries can quickly identify choke points
entity resolution, see “Graph Data Science Use
in a supply chain, which provides an opportunity
Cases: Entity Resolution.”
to find alternative suppliers, transportation
routes, etc. to mitigate the risk at that critical GenAI
point in the network. Despite LLMs’ impressive ability to produce
contextually relevant outputs, they have significant
Supply chains work well as knowledge graphs
weaknesses. They lack access to real-time data,
because they consist of multiple complex stages,
and they can’t incorporate private or proprietary
inputs, outputs, and connection points. Working with
information not included in their training set.
this data as a graph rather than in tables is much
Furthermore, responses are unverified and don’t
more intuitive and allows for better insights.

18
The Developer’s Guide: How to Build a Knowledge Graph

include the source(s) on which the LLM has based


an answer. This can lead to outdated, incomplete, or
incorrect responses in rapidly evolving fields or when
dealing with company-specific knowledge.

Graph-based retrieval-augmented generation


(GraphRAG) addresses these limitations by
integrating knowledge graph data sources with
LLMs. Using a knowledge graph as the data source
produces better results than using a traditional
database. A knowledge graph captures the context
inherent in the data relationships and can provide a
more complete and explainable answer than other
types of data stores.

This approach results in more nuanced and accurate


responses and verifiable source data, which is
especially for important applications that require
complex reasoning or where accuracy is critical,
such as healthcare and finance.

For more information about GraphRAG techniques


and implementation details, refer to “What Is
GraphRAG”.

19
The Developer’s Guide: How to Build a Knowledge Graph

Concluding Thoughts and In this guide, you learned how to create a knowledge
graph from scratch and how to obtain insights from
Further Learning it using the Cypher graph query language. You also
learned about the role the knowledge graph plays in
Because knowledge graphs represent information certain domains, like supply chains, entity resolution,
as an interconnected network of entities and and GenAI.
relationships, they reflect the complex, context- Here are some immediate steps you can take to build
dependent nature of real-world information. upon this foundational knowledge:
Structuring data in this way allows you to model • Use your Neo4j instance to experiment with
reality with remarkable fidelity, capturing nuances different data models and explore complex
across and within domains that siloed data structures queries.
often miss. You can highlight connections and • Complete some of the free self-paced
insights that aren’t possible with traditional data courses on graph concepts and techniques in
structures. A flexible structure also makes it easy GraphAcademy.
to integrate new data from various sources without • Join the Neo4j Community to get support
disrupting existing relationships. and insights from fellow graph developers
and enthusiasts.

Acknowledgements
This guide was developed with contributions from technical subject matter experts who helped ensure accuracy
and clarity of the content. Special thanks to Jennifer Reif, John Stegeman, and Damaso Sanoja for their technical
expertise and contributions to this developer guide.

Get Started with


Neo4j AuraDB
Neo4j uncovers hidden relationships and patterns
across billions of data connections deeply, easily,
and quickly, making graph databases an ideal choice
for building your first knowledge graph.

Build Now

20

You might also like