Introduction
This article is the last one in the series of blogs that I’ve written on GRAPH tables in SQLServer.
The previous two articles were published in Technet Wiki and can be checked from the below link.
Introduction to Graph tables
https://social.technet.microsoft.com/wiki/contents/articles/37993.t-sql-graph-based-tables-in-sql-2017.aspx
Graph table enhancements : Edge constraints
https://social.technet.microsoft.com/wiki/contents/articles/53162.graph-table-enhancements-edge-constraints-system-utility-functions.aspx
In this article, we shall see the latest enhancements that have been introduced in GRAPH tables in SQL 2019 version, i.e SHORTEST_PATH function.
SHORTEST_PATH allows us to traverse the GRAPH table using a given search condition and return the matching nodes based on it. As such this function is always used in conjunction with the MATCH function as an option inside. The search can be done repeatedly in a recursive manner until last level is reached or a predefined number of iterations is done as specified by an argument.
The traversed result shall include multiple nodes within the graph. Hence most often it would require the use of an aggregate function like STRING_AGG to present the result as a single column inside our SELECT query.
Illustration
For illustrating the functionality, lets consider the below sample data
-- Create a graph demo database
IF NOT EXISTS (SELECT * FROM sys.databases WHERE NAME = 'graphdemo')
CREATE DATABASE GraphDemo;
GO
USE GraphDemo;
GO
-- Create NODE tables
CREATE TABLE Person (
ID INTEGER PRIMARY KEY,
name VARCHAR(100)
) AS NODE;
CREATE TABLE Restaurant (
ID INTEGER NOT NULL,
name VARCHAR(100),
city VARCHAR(100)
) AS NODE;
CREATE TABLE City (
ID INTEGER PRIMARY KEY,
name VARCHAR(100),
stateName VARCHAR(100)
) AS NODE;
-- Create EDGE tables.
CREATE TABLE likes (rating INTEGER) AS EDGE;
CREATE TABLE friendOf AS EDGE;
CREATE TABLE livesIn AS EDGE;
CREATE TABLE locatedIn AS EDGE;
CREATE TABLE siblingOf AS EDGE;
-- Insert data into node tables. Inserting into a node table is same as inserting into a regular table
INSERT INTO Person (Id, name)
VALUES (1, 'John')
, (2, 'Mary')
, (3, 'Alice')
, (4, 'Jacob')
, (5, 'Julie');
INSERT INTO Person (Id, name)
VALUES (6,'Patrick'),
(7,'Janet'),
(8,'Divya'),
(9,'Marc'),
(10,'Peter')
INSERT INTO Person (Id, name)
VALUES (11,'Henry')
INSERT INTO Person (Id, name)
VALUES (12,'Micheal'),
(13,'Leone')
INSERT INTO Restaurant (Id, name, city)
VALUES (1, 'Taco Dell','Bellevue')
, (2, 'Ginger and Spice','Seattle')
, (3, 'Noodle Land', 'Redmond');
INSERT INTO City (Id, name, stateName)
VALUES (1,'Bellevue','wa')
, (2,'Seattle','wa')
, (3,'Redmond','wa');
-- Insert into edge table. While inserting into an edge table,
-- you need to provide the $node_id from $from_id and $to_id columns.
/* Insert which restaurants each person likes */
INSERT INTO likes
VALUES ((SELECT $node_id FROM Person WHERE ID = 1), (SELECT $node_id FROM Restaurant WHERE ID = 1), 9)
, ((SELECT $node_id FROM Person WHERE ID = 2), (SELECT $node_id FROM Restaurant WHERE ID = 2), 9)
, ((SELECT $node_id FROM Person WHERE ID = 3), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9)
, ((SELECT $node_id FROM Person WHERE ID = 4), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9)
, ((SELECT $node_id FROM Person WHERE ID = 5), (SELECT $node_id FROM Restaurant WHERE ID = 3), 9);
/* Associate in which city live each person*/
INSERT INTO livesIn
VALUES ((SELECT $node_id FROM Person WHERE ID = 1), (SELECT $node_id FROM City WHERE ID = 1))
, ((SELECT $node_id FROM Person WHERE ID = 2), (SELECT $node_id FROM City WHERE ID = 2))
, ((SELECT $node_id FROM Person WHERE ID = 3), (SELECT $node_id FROM City WHERE ID = 3))
, ((SELECT $node_id FROM Person WHERE ID = 4), (SELECT $node_id FROM City WHERE ID = 3))
, ((SELECT $node_id FROM Person WHERE ID = 5), (SELECT $node_id FROM City WHERE ID = 1));
/* Insert data where the restaurants are located */
INSERT INTO locatedIn
VALUES ((SELECT $node_id FROM Restaurant WHERE ID = 1), (SELECT $node_id FROM City WHERE ID =1))
, ((SELECT $node_id FROM Restaurant WHERE ID = 2), (SELECT $node_id FROM City WHERE ID =2))
, ((SELECT $node_id FROM Restaurant WHERE ID = 3), (SELECT $node_id FROM City WHERE ID =3));
/* Insert data into the friendOf edge */
INSERT INTO friendOf
VALUES ((SELECT $NODE_ID FROM Person WHERE ID = 1), (SELECT $NODE_ID FROM Person WHERE ID = 2))
, ((SELECT $NODE_ID FROM Person WHERE ID = 2), (SELECT $NODE_ID FROM Person WHERE ID = 3))
, ((SELECT $NODE_ID FROM Person WHERE ID = 3), (SELECT $NODE_ID FROM Person WHERE ID = 1))
, ((SELECT $NODE_ID FROM Person WHERE ID = 4), (SELECT $NODE_ID FROM Person WHERE ID = 2))
, ((SELECT $NODE_ID FROM Person WHERE ID = 5), (SELECT $NODE_ID FROM Person WHERE ID = 4));
, ((SELECT $NODE_ID FROM Person WHERE ID = 6), (SELECT $NODE_ID FROM Person WHERE ID = 3))
, ((SELECT $NODE_ID FROM Person WHERE ID = 7), (SELECT $NODE_ID FROM Person WHERE ID = 6))
, ((SELECT $NODE_ID FROM Person WHERE ID = 9), (SELECT $NODE_ID FROM Person WHERE ID = 7))
, ((SELECT $NODE_ID FROM Person WHERE ID = 10), (SELECT $NODE_ID FROM Person WHERE ID = 9))
, ((SELECT $NODE_ID FROM Person WHERE ID = 8), (SELECT $NODE_ID FROM Person WHERE ID = 5))
, ((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 1))
, ((SELECT $NODE_ID FROM Person WHERE ID = 10), (SELECT $NODE_ID FROM Person WHERE ID = 11))
, ((SELECT $NODE_ID FROM Person WHERE ID = 7), (SELECT $NODE_ID FROM Person WHERE ID = 11))
, ((SELECT $NODE_ID FROM Person WHERE ID = 3), (SELECT $NODE_ID FROM Person WHERE ID = 13))
, ((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 12))
, ((SELECT $NODE_ID FROM Person WHERE ID = 12), (SELECT $NODE_ID FROM Person WHERE ID = 13))
INSERT INTO siblingOf
VALUES
((SELECT $NODE_ID FROM Person WHERE ID = 11), (SELECT $NODE_ID FROM Person WHERE ID = 12)),
((SELECT $NODE_ID FROM Person WHERE ID = 12), (SELECT $NODE_ID FROM Person WHERE ID = 13))
The above data can be visualized into a graph like below
For simplicity only the major nodes inside the graph are included. We shall see how SHORTEST_PATH function can be applied in the above case for traversing through the friends as well as sibling nodes from a given node or by means of a search condition.
Simple Scenario – MATCH function
First lets try a simple scenario of getting every direct friend of John. Looking at the graph, we can see that for this requirement we need to traverse just a single level of node using the relationship friendof. Since this doesnt require any repeated or recursive logic, this can be directly achieved using a MATCH function as shown below
SELECT p1.name AS Person,p2.name AS friend
FROM Person AS p1, friendOf ,Person AS p2
WHERE MATCH(p1-(friendOf)->p2)
AND p1.name = 'John'
The result of the code will include the direct friends of John which is Mary as per the graph
Repetitive Traversal – SHORTEST_PATH function
Now lets go one step further and gets next level friends (friends of friends) for John. In this case we need to repeat the graph traversal to one more level.
This is made possible by the SHORTEST_PATH function which accepts an arbitrary length pattern as parameter and traverses the path based on it. Since we need only up to second level we can set to repeat the pattern up to 2 times. Also since the path includes multiple nodes we would require using string concatenation function like STRING_AGG to return the nodes as a delimited list. Also the edge table and node table which has to be traversed repeatedly should be included with FOR PATH clause to enable the recursive processing.
The modified query will look like below
SELECT p1.name AS Person,
STRING_AGG(p2.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendPath,
LAST_VALUE(p2.name) WITHIN GROUP (GRAPH PATH) AS Friend
FROM Person AS p1, friendOf FOR PATH,Person FOR PATH AS p2
WHERE MATCH(SHORTEST_PATH(p1(-(friendOf)->p2){1,2}))
AND p1.name = 'John'
The result of the above query will be as below
Person FriendPath Friend
--------------------------
John Mary Mary
John Mary->Alice Alice
The results shows two paths one direct friend of John i.e Mary and next level will have Mary’s friends which is Alice. The FriendsPath gives entire path of nodes traversed and LAST_VALUE function is used to return the last friend node at the current traversed level.
Extending this to one more level, the query looks like this. An additional column based on COUNT function is also included to count the level being traversed each time
SELECT p1.name AS Person,
STRING_AGG(p2.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendPath,
LAST_VALUE(p2.name) WITHIN GROUP (GRAPH PATH) AS Friend,
COUNT(p2.Name) WITHIN GROUP(GRAPH PATH) AS TraversedLevel
FROM Person AS p1, friendOf FOR PATH,Person FOR PATH AS p2
WHERE MATCH(SHORTEST_PATH(p1(-(friendOf)->p2){1,3}))
AND p1.name = 'John'
And the result looks like the below
Person FriendPath Friend TraversedLevel
-------------------------------------------
John Mary Mary 1
John Mary->Alice Alice 2
John Mary->Alice->John John 3
John Mary->Alice->Leone Leone 3
Till now we were considering scenarios involving a single relationship. Now lets consider a case where two relationships are involved.
The problem statement is as follows
Get the details of all people along with restaurants they liked and location of the restaurant, for those who are friends of the people who liked a restaurant located in Seattle.
For the above problem statement we would require three sets of relationships based on FriendOf, liked and LocatedIn
Since this doesnt require any recursive or repetitive logic, we dont need to use SHORTEST_PATH function in this case
The corresponding query can be given as below
SELECT p1.name AS LikedPerson,r2.name AS LikedRestaurant,c2.name as LocatedIn,p2.name as FriendOf,r1.name AS LikedRestaurant_Seattle,c1.name AS LocatedIn_Seattle
FROM Person AS p1, likes as l1, Restaurant AS r1,locatedIn AS ln1,City AS c1,friendOf,Person AS p2,Restaurant AS r2,likes as l2,City AS c2,locatedIn AS ln2
WHERE MATCH (c2<-(ln2)-r2<-(l2)-p1-(friendOf)->p2-(l1)->r1-(ln1)->c1)
AND c1.name = 'Seattle'
The result looks as below
LikedPerson LikedRestaurant LocatedIn FriendOf LikedRestaurant_Seattle LocatedIn_Seattle
--------------------------------------------------------------------------------------------------
John Taco Dell Bellevue Mary Ginger and Spice Seattle
Jacob Noodle Land Redmond Mary Ginger and Spice Seattle
The results shows that there are two people John and Jacob who are friends of Mary who liked the restaurant Ginger and Spice which is located in Seattle. The results also returns the restaurants liked by John and Jacob and gives their location as well.
Multiple Repetitive Traversal – SHORTEST_PATH with LAST_NODE
Now lets try a recursive (repeating) relationships example using SHORTEST_PATH function
Consider the scenario where we need to get details of all persons who are friends with person who is a sibling of Leone
The persons can be at any level in the friends network with the person who is in the sibling network of Leone. As such, this would require usage of SHORTEST_PATH function on both the relationships for multi level traversal.
Also for the above scenario we would require first traversing friends of person till the last level and then associate the last level node to other nodes using sibling relationship until we reach Leone. So its like one relationship recursively traversed till it reaches the end level and from the last node, start the traversal of the second relationship until we reach the intended node Leone. This is made possible by the use of LAST_NODE within the SHORTEST_PATH function call based on sibling relationship.
When SHORTEST_PATH sees the LAST_NODE, it takes the latest node of the last relationship traversed and starts from there by applying the current specified relationship (siblingOf in this case)
Based on the above approach, the query can be written as per below
SELECT *
FROM
(
SELECT
Person1.name AS PersonName,
STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
STRING_AGG(Person3.name, '->') WITHIN GROUP (GRAPH PATH) AS Siblings,
LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS Sibling,
LAST_VALUE(Person3.name) WITHIN GROUP (GRAPH PATH) AS FinalPerson
FROM
Person AS Person1,
friendOf FOR PATH AS fo,
Person FOR PATH AS Person2,
Person FOR PATH AS Person3,
SiblingOf FOR PATH AS so
WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+) AND SHORTEST_PATH(LAST_NODE(Person2)(-(so)->Person3)+))
)t
WHERE FinalPerson = 'Leone'
The query will give you a result as below
PersonName Friends Siblings Sibling FinalPerson
---------------------------------------------------------
Janet Henry Micheal->Leone Henry Leone
Peter Henry Micheal->Leone Henry Leone
Marc Janet->Henry Micheal->Leone Henry Leone
So there are three persons returned in the resultset. Janet who is a friend of Henry will be returned as Henry is a sibling to Leone through relationship traversed through Micheal. Similarly Peter will also be returned who has a friendOf relationship with Henry. Another addition to the result would be Marc who has friendOf relationship with Henry through Janet.
Thus Janet, Peter and Marc are the persons who are friends with a person (Henry) who is a sibling of Leone.
Mixing Repetitive and Non-repetitive Relationships
Before concluding our findings, lets see one more example where we shall traverse the graph in opposite direction and use a non-repetitive relationship in addition to the repetitive ones
The scenario requires finding all siblings who has in their friends network a friend who has liked a restaurant which is situated at Bellevue
The above case involves traversing two repetitive relationships, sibilingOf and friendOf and also determining the liked restaurant based on the simple likes relationship from the LAST_NODE of the friends collection. Hence this shall be regarded as an example for hybrid traversal which includes recursive and simple paths interelated
Accordingly, the query can be written as below
SELECT *
FROM
(
SELECT
Person1.name AS PersonName,
STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Siblings,
STRING_AGG(Person3.name, '->') WITHIN GROUP (GRAPH PATH) AS FriendsOfSiblings,
LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS Sibling,
LAST_VALUE(Person3.name) WITHIN GROUP (GRAPH PATH) AS FinalFriend,
r.Name as LikedRestaurant,
r.city as RestaurantCity
FROM
Person AS Person1,
friendOf FOR PATH AS fo,
Person FOR PATH AS Person2,
Person FOR PATH AS Person3,
SiblingOf FOR PATH AS so,
likes AS l,
Restaurant AS r
WHERE MATCH(SHORTEST_PATH(Person1(-(so)->Person2)+) AND SHORTEST_PATH(LAST_NODE(Person2)(<-(fo)-Person3)+) AND LAST_NODE(Person3)-(l)->r)
) AS r
WHERE RestaurantCity = 'Bellevue'
If you check the above query, it includes two recursive traversals of the graph using siblingOf and friendOf relationships and includes a simple traversal at the end based on likes relationship.
The result would be obtained as per the below
PersonName Siblings FriendsOfSiblings Sibling FinalFriend LikedRestaurant RestaurantCity
--------------------------------------------------------------------------------------------------
Micheal Leone Alice->Mary->John Leone John Taco Dell Bellevue
Henry Micheal->Leone Alice->Mary->John Leone John Taco Dell Bellevue
On analyzing the result we can see that there are two persons Micheal and Henry who has friends in the network of their siblings who has liked the restaurant in Bellevue city. In both of the cases, their sibling Leone has friend John who has liked Taco Dell restaurant in Bellevue which is what you will getting in your result.
Conclusion
As seen from the above illustrations, SHORTEST_PATH is a very useful addition to SQL Server GRAPH table functions which helps in repetitive or recursive traversal of the graph based on a defined relationship.