For managing graph DB, Cyher language is used. Alike other languages, it offers set of keywords and constructs. This document helps to understand them with simple examples and complex example all of them take from real life queries.
Ref: https://neo4j.com/developer/kb/understanding-how-merge-works/
It is used for retiring matched result. Below are examples
match (t:Tweet) return COUNT(t)
This query will return count of Tweet entries in the DB. Here Tweet is the label
match (t:User) return t
This query will return count of User entries in the DB. Here User is the label
match (u:User {screen_name:"ishafoundation"}) return u
This query will return User "ishafoundation". Please refer below screenshot
match (u:User {screen_name:'Manu'})-[:POSTS]->(t:Tweet) return max(t.id) AS max_id
Matches user with give sreen_name(say Manu) with link to Tweet label with property POSTS
In other words, it will give max (tweet ID) of among list of tweets posted by user with name Manu
match (u:User {screen_name:'dpkmr'})-[:POSTS]->(t:Tweet) return t.id
it will give all tweet IDs of tweets posted by user with name dpkmr (Ref: below screenshot)
match (u:User {screen_name:'dpkmr'})<-[m:MENTIONS]-(t:Tweet) WHERE m.method="mention_search" return t AS id
It Matches tweets where user is mentioned (Ref: below screenshot)
match (t:Tweet) where t.text CONTAINS 'शुभं करोति' return max(t.id) AS max_id
It matches all tweets which contains 'शुभं करोति' text
match(s:User {screen_name:"abc"})-[r:DM]->(u:User)
return r
It returns tweets with specific relationship as mentioned.
match(u:User)-[:FOLLOWS]->(s:User {name:"abc"}) return u
Match all users who follows user abc
It is used to create constraint. Below are examples
CREATE CONSTRAINT ON (t:Tweet) ASSERT t.id IS UNIQUE;
This create constraint on label Tweet such that any other node with same label must not have same id. So, if user tries to create another node with same id, then this write operation will fail.
CREATE CONSTRAINT ON (u:User) ASSERT u.screen_name IS UNIQUE;
This create constraint on label User such that any other node with same label must not have same screen_name So, if user tries to create another node with same screen_name, then this write operation will fail.
Use CALL db.constraints
:schema
With UNWIND, any list can be transformed back into individual rows.
MERGE Match a pattern or create it if it does not exist
SET update or create a property
Below are examples
Example of UNWIND, MERGE, SET
UNWIND $users AS u
WITH u
MERGE (user:User {screen_name:u.screen_name})
SET user.name = u.name,
user.location = u.location,
user.followers = u.followers_count,
user.following = u.friends_count,
user.statuses = u.statusus_count,
user.url = u.url,
user.profile_image_url = u.profile_image_url
MERGE (mainUser:User {screen_name:$screen_name})
MERGE (mainUser)-[:FOLLOWS]->(user)
In above example,
UNWIND will convert list for user (Say User1) to row.
First MERGE will create or update a node with label User and property screen_name (say Manu)
second MERGE will create or update a node with label User with screen_name (say deepak)
third MERGE will link deepak node to Manu with property Follow
In summary, it will create a follow relationship where deepak is following Manu
Text Box
UNWIND $users AS u
WITH u
MERGE (user:User {screen_name:u.screen_name})
SET user.name = u.name,
user.location = u.location,
user.followers = u.followers_count,
user.following = u.friends_count,
user.statuses = u.statusus_count,
user.url = u.url,
user.profile_image_url = u.profile_image_url
MERGE (mainUser:User {screen_name:$screen_name})
MERGE (user)-[:FOLLOWS]->(mainUser)
In the above example it will create a follow relationship where Manu is following Deepak (reverse of above example)
ORDER BY sorts the result
FOREACH loops
ON CREATE is used for conditional update(for example, post creation of label )
Split splits a string into a list of strings.
REPLACE replaces all occurrences of search with replacement
Real life example for ORDER BY, FOREACH and ON CREATE
UNWIND {tweets} AS t
WITH t
ORDER BY t.id. --> Sort by tweet ID
WITH t,
t.entities AS e, --> Alias
t.user AS u,
t.retweeted_status AS retweet
MERGE (tweet:Tweet {id:t.id}) -> Creates/Updates node for each tweet
SET tweet.id_str = t.id_str,
tweet.text = t.text,
tweet.created_at = t.created_at,
tweet.favorites = t.favorite_count
MERGE (user:User {screen_name:u.screen_name}) -> Creates/Updates node for each user
SET user.name = u.name,
user.location = u.location,
user.followers = u.followers_count,
user.following = u.friends_count,
user.statuses = u.statusus_count,
user.profile_image_url = u.profile_image_url
MERGE (user)-[:POSTS]->(tweet) -> Relationship 'User has posted tweeted'
MERGE (source:Source {name:REPLACE(SPLIT(t.source, ">")[1], "</a", "")}) -> Create/update source node
MERGE (tweet)-[:USING]->(source) -> Relation 'Tweet is using source'
FOREACH (h IN e.hashtags |
MERGE (tag:Hashtag {name:LOWER(h.text)})
MERGE (tag)<-[:TAGS]-(tweet)
)-> For each hashtag, create/update tag node and link tweet in this hashtag
FOREACH (u IN e.urls |
MERGE (url:Link {url:u.expanded_url})
MERGE (tweet)-[:CONTAINS]->(url)
)-> For each url, create/update url node and link url in the tweet
FOREACH (m IN e.user_mentions |
MERGE (mentioned:User {screen_name:m.screen_name})
ON CREATE SET mentioned.name = m.name
MERGE (tweet)-[:MENTIONS]->(mentioned)
)-> For each user_mention, create/update mentioned User node and link mentioned user in the tweet
FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |
MERGE (reply_tweet:Tweet {id:r})
MERGE (tweet)-[:REPLY_TO]->(reply_tweet)
)-> For each reply_tweet, create/update reply_tweet node and link reply_tweet in the tweet
FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |
MERGE (retweet_tweet:Tweet {id:retweet_id})
MERGE (tweet)-[:RETWEETS]->(retweet_tweet)
)-> For each retweet_id, create/update retweet node and link retweet in the tweet
In summary, above query does following
For each tweet in sorted order, it links
user who posted the tweet
source
URL
user_mentions
reply_tweet
Retweet
For each hashtag, it links
Tweet
ORDER by example
UNWIND {tweets} AS t
WITH t
ORDER BY t.id
WITH t,
t.entities AS e,
t.user AS u,
t.retweeted_status AS retweet
MERGE (tweet:Tweet {id:t.id}) -> Create or update Tweet node
SET tweet.id_str = t.id_str,
tweet.text = t.text,
tweet.created_at = t.created_at,
tweet.favorites = t.favorite_count
MERGE (user:User {screen_name:u.screen_name})-> Create or update User node
SET user.name = u.name,
user.location = u.location,
user.followers = u.followers_count,
user.following = u.friends_count,
user.statuses = u.statusus_count,
user.profile_image_url = u.profile_image_url
MERGE (user)-[:POSTS]->(tweet) -> user posted tweet
MERGE (source:Source {name:t.source})
MERGE (tweet)-[:USING]->(source) -> Tweet is using source
FOREACH (h IN e.hashtags |
MERGE (tag:Hashtag {name:LOWER(h.text)})
MERGE (tag)<-[:TAGS]-(tweet)
) -> Link hashtag used in the tweet
FOREACH (u IN e.urls |
MERGE (url:Link {url:u.expanded_url})
MERGE (tweet)-[:CONTAINS]->(url)
) -> Link URL in tweet
FOREACH (m IN e.user_mentions |
MERGE (mentioned:User {screen_name:m.screen_name})
ON CREATE SET mentioned.name = m.name
MERGE (tweet)-[mts:MENTIONS]->(mentioned)
SET mts.method = 'mention_search'
) -> Link User mentions in tweet
FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |
MERGE (reply_tweet:Tweet {id:r})
MERGE (tweet)-[:REPLY_TO]->(reply_tweet)
) -> Link reply tweet
FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |
MERGE (retweet_tweet:Tweet {id:retweet_id})
MERGE (tweet)-[:RETWEETS]->(retweet_tweet)
) -> Link retweet
ORDER BY, ON CREATE
UNWIND $tweets AS t
WITH t
ORDER BY t.id
WITH t,
t.entities AS e,
t.user AS u,
t.retweeted_status AS retweet
MERGE (tweet:Tweet {id:t.id})
SET tweet.id_str = t.id_str,
tweet.text = t.text,
tweet.created_at = t.created_at,
tweet.favorites = t.favorite_count
MERGE (user:User {screen_name:u.screen_name})
SET user.name = u.name,
user.location = u.location,
user.followers = u.followers_count,
user.following = u.friends_count,
user.statuses = u.statusus_count,
user.profile_image_url = u.profile_image_url
MERGE (user)-[:POSTS]->(tweet)
MERGE (source:Source {name:t.source})
MERGE (tweet)-[:USING]->(source)
FOREACH (h IN e.hashtags |
MERGE (tag:Hashtag {name:LOWER(h.text)})
MERGE (tag)<-[:TAGS]-(tweet)
)
FOREACH (u IN e.urls |
MERGE (url:Link {url:u.expanded_url})
MERGE (tweet)-[:CONTAINS]->(url)
)
FOREACH (m IN e.user_mentions |
MERGE (mentioned:User {screen_name:m.screen_name})
ON CREATE SET mentioned.name = m.name
MERGE (tweet)-[:MENTIONS]->(mentioned)
)
FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |
MERGE (reply_tweet:Tweet {id:r})
MERGE (tweet)-[:REPLY_TO]->(reply_tweet)
)
FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |
MERGE (retweet_tweet:Tweet {id:retweet_id})
MERGE (tweet)-[:RETWEETS]->(retweet_tweet)
)
COUNT counts The number of matching rows.
LIMIT limits the number of results.
<- Relationship of type KNOWS from n to m in this ((m)<-[:KNOWS]-(n))
Example of COUNT and LIMIT
MATCH (h:Hashtag)<-[:TAGS]-(t:Tweet)<-[:POSTS]-(u:User {screen_name:'dpkmr'}) WITH h, COUNT(h) AS Hashtags ORDER BY Hashtags DESC LIMIT 5 RETURN h.name AS tag_name, Hashtags
Returns (hashtags, count of times used) used by dpkmr user in his tweet with following data. Result is limited to maximum 5 (Ref: below screenshot)
DETACH DELETE Delete a node and all relationships connected to it.
It returns all distinct labels
MATCH (n { name: 'Andy' })-[r:KNOWS]->()DELETE r
Above query deletes the relationship
Ref: https://neo4j.com/docs/cypher-manual/current/clauses/delete/
MATCH(bucket:DMCheckBucket)WHERE NOT (bucket)-[:DMCHECKCLIENT]->()
WITH bucket LIMIT 1
MATCH(client:DMCheckClient {id:'1278883139272626183'})
MERGE(bucket)-[:DMCHECKCLIENT]->(client)
return bucket.id
Above command find buckets which are not in relationship DMCHECKCLIENT
Ref: https://stackoverflow.com/questions/10952332/return-node-if-relationship-is-not-present
It removes a node property
MATCH(bucket:DMCheckBucket) WHERE NOT (bucket)-[:DMCHECKCLIENT]->() REMOVE bucket.dead_datetime
Above example removes property with name dead_datetime
Ref: https://community.neo4j.com/t/rename-property-name/5873
It checks existence of a property
MATCH(b:DMCheckBucket) where exists(b.dead_datetime) return b
Above example returns node if property namely dead_datetime exists
Ref: https://stackoverflow.com/questions/33676844/neo4jclient-how-to-check-if-property-exists
DISTINCT -> Returns unique
LABELS -> label of a node
MATCH (n) RETURN distinct labels(n)
WITH-> It is used to apply filter on match result
MATCH(bucket:DMCheckBucket)WHERE NOT (bucket)-[:DMCHECKCLIENT]->()
WITH bucket LIMIT 1
MATCH(client:DMCheckClient {id:'1278883139272626183'})
MERGE(bucket)-[:DMCHECKCLIENT]->(client)
return bucket.id
Above command find buckets which are not in relationship DMCHECKCLIENT and LIMIT result to 1 and then add relationship to this one bucket
Ref:https://neo4j.com/docs/cypher-manual/current/clauses/with/
We are experimenting API based info for below real tweet
Neo4J Query
match(t:Tweet {id_str : "1244485563698348034"}) return t
Neo4J output
Textual
Graphical output
Graphical output provides associated nodes info as as well. In below example, highlighted node is the tweet under consideration.
Below example shows the DM users wrt to a given tweet
Below example shows the DM users wrt to a given user
Below example filters, for a given tweet, retweets users who are DM capable
From 3.x onwards, there is no upper limit on number of nodes and relationships (Refer: https://neo4j.com/blog/neo4j-3-0-massive-scale-developer-productivity/)
Earlier version had limit (Ref: https://dba.stackexchange.com/questions/186968/neo4j-community-edition-db-size-limit)
In the neo4j console, fire 'du -hc data/databases/' (Ref: https://neo4j.com/developer/kb/understanding-database-growth/)
Refer https://neo4j.com/developer/kb/cypher-to-determine-version-and-edition-of-neo4j/
Refer https://py2neo.org/v4/database.html#py2neo.database.Graph
Create new relationship type and then delete old one
Ref: https://community.neo4j.com/t/change-relationships-name/6473
Use +=
Ref: https://neo4j.com/developer/kb/understanding-how-merge-works/
Use rand()
Ref: https://stackoverflow.com/questions/12510696/neo4j-is-there-a-way-how-to-select-random-nodes
match(c:ClientForService {id:"1063283370"})-[r]->(b) RETURN r,c,b
Ref: https://stackoverflow.com/questions/38423683/get-all-relationships-for-a-node-with-cypher
https://neo4j.com/docs/cypher-refcard/current/
https://github.com/neo4j-contrib/twitter-neo4j/blob/master/docker/import_user.py
https://github.com/krdpk17/twitter-neo4j/blob/master/docker/import_user.py
https://py2neo.org/2.0/intro.html
https://stackoverflow.com/questions/28144751/whats-the-cypher-script-to-delete-a-node-by-id
https://stackoverflow.com/questions/32742751/what-is-the-difference-between-multiple-match-clauses-and-a-comma-in-a-cypher-qu
https://stackoverflow.com/questions/24094882/how-can-i-make-a-string-contain-filter-on-neo4j-cypher
https://neo4j.com/docs/cypher-manual/current/clauses/delete/
https://py2neo.org/v4/database.html#py2neo.database.Graph