Understanding Cypher query using Neo4J and Twitter

Introduction

For managing graph DB, Cyher language is used. Alike other languages, it offers set of keywords and constructs. This document helps to understand them with simple examples and complex example all of them take from real life queries.

Database schema visualisation

Patterns best practices

Ref: https://neo4j.com/developer/kb/understanding-how-merge-works/

Match and Return

It is used for retiring matched result. Below are examples

match (t:Tweet) return COUNT(t)

This query will return count of Tweet entries in the DB. Here Tweet is the label

match (t:User) return t

This query will return count of User entries in the DB. Here User is the label

match (u:User {screen_name:"ishafoundation"}) return u

This query will return User "ishafoundation". Please refer below screenshot

match (u:User {screen_name:'Manu'})-[:POSTS]->(t:Tweet) return max(t.id) AS max_id

Matches user with give sreen_name(say Manu) with link to Tweet label with property POSTS

In other words, it will give max (tweet ID) of among list of tweets posted by user with name Manu

match (u:User {screen_name:'dpkmr'})-[:POSTS]->(t:Tweet) return t.id

it will give all tweet IDs of tweets posted by user with name dpkmr (Ref: below screenshot)

match (u:User {screen_name:'dpkmr'})<-[m:MENTIONS]-(t:Tweet) WHERE m.method="mention_search" return t AS id

It Matches tweets where user is mentioned (Ref: below screenshot)

match (t:Tweet) where t.text CONTAINS 'शुभं करोति' return max(t.id) AS max_id

It matches all tweets which contains 'शुभं करोति' text

match(s:User {screen_name:"abc"})-[r:DM]->(u:User)

return r

It returns tweets with specific relationship as mentioned.

match(u:User)-[:FOLLOWS]->(s:User {name:"abc"}) return u

Match all users who follows user abc

CONSTRAINT

CREATE CONSTRAINT

It is used to create constraint. Below are examples

CREATE CONSTRAINT ON (t:Tweet) ASSERT t.id IS UNIQUE;

This create constraint on label Tweet such that any other node with same label must not have same id. So, if user tries to create another node with same id, then this write operation will fail.

CREATE CONSTRAINT ON (u:User) ASSERT u.screen_name IS UNIQUE;

This create constraint on label User such that any other node with same label must not have same screen_name So, if user tries to create another node with same screen_name, then this write operation will fail.

VIEW CONSTRAINT

Use CALL db.constraints

:schema

UNWIND, MERGE, SET

With UNWIND, any list can be transformed back into individual rows.
MERGE Match a pattern or create it if it does not exist
SET update or create a property

Below are examples

Example of UNWIND, MERGE, SET

UNWIND $users AS u

WITH u

MERGE (user:User {screen_name:u.screen_name})

SET user.name = u.name,

user.location = u.location,

user.followers = u.followers_count,

user.following = u.friends_count,

user.statuses = u.statusus_count,

user.url = u.url,

user.profile_image_url = u.profile_image_url

MERGE (mainUser:User {screen_name:$screen_name})

MERGE (mainUser)-[:FOLLOWS]->(user)

In above example,

UNWIND will convert list for user (Say User1) to row.
First MERGE will create or update a node with label User and property screen_name (say Manu)
second MERGE will create or update a node with label User with screen_name (say deepak)
third MERGE will link deepak node to Manu with property Follow

In summary, it will create a follow relationship where deepak is following Manu

Text Box

UNWIND $users AS u

WITH u

MERGE (user:User {screen_name:u.screen_name})

SET user.name = u.name,

user.location = u.location,

user.followers = u.followers_count,

user.following = u.friends_count,

user.statuses = u.statusus_count,

user.url = u.url,

user.profile_image_url = u.profile_image_url

MERGE (mainUser:User {screen_name:$screen_name})

MERGE (user)-[:FOLLOWS]->(mainUser)

In the above example it will create a follow relationship where Manu is following Deepak (reverse of above example)

ORDER BY, FOREACH, ON CREATE, REPLACE, SPLIT

- ORDER BY sorts the result
- FOREACH loops
- ON CREATE is used for conditional update(for example, post creation of label )

Split splits a string into a list of strings.
- REPLACE replaces all occurrences of search with replacement

Real life example for ORDER BY, FOREACH and ON CREATE

UNWIND {tweets} AS t

WITH t

ORDER BY t.id. --> Sort by tweet ID

WITH t,

t.entities AS e, --> Alias

t.user AS u,

t.retweeted_status AS retweet

MERGE (tweet:Tweet {id:t.id}) -> Creates/Updates node for each tweet

SET tweet.id_str = t.id_str,

tweet.text = t.text,

tweet.created_at = t.created_at,

tweet.favorites = t.favorite_count

MERGE (user:User {screen_name:u.screen_name}) -> Creates/Updates node for each user

SET user.name = u.name,

user.location = u.location,

user.followers = u.followers_count,

user.following = u.friends_count,

user.statuses = u.statusus_count,

user.profile_image_url = u.profile_image_url

MERGE (user)-[:POSTS]->(tweet) -> Relationship 'User has posted tweeted'

MERGE (source:Source {name:REPLACE(SPLIT(t.source, ">")[1], "</a", "")}) -> Create/update source node

MERGE (tweet)-[:USING]->(source) -> Relation 'Tweet is using source'

FOREACH (h IN e.hashtags |

MERGE (tag:Hashtag {name:LOWER(h.text)})

MERGE (tag)<-[:TAGS]-(tweet)

)-> For each hashtag, create/update tag node and link tweet in this hashtag

FOREACH (u IN e.urls |

MERGE (url:Link {url:u.expanded_url})

MERGE (tweet)-[:CONTAINS]->(url)

)-> For each url, create/update url node and link url in the tweet

FOREACH (m IN e.user_mentions |

MERGE (mentioned:User {screen_name:m.screen_name})

ON CREATE SET mentioned.name = m.name

MERGE (tweet)-[:MENTIONS]->(mentioned)

)-> For each user_mention, create/update mentioned User node and link mentioned user in the tweet

FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |

MERGE (reply_tweet:Tweet {id:r})

MERGE (tweet)-[:REPLY_TO]->(reply_tweet)

)-> For each reply_tweet, create/update reply_tweet node and link reply_tweet in the tweet

FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |

MERGE (retweet_tweet:Tweet {id:retweet_id})

MERGE (tweet)-[:RETWEETS]->(retweet_tweet)

)-> For each retweet_id, create/update retweet node and link retweet in the tweet

In summary, above query does following

For each tweet in sorted order, it links

user who posted the tweet
source
URL
user_mentions
reply_tweet
Retweet

For each hashtag, it links

ORDER by example

UNWIND {tweets} AS t

WITH t

ORDER BY t.id

WITH t,

t.entities AS e,

t.user AS u,

t.retweeted_status AS retweet

MERGE (tweet:Tweet {id:t.id}) -> Create or update Tweet node

SET tweet.id_str = t.id_str,

tweet.text = t.text,

tweet.created_at = t.created_at,

tweet.favorites = t.favorite_count

MERGE (user:User {screen_name:u.screen_name})-> Create or update User node

SET user.name = u.name,

user.location = u.location,

user.followers = u.followers_count,

user.following = u.friends_count,

user.statuses = u.statusus_count,

user.profile_image_url = u.profile_image_url

MERGE (user)-[:POSTS]->(tweet) -> user posted tweet

MERGE (source:Source {name:t.source})

MERGE (tweet)-[:USING]->(source) -> Tweet is using source

FOREACH (h IN e.hashtags |

MERGE (tag:Hashtag {name:LOWER(h.text)})

MERGE (tag)<-[:TAGS]-(tweet)

) -> Link hashtag used in the tweet

FOREACH (u IN e.urls |

MERGE (url:Link {url:u.expanded_url})

MERGE (tweet)-[:CONTAINS]->(url)

) -> Link URL in tweet

FOREACH (m IN e.user_mentions |

MERGE (mentioned:User {screen_name:m.screen_name})

ON CREATE SET mentioned.name = m.name

MERGE (tweet)-[mts:MENTIONS]->(mentioned)

SET mts.method = 'mention_search'

) -> Link User mentions in tweet

FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |

MERGE (reply_tweet:Tweet {id:r})

MERGE (tweet)-[:REPLY_TO]->(reply_tweet)

) -> Link reply tweet

FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |

MERGE (retweet_tweet:Tweet {id:retweet_id})

MERGE (tweet)-[:RETWEETS]->(retweet_tweet)

) -> Link retweet

ORDER BY, ON CREATE

UNWIND $tweets AS t

WITH t

ORDER BY t.id

WITH t,

t.entities AS e,

t.user AS u,

t.retweeted_status AS retweet

MERGE (tweet:Tweet {id:t.id})

SET tweet.id_str = t.id_str,

tweet.text = t.text,

tweet.created_at = t.created_at,

tweet.favorites = t.favorite_count

MERGE (user:User {screen_name:u.screen_name})

SET user.name = u.name,

user.location = u.location,

user.followers = u.followers_count,

user.following = u.friends_count,

user.statuses = u.statusus_count,

user.profile_image_url = u.profile_image_url

MERGE (user)-[:POSTS]->(tweet)

MERGE (source:Source {name:t.source})

MERGE (tweet)-[:USING]->(source)

FOREACH (h IN e.hashtags |

MERGE (tag:Hashtag {name:LOWER(h.text)})

MERGE (tag)<-[:TAGS]-(tweet)

)

FOREACH (u IN e.urls |

MERGE (url:Link {url:u.expanded_url})

MERGE (tweet)-[:CONTAINS]->(url)

)

FOREACH (m IN e.user_mentions |

MERGE (mentioned:User {screen_name:m.screen_name})

ON CREATE SET mentioned.name = m.name

MERGE (tweet)-[:MENTIONS]->(mentioned)

)

FOREACH (r IN [r IN [t.in_reply_to_status_id] WHERE r IS NOT NULL] |

MERGE (reply_tweet:Tweet {id:r})

MERGE (tweet)-[:REPLY_TO]->(reply_tweet)

)

FOREACH (retweet_id IN [x IN [retweet.id] WHERE x IS NOT NULL] |

MERGE (retweet_tweet:Tweet {id:retweet_id})

MERGE (tweet)-[:RETWEETS]->(retweet_tweet)

)

COUNT, <- and LIMIT

- COUNT counts The number of matching rows.
- LIMIT limits the number of results.
- <- Relationship of type KNOWS from n to m in this ((m)<-[:KNOWS]-(n))

Example of COUNT and LIMIT

MATCH (h:Hashtag)<-[:TAGS]-(t:Tweet)<-[:POSTS]-(u:User {screen_name:'dpkmr'}) WITH h, COUNT(h) AS Hashtags ORDER BY Hashtags DESC LIMIT 5 RETURN h.name AS tag_name, Hashtags

Returns (hashtags, count of times used) used by dpkmr user in his tweet with following data. Result is limited to maximum 5 (Ref: below screenshot)

DETACH DELETE

- DETACH DELETE Delete a node and all relationships connected to it.

It returns all distinct labels

MATCH (n { name: 'Andy' })-[r:KNOWS]->()DELETE r

Above query deletes the relationship

Ref: https://neo4j.com/docs/cypher-manual/current/clauses/delete/

WHERE

MATCH(bucket:DMCheckBucket)WHERE NOT (bucket)-[:DMCHECKCLIENT]->()

WITH bucket LIMIT 1

MATCH(client:DMCheckClient {id:'1278883139272626183'})

MERGE(bucket)-[:DMCHECKCLIENT]->(client)

return bucket.id

Above command find buckets which are not in relationship DMCHECKCLIENT

Ref: https://stackoverflow.com/questions/10952332/return-node-if-relationship-is-not-present

REMOVE

It removes a node property

MATCH(bucket:DMCheckBucket) WHERE NOT (bucket)-[:DMCHECKCLIENT]->() REMOVE bucket.dead_datetime

Above example removes property with name dead_datetime

Ref: https://community.neo4j.com/t/rename-property-name/5873

EXISTS

It checks existence of a property

MATCH(b:DMCheckBucket) where exists(b.dead_datetime) return b

Above example returns node if property namely dead_datetime exists

Ref: https://stackoverflow.com/questions/33676844/neo4jclient-how-to-check-if-property-exists

DISTINCT, LABELS

DISTINCT -> Returns unique

LABELS -> label of a node

MATCH (n) RETURN distinct labels(n)

WITH

WITH-> It is used to apply filter on match result

MATCH(bucket:DMCheckBucket)WHERE NOT (bucket)-[:DMCHECKCLIENT]->()

WITH bucket LIMIT 1

MATCH(client:DMCheckClient {id:'1278883139272626183'})

MERGE(bucket)-[:DMCHECKCLIENT]->(client)

return bucket.id

Above command find buckets which are not in relationship DMCHECKCLIENT and LIMIT result to 1 and then add relationship to this one bucket

Ref:https://neo4j.com/docs/cypher-manual/current/clauses/with/

Real use-case

Analysis of a specific Tweet

We are experimenting API based info for below real tweet

Neo4J Query

match(t:Tweet {id_str : "1244485563698348034"}) return t

Neo4J output

Textual

Graphical output

Graphical output provides associated nodes info as as well. In below example, highlighted node is the tweet under consideration.

Analysis of DM users

Below example shows the DM users wrt to a given tweet

Below example shows the DM users wrt to a given user

Below example filters, for a given tweet, retweets users who are DM capable

Scale Limits

From 3.x onwards, there is no upper limit on number of nodes and relationships (Refer: https://neo4j.com/blog/neo4j-3-0-massive-scale-developer-productivity/)

Earlier version had limit (Ref: https://dba.stackexchange.com/questions/186968/neo4j-community-edition-db-size-limit)

Infra queries

DB size check

In the neo4j console, fire 'du -hc data/databases/' (Ref: https://neo4j.com/developer/kb/understanding-database-growth/)

Version check

Refer https://neo4j.com/developer/kb/cypher-to-determine-version-and-edition-of-neo4j/

Custom port for Bolt and HTTP via py2neo

Refer https://py2neo.org/v4/database.html#py2neo.database.Graph

FAQs

How to rename a relationship type

Create new relationship type and then delete old one

Ref: https://community.neo4j.com/t/change-relationships-name/6473

How to set multiple properties together

Use +=

Ref: https://neo4j.com/developer/kb/understanding-how-merge-works/

How to get random nodes on each execution

Use rand()

Ref: https://stackoverflow.com/questions/12510696/neo4j-is-there-a-way-how-to-select-random-nodes

How to list all relationships associated to a node

match(c:ClientForService {id:"1063283370"})-[r]->(b) RETURN r,c,b

Ref: https://stackoverflow.com/questions/38423683/get-all-relationships-for-a-node-with-cypher

Reference

https://neo4j.com/docs/cypher-refcard/current/

https://github.com/neo4j-contrib/twitter-neo4j/blob/master/docker/import_user.py

https://github.com/krdpk17/twitter-neo4j/blob/master/docker/import_user.py

https://py2neo.org/2.0/intro.html

https://stackoverflow.com/questions/28144751/whats-the-cypher-script-to-delete-a-node-by-id

https://stackoverflow.com/questions/32742751/what-is-the-difference-between-multiple-match-clauses-and-a-comma-in-a-cypher-qu

https://stackoverflow.com/questions/24094882/how-can-i-make-a-string-contain-filter-on-neo4j-cypher

https://neo4j.com/docs/cypher-manual/current/clauses/delete/

https://py2neo.org/v4/database.html#py2neo.database.Graph

Page updated

Google Sites

Report abuse