In this blog post, we’re going to walk through designing a graph processing algorithm on top of Neo4j that discovers the influence and sentiment of tweets in your Twitter network.
The source code for this reference application is open source. You can find the GitHub project here. |
Graph Data Modeling
The first thing we’ll need to do is to design a data model for analyzing the sentiments and influences of users on Twitter. This example iterates from an earlier graph processing example described in another blog post. I recommend taking a look at that post to better understand the concepts I talk about in this one.
The diagram below is the graph data model that we will use to import, analyze, and query data from Twitter.
In the diagram above, the following relationships are described.
-
Users
follow otherusers
-
Users
createtweets
-
Tweets
containphrases
-
Phrases
are categorized intotopics
Twitter User Ranking
For this first blog post we’re going to focus on generating a rank of influential Twitter users in my social network that tells me which topics a user tweets about.
The screenshot above is from the results of a Neo4j cypher query. Here we find a list of Twitter users that were discovered using a crawling algorithm based on PageRank. This output is similar to the dashboard that was created in an earlier blog post, but adds in a top category, top phrase, and a sentiment score.