8th-9th June
I have been busy creating the entity relationships, you can see a screen shot of the data for Destiny Activities and the associated relationships. You can see already it's not very clean, two entries for the same activity (inherited from the dubious website I ripped it from).
Another way ensure we only have unique entity nodes is to issue a statement in Neo4j like so:
"CREATE CONSTRAINT ON (activity:Activity) ASSERT activity.isbn IS UNIQUE;" If you try and create the same node, Neo4J will issue you something back like this: "Neo.ClientError.Schema.ConstraintViolation" I am also tackling these duplicate entries by writing each entry into a hashset and if it doesn't exist, write it to a JSON Array: if (!hs.contains(finalNodes.toString())) {
12th June
I finished off the entity JSON create scripts and now I'm working back on what I call the NEO4J utils library and it's going really well! I built a class with a couple of functions which create nodes and relationships (using the restful APIs), tested it end to end to ensure it all worked, as you can see below it's only two nodes but it's great to see it working:
More progress: Things are starting to get exciting, I just did a test load of all the medal entities, you can see below I haven't added the relationships but in an hour (probably a day) or so I plan to have that running.
Some time later: I made some pretty good progress with the entities and the relationships, after 2 weeks of automating I have the Destiny medals and their relationships all stored in NEO4J. Below you can see a video where I did a "MATCH ALL" type query to see all my hard work in action!
What you're seeing: "Medals" in pink, "Medal Types" in yellow and "Medal Weight" in red. All the lines joining the nodes are relationships
It's fair to say I'm getting pretty amped about this, I need to spend some time mapping out what's next, I'm assuming that I'm going to be stepping into document classification using Reddit posts.
Go to my next post
1 Comment
It's been a weird week. I resigned from Vend, I have no bathroom (renovation) and it's pretty cold. Enough about my sob stories!
In my last post in this series I was talking about the API ingestion, thinking about how stupid I was to have multiple calls for different attributes / relationships on a single node. Well, as I have mentioned before this is my first time with Neo4j for some reason I skipped over the very first part of the Java API example which shows you how to use the Cypher language in an API call. So erase most of my last post from your memory. So, today (8th July) I have been running over in my brain how to create the entities and it's fair to say I have been fairly consumed by the whole thing. Below is a screen shot of a RAW entity detailing 3 activities.
In a traditional database and if I was lazy, I would insert this into a table called destiny_activities, but because we are using Neo4j all that goes out the window.
If we look at theNeo4J best practices, you can see that the model is extremely normalized. Below is a screen shot of how the Neo guys would datamodel book reviews
So, my preliminary data model based on the activity excel file above might look like:
This blog is a saving grace, as it forces me through an iterative design process. When you're a sole developer working on something generally we just smash code until it works and the design aspect goes some what out the window.
The next question I need to answer is how to ingest the data into Neo4J, as I create relationships to other nodes I will need to create all entity nodes first then the relationships. My thought at the moment is to create a JSON structure that looks like this: { "class": "activity", "name": "A guardian rises..", "desc": "Rise in the light", "relationships": [ { "class": "activity_levels", "name": "activity_level_1" }, { "class": "game_mode", "name": "story" }, { "class": "destination", "name": "old_russia, earth" } ] }
I would infer all of this information in a one off script I would write in Java. Once I'm done I would then write a one off script that will first run through all the entities and create them, then when that is done I would create the relationships.
I'm being a little stubborn and trying not to use theCSV importer and solely use the API's, my reason behind this is I want to get used to these interfaces as when I'm done I want this all to run automatically with no hand holding. Click here to see my next post I'm busy digging into the Neo4J implementation in Java, there seems to be two ways of writing / reading from the database: The first is pretty similar to connecting to any database, the other is using the restful API's. When you fire up Neo4J you do get a nice and pretty interface/console to use, one thing I have learned is that the web console utilizes the rest API's behind the scene (makes sense). I created a Neo4J DB externally on a server at home, working with DB's for so long I assumed that I could create a connection client sending along the usual "host/port/pass/user" combo. The setup is a little weird as, if you have physical separation between the application and database it somewhat forces you to use the APIs, which in the long run isn't a bad thing if you think about the way that most modern application development architecture is done these days. I'm going to have a go at writing information from the "Destiny Entities" I created in my last post through the API's. (Monday 29th June) It's been a day or so fiddling with Neo4j and I only just figured out how it all fits together, I was searching high and low with the rest API on how to make a new node with labels in one API call. I was assuming that it would have been a single object containing all the relationships and labels, I had my relational DB hat on and it took me awhile to understand how it worked (duh). This is all a discovery process for me so there are still some things I'm going to get wrong. So I understand that you create a Node, then apply labels, properties and all that good stuff in different API calls.. my brain was thinking it was all one.. when it is not! You know the old CREATE TABLE statement with PK/FK stuff all hanging off one statement, it's easy to get confused if you're old school. In saying that it would be nice if it create API call and Neo did the decoupling behind the scenes so I didn't have to make network latent rest calls for one node, label, properties creation. (Thursday / Friday 2 July) Firstly I manually loaded a couple of Destiny Entity nodes, learned a bunch made a couple of mistakes (see below) but it's still pretty cool getting visual feedback right away. But overall things are going okey dokey! If you have read this blog series I had previously ripped destiny entity information, and before I start parsing community post information I need to get the structure and relationships between the entities correct so I can map it all out.
On Thursday I spent the time making sense of the entity data and creating the relationships & classes in Java, it probably needs another day or so before it's ready to push 4000+ of these nodes in Neo4j. By the time it's finished the output will be classes with relationship information to other classes all stored in JSON ready to consume in Neo4j. Hopefully next week I can post a visualisation of the entities all mapped out, keep posted. Check out my next post I update this menu list as I continue development:
I thought it would be cool to go on a journey and create something. I used to work in the computer game industry, where I would frequent studios like, getting the low down on things before most, get to play games months before any release day. One thing I found interesting is the studios spend loads of time looking at social media to help make design decisions or pin point issues. There is a plethora of information publicly available on the WWW. I'm super interested in free text and documentation classification. So, I'm going on a little bit of a mission.. create a rough platform that would help munge all this data in one place. Below is a few things that come to mind: DATASETS YOUTUBE Probably starting with the data apis, from my very thorough 2 minute read I can tell that there is some possibility to search for videos containing key words. Same as youtube, Reddit has a developer section for APIS. I could get all the latest or greatest community information from sub-reddits like r/callofduty or r/destiny. Using keywords, I could create a call from the Twitter API's for any hashtags for these games. FORUMS This is a tricker one, no API's but hey... if it's on the net it should be easy-ish. TOOLS n STUFF LANGUAGE Since leaving my last role (where I coded in python for 4.5 years), I have forced myself to only use Java for all things backend. The first couple of months sucked, but I found when I went back to Python I saw myself missing Java in pretty quick succession, so I have become a nipple high wearing tracksuit wearing Java developer.... sue me. I will probably use some Angular to do stuff and attempt to make something pretty with bootstrap. DATA STORAGE This is going to be an interesting one as I'm primarily working with text... I'm going to have a go at some graph databases, of which I seem to always get pointed to Neo4j. This will be my first time in Neo4j , so I will be a newbie and plenty of forgiveness is needed. OTHERSTUFF I will use a bunch of text based data mining techniques like sentiment analysis for measuring things like subjectivity and objectivity, which is going to be interesting due to the language gamers use to show delight or loathing is not usually typical to normal text. I will probably use jsoup as a parser for any sites that don't have any API'S. Before I start on this journey I have a few things to ask:
I plan to have a pretty crappy version up and running before the large Destiny release in September, so I can see if this even works. Hold tight! It's going to be a rough ride (mainly for me!). Go to my next post to see more |
AuthorNew Zealand big data nerd, facial hair sculptor and classic car fanatic. Owner of needles.io, freelance big data consultant, ex Activision. Archives
April 2016
Categories |