In preparation for a post about doing graph analytics in Neo4j (paralleling SPARQLverse from this earlier post), I had to learn to load text/CSV data into Neo.  This post just shows the steps I took to load nodes and then establish edges/relationships in the database.

My head hurt trying to find a simple example of loading the data I had used in my earlier example but this was because I was new to the Cypher language.  I was getting really hung up on previewing the data in the Neo4j visualiser and finding that all my nodes had only ID numbers was really confusing me.  I had thought it wasn’t loading my name properties or something when it was really just a visualisation setting (more on that another time).  Anyway, enough distractions…

Graph Data File – Simple Graph Relations Example

I took my earlier sample data and dumbed it down to fit the normal paradigm of Neo4j – separate nodes and edges load files.  I appreciated working with triples before as I didn’t have to pre-load all the nodes first, but that’s also a story for another day.

First, the nodes file looked like the following.  Note, I thought I had to add the ID though I didn’t end up using it after all:

id,name
1,Chong
2,Lashaun
3,Roberta
4,Elin
5,Tameka
6,Rosalie
7,Noella
8,Elim
9,Mae
10,Fernando
11,Alan
12,Katrina
13,Kaitlyn
14,Zackary
15,Nana
16,Lamonica
17,Meggan
18,Fermina
19,Genevieve
20,Manual
21,Jolie

The second file was simply a list of “source” and “target” names – the graph relations – where the first person had the second person for a friend.  (We handle them as unidirectional in this example.)

source,target
Chong,Lashaun
Chong,Roberta
Chong,Elin
Chong,Tameka
Chong,Rosalie
Chong,Noella
Lashaun,Roberta
Lashaun,Elin
Lashaun,Tameka
Roberta,Elin
Roberta,Tameka
Roberta,Rosalie
Roberta,Noella
Elim,Tameka
Elim,Rosalie
Elim,Noella
Elim,Mae
Elim,Fernando
Elim,Alan
Tameka,Katrina
Tameka,Kaitlyn
Rosalie,Alan
Noella,Katrina
Mae,Kaitlyn
Mae,Zackary
Mae,Nana
Mae,Lamonica
Fernando,Meggan
Fernando,Fermina
Fernando,Genevieve
Fernando,Manual
Fernando,Jolie
Fernando,Chong
Alan,Lashaun
Alan,Roberta
Alan,Elin
Katrina,Tameka
Katrina,Rosalie
Kaitlyn,Noella
Zackary,Fernando
Zackary,Alan
Nana,Katrina
Nana,Kaitlyn
Nana,Zackary
Lamonica,Zackary
Lamonica,Fernando
Meggan,Zackary
Meggan,Fernando
Meggan,Jolie
Fermina,Fernando
Fermina,Jolie
Fermina,Chong
Genevieve,Lashaun
Genevieve,Roberta
Manual,Elin
Manual,Tameka
Jolie,Rosalie
Graph of first load test of Neo4j

Basic load example of a handful of relationships.

Loading CSV Relationships into Neo4j

To get the data into Neo4j I had to run two commands.  But first I run a sort of “delete all” as I was doing lots of testing:

MATCH (n)
WITH n LIMIT 10000
OPTIONAL MATCH (n)-[r]->()
DELETE n,r;

Then load all the nodes, assigning each one to a Person entity and grabbing only the name property from the CSV:

LOAD CSV WITH HEADERS FROM "file:///Users/tyler/graph/friends_nodes.csv" AS nodes
CREATE (p:Person { name: nodes.name })

And finally, load the edges/relationships to map persons -> to persons via a has_friend relationship:

LOAD CSV WITH HEADERS FROM "file:///Users/tyler/graph/friends_edges.csv" AS edges
MATCH (a:Person { name: edges.source})
MATCH (b:Person { name: edges.target })
CREATE (a)-[:HAS_FRIEND]->(b);

The resulting load will say something like:

Created 57 relationships, returned 0 rows in 46 ms

More on exploring and analysing this in a future post.  Tweet it or comment if you are interested in more along this line.  Thanks for reading!

About Tyler Mitchell

Director Product Marketing @ OmniSci.com GPU-accelerate data analytics | Sr. Product Manager @ Couchbase.com - next generation Data Platform for System of Engagement! Former Eng. Director @Actian.com, author and technology writer in NoSQL, big data, graph analytics, geospatial and Internet of Things. Follow me @1tylermitchell or get my book from http://locatepress.com/.