Delete Duplicate Node By Index Using Neo4j Cypher Query

Wednesday, August 14, 2013

Follow the steps below to find and delete duplicate nodes on property and index in Neo4j's web admin console.

Step 1

Select duplicate records by executing the following Cypher query in the Neo4j admin console.

START n=node:invoices("PO_NUMBER:(\"112233\")")
// Cypher query for collecting the ids of indexed nodes containing duplicate properties
ORDER BY id(n) DESC  // Order by descending to delete the most recent duplicated record
WITH n.Key? as DuplicateKey, COUNT(n) as ColCount, COLLECT(id(n)) as ColNode
WITH DuplicateKey, ColCount, ColNode, HEAD(ColNode) as DuplicateId
WHERE ColCount > 1 AND (DuplicateKey is not null) AND (DuplicateId is not null)
WITH DuplicateKey, ColCount, ColNode, DuplicateId 
ORDER BY DuplicateId 
RETURN DuplicateKey, ColCount, DuplicateId 
//RETURN COLLECT(DuplicateId) as CommaSeparatedListOfIds
//** Toggle comments for the return statements above to validate duplicate records 
//** Do not proceed to delete without validating

Step 2

Validate and copy duplicate record IDs from web admin console:

Execute the Cypher query from Step 1 to validate duplicate records exist.
After validating duplicate records, execute the Cypher query from Step 1 as a comma separated list of IDs.

Step 3

Copy and paste CommaSeparatedListOfIds into the delete query below.

START n=node(1120038,1120039,1120040,1120042,1120044,1120048,1120049,1120050,1120053,1120067,1120068)
// Replace above with the IDs from CommaSeparatedListOfIds in the previous step
MATCH n-[r]-()

** Execute the Cypher query above ONLY after replacing the example IDs in the START statement.

Step 4

Validate that the delete transaction committed.

Execute the Cypher query from Step 1 to make sure that the transaction was committed.

That's it! Comment below with questions or feedback.