Delete Duplicate Node By Index Using Neo4j Cypher Query

Follow the steps below to find and delete duplicate nodes on property and index in Neo4j's web admin console.

Step 1

Select duplicate records by executing the following Cypher query in the Neo4j admin console.

START n=node:invoices("PO_NUMBER:(\"112233\")")
// Cypher query for collecting the ids of indexed nodes containing duplicate properties
WITH n
ORDER BY id(n) DESC  // Order by descending to delete the most recent duplicated record
WITH n.Key? as DuplicateKey, COUNT(n) as ColCount, COLLECT(id(n)) as ColNode
WITH DuplicateKey, ColCount, ColNode, HEAD(ColNode) as DuplicateId
WHERE ColCount > 1 AND (DuplicateKey is not null) AND (DuplicateId is not null)
WITH DuplicateKey, ColCount, ColNode, DuplicateId 
ORDER BY DuplicateId 
RETURN DuplicateKey, ColCount, DuplicateId 
//RETURN COLLECT(DuplicateId) as CommaSeparatedListOfIds
//** Toggle comments for the return statements above to validate duplicate records 
//** Do not proceed to delete without validating

Step 2

Validate and copy duplicate record IDs from web admin console:

Execute the Cypher query from Step 1 to validate duplicate records exist.

After validating duplicate records, execute the Cypher query from Step 1 as a comma separated list of IDs.

Step 3

Copy and paste CommaSeparatedListOfIds into the delete query below.

START n=node(1120038,1120039,1120040,1120042,1120044,1120048,1120049,1120050,1120053,1120067,1120068)
// Replace above with the IDs from CommaSeparatedListOfIds in the previous step
MATCH n-[r]-()
DELETE r, n

** Execute the Cypher query above ONLY after replacing the example IDs in the START statement.

Step 4

Validate that the delete transaction committed.

Execute the Cypher query from Step 1 to make sure that the transaction was committed.

That's it! Comment below with questions or feedback.

Kenny Bastani

Pages