I’m performance tuning an application, and have acheived huge speed ups by batching records into thousands of insert JSON docs per transaction, insead of individual transactions on multiple threads. The nodes I am inserting must be unique by a value, and I have therefor made this value an upsert index.
By implementing batches I however had to reduce parallelism in my app, as this unique value can simultaneously occur in separate batches on separate threads, which causes the entire batch to fail. Retrying is then expensive enough to make it slower than a single thread. (It is not possible to shuffle the same values to the same thread before insert)
Is there any strategy to have parallel batch inserts, with the transactional guarantees of non-duplicate nodes, e.g. letting only the duplicated inserts fail and the rest of the batch commit successfully? Or any other way to batch load data whilst keeping nodes unique by a value? I also cannot use the bulk loader as this process is part of a high volume streaming application.
Thanks in advance!