Zotero: Eliminate imported duplicates

February 14, 2023

After exporting a “collection” in Zotero to send to an acquaintance, I imported the exported file to check that the archive file was valid, choosing to “link” the imported files rather than importing them (i.e., storing them inside the Zotero data folder).

The import had gone smoothly and I deleted the collection again, but foolishly did not chose to delete the collection contents. This left me with a large number of duplicate items scattered around, as “orphan” items with no parents.

At this point, I should have used the Unfiled Items pane to find those duplicates and chosen to permanently delete them. But, as I had not noticed the duplicate items, I forgot to do this and over time, more unfiled items filled up with unrelated items (e.g. through saving snapshots from the web extension without selecting a collection).

I then noticed the duplicated items and first tried the “merge” function via the Duplicate Items pane. This left me with no duplicate items any more, but meant that inside the items previously affected by duplication, the notes, PDF attachments and website snapshots were duplicated, with duplicates pointing to the data export folder.

After a little digging, it turns out Zotero marks imported “linked” attachments and especially internally with a “link mode” of 2:

LINK_MODE_IMPORTED_FILE = 0;
LINK_MODE_IMPORTED_URL = 1;
LINK_MODE_LINKED_FILE = 2;
LINK_MODE_LINKED_URL = 3;
LINK_MODE_EMBEDDED_IMAGE = 4;

Using the Zotero Javascript API, I cleaned up the duplicates.

Go to Tools -> Developer -> Run JavaScript and enter this code, uncheck Run as async function, click Run. Adjust FS_LOC to the start of the full path of the linked imported Zotero collection, e.g. C:/ or C:\ for Windows.

const ZoteroPane = Zotero.getActiveZoteroPane();
const selectedItems = [ZoteroPane.getSelectedItems()[0]];
const attachmentsToDelete = [];
const itemsWithDuplicateNotes = [];
const itemsWithDuplicateLinks = [];

// Change this to 'C:/' for Windows or '/User' for Mac
const FS_LOC = '/home'

for (let item of selectedItems) {
  if (item.isRegularItem()) {
    let attachmentIDs = item.getAttachments();
    let links = [];
    for (let id of attachmentIDs) {
      let attachment = Zotero.Items.get(id);
      let url = attachment.getField('url');
      // Remove "linked" attachments
      if (attachment['attachmentLinkMode'] == 2 &&
        attachment['attachmentPath'].startsWith(FS_LOC)) {
        // Could also check if attachment with same name already exists
        attachmentsToDelete.push(attachment);
        attachment.erase();
        attachment.saveTx();
        Zotero.Fulltext.indexItems([attachment.id])
      }
      // Remove duplicate linked url attachments
      if (attachment.getAttachmentLinkMode() == 3 && links.includes(url)) {
        itemsWithDuplicateLinks.push({
          f: attachment,
          url: url,
        });
        attachment.erase();
        attachment.saveTx();
        Zotero.Fulltext.indexItems([attachment.id])
      }
      links.push(url);
    }
    let notes = item.getNotes();
    let noteContents = []
    for (let id of notes) {
      let note = Zotero.Items.get(id);
      let noteHTML = note.getNote();
      // Remove duplicates notes
      if (noteContents.includes(noteHTML)) {
        itemsWithDuplicateNotes.push({
          text: noteHTML,
          id: item.getID(),
          title: item.getDisplayTitle(),
        });
        note.erase();
        note.saveTx();
        Zotero.Fulltext.indexItems([note.id])
      }
      noteContents.push(noteHTML);
    }
  }
}

//attachmentsToDelete;
//itemsWithDuplicateNotes;
//itemsWithDuplicateLinks;

For all methods of the Zotero.Item object, see Zotero’s source code for item.js.

See also: Zotero file relink.