Skip to content
DuoBolt

Core Concepts

This page walks through how DuoBolt detects duplicate files end-to-end — from directory discovery through size filtering, head+tail prehashing, full-content BLAKE3 hashing, and deterministic grouping. The same engine powers both flavours the product ships in:

  • DuoBolt Desktop — a modern visual application for scanning, reviewing, and cleaning duplicate files.
  • DuoBolt CLI — a command-line scanner for power-users who need maximum control, automation, and script integration.

Both tools share the same underlying principles: high-performance scanning, BLAKE3 hashing, deterministic grouping, and safe cleaning workflows (Trash or move-to-archive in Desktop; scan-only in CLI).


DuoBolt is designed around three core layers:

1. Scan Layer

Discovers candidate files through directory traversal.

Handles extension filtering, symlinks, size thresholds, and ignore rules.

2. Hashing Layer

Ultra-fast BLAKE3 hashing with two modes:

Single-stage (full hash) or Two-stage (head+tail prehash, then full hash).

3. Detection Layer

Groups files by size → prehash → BLAKE3 hash.

Zero false positives. Perfect determinism. Linear scaling.


The desktop app provides a guided, visual workflow, including:

Configure:

  • included / excluded extensions
  • directory-extension exclusions
  • ignore rules
  • symlink behavior
  • advanced performance settings
  • minimum file size

All options affect which files are scanned, never the correctness of the results.

The desktop UI uses:

  • progress bars
  • animated circular charts
  • badges for result groups
  • scan statistics
  • counts for filtered files, ignored files, and hashed files

Everything updates deterministically based on scan progress.

DuoBolt Desktop groups duplicates visually into:

  • groups (at least 2 files)
  • singleton files (unique)
  • preview thumbnails
  • path inspection
  • Bulk select rules (e.g., auto-select older files, or lowest-quality files)

Users can then safely remove selected files in either of two ways: delete to the system Trash/Recycle Bin for short-term reversibility, or Move to Archive to relocate them into a folder of their choice (local or external) with their directory structure preserved. Archives appear alongside deletions in Deletion History, where every removal is tracked with its original source paths so files can be restored long after the OS Trash is gone.

On macOS, the desktop app also detects APFS clone groups — files that share storage at the filesystem level (e.g., from cp, Time Machine snapshots, or build caches) — and reports recoverable space accurately so duplicates that can’t actually free disk space don’t inflate the cleanup numbers.


The CLI provides maximum control and integrates perfectly with scripts, automation, and server workflows.

  • filters (include-ext / exclude-ext / min-size)
  • parallel hashing (--threads) — auto-selected optimal values when unspecified
  • symlink control (--follow-symlinks, --no-symlink-collapse)
  • output formats (txt, csv, json)
  • exit codes (0 = clean, 2 = duplicates found)

The CLI and Desktop app share the same detection engine, ensuring identical results.


Both versions follow the same data-flow model:

  1. Discovery

    Walk the filesystem to find candidate files.

  2. Filtering

    Apply include/exclude rules based on your configuration.

  3. Prehashing (default)

    Head+tail hash prefilter; disable with --no-prehash if you need full hashing only.

  4. Hashing

    Full BLAKE3 hashing of file contents that pass the prefilter.

  5. Grouping

    Build duplicate groups based on hash matches.

  6. Output

    UI presentation (Desktop) or formatted results (CLI).


DuoBolt is optimized to handle huge datasets without blocking:

  • multithreaded hashing with auto-selected optimal thread counts (manual tuning optional)
  • streaming reads
  • optimized chunk sizes
  • BLAKE3 parallelism
  • default head+tail prehashing to keep large files fast
  • stable, repeatable grouping

  • Always review results before deleting files
  • Use Bulk select rules to auto-select duplicates based on criteria
  • Use Move to Archive when you want a reversible removal that doesn’t depend on the Trash’s time-to-live, or to relocate duplicates into a single audit folder
  • Be extra cautious with network volumes: files on NAS/SMB shares bypass the Trash and are permanently deleted
  • Exclude directory extensions like .app or .bundle on macOS
  • Use extension includes to drastically reduce scanning time
  • Keep the Two-Stage Hashing toggle on (default); turn it off only if you have a specific need
  • On macOS, keep Detect APFS clones on (default) so duplicates that share storage don’t show up as recoverable space
  • Preview thumbnails help identify visual duplicates quickly