1. Scan Layer
Discovers candidate files through directory traversal.
Handles extension filtering, symlinks, size thresholds, and ignore rules.
This page walks through how DuoBolt detects duplicate files end-to-end — from directory discovery through size filtering, head+tail prehashing, full-content BLAKE3 hashing, and deterministic grouping. The same engine powers both flavours the product ships in:
Both tools share the same underlying principles: high-performance scanning, BLAKE3 hashing, deterministic grouping, and safe cleaning workflows (Trash or move-to-archive in Desktop; scan-only in CLI).
DuoBolt is designed around three core layers:
1. Scan Layer
Discovers candidate files through directory traversal.
Handles extension filtering, symlinks, size thresholds, and ignore rules.
2. Hashing Layer
Ultra-fast BLAKE3 hashing with two modes:
Single-stage (full hash) or Two-stage (head+tail prehash, then full hash).
3. Detection Layer
Groups files by size → prehash → BLAKE3 hash.
Zero false positives. Perfect determinism. Linear scaling.
The desktop app provides a guided, visual workflow, including:
Configure:
All options affect which files are scanned, never the correctness of the results.
The desktop UI uses:
Everything updates deterministically based on scan progress.
DuoBolt Desktop groups duplicates visually into:
Users can then safely remove selected files in either of two ways: delete to the system Trash/Recycle Bin for short-term reversibility, or Move to Archive to relocate them into a folder of their choice (local or external) with their directory structure preserved. Archives appear alongside deletions in Deletion History, where every removal is tracked with its original source paths so files can be restored long after the OS Trash is gone.
On macOS, the desktop app also detects APFS clone groups — files that share storage at the filesystem level (e.g., from cp, Time Machine snapshots, or build caches) — and reports recoverable space accurately so duplicates that can’t actually free disk space don’t inflate the cleanup numbers.
The CLI provides maximum control and integrates perfectly with scripts, automation, and server workflows.
--threads) — auto-selected optimal values when unspecified--follow-symlinks, --no-symlink-collapse)txt, csv, json)The CLI and Desktop app share the same detection engine, ensuring identical results.
Both versions follow the same data-flow model:
Discovery
Walk the filesystem to find candidate files.
Filtering
Apply include/exclude rules based on your configuration.
Prehashing (default)
Head+tail hash prefilter; disable with --no-prehash if you need full hashing only.
Hashing
Full BLAKE3 hashing of file contents that pass the prefilter.
Grouping
Build duplicate groups based on hash matches.
Output
UI presentation (Desktop) or formatted results (CLI).
DuoBolt is optimized to handle huge datasets without blocking:
.app or .bundle on macOS--no-prehash only for specialized cases that require full hashing.app or .bundle on macOS--quiet flag when scripting to suppress progress output0 = clean, 2 = duplicates found)--threads based on your CPU cores for optimal performance