Core Concepts

This page walks through how DuoBolt detects duplicate files end-to-end — from directory discovery through size filtering, head+tail prehashing, full-content BLAKE3 hashing, and deterministic grouping. The same engine powers both flavours the product ships in:

DuoBolt Desktop — a modern visual application for scanning, reviewing, and cleaning duplicate files.
DuoBolt CLI — a command-line scanner for power-users who need maximum control, automation, and script integration.

Both tools share the same underlying principles: high-performance scanning, BLAKE3 hashing, deterministic grouping, and safe cleaning workflows (Trash or move-to-archive in Desktop; scan-only in CLI).

Architecture Overview

DuoBolt is designed around three core layers:

1. Scan Layer

Discovers candidate files through directory traversal.

Handles extension filtering, symlinks, size thresholds, and ignore rules.

2. Hashing Layer

Ultra-fast BLAKE3 hashing with two modes:

Single-stage (full hash) or Two-stage (head+tail prehash, then full hash).

3. Detection Layer

Groups files by size → prehash → BLAKE3 hash.

Zero false positives. Perfect determinism. Linear scaling.

DuoBolt Desktop — Core Concepts

The desktop app provides a guided, visual workflow, including:

1. Scan Setup

Configure:

included / excluded extensions
directory-extension exclusions
ignore rules
symlink behavior
advanced performance settings
minimum file size

All options affect which files are scanned, never the correctness of the results.

2. Real-time Feedback

The desktop UI uses:

progress bars
animated circular charts
badges for result groups
scan statistics
counts for filtered files, ignored files, and hashed files

Everything updates deterministically based on scan progress.

3. Results & Review Workflow

DuoBolt Desktop groups duplicates visually into:

groups (at least 2 files)
singleton files (unique)
preview thumbnails
path inspection
Bulk select rules (e.g., auto-select older files, or lowest-quality files)

Users can then safely remove selected files in either of two ways: delete to the system Trash/Recycle Bin for short-term reversibility, or Move to Archive to relocate them into a folder of their choice (local or external) with their directory structure preserved. Archives appear alongside deletions in Deletion History, where every removal is tracked with its original source paths so files can be restored long after the OS Trash is gone.

On macOS, the desktop app also detects APFS clone groups — files that share storage at the filesystem level (e.g., from cp, Time Machine snapshots, or build caches) — and reports recoverable space accurately so duplicates that can’t actually free disk space don’t inflate the cleanup numbers.

DuoBolt CLI — Core Concepts

The CLI provides maximum control and integrates perfectly with scripts, automation, and server workflows.

Key CLI Concepts:

filters (include-ext / exclude-ext / min-size)
parallel hashing (--threads) — auto-selected optimal values when unspecified
symlink control (--follow-symlinks, --no-symlink-collapse)
output formats (txt, csv, json)
exit codes (0 = clean, 2 = duplicates found)

The CLI and Desktop app share the same detection engine, ensuring identical results.

Data Flow

Both versions follow the same data-flow model:

Discovery

Walk the filesystem to find candidate files.
Filtering

Apply include/exclude rules based on your configuration.
Prehashing (default)

Head+tail hash prefilter; disable with --no-prehash if you need full hashing only.
Hashing

Full BLAKE3 hashing of file contents that pass the prefilter.
Grouping

Build duplicate groups based on hash matches.
Output

UI presentation (Desktop) or formatted results (CLI).

Performance Principles

DuoBolt is optimized to handle huge datasets without blocking:

multithreaded hashing with auto-selected optimal thread counts (manual tuning optional)
streaming reads
optimized chunk sizes
BLAKE3 parallelism
default head+tail prehashing to keep large files fast
stable, repeatable grouping

Always review results before deleting files
Use Bulk select rules to auto-select duplicates based on criteria
Use Move to Archive when you want a reversible removal that doesn’t depend on the Trash’s time-to-live, or to relocate duplicates into a single audit folder
Be extra cautious with network volumes: files on NAS/SMB shares bypass the Trash and are permanently deleted
Exclude directory extensions like .app or .bundle on macOS
Use extension includes to drastically reduce scanning time
Keep the Two-Stage Hashing toggle on (default); turn it off only if you have a specific need
On macOS, keep Detect APFS clones on (default) so duplicates that share storage don’t show up as recoverable space
Preview thumbnails help identify visual duplicates quickly

Prehashing is on by default; add --no-prehash only for specialized cases that require full hashing
Use extension includes to drastically reduce scanning time
Exclude directory extensions like .app or .bundle on macOS
Output in JSON for machine-processing and automation
Use --quiet flag when scripting to suppress progress output
Check exit codes (0 = clean, 2 = duplicates found)
Set --threads based on your CPU cores for optimal performance