rudu

rudu is a high-performance, Rust-powered replacement for the traditional Unix du (disk usage) command. Built to provide a safer, faster, and more extensible alternative for scanning and analyzing directory sizes — especially for large-scale or deep filesystem structures.

View on GitHub

Why rudu?

While du has been a reliable tool for decades, it's single-threaded, limited in extensibility, and not always ideal for custom workflows or integration with modern systems. rudu takes advantage of Rust's memory safety and concurrency to provide a tool that is:

Fast — uses multithreading (rayon) to speed up directory traversal and size aggregation
Safe — memory-safe by design, no segfaults or undefined behavior
Extensible — easy to add new flags, filters, and output formats
Accurate — reports true disk usage (allocated blocks), not just file sizes
Memory-aware — configurable memory limits for resource-constrained environments

Features

Core Functionality

Recursive disk usage scanning
Parallelized file traversal with multithreading
Real disk usage calculation (st_blocks × 512)
Cross-platform compatibility (macOS, Linux, BSD)
Platform-specific memory monitoring

Memory Management (v1.4.0+)

--memory-limit MB — Set maximum memory usage
Graceful memory handling with automatic feature disabling
Early termination when memory limit exceeded
HPC cluster support for resource-constrained environments

Filtering and Display

--depth N — Limit output to N levels deep
--exclude PATTERN — Exclude patterns (e.g., .git, node_modules)
--show-files true|false — Toggle file display
Clear [DIR] and [FILE] labels

Output & Analysis

--sort size|name — Sort by size or name
--show-owner — Display ownership information
--show-inodes — Show file/directory counts
--output report.csv — Export to CSV

Performance Features

--threads N — Control CPU thread usage
Intelligent caching for faster subsequent runs
Incremental scanning (only rescans changed directories)
--profile — Detailed performance profiling
--no-cache, --cache-ttl — Cache management

Installation

Using Cargo

cargo install rudu

From Source

git clone https://github.com/greensh16/rudu.git
cd rudu
cargo build --release
cargo install --path .

Usage Examples

Basic Usage

# Scan current directory
rudu

# Scan specific directory
rudu /data

# With progress indicator
rudu /large/directory

Memory-Limited Scanning (HPC Clusters)

# Limit memory to 512MB
rudu /large/dataset --memory-limit 512

# Very memory-constrained (128MB)
rudu /project --memory-limit 128 --no-cache

# Combine with other options
rudu /data --memory-limit 256 --depth 3 --threads 2

Filtering and Analysis

# Top-level directories only
rudu /data --depth 1

# Exclude directories
rudu /project --exclude .git --exclude node_modules

# Sort by size and show owners
rudu /data --sort size --show-owner

Output Formats

# Export to CSV
rudu /data --output report.csv

# Combine options
rudu /project --depth 2 --sort size --exclude .git --output analysis.csv

Performance Tuning

# Use 4 threads
rudu /large/directory --threads 4

# Enable profiling
rudu /project --profile --threads 8

# Disable caching
rudu /data --no-cache

Memory Limiting for HPC

New in v1.4.0: rudu supports memory usage limits, making it suitable for High-Performance Computing environments where memory resources are strictly controlled.

Why Memory Limiting?

In HPC clusters, jobs are allocated specific amounts of memory. Exceeding limits can result in:

Job termination by the scheduler (SLURM, PBS, etc.)
Node instability affecting other users
Poor cluster performance

How It Works

Real-time monitoring — Tracks RSS (Resident Set Size) memory usage
Graceful degradation — At 95% of limit, disables memory-intensive features
Early termination — Stops scan if limit exceeded, returns partial results
Platform awareness — Automatically handles platforms without RSS support

HPC Usage Example

#!/bin/bash
#SBATCH --mem=1G
#SBATCH --job-name=rudu-scan
rudu /lustre/project --memory-limit 900 --no-cache --threads 4

Memory Behavior

Memory Usage	Behavior
< 95% limit	Normal operation with all features
95-100% limit	Disables caching, reduces allocations
> 100% limit	Terminates early, returns partial results

Performance

Benchmark Results

Directory Type	Files/Dirs	du Time	rudu Time	Speedup
Small (1K files)	~1,000	0.010s	0.619s	0.02x*
Medium (/usr/bin)	~1,400	0.017s	0.015s	1.13x
Large (project)	~10,000	0.106s	0.052s	2.04x

*Note: For very small directories, rudu's startup overhead can make it slower than du. Performance benefits become apparent with larger directory structures.

When to Use

Use rudu for:

Large directory structures (>5,000 files)
Complex filtering requirements
CSV output for analysis
HPC clusters and memory-constrained environments
Repeated scans (caching benefits)

Use du for:

Very small directories (<1,000 files)
Simple, quick size checks
Legacy script compatibility

Contributing

Contributions are welcome! Please submit a Pull Request on GitHub.

Development Setup

git clone https://github.com/greensh16/rudu.git
cd rudu

# Run tests
cargo test

# Check formatting
cargo fmt --check

# Run linter
cargo clippy --all-targets -- -D warnings

# Build documentation
cargo doc --open

License

GNU General Public License v3.0 - see the LICENSE file for details.

Acknowledgments

Inspired by the classic Unix du command
Built with Rust
Uses Rayon for parallel processing
CLI powered by Clap
Progress indicators via Indicatif