rudu
rudu is a high-performance, Rust-powered replacement for the traditional Unix du (disk usage) command.
Built to provide a safer, faster, and more extensible alternative for scanning and analyzing directory sizes — especially for large-scale or deep filesystem structures.
Why rudu?
While du has been a reliable tool for decades, it's single-threaded, limited in extensibility, and not always ideal for custom workflows or integration with modern systems.
rudu takes advantage of Rust's memory safety and concurrency to provide a tool that is:
- Fast — uses multithreading (rayon) to speed up directory traversal and size aggregation
- Safe — memory-safe by design, no segfaults or undefined behavior
- Extensible — easy to add new flags, filters, and output formats
- Accurate — reports true disk usage (allocated blocks), not just file sizes
- Memory-aware — configurable memory limits for resource-constrained environments
Features
Core Functionality
- Recursive disk usage scanning
- Parallelized file traversal with multithreading
- Real disk usage calculation (st_blocks × 512)
- Cross-platform compatibility (macOS, Linux, BSD)
- Platform-specific memory monitoring
Memory Management (v1.4.0+)
--memory-limit MB— Set maximum memory usage- Graceful memory handling with automatic feature disabling
- Early termination when memory limit exceeded
- HPC cluster support for resource-constrained environments
Filtering and Display
--depth N— Limit output to N levels deep--exclude PATTERN— Exclude patterns (e.g., .git, node_modules)--show-files true|false— Toggle file display- Clear
[DIR]and[FILE]labels
Output & Analysis
--sort size|name— Sort by size or name--show-owner— Display ownership information--show-inodes— Show file/directory counts--output report.csv— Export to CSV
Performance Features
--threads N— Control CPU thread usage- Intelligent caching for faster subsequent runs
- Incremental scanning (only rescans changed directories)
--profile— Detailed performance profiling--no-cache,--cache-ttl— Cache management
Installation
Using Cargo
cargo install rudu From Source
git clone https://github.com/greensh16/rudu.git
cd rudu
cargo build --release
cargo install --path . Usage Examples
Basic Usage
# Scan current directory
rudu
# Scan specific directory
rudu /data
# With progress indicator
rudu /large/directory Memory-Limited Scanning (HPC Clusters)
# Limit memory to 512MB
rudu /large/dataset --memory-limit 512
# Very memory-constrained (128MB)
rudu /project --memory-limit 128 --no-cache
# Combine with other options
rudu /data --memory-limit 256 --depth 3 --threads 2 Filtering and Analysis
# Top-level directories only
rudu /data --depth 1
# Exclude directories
rudu /project --exclude .git --exclude node_modules
# Sort by size and show owners
rudu /data --sort size --show-owner Output Formats
# Export to CSV
rudu /data --output report.csv
# Combine options
rudu /project --depth 2 --sort size --exclude .git --output analysis.csv Performance Tuning
# Use 4 threads
rudu /large/directory --threads 4
# Enable profiling
rudu /project --profile --threads 8
# Disable caching
rudu /data --no-cache Memory Limiting for HPC
New in v1.4.0: rudu supports memory usage limits, making it suitable for High-Performance Computing environments where memory resources are strictly controlled.
Why Memory Limiting?
In HPC clusters, jobs are allocated specific amounts of memory. Exceeding limits can result in:
- Job termination by the scheduler (SLURM, PBS, etc.)
- Node instability affecting other users
- Poor cluster performance
How It Works
- Real-time monitoring — Tracks RSS (Resident Set Size) memory usage
- Graceful degradation — At 95% of limit, disables memory-intensive features
- Early termination — Stops scan if limit exceeded, returns partial results
- Platform awareness — Automatically handles platforms without RSS support
HPC Usage Example
#!/bin/bash
#SBATCH --mem=1G
#SBATCH --job-name=rudu-scan
rudu /lustre/project --memory-limit 900 --no-cache --threads 4 Memory Behavior
| Memory Usage | Behavior |
|---|---|
| < 95% limit | Normal operation with all features |
| 95-100% limit | Disables caching, reduces allocations |
| > 100% limit | Terminates early, returns partial results |
Performance
Benchmark Results
| Directory Type | Files/Dirs | du Time | rudu Time | Speedup |
|---|---|---|---|---|
| Small (1K files) | ~1,000 | 0.010s | 0.619s | 0.02x* |
| Medium (/usr/bin) | ~1,400 | 0.017s | 0.015s | 1.13x |
| Large (project) | ~10,000 | 0.106s | 0.052s | 2.04x |
*Note: For very small directories, rudu's startup overhead can make it slower than du. Performance benefits become apparent with larger directory structures.
When to Use
Use rudu for:
- Large directory structures (>5,000 files)
- Complex filtering requirements
- CSV output for analysis
- HPC clusters and memory-constrained environments
- Repeated scans (caching benefits)
Use du for:
- Very small directories (<1,000 files)
- Simple, quick size checks
- Legacy script compatibility
Contributing
Contributions are welcome! Please submit a Pull Request on GitHub.
Development Setup
git clone https://github.com/greensh16/rudu.git
cd rudu
# Run tests
cargo test
# Check formatting
cargo fmt --check
# Run linter
cargo clippy --all-targets -- -D warnings
# Build documentation
cargo doc --open License
GNU General Public License v3.0 - see the LICENSE file for details.