skip to content

Search

rudu

rudu is a high-performance, Rust-powered replacement for the traditional Unix du (disk usage) command. Built to provide a safer, faster, and more extensible alternative for scanning and analyzing directory sizes — especially for large-scale or deep filesystem structures.

Why rudu?

While du has been a reliable tool for decades, it's single-threaded, limited in extensibility, and not always ideal for custom workflows or integration with modern systems. rudu takes advantage of Rust's memory safety and concurrency to provide a tool that is:

  • Fast — uses multithreading (rayon) to speed up directory traversal and size aggregation
  • Safe — memory-safe by design, no segfaults or undefined behavior
  • Extensible — easy to add new flags, filters, and output formats
  • Accurate — reports true disk usage (allocated blocks), not just file sizes
  • Memory-aware — configurable memory limits for resource-constrained environments

Features

Core Functionality

  • Recursive disk usage scanning
  • Parallelized file traversal with multithreading
  • Real disk usage calculation (st_blocks × 512)
  • Cross-platform compatibility (macOS, Linux, BSD)
  • Platform-specific memory monitoring

Memory Management (v1.4.0+)

  • --memory-limit MB — Set maximum memory usage
  • Graceful memory handling with automatic feature disabling
  • Early termination when memory limit exceeded
  • HPC cluster support for resource-constrained environments

Filtering and Display

  • --depth N — Limit output to N levels deep
  • --exclude PATTERN — Exclude patterns (e.g., .git, node_modules)
  • --show-files true|false — Toggle file display
  • Clear [DIR] and [FILE] labels

Output & Analysis

  • --sort size|name — Sort by size or name
  • --show-owner — Display ownership information
  • --show-inodes — Show file/directory counts
  • --output report.csv — Export to CSV

Performance Features

  • --threads N — Control CPU thread usage
  • Intelligent caching for faster subsequent runs
  • Incremental scanning (only rescans changed directories)
  • --profile — Detailed performance profiling
  • --no-cache, --cache-ttl — Cache management

Installation

Using Cargo

cargo install rudu

From Source

git clone https://github.com/greensh16/rudu.git
cd rudu
cargo build --release
cargo install --path .

Usage Examples

Basic Usage

# Scan current directory
rudu

# Scan specific directory
rudu /data

# With progress indicator
rudu /large/directory

Memory-Limited Scanning (HPC Clusters)

# Limit memory to 512MB
rudu /large/dataset --memory-limit 512

# Very memory-constrained (128MB)
rudu /project --memory-limit 128 --no-cache

# Combine with other options
rudu /data --memory-limit 256 --depth 3 --threads 2

Filtering and Analysis

# Top-level directories only
rudu /data --depth 1

# Exclude directories
rudu /project --exclude .git --exclude node_modules

# Sort by size and show owners
rudu /data --sort size --show-owner

Output Formats

# Export to CSV
rudu /data --output report.csv

# Combine options
rudu /project --depth 2 --sort size --exclude .git --output analysis.csv

Performance Tuning

# Use 4 threads
rudu /large/directory --threads 4

# Enable profiling
rudu /project --profile --threads 8

# Disable caching
rudu /data --no-cache

Memory Limiting for HPC

New in v1.4.0: rudu supports memory usage limits, making it suitable for High-Performance Computing environments where memory resources are strictly controlled.

Why Memory Limiting?

In HPC clusters, jobs are allocated specific amounts of memory. Exceeding limits can result in:

  • Job termination by the scheduler (SLURM, PBS, etc.)
  • Node instability affecting other users
  • Poor cluster performance

How It Works

  1. Real-time monitoring — Tracks RSS (Resident Set Size) memory usage
  2. Graceful degradation — At 95% of limit, disables memory-intensive features
  3. Early termination — Stops scan if limit exceeded, returns partial results
  4. Platform awareness — Automatically handles platforms without RSS support

HPC Usage Example

#!/bin/bash
#SBATCH --mem=1G
#SBATCH --job-name=rudu-scan
rudu /lustre/project --memory-limit 900 --no-cache --threads 4

Memory Behavior

Memory Usage Behavior
< 95% limit Normal operation with all features
95-100% limit Disables caching, reduces allocations
> 100% limit Terminates early, returns partial results

Performance

Benchmark Results

Directory Type Files/Dirs du Time rudu Time Speedup
Small (1K files) ~1,000 0.010s 0.619s 0.02x*
Medium (/usr/bin) ~1,400 0.017s 0.015s 1.13x
Large (project) ~10,000 0.106s 0.052s 2.04x

*Note: For very small directories, rudu's startup overhead can make it slower than du. Performance benefits become apparent with larger directory structures.

When to Use

Use rudu for:

  • Large directory structures (>5,000 files)
  • Complex filtering requirements
  • CSV output for analysis
  • HPC clusters and memory-constrained environments
  • Repeated scans (caching benefits)

Use du for:

  • Very small directories (<1,000 files)
  • Simple, quick size checks
  • Legacy script compatibility

Contributing

Contributions are welcome! Please submit a Pull Request on GitHub.

Development Setup

git clone https://github.com/greensh16/rudu.git
cd rudu

# Run tests
cargo test

# Check formatting
cargo fmt --check

# Run linter
cargo clippy --all-targets -- -D warnings

# Build documentation
cargo doc --open

License

GNU General Public License v3.0 - see the LICENSE file for details.

Acknowledgments

  • Inspired by the classic Unix du command
  • Built with Rust
  • Uses Rayon for parallel processing
  • CLI powered by Clap
  • Progress indicators via Indicatif