2 releases

0.1.1	Mar 6, 2026
0.1.0	Mar 5, 2026

#536 in Filesystem

MIT license

34KB
478 lines

nftw

nftw is a Rust crate to efficiently crawl a directory tree while mapping an arbitrary function to each path encountered.

nftw can process the files asynchronously while reading the directory tree using Nftw::par_map, which uncouples the tree traversal from the file processing. This helps with compute-intensive jobs where each file takes significant time to process on a very large file system with millions of files, and you don't want to wait to start until all paths have been read or incur the overhead of storing all paths in a Vec<PathBuf>. The overall memory footprint of the traversal should in general be lower with nftw compared to walkdir.

It also supports serial file processing for simplicity and efficiency on smaller tasks and is configurable for filtering and controlling traversal behavior.

If you just want to collect paths in a Vec<PathBuf>, you should probably use the walkdir crate, although this crate can do the same thing with the appropriate closure argument.

The name nftw is a reference to the Unix C standard library function nftw(3), which provides similar capabilities.

Installation

Add this to your Cargo.toml:

[dependencies]
nftw = "0.1"

Usage

Serial mapping over paths

Apply a function to each file serially and collect results:

use nftw::Nftw;
use std::path::Path;

fn main() {
    let walker = Nftw::new(Path::new("."));
    let results = walker.map(|path| {
        let metadata = std::fs::metadata(path)?;
        Ok(metadata.len())
    });
    
    for result in results {
        match result {
            Ok(size) => println!("File size: {}", size),
            Err(e) => eprintln!("Error: {}", e),
        }
    }
}

Parallel processing with custom thread count

Use Rayon to process files in parallel with a specified number of threads:

use nftw::Nftw;
use std::path::Path;

fn main() {
    let mut walker = Nftw::new(Path::new("."));
    walker.threads(4);  // Use 4 worker threads
    
    let results = walker.par_map(|path| {
        let data = std::fs::read(path)?;
        Ok(data.len())
    });
    
    for result in results {
        match result {
            Ok(size) => println!("File size: {}", size),
            Err(e) => eprintln!("Error: {}", e),
        }
    }
}

Filtering options

Configure the walker with builder-pattern methods:

use nftw::Nftw;
use std::path::Path;

fn main() {
    let mut walker = Nftw::new(Path::new("."));
    walker
        .ignore_symlinks()           // Skip symbolic links
        .ignore_hidden()              // Skip hidden files (.*) and hidden directories
        .one_per_leaf()               // Only process one file per leaf directory
        .same_device()                // Don't cross mount points (Unix only)
        .threads(8);                  // Use 8 threads
    
    let results = walker.par_map(|path| {
        // Process each path in parallel
        Ok(path.to_path_buf())
    });
}

Performance Considerations

Use map() when you just need to collect paths or when the per-file processing cost is minimal
Use par_map() when processing each file is expensive (e.g., reading file contents, computing checksums, etc.)
The parallel implementation includes dedicated threads for:
- Directory traversal (1 thread)
- Result collection (implicit)
- Worker threads for processing (configurable via threads())
The default thread count is based on the number of CPU cores available

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dependencies

~1–3MB
~58K SLoC