2 releases
| 0.1.1 | Mar 6, 2026 |
|---|---|
| 0.1.0 | Mar 5, 2026 |
#536 in Filesystem
34KB
478 lines
nftw
nftw is a Rust crate to efficiently crawl a directory tree while mapping an arbitrary function to each path encountered.
nftw can process the files asynchronously while reading the directory tree using Nftw::par_map, which uncouples the tree traversal from the file processing.
This helps with compute-intensive jobs where each file takes significant time to process on a very large file system with millions of files, and you don't want to wait to start until all paths have been read or incur the overhead of storing all paths in a Vec<PathBuf>. The overall memory footprint of the traversal should in general be lower with nftw compared to walkdir.
It also supports serial file processing for simplicity and efficiency on smaller tasks and is configurable for filtering and controlling traversal behavior.
If you just want to collect paths in a Vec<PathBuf>, you should probably use the walkdir crate, although this crate can do the same thing with the appropriate closure argument.
The name nftw is a reference to the Unix C standard library function nftw(3), which provides similar capabilities.
Installation
Add this to your Cargo.toml:
[dependencies]
nftw = "0.1"
Usage
Serial mapping over paths
Apply a function to each file serially and collect results:
use nftw::Nftw;
use std::path::Path;
fn main() {
let walker = Nftw::new(Path::new("."));
let results = walker.map(|path| {
let metadata = std::fs::metadata(path)?;
Ok(metadata.len())
});
for result in results {
match result {
Ok(size) => println!("File size: {}", size),
Err(e) => eprintln!("Error: {}", e),
}
}
}
Parallel processing with custom thread count
Use Rayon to process files in parallel with a specified number of threads:
use nftw::Nftw;
use std::path::Path;
fn main() {
let mut walker = Nftw::new(Path::new("."));
walker.threads(4); // Use 4 worker threads
let results = walker.par_map(|path| {
let data = std::fs::read(path)?;
Ok(data.len())
});
for result in results {
match result {
Ok(size) => println!("File size: {}", size),
Err(e) => eprintln!("Error: {}", e),
}
}
}
Filtering options
Configure the walker with builder-pattern methods:
use nftw::Nftw;
use std::path::Path;
fn main() {
let mut walker = Nftw::new(Path::new("."));
walker
.ignore_symlinks() // Skip symbolic links
.ignore_hidden() // Skip hidden files (.*) and hidden directories
.one_per_leaf() // Only process one file per leaf directory
.same_device() // Don't cross mount points (Unix only)
.threads(8); // Use 8 threads
let results = walker.par_map(|path| {
// Process each path in parallel
Ok(path.to_path_buf())
});
}
Performance Considerations
- Use
map()when you just need to collect paths or when the per-file processing cost is minimal - Use
par_map()when processing each file is expensive (e.g., reading file contents, computing checksums, etc.) - The parallel implementation includes dedicated threads for:
- Directory traversal (1 thread)
- Result collection (implicit)
- Worker threads for processing (configurable via
threads())
- The default thread count is based on the number of CPU cores available
License
This project is licensed under the MIT License - see the LICENSE file for details.
Dependencies
~1–3MB
~58K SLoC