Tags: dobestan/sd
Tags
feat: add line-by-line mode as default, stream without loading files … …into memory (chmln#328) * refactor: remove pub(crate) This made no sense because we don't intend to ever release `sd` as a crate * remove old feature * address clippy lints and improve code quality * refactor: split sd and sd-cli * feat: add --line-by-line (-L) flag for line-by-line processing Add a new processing mode that handles input line by line instead of reading entire files into memory. This fixes several long-standing issues: - OOM on large files (O(line_size) memory instead of O(file_size)) - stdin waits for EOF (output now flushed per line, enables streaming) - `^` matches phantom empty line after trailing `\n` - `\s+$` eats newlines because `\s` sees `\n` across line boundaries The implementation strips `\n` before passing each line to the replacer, then restores it, so regex never sees newline characters. Files without trailing newlines are preserved as-is. In-place file modification uses the same atomic temp-file-and-rename pattern as the existing code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: make line-by-line the default, add --across (-A) for whole-file Line-by-line processing is now the default behavior. This provides better defaults for common use cases: lower memory usage, streaming stdin output, and predictable regex anchor behavior. For patterns that need to match across line boundaries (e.g. replacing \n or multi-line patterns), use the new --across / -A flag which restores the previous whole-file behavior. Pre-validates all input files before modifying any, matching the atomicity guarantees of the mmap-based code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add line-by-line vs across benchmarks to README Add benchmark results comparing line-by-line (default) and across (-A) modes on a 1M line (~36MB) test file: - Line-by-line is ~2-3x slower than across mode for throughput - Still faster than sed for regex replacements - Memory usage: 3 MB (line-by-line) vs 74 MB (across) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * perf: optimize line-by-line mode with chunked reading Replace per-line read_until() calls with chunked reading (8KB chunks) and a line buffer that spans chunk boundaries. This reduces syscall overhead and improves CPU cache locality. Benchmark results on 1M line file (~36MB): - Before: 357ms (2.84x slower than across mode, slower than sed) - After: 106ms (3.19x slower than across mode, 1.1x faster than sed) The trade-off between modes is: - Across mode: fastest (33ms), uses more memory (~74MB) - Line-by-line: now much faster (106ms), bounded memory usage - Line-by-line still respects memory limits for streaming use cases fix build, tests, and lint regressions remove file-mapping code paths and dependency
Error on `$<num><non_num>` capture replacement names (chmln#258) * Mostly working capture name validation * Improve inputs for property tests * Fix advancing when passing over escaped dollar signs * Switch to inline snapshot captures * Cleanup invalid capture error formatting code
Error on `$<num><non_num>` capture replacement names (chmln#258) * Mostly working capture name validation * Improve inputs for property tests * Fix advancing when passing over escaped dollar signs * Switch to inline snapshot captures * Cleanup invalid capture error formatting code
PreviousNext