Skip to content

Refactor fix_trailing_whitespace#411

Merged
j178 merged 10 commits intoj178:masterfrom
tmdhhl:upgrade_fix_trailing_whitespace
Aug 12, 2025
Merged

Refactor fix_trailing_whitespace#411
j178 merged 10 commits intoj178:masterfrom
tmdhhl:upgrade_fix_trailing_whitespace

Conversation

@tmdhhl
Copy link
Contributor

@tmdhhl tmdhhl commented Aug 10, 2025

Closes #294
Closes #223

This PR refactors fix_trailing_whitespace.rs, improves the handling of trailing whitespace, especially for Markdown files where exactly two trailing spaces at the end of a line indicate a line break. Related to #294.

Main changes

  • Introduced a Chars type to encapsulate a custom list of characters to trim, resolving an issue where clap cannot correctly parse --chars= \t into Vec<char>.
  • Optimized the fix_file function logic:
    • Only preserve and normalize to exactly two trailing spaces on non-empty Markdown lines that already have two trailing spaces; it does not add missing trailing spaces.
    • Normalize lines with more than two trailing spaces to exactly two spaces.
    • Remove all trailing whitespace on blank lines or non-Markdown files, with no extra spaces preserved.
  • Added extensive unit tests covering:
    • Markdown lines with excessive or insufficient trailing spaces.
    • Forced Markdown mode.
    • Custom trim character sets.
    • Different line ending formats (including CRLF).
    • Files without trailing newline.
    • Wildcard Markdown extensions.
  • Only write back files if actual content changes occur, avoiding unnecessary writes.

About Chars type encapsulation and clap parsing issue

When using clap to parse command-line arguments, directly parsing a parameter like --chars=" \t" into a Vec<char> runs into issues because:

  • clap splits arguments by whitespace by default, causing strings containing spaces and tabs to be misparsed or treated as a single element.
  • For example, passing --chars=" \t" may not correctly convert into a character vector containing space and tab characters.

To solve this, we implemented a custom Chars type with the FromStr trait:

  • Chars wraps a Vec<char> internally and manually converts the input string into a vector of characters.
  • This approach allows correct parsing regardless of spaces, tabs, or other special characters included in the input.

This workaround bypasses clap's default parsing limitations and enables flexible handling of user-defined trim character sets.

Trailing two-space preservation rules and reference

  • Only lines that already have exactly two trailing spaces will preserve these spaces as Markdown line breaks.
  • Markdown lines without two trailing spaces will have their trailing whitespace removed normally, without automatically adding spaces.
  • This rule prevents unnecessary modifications and aligns with Markdown syntax standards.

This behavior is consistent with the original implementation in the open-source project pre-commit. For example:

@pytest.mark.parametrize(
    ('input_s', 'expected'),
    (
        ('foo \nbar \n', 'foo\nbar\n'),
        ('bar\t\nbaz\t\n', 'bar\nbaz\n'),
    ),
)
def test_fixes_trailing_whitespace(input_s, expected, tmpdir):
    path = tmpdir.join('file.md')
    path.write(input_s)
    assert main((str(path),)) == 1
    assert path.read() == expected

@j178 j178 added the performance Performance improvements label Aug 10, 2025
@j178 j178 requested a review from Copilot August 10, 2025 06:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the fix_trailing_whitespace.rs module to improve handling of trailing whitespace, particularly for Markdown files where two trailing spaces indicate a line break. The refactor resolves a clap parsing issue with custom character sets and optimizes the file processing logic.

Key changes:

  • Introduced a custom Chars type to work around clap's inability to parse --chars=" \t" correctly
  • Rewrote the file processing logic to use async I/O with temporary files and only write back when changes occur
  • Added comprehensive test coverage for various scenarios including CRLF handling, Unicode characters, and edge cases

@j178 j178 changed the title Refactor fix_trailing_whitespace Refactor fix_trailing_whitespace Aug 12, 2025
@j178 j178 merged commit c3cd21a into j178:master Aug 12, 2025
15 checks passed
@tmdhhl tmdhhl deleted the upgrade_fix_trailing_whitespace branch December 3, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid reading whole file into memory in fix_trailing_whitespace and fix_end_of_file hooks Add tests for fix-trailing-whitespace

2 participants