Skip to content

Implement detect-private-key as builtin hook#893

Merged
j178 merged 2 commits intoj178:masterfrom
lmmx:detect-private-key
Oct 16, 2025
Merged

Implement detect-private-key as builtin hook#893
j178 merged 2 commits intoj178:masterfrom
lmmx:detect-private-key

Conversation

@lmmx
Copy link
Collaborator

@lmmx lmmx commented Oct 15, 2025

Description

As discussed in #880 there is a priority to port more builtin hooks that are being used in projects with prek already, and detect-private-key is one of the more popular ones.

This hook prevents the accidental upload of private keys, so it's obviously very important that it's correct.

The python hook is very simple: here, it has a blacklist of

BLACKLIST = [
    b'BEGIN RSA PRIVATE KEY',
    b'BEGIN DSA PRIVATE KEY',
    b'BEGIN EC PRIVATE KEY',
    b'BEGIN OPENSSH PRIVATE KEY',
    b'BEGIN PRIVATE KEY',
    b'PuTTY-User-Key-File-2',
    b'BEGIN SSH2 ENCRYPTED PRIVATE KEY',
    b'BEGIN PGP PRIVATE KEY BLOCK',
    b'BEGIN ENCRYPTED PRIVATE KEY',
    b'BEGIN OpenVPN Static key V1',
]

and then it iterates over the blacklist, checking for substring matches against the entire file content:

    for filename in args.filenames:
        with open(filename, 'rb') as f:
            content = f.read()
            if any(line in content for line in BLACKLIST):

This computes a mask over the blacklist, i.e. a bool for each trigger string:

>>> [line in x for line in BLACKLIST]
[False, False, False, False, False, False, False, False, False, False]

the line in content is a substring match (sub-bytes match technically) returning a bool.

It triggers like this

>>> my_content = b"BEGIN RSA PRIVATE KEY\nHello\nworld"
>>> [line in my_content for line in BLACKLIST]
[True, False, False, False, False, False, False, False, False, False]
>>> my_content = b"Hello\nBEGIN RSA PRIVATE KEY\nworld"
>>> [line in my_content for line in BLACKLIST]
[True, False, False, False, False, False, False, False, False, False]
>>> my_content = b"helloBEGIN RSA PRIVATE KEYworld"
>>> [line in my_content for line in BLACKLIST]
[True, False, False, False, False, False, False, False, False, False]

The first is True so the any iterator will return True

>>> any(line in my_content for line in BLACKLIST)
True

Demo

Running this on the Apache Airflow repo (which uses it, and uses prek), pre-commit runs 40% faster (.08s vs .13s)

louis 🌟 ~/tmp/airflow $ pre-commit run -a --verbose
Detect if private key is added to the repository.........................Passed
- hook id: detect-private-key
- duration: 0.08s
louis 🌟 ~/tmp/airflow $ prek run -a --verbose
Detect if private key is added to the repository.........................Passed
- hook id: detect-private-key
- duration: 0.13s

and then with the new feature branch of prek, it runs in 0.47s (over 3x slower)

Detect if private key is added to the repository.........................Passed
- hook id: detect-private-key
- duration: 0.47s

I swapped to memchr::memmem::find (I see the memchr dependency is already here) and it sped up to become faster than pre-commit (now 40% faster) 🎉

Detect if private key is added to the repository.........................Passed
- hook id: detect-private-key
- duration: 0.05s

More importantly, if I write a new file (and git add it) to the airflow repo, it does indeed get detected:

louis 🌟 ~/tmp/airflow $ cat foo.md 
--- BEGIN RSA PRIVATE KEY ---
hello
louis 🌟 ~/tmp/airflow $ prek run -a --verbose
Detect if private key is added to the repository.........................Failed
- hook id: detect-private-key
- duration: 0.05s
- exit code: 1
  Private key found: foo.md

@lmmx lmmx force-pushed the detect-private-key branch from 119353b to 3626d91 Compare October 15, 2025 12:43
@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.80%. Comparing base (ad6fbf7) to head (5f4a6b3).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #893      +/-   ##
==========================================
+ Coverage   89.66%   89.80%   +0.13%     
==========================================
  Files          61       62       +1     
  Lines       11429    11552     +123     
==========================================
+ Hits        10248    10374     +126     
+ Misses       1181     1178       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lmmx lmmx force-pushed the detect-private-key branch 2 times, most recently from db7c873 to d91fc9f Compare October 15, 2025 13:05
@lmmx lmmx mentioned this pull request Oct 15, 2025
34 tasks
@j178 j178 changed the title feat(detect-private-key): implement builtin hook Implement detect-private-key as builtin hook Oct 15, 2025
@lmmx lmmx force-pushed the detect-private-key branch from d91fc9f to b40a83a Compare October 15, 2025 16:48
@j178 j178 merged commit 6df836b into j178:master Oct 16, 2025
18 checks passed
@lmmx lmmx deleted the detect-private-key branch October 16, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants