-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching to formatter #8089
Add caching to formatter #8089
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
3c66449
to
04b546f
Compare
@@ -169,7 +203,7 @@ impl Cache { | |||
/// | |||
/// This returns `None` if `key` differs from the cached key or if the | |||
/// cache doesn't contain results for the file. | |||
pub(crate) fn get<T: CacheKey>(&self, path: &RelativePath, key: &T) -> Option<&FileCache> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think supporting different cache keys for a single cache instance is a good idea. It can lead to key collisions and is prone to mixing different keys (which has the unexpected result that a stored value can not be retrieved).
E.g. I accidentally passed the hashed u64
instead of the FileCacheKey
. Rust was happy because u64
implements CacheKey
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this should be generic on the instance, right? (Looks like it was made non-generic which is also fine with me.)
04b546f
to
bbacadc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me. Is it ready for review / approval?
Not yet |
718c016
to
cce999e
Compare
cce999e
to
7ee0223
Compare
PR Check ResultsEcosystem✅ ecosystem check detected no changes. |
/// Path to the cache directory. | ||
#[arg(long, env = "RUFF_CACHE_DIR", help_heading = "Miscellaneous")] | ||
pub cache_dir: Option<PathBuf>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to have different environment variables (RUFF_CHECK_CACHE_DIR
, RUFF_FORMAT_CACHE_DIR
) since one could potentially provide different path to the check
/ format
subcommand.
Or, could make this a global flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would wait until we have a specific request to store the caches in different locations. I also don't know how to support different cache locations when check
lints and formats.
The alternative would be to have two different caches (the implementation I had before this morning) where format
and lint
store their results in different files. Different jobs could then cache the specific sub-directories only. However, the overall design of the Cache
API felt more fragile and would probably require a more drastic refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
@@ -169,7 +203,7 @@ impl Cache { | |||
/// | |||
/// This returns `None` if `key` differs from the cached key or if the | |||
/// cache doesn't contain results for the file. | |||
pub(crate) fn get<T: CacheKey>(&self, path: &RelativePath, key: &T) -> Option<&FileCache> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this should be generic on the instance, right? (Looks like it was made non-generic which is also fine with me.)
let start = Instant::now(); | ||
let (mut results, mut errors): (Vec<_>, Vec<_>) = paths | ||
.into_par_iter() | ||
.par_iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change, then cloning the path below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package_roots
borrows paths
until the end of the function. That means we can no longer use into_iter
. I don't expect this to be meaningful, considering that we only hit the clone
part when an error occurred.
) | ||
}) { | ||
Ok(inner) => inner.map(|result| FormatPathResult { | ||
path: resolved_file.into_path(), | ||
path: resolved_file.path().to_path_buf(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we keep the par_iter
, perhaps path.to_path_buf()
?
Did you profile what the time is spent on with a hot cache? 100ms sounds a lot to me for just checking modification timestamps. |
I did not. I assume it's the python file discovery as well as detecting the python packages, loading of the cache etc. Linting without changes takes 96ms. So it's about the same. I agree, that we could probably improve our caching overall. |
Summary
Implement caching for
ruff format
The formatter caching is relatively simple. All we need to track is the formatted files. This PR extends our
Cache
implementation to instead track the changes to the cache instead of the new "file cache" contents (similar to an event store). Meaning, the cache now stores achanges
collection instead ofnew_files
and aChange
either sets theformatted
flag or updates thelint
results.Test Plan
Manual testing:
The speedup from the benchmarking shows that caching is working.
Benchmarking
Hot formatter cache, no changes
Hot formatter and linter cache, no changes
slightly slower for projects with many diagnostics
No cache
Cold cache (slower)
Cold cache with changes (slower)
Hot cache with changes (e.g. when switching between branches)
Linting with caching