-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Calculate latency for each batch instead of just using the average for all queries #433
Conversation
sorry what's the difference between batch latencies and non-batch latencies? |
So when running without But while running in batches (i.e. with This PR will make the entries in the "times" dataset different and more precise because it will calculate the time taken for each batch and then divide it by the number of queries in that batch. It's still not perfect but it's really the best we can do. And as I mentioned previously, now p99 and p95 will make sense because "times" values will actually be different. Does this answer your question? |
Not entirely following. I guess there's no obvious definition of what p90 means in batched mode. What are you suggesting is the right way to define it? |
Not the perfect way. But, yeah, I think the best we can do when using batches because we don't know about individual queries in batch mode. The smaller the batch size, the better the approximation obviously. Tried a small experiment on replit.com to test this out: https://replit.com/@KShivendu/SecondaryLegalRelationalmodel#main.py Here are the results:
|
@erikbern any thoughts on this? ^ |
@erikbern pinging in case it wasn't noticed :) ^ |
I didn't have batch sizes in mind when I worked on the batch mode, your proposed change make sense to me @KShivendu. (Also seems to be a very minimal change.) |
We should calculate latency for each batch instead of just using the average because it will be more precise. At the moment, all "times" values are the same as the average which really defeats the purpose of p99 and p95 values.
Plus, I noticed a bug while running with
--batch --local
. All the algorithms have been refactored to have amodule.py
but the runner wasn't updated accordingly which makes it fail when running it locally with batches. To keep the change minimal I just modifieddefinitions.py
instead of all the config files.I just did this for Qdrant right now but can do it for others as well once this PR is approved and merged :)
Also, thanks for creating/maintaining this repo. Really great work!