Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Generation] Optimize the slow update method in the KVCacheDecoder #1190

Merged
merged 5 commits into from
Aug 24, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Aug 17, 2023

As reported by @mgoin and investigated by myself, the update method in the KVCacheDecoder is very slow.
Profiling has shown that this is due to the repeated use of the numpy.delete function:

image

This discussion:
https://stackoverflow.com/questions/30399534/shift-elements-in-a-numpy-array
hints that the most elegant and quite efficient replacement for numpy.delete would be slicing the arrays. This is the change that this PR introduces.

Short benchmarking numbers:

image

@dbogunowicz dbogunowicz marked this pull request as ready for review August 18, 2023 12:52
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending comment

@dbogunowicz dbogunowicz requested a review from bfineran August 22, 2023 10:30
@dbogunowicz dbogunowicz merged commit 1bd60d2 into main Aug 24, 2023
@dbogunowicz dbogunowicz deleted the feature/damian/optimize_decoder branch August 24, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants