-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Llama-3.1-8B-Instruct-4bit keeps looping at the end. #1059
Copy link
Copy link
Open
Description
I'm on mlx-lm v0.19.1.
Running the following command with 4bit produced a bug where it would just generate the full 1000 max-tokens and just repeat the last two paragraphs over and over.
mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --max-kv-size 33000 --max-tokens 1000 --temp 0.0 --top-p 0.9 --seed 1000 --prompt -<../text/portugal.txt;say doneHere is my Full prompt from Wikipedia article.
Just to see what happens, I increased --max-kv-size to 34k and --max-tokens to 2000, and it generated 2k tokens with the loop.
Running the exact same command replacing 4bit with 8bit generated the correct text with 811 tokens and stopped at the end without the loop.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels