Skip to content

Llama-3.1-8B-Instruct-4bit keeps looping at the end. #1059

@chigkim

Description

@chigkim

I'm on mlx-lm v0.19.1.

Running the following command with 4bit produced a bug where it would just generate the full 1000 max-tokens and just repeat the last two paragraphs over and over.

mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --max-kv-size 33000 --max-tokens 1000 --temp 0.0 --top-p 0.9 --seed 1000 --prompt -<../text/portugal.txt;say done

Here is my Full prompt from Wikipedia article.

Just to see what happens, I increased --max-kv-size to 34k and --max-tokens to 2000, and it generated 2k tokens with the loop.

Running the exact same command replacing 4bit with 8bit generated the correct text with 811 tokens and stopped at the end without the loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions