Llama-3.1-8B-Instruct-4bit keeps looping at the end.

I'm on mlx-lm v0.19.1.

Running the following command with 4bit produced a bug where it would just generate the full 1000 max-tokens and just repeat the last two paragraphs over and over.

```bash
mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --max-kv-size 33000 --max-tokens 1000 --temp 0.0 --top-p 0.9 --seed 1000 --prompt -<../text/portugal.txt;say done
```

Here is my [Full prompt](https://pastebin.com/raw/U4fjjTuv) from Wikipedia article.

Just to see what happens, I increased --max-kv-size to 34k and --max-tokens to 2000, and it generated 2k tokens with the loop.

Running the exact same command replacing 4bit with 8bit generated the correct text with 811 tokens and stopped at the end without the loop.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.1-8B-Instruct-4bit keeps looping at the end. #1059

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama-3.1-8B-Instruct-4bit keeps looping at the end. #1059

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions