This repository contains hand-made quasi-bindings for some of the functions from llama.h, allowing to use it with FFI.
It doesn't have ALL functions!
I made it for my other project and I'm planning to add a couple of more functions/examples
to complete embeddings/full inference, but I am not planning to maintain it/keep up-to-date in the near future.
Please, feel free to do whatever you want with it!!! and let me know if you wanna add something.
It assumes .dylib is used, so it's tested on MacOS only, but I'm pretty sure it could work with other architectures, too. The only requirement would be to update build section in Makefile accordingly.
It was last tested with llama.cpp release b3756, but I suspect it will work with any version.
Please note, I'm a noob and this repo might be incorrect. Use at your own risk :)
Part of the code was generated with help of LLMs in-context with current state, so it should be possible to complete the code with this approach.
Once you build the project, you can get rid of everything apart from:
- libllama.dylib - main dependency, containing llama.cpp code
- llama.rkt - contains the bindings that allow to call functions from above library
You need to have installed ffi/unsafe library for racket.
The examples at the end show how you can incorporate it within your project.
- Initialize llama.cpp submodule:
make update
- Build libllama.dylib library:
make build
If you changed something in llama.cpp source code (or updated submodule again) and want to rebuild, use this:
make rebuild
rebuild can be executed multiple times. It basically clears the state and builds libllama again.
In any case the lib will be placed in main directory as libllama.dylib
-
Download a llama.cpp compatible model.
-
Once the project is built, you can use it with llama.rkt. See, examples below.
Both examples below load bindings from llama.rkt and take model.gguf from main dir (I use city96/t5-v1_1-xxl-encoder-gguf).
racket ./example-tokenizer.rkt
racket ./example-batch-initialization.rkt