Serverless LLM - Text Embeddings Inference

This is a proof-of-concept (PoC) where I adapted TEI (Text Embedding Inference) framework to run as a serverless application in AWS Lambda. You can find more information about this project HERE.

Results

We conducted an experiemnt to evaluate the effectiveness of this PoC. The experiment consisted of the following steps:

The intfloat/multilingual-e5-small was chosen as the embedding model.
We sent A 600-hundred token text to the model and measured the time to return its result.
We took two differents measurements with this 600-hundred token text processing time - the time spent with cold-start and without it.
With the processing time in hands, we determined (1) the cost per million tokens processed (assuming 10% of executions with cold-start and 90% without) and (2) how many tokens can be processed per month for free with AWS quota.

Model	Time w/coldstart	Time w/o coldstart	Free M tokens/month	Cost/M tokens
E5 small	4 s	300 ms	17.8	$0.03

Running on AWS

Considering that you have configured AWS CLI on your computer, use the following steps to deploy:

Build docker image

First, download this repo and build its docker image, setting which model you want to use:

docker buildx build --build-arg MODEL_ID=<model_id> --platform linux/amd64 --tag <account_id>.dkr.ecr.<region>.amazonaws.com/<ecr_repo_name>:latest .

This command can take several minutes since TEI is a Rust framework and needs to compile everything.

Pull image to AWS

Login at AWS ECR, create the image repository, and pull the build:

aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account_id>.dkr.ecr.<region>.amazonaws.com  

aws ecr create-repository --repository-name <ecr_repo_name> --region <region>

docker push <account_id>.dkr.ecr.<region>.amazonaws.com/<ecr_repo_name>:latest

Create Lambda

Create the Lambda service and its role:

aws iam create-role --role-name lambda-basic-execution --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}'   
 
aws lambda create-function --region <region> --function-name tei_test --package-type Image --code ImageUri=<account_id>.dkr.ecr.`<region>`.amazonaws.com/<ecr_repo_name>:latest   --role arn:aws:iam::<account_id>:role/lambda-basic-execution --environment "Variables={MODEL_ID=<model_id>" --timeout <timeout> --memory-size <memory>

Running locally

Build & Run

In one terminal, execute:

docker buildx build --build-arg MODEL_ID=`<model_id>` --platform linux/amd64 --tag serverless_tei_test . 

docker run -e MODEL_ID=`<model_id>` --rm -p 9000:8080 --name serverless_tei_test serverless_tei_test

Calling the service

And in the other:

curl -X POST http://localhost:9000/2015-03-31/functions/function/invocations -H 'Content-Type: application/json' -d '{"inputs":["First text", "Second text"]}' | python3 -m json.tool

Next steps

Let's hope Hugging Face implements this kind of feature at TEI. Or you can help me transform this PoC into a fully functional application. You are more than welcome to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.cargo		.cargo
.github		.github
assets		assets
backends		backends
core		core
docs		docs
load_tests		load_tests
proto		proto
router		router
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile-cpu		Dockerfile-cpu
Dockerfile-cuda		Dockerfile-cuda
Dockerfile-cuda-all		Dockerfile-cuda-all
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cuda-all-entrypoint.sh		cuda-all-entrypoint.sh
index.py		index.py
lambda_entrypoint.sh		lambda_entrypoint.sh
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless LLM - Text Embeddings Inference

Results

Running on AWS

Running locally

Next steps

About

Releases

Packages

Languages

License

gustavo-vm/serverless-text-embeddings-inference

Folders and files

Latest commit

History

Repository files navigation

Serverless LLM - Text Embeddings Inference

Results

Running on AWS

Running locally

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages