Name	Name	Last commit message	Last commit date
parent directory ..
byom-oss-llm-code	byom-oss-llm-code
byom-oss-llm-templates	byom-oss-llm-templates
resources	resources
.gitignore	.gitignore
README.md	README.md

Bringing Open-sourced LLMs into SAP AI Core

The open-source community surrounding Large Language Models (LLMs) is evolving rapidly, with new models, backends, libraries, and toolings constantly emerging. These developments enable the running of LLMs locally or on self-hosted environments. SAP AI Core is a service in the SAP Business Technology Platform that is designed to handle the execution and operations of your AI assets in a standardized, scalable, and hyperscaler-agnostic way. This repository serves as a guide on how to bring popular open-source Large Language Models (such as LLaMa 3, Phi-3, Mistral, Mixtral, LlaVA, Gemma, etc.) and open-source Text Embedding Models into SAP AI Core using widely adopted open-source LLM tools or backends, which complements SAP Generative AI Hub with self-hosted open-source LLMs.

Ollama
LocalAI
llama.cpp
vLLM
Custom Inference Server with Hugging Face Transformers Library
Infinity for open-source text embedding models from Massive Text Embedding Benchmark (MTEB)

Please refer to the blog post about Bring Open-Source LLMs into SAP AI Core for details.

Why running open-sourced LLMs with SAP AI Core?

Data Protection & Privacy
Security
Cost-effectiveness
Flexibility of choices for LLMs and LLM backends etc.
Making open-sourced LLMs enterprise ready

Solution Architecture

In principle, there are three essential parts for bringing an open-source LLM/LMM into SAP AI Core.

Commercially viable Open-Source or Open-Weight Models: e.g. Mistral, Mixtral, LLaVa etc.
Public accessible Model Hub: For instance, Ollama Model Library tailored for Ollama, Hugging Face as a general purposed model repository.
Inference server in SAP AI Core: You can bring your own code to implement an inference server, for example,Custom Inference Server with Hugging Face Transformers Library. Alternatively, there are open-source and ready-to-use llm inference servers which can be reused in SAP AI Core, like Ollama, LocalAI, llama.cpp and vLLM with minimal custom code as a custom Dockerfile and configurable serving template adapted for SAP AI Core. Ollama is recommended for its simplicity and efficiency.

Why leverage Ollama, LocalAI, llama.cpp and vLLM in SAP AI Core?

Ollama, LocalAI, llama.cpp and vLLM offer a comprehensive solution for running Large Language Models (LLMs) locally or in self-hosted environments. Their full stack capabilities include:

Model Management: Dynamically pull or download LLMs from model repository through API during run-time (exclusive to Ollama and LocalAI. vLLM provides seamless integration with Hugging Face models)
Running LLM efficiently with GPU Acceleration in SAP AI Core using open-source backends such as llama.cpp, vllm, transformer, exllama etc.
Serving with OpenAI-compatible chat completions and embedding APIs.
Easy deployment and setup without the need of without the need for custom code deployment in SAP AI Core.
Commercially viability: They are all under MIT licenses or Apache 2.0 License.

Ollama vs LocalAI in context of SAP AI Core

	Ollama	LocalAI
Description	"Ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models."	"LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inference..."
Recommendation	Recommended if just inference LLMs/LMMs in SAP AI Core. See its AI capabilities below for detail.	Recommended if speech recognition, speech generation and image generation are also required apart from LLMs/LMMs.
AI Capabilities	-Text generation -Vision -Text Embedding	-Text generation -Vision -Text Embedding -Speech to Text -Text to Speech -Image Generation
Installation & Setup	Easy installation and setup	Assure to use the corresponding docker image or build with the right variables for GPU acceleration.
GPU Acceleration	Automatically detect and apply GPU	Supported. Require configuration on model
Model Management	Easy built-in model management through CLI command or APIs	Experimental model gallery May require additional configuration for GPU acceleration per model
Supported Backends	llama.cpp	multi-backend support and backend agnostic. Default backend as llama.cpp, also support extra backends such as vLLM, rwkv, huggingface transformer, bert, whisper.cpp etc. Please check its model compatibility table for details
Supported Models	Built-in Model Library	Experimental Model Gallery
Model Switching	Seamless model switching with automatic memory management	Supported
APIs	-Model Management API -OpenAI-compatible chat/complemtions API -Embedding API	-Model Management API -Text Generation API -OpenAI-compatible chat/complemtions API -Embedding API
Model Customization	supported	supported
License	MIT	MIT

llama.cpp vs vLLM in context of SAP AI Core

	llama.cpp	vLLM
Description	"The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud."	"A high-throughput and memory-efficient inference and serving engine for LLMs"
Recommendation	Recommended for private custom LLMs or fine tuning model.	Recommended for private custom LLMs or fine tuning models.
AI Capabilities	-Text generation -Vision -Text Embedding	-Text generation -Vision -Text Embedding
Deployment & Setup	Easy deployment via docker. Many arguments to explore on starting llama.cpp server	Easy deployment via docker. Many engine arguments on starting vllm.entrypoints.openai.api_server
GPU Acceleration	Supported	Supported
Model Management	Not supported. Need external tool(wget etc) to download models from Hugging Face	Seamless integration with popular HuggingFace models
Supported Quantization	1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization	GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
Supported Models	https://github.com/ggerganov/llama.cpp > Supported models	Supported Model
Model Switching	Not supported. One deployment for one model.	Not supported. One deployment for one model.
APIs	-OpenAI-compatible chat/complemtions API -Embedding API	-OpenAI-compatible chat/complemtions API -Embedding API
License	MIT	Apache-2.0 license

How to bring open-sourced LLMs into SAP AI Core

In the following section, we see how to bring open-sourced LLMs into SAP AI Core with Ollama, LocalAI, llama.cpp and vLLM.

Prerequisites

The following softwares are required to serve an AI model in SAP AI Core. Please follow this tutorial to provision and set up your own SAP AI Core if it is new to you, which will cover the list below.

1. Use Boosters for Standard Plan of SAP AI Core and SAP AI Launchpad(Optional)

Important: Please assure to entitle Standard Plan or Extended Plan of SAP AI Core, which require a BTPEA, or Pay-As-You-Go contract. Please refer to pricing of SAP AI Core for detail. Due to restriction of Free Tier service plan. We can't run the open-source llms with Free Tier plan. Please refer to the official document about Resource Plan in SAP AI Core for detail.

For the Free Tier service plan, only the Starter resource plan is available. Specifying other plans will result in error. For the Standard service plan, all resource plans are available. For more information, see Free Tier and Service Plans.

In this sample, SAP AI Launchpad is optional, and only used to show and check the results. All the configurations like create resource group, docker registry secret, github repo onboarding, application, configuration and deployment etc. are automated through SAP AI Core SDK. However, it is still recommended to have SAP AI Launchpad for more user-friendly graphical cockpit to perform administration tasks, especially if you are new to SAP AI Core.

2. Set Up Tools to Connect With and Operate SAP AI Core

Please skip if you have previously completed the initial configurations for your SAP AI Core.

3. Generate a GitHub personal access token

Please skip if you have done it before.
Only take the steps about Generate a GitHub personal access token, which will be used to onboard the github repository into SAP AI Core afterwards.

4. Install Docker Desktop and create a personal Docker Registry

Please skip if you have done it before.
Instructions can be found here, Step 1 to 4. We recommend you to create an access token to be used in place of your password. Instructions on how to generate a token can be found here.

5. Install Git and Visual Studio Code(Optional)

Install Git by following the instructions here.
Download and Install Visual Studio Code by following instructions here

6.Fork the GitHub repository of btp-generative-ai-hub-use-cases

Fork with this url into your own github account. Set your forked repository to private to prevent from public access.

7.Clone your forked repository

git clone <YOUR_FORKED_REPOSITORY_URL>

8.Setup a local Python3 environment

Download and Install Python3(>=3.7) in your local environment from here or other approaches.
Create a virtual environment and install the dependencies

# Create a virtual env and install the dependencies 
cd btp-generative-ai-hub-use-cases/10-byom-oss-llm-ai-core
python3 -m venv oss-llm-env
source oss-llm-env/bin/activate
pip3 install -r byom-oss-llm-code/requirements.txt

Perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core

Please follow and run the jupyter notebook 00-init-config.ipynb to perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core. To run the jupyter notebook, you can use either of the options below

JupyterLab

# Start the JupyterLab
jupyter lab

Visual Studio Code

Option 1: (Recommended) Bring open-sourced LLMs into SAP AI Core with Ollama

Please refer to this blog post about Bring Open-Source LLMs into SAP AI Core with Ollama for more details.

Please follow the jupyter notebooks below to deploy and test Ollama in SAP AI Core.

01-deployment.ipynb
02-ollama.ipynb for testing Ollama's Model pulling API, Text Generation API, OpenAI-like Chat Completion API in SAP AI Core through direct API call.
02-ollama-sap-genai-hub-sdk.ipynb for testing Ollama's OpenAI-like Chat Completion API in SAP AI Core through SAP Generative AI Hub SDK and Langchain
03-ollama-function-call.ipynb for testing Function Calling with Ollama on the open-weight models Meta's llama 3.1 or Mistral's mistral v0.3.
03-ollama-llava.ipynb for testing Ollama's Text Generation API on LLaVa model with vision capability in SAP AI Core through direct API call.
04-cleanup.ipynb

Option 2: Bring open-sourced LLMs into SAP AI Core with LocalAI

Please follow the jupyter notebooks below to deploy and test LocalAI in SAP AI Core.

Option 3: Bring open-sourced LLMs into SAP AI Core with llama.cpp

Please follow the jupyter notebooks below to deploy and test llama.cpp in SAP AI Core.

Option 4: Bring open-sourced LLMs into SAP AI Core with vLLM

Please follow the jupyter notebooks below to deploy and test vLLM in SAP AI Core.

Option 5: Bring open-sourced LLMs into SAP AI Core with a Custom Inference Server with Hugging Face Transformer Library

Please follow the jupyter notebooks below to deploy and test Custom Transformer Server in SAP AI Core.

01-deployment.ipynb for building the docker image and starting a deployment
02-transformer-direct-api-call.ipynb: Sample code to inference Microsoft's Phi-3-vision-128k-instruct served in custom inference server with hugging face transformers library within SAP AI Core through direct API call.
03-transformer-sap-genai-hub-sdk.ipynb: Sample code to inference Microsoft's Phi-3-vision-128k-instruct served in custom inference server with hugging face transformers library within SAP AI Core through SAP Generative AI Hub SDK and LangChain.
04-cleanup.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10-byom-oss-llm-ai-core

10-byom-oss-llm-ai-core

README.md

Bringing Open-sourced LLMs into SAP AI Core

Why running open-sourced LLMs with SAP AI Core?

Solution Architecture

Why leverage Ollama, LocalAI, llama.cpp and vLLM in SAP AI Core?

Ollama vs LocalAI in context of SAP AI Core

llama.cpp vs vLLM in context of SAP AI Core

How to bring open-sourced LLMs into SAP AI Core

Prerequisites

1. Use Boosters for Standard Plan of SAP AI Core and SAP AI Launchpad(Optional)

2. Set Up Tools to Connect With and Operate SAP AI Core

3. Generate a GitHub personal access token

4. Install Docker Desktop and create a personal Docker Registry

5. Install Git and Visual Studio Code(Optional)

6.Fork the GitHub repository of btp-generative-ai-hub-use-cases

7.Clone your forked repository

8.Setup a local Python3 environment

Perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core

Option 1: (Recommended) Bring open-sourced LLMs into SAP AI Core with Ollama

Option 2: Bring open-sourced LLMs into SAP AI Core with LocalAI

Option 3: Bring open-sourced LLMs into SAP AI Core with llama.cpp

Option 4: Bring open-sourced LLMs into SAP AI Core with vLLM

Option 5: Bring open-sourced LLMs into SAP AI Core with a Custom Inference Server with Hugging Face Transformer Library

License

Files

10-byom-oss-llm-ai-core

Directory actions

More options

Directory actions

More options

Latest commit

History

10-byom-oss-llm-ai-core

Folders and files

parent directory

README.md

Bringing Open-sourced LLMs into SAP AI Core

Why running open-sourced LLMs with SAP AI Core?

Solution Architecture

Why leverage Ollama, LocalAI, llama.cpp and vLLM in SAP AI Core?

Ollama vs LocalAI in context of SAP AI Core

llama.cpp vs vLLM in context of SAP AI Core

How to bring open-sourced LLMs into SAP AI Core

Prerequisites

1. Use Boosters for Standard Plan of SAP AI Core and SAP AI Launchpad(Optional)

2. Set Up Tools to Connect With and Operate SAP AI Core

3. Generate a GitHub personal access token

4. Install Docker Desktop and create a personal Docker Registry

5. Install Git and Visual Studio Code(Optional)

6.Fork the GitHub repository of btp-generative-ai-hub-use-cases

7.Clone your forked repository

8.Setup a local Python3 environment

Perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core

Option 1: (Recommended) Bring open-sourced LLMs into SAP AI Core with Ollama

Option 2: Bring open-sourced LLMs into SAP AI Core with LocalAI

Option 3: Bring open-sourced LLMs into SAP AI Core with llama.cpp

Option 4: Bring open-sourced LLMs into SAP AI Core with vLLM

Option 5: Bring open-sourced LLMs into SAP AI Core with a Custom Inference Server with Hugging Face Transformer Library

License