GPU Server
General Setup
The GPU server should be used exclusively for programs that utilize its GPU in some
way, such as AI stuff or data processing. The idea is to have an API for the app server to
access when it needs to do something with that uses the GPU, and should pass any
required data to the GPU server in the request. This should allow us to run a variety of
functions on the GPU server while maintaining a single point of access.
After a lot of testing, Ollama just runs models better than anything else, and does a better
job of managing multiple models in VRAM than something like HuggingFace. So as much
as I hate to have an API for the API to use, that's how it's being done. Ollama runs in a
container with the port number 11434 . The API for using the GPU server will also run in a
docker container listening to port 8080 .
Docker will save its data to the RAID storage so large models don't shit up the OS'
partition.
Detailed Setup
During development I've just used the ubuntu user for everything, but we will probably
want to make a compliance user for production.
RAID
Set up partitions on drives (during install)
- Create 100G ext4 partition for OS at /
- Create 2G ext4 partition for boot at /boot
- Create unformatted partition with the remaining space
- Create RAID10 array from unformatted partitions with default name md0
Format RAID array to ext4 with sudo mkfs.ext4 /dev/md0
Mount the RAID array with sudo mount /dev/md0 /home/<user>/data
Get UUID for RAID with blkid | grep md0
- If nothing is returns, restart the server
Edit /etc/fstab to mount the RAID array on startup by adding UUID=<UUID of md0>
/mnt/md0 ext4 defaults
- If you had to restart the server to get the RAID's UUID, restart the daemon with
systemctl daemon-reload
Nvidia AI Workbench
The entire purpose of installing this is to get the right drivers for the GPU and things like
nvidia-container-toolkit to make the GPU work with Docker in one download. We won't
use this at all in production.
Install Nvidia AI Workbench with this script
sudo mkdir -p $HOME/nvwb/.nvwb/bin && \
sudo curl -L [Link]
cli/$(curl -L -s [Link]
cli/LATEST)/nvwb-cli-$(uname)-$(uname -m) --output
$HOME/nvwb/.nvwb/bin/nvwb-cli && \
sudo chmod +x $HOME/nvwb/.nvwb/bin/nvwb-cli && \
sudo -E $HOME/nvwb/.nvwb/bin/nvwb-cli install
Accepts the terms and conditions
Choose to install Docker instead of Podman
Choose to install the GPU drivers
Reboot the system
Docker
Ollama has to download the models somewhere, and they tend to take up a lot of space,
so we have to change Docker's settings to save data in the large RAID partition.
Make a directory for docker in the large RAID directory with sudo mkdir
/home/<user>/data/docker
Edit /etc/docker/[Link] and add a line to the dictionary "data-root":
"/home/<user>/data/docker"
Restart docker with service docker restart