Subsections of How-tos

Easy Demo - Full Chat Python AI

  • You will need about 10gb of RAM Free
  • You will need about 15gb of space free on C drive for Docker-compose

This is for Linux, Mac OS, or Windows Hosts. - Docker Desktop, Python 3.11, Git

Linux Hosts:

There is a Full_Auto installer compatible with some types of Linux distributions, feel free to use them, but note that they may not fully work. If you need to install something, please use the links at the top.

git clone https://github.com/lunamidori5/localai-lunademo.git

cd localai-lunademo

#Pick your type of linux for the Full Autos, if you already have python, docker, and docker-compose installed skip this chmod. But make sure you chmod the setup_linux file.

chmod +x Full_Auto_setup_Debian.sh or chmod +x Full_Auto_setup_Ubutnu.sh

chmod +x Setup_Linux.sh

#Make sure to install cuda to your host OS and to Docker if you plan on using GPU

./(the setupfile you wish to run)

Windows Hosts:

REM Make sure you have git, docker-desktop, and python 3.11 installed

git clone https://github.com/lunamidori5/localai-lunademo.git

cd localai-lunademo

call Setup.bat

MacOS Hosts:

  • I need some help working on a MacOS Setup file, if you are willing to help out, please contact Luna Midori on discord or put in a PR on Luna Midori’s github.

Video How Tos

  • Ubuntu - COMING SOON
  • Debian - COMING SOON
  • Windows - COMING SOON
  • MacOS - PLANED - NEED HELP

Enjoy localai! (If you need help contact Luna Midori on Discord)

  • Trying to run Setup.bat or Setup_Linux.sh from Git Bash on Windows is not working. (Somewhat fixed)
  • Running over SSH or other remote command line based apps may bug out, load slowly, or crash.

Easy Model Setup

Lets learn how to setup a model, for this How To we are going to use the Dolphin 2.2.1 Mistral 7B model.

To download the model to your models folder, run this command in a commandline of your picking.

curl --location 'http://localhost:8080/models/apply' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id": "TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q4_0.gguf"
}'

Each model needs at least 5 files, with out these files, the model will run raw, what that means is you can not change settings of the model.

File 1 - The model's GGUF file
File 2 - The model's .yaml file
File 3 - The Chat API .tmpl file
File 4 - The Chat API helper .tmpl file
File 5 - The Completion API .tmpl file

So lets fix that! We are using lunademo name for this How To but you can name the files what ever you want! Lets make blank files to start with

touch lunademo-chat.tmpl
touch lunademo-chat-block.tmpl
touch lunademo-completion.tmpl
touch lunademo.yaml

Now lets edit the "lunademo-chat.tmpl", This is the template that model “Chat” trained models use, but changed for LocalAI

<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>

For the "lunademo-chat-block.tmpl", Looking at the huggingface repo, this model uses the <|im_start|>assistant tag for when the AI replys, so lets make sure to add that to this file. Do not add the user as we will be doing that in our yaml file!

{{.Input}}
<|im_start|>assistant

Now in the "lunademo-completion.tmpl" file lets add this. (This is a hold over from OpenAI V0)

{{.Input}}

For the "lunademo.yaml" file. Lets set it up for your computer or hardware. (If you want to see advanced yaml configs - Link)

We are going to 1st setup the backend and context size.

backend: llama
context_size: 2000

What this does is tell LocalAI how to load the model. Then we are going to add our settings in after that. Lets add the models name and the models settings. The models name: is what you will put into your request when sending a OpenAI request to LocalAI

name: lunademo
parameters:
  model: dolphin-2.2.1-mistral-7b.Q4_0.gguf

Now that LocalAI knows what file to load with our request, lets add the template files to our models yaml file now.

template:
  chat: lunademo-chat-block
  chat_message: lunademo-chat
  completion: lunademo-completion

If you are running on GPU or want to tune the model, you can add settings like (higher the GPU Layers the more GPU used)

f16: true
gpu_layers: 4

To fully tune the model to your like. But be warned, you must restart LocalAI after changing a yaml file

docker compose restart

If you want to check your models yaml, here is a full copy!

backend: llama
context_size: 2000
##Put settings right here for tunning!! Before name but after Backend!
name: lunademo
parameters:
  model: dolphin-2.2.1-mistral-7b.Q4_0.gguf
template:
  chat: lunademo-chat-block
  chat_message: lunademo-chat
  completion: lunademo-completion

Now that we got that setup, lets test it out but sending a request to Localai!

—– Adv Stuff —–

(Please do not run these steps if you have already done the setup) Alright now that we have learned how to set up our own models, here is how to use the gallery to do alot of this for us. This command will download and set up (mostly, we will always need to edit our yaml file to fit our computer / hardware)

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "id": "model-gallery@lunademo"
   }'  

This will setup the model, models yaml, and both template files (you will see it only did one, as completions is out of date and not supported by OpenAI if you need one, just follow the steps from before to make one. If you would like to download a raw model using the gallery api, you can run this command. You will need to set up the 3 files needed to run the model tho!

curl --location 'http://localhost:8080/models/apply' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id": "NAME_OFF_HUGGINGFACE/REPO_NAME/MODENAME.gguf",
    "name": "REQUSTNAME"
}'

Easy Request - All

Curl Request

Curl Chat API -

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "lunademo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

This is for Python, OpenAI=>V1

OpenAI Chat API Python -

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-xxx")

messages = [
{"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
{"role": "user", "content": "Hello How are you today LocalAI"}
]
completion = client.chat.completions.create(
  model="lunademo",
  messages=messages,
)

print(completion.choices[0].message)

See OpenAI API for more info!

This is for Python, OpenAI=0.28.1

OpenAI Chat API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.ChatCompletion.create(
  model="lunademo",
  messages=[
    {"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
    {"role": "user", "content": "How are you?"}
  ]
)

print(completion.choices[0].message.content)

OpenAI Completion API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.Completion.create(
  model="lunademo",
  prompt="function downloadFile(string url, string outputPath) ",
  max_tokens=256,
  temperature=0.5)

print(completion.choices[0].text)

Easy Setup - CPU Docker

  • You will need about 10gb of RAM Free
  • You will need about 15gb of space free on C drive for Docker compose

We are going to run LocalAI with docker compose for this set up.

Lets setup our folders for LocalAI

mkdir "LocalAI"
cd LocalAI
mkdir "models"
mkdir "images"
mkdir -p "LocalAI"
cd LocalAI
mkdir -p "models"
mkdir -p "images"

At this point we want to set up our .env file, here is a copy for you to use if you wish, Make sure this is in the LocalAI folder.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## Default path for models
MODELS_PATH=/models

## Enable debug mode
# DEBUG=true

## Disables COMPEL (Lets Stable Diffuser work, uncomment if you plan on using it)
# COMPEL=0

## Enable/Disable single backend (useful if only one GPU is available)
# SINGLE_ACTIVE_BACKEND=true

## Specify a build type. Available: cublas, openblas, clblas.
BUILD_TYPE=cublas

## Uncomment and set to true to enable rebuilding from source
# REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT

# HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose file. It will use a container from quay.io. Also note this docker-compose file is for CPU only.

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:v2.0.0
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘
curl http://localhost:8080/models/available

Output will look like this:

Now that we got that setup, lets go setup a model

Easy Setup - Embeddings

To install an embedding model, run the following command

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "id": "model-gallery@bert-embeddings"
   }'  

Now we need to make a bert.yaml in the models folder

backend: bert-embeddings
embeddings: true
name: text-embedding-ada-002
parameters:
  model: bert

Restart LocalAI after you change a yaml file

When you would like to request the model from CLI you can do

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-ada-002"
  }'

See OpenAI Embedding for more info!

Easy Setup - GPU Docker

  • You will need about 10gb of RAM Free
  • You will need about 15gb of space free on C drive for Docker compose

We are going to run LocalAI with docker compose for this set up.

Lets Setup our folders for LocalAI

mkdir "LocalAI"
cd LocalAI
mkdir "models"
mkdir "images"
mkdir -p "LocalAI"
cd LocalAI
mkdir -p "models"
mkdir -p "images"

At this point we want to set up our .env file, here is a copy for you to use if you wish, Make sure this is in the LocalAI folder.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## Default path for models
MODELS_PATH=/models

## Enable debug mode
# DEBUG=true

## Disables COMPEL (Lets Stable Diffuser work, uncomment if you plan on using it)
# COMPEL=0

## Enable/Disable single backend (useful if only one GPU is available)
# SINGLE_ACTIVE_BACKEND=true

## Specify a build type. Available: cublas, openblas, clblas.
BUILD_TYPE=cublas

## Uncomment and set to true to enable rebuilding from source
# REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT

# HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose file. It will use a container from quay.io. Also note this docker-compose file is for CUDA only.

Please change the image to what you need.

  • master-cublas-cuda11
  • master-cublas-cuda11-core
  • v2.0.0-cublas-cuda11
  • v2.0.0-cublas-cuda11-core
  • v2.0.0-cublas-cuda11-ffmpeg
  • v2.0.0-cublas-cuda11-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

  • master-cublas-cuda12
  • master-cublas-cuda12-core
  • v2.0.0-cublas-cuda12
  • v2.0.0-cublas-cuda12-core
  • v2.0.0-cublas-cuda12-ffmpeg
  • v2.0.0-cublas-cuda12-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

version: '3.6'

services:
  api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:[CHANGEMETOIMAGENEEDED]
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘
curl http://localhost:8080/models/available

Output will look like this:

Now that we got that setup, lets go setup a model

Easy Setup - Stable Diffusion

To set up a Stable Diffusion model is super easy. In your models folder make a file called stablediffusion.yaml, then edit that file with the following. (You can change Linaqruf/animagine-xl with what ever sd-lx model you would like.

name: animagine-xl
parameters:
  model: Linaqruf/animagine-xl
backend: diffusers

# Force CPU usage - set to true for GPU
f16: false
diffusers:
  pipeline_type: StableDiffusionXLPipeline
  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: dpm_2_a

If you are using docker, you will need to run in the localai folder with the docker-compose.yaml file in it

docker-compose down #windows
docker compose down #linux/mac

Then in your .env file uncomment this line.

COMPEL=0

After that we can reinstall the LocalAI docker VM by running in the localai folder with the docker-compose.yaml file in it

docker-compose up #windows
docker compose up #linux/mac

Then to download and setup the model, Just send in a normal OpenAI request! LocalAI will do the rest!

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "Two Boxes, 1blue, 1red",
  "size": "256x256"
}'