🦙 llama.cpp
llama.cpp is a popular port of Facebook’s LLaMA model in C/C++.
The ggml file format has been deprecated. If you are using ggml models and you are configuring your model with a YAML file, specify, use the llama-stable backend instead. If you are relying in automatic detection of the model, you should be fine. For gguf models, use the llama backend.
Features
The llama.cpp model supports the following features:
Setup
LocalAI supports llama.cpp models out of the box. You can use the llama.cpp model in the same way as any other model.
Manual setup
It is sufficient to copy the ggml or guf model files in the models folder. You can refer to the model in the model parameter in the API calls.
You can optionally create an associated YAML model config file to tune the model’s parameters or apply a template to the prompt.
Prompt templates are useful for models that are fine-tuned towards a specific prompt.
Automatic setup
LocalAI supports model galleries which are indexes of models. For instance, the huggingface gallery contains a large curated index of models from the huggingface model hub for ggml or gguf models.
For instance, if you have the galleries enabled, you can just start chatting with models in huggingface by running:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q2_K.bin",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.1
}'
LocalAI will automatically download and configure the model in the model directory.
Models can be also preloaded or downloaded on demand. To learn about model galleries, check out the model gallery documentation.
YAML configuration
To use the llama.cpp backend, specify llama as the backend in the YAML file:
name: llama
backend: llama
parameters:
# Relative to the models path
model: file.gguf.bin
In the example above we specify llama as the backend to restrict loading gguf models only.
For instance, to use the llama-stable backend for ggml models:
name: llama
backend: llama-stable
parameters:
# Relative to the models path
model: file.ggml.bin