Configuration Guideline for Model

Every application chart for LLM should include a modelConfig.yaml file in the root directory. This file provides the essential information required by an LLM.

Here's an example of what a modelConfig.yaml file might look like:

modelConfig.yaml Example

Yaml

source_url: https://huggingface.co/TheBloke/Yarn-Mistral-7B-128k-GGUF/resolve/main/yarn-mistral-7b-128k.Q4_K_M.gguf
id: yarnmistral7b
object: model
name: Yarn Mistral 7B Q4
version: '1.0'
description: Yarn Mistral 7B is a language model for long context and supports a 128k token context window.
format: gguf
settings:
  ctx_len: 4096
  prompt_template: |-
    {prompt}
parameters:
  temperature: 0.7
  top_p: 0.95
  stream: true
  max_tokens: 4096
  stop: []
  frequency_penalty: 0
  presence_penalty: 0
metadata:
  author: NousResearch, The Bloke
  tags:
  - 7B
  - Finetuned
  size: 4370000000
engine: nitro

source_url

Type: string

The model download source. It can be an external url or a local filepath.

id

Type: string

The model identifier, which can be referenced in the API endpoints.

object

Type: string
Default: model

The type of the object.

name

Type: string

Human-readable name that is used for UI.

version

Type: string

The version of the model.

description

Type: string

The description of the model.

format

Type: string

The format of the model.

settings

The model settings.

Configuration example:

Yaml

settings:
  ctx_len: 4096
  prompt_template: |-
    {prompt}

ctx_len

Type: int

The context length of the model.

prompt_template

Type: string

The prompt template of the model, which is used to generate the prompt part of the model input.

parameters

Parameters of the model.

Configuration example

parameters:
  temperature: 0.7
  top_p: 0.95
  stream: true
  max_tokens: 4096
  stop: []
  frequency_penalty: 0
  presence_penalty: 0

temperature

Type: float

The temperature parameter when the model generates text.

top_p

Type: float

The top-p parameter when the model generates text, which controls the probability distribution range of the output.

stream

Type: bool

Indicates whether the model generates text in a streaming manner.

max_tokens

Type: int

The maximum number of tokens generated by the model.

stop

Type: array

List of stop words.

frequency_penalty

Type: int

Frequency penalty parameter, used to adjust the frequency of vocabulary in the generated text.

presence_penalty

Type: int

Presence penalty parameter, used to adjust the probability of the presence of vocabulary in the generated text.

metadata

Metadata of the model.

Configuration example

metadata:
  author: NousResearch, The Bloke
  tags:
  - 7B
  - Finetuned
  size: 4370000000

author

Type: string

The author name of the model.

size

Type: int

The size of the model.

engine

Type: string

The model engine used.

Olares CLI

olares backups

Additional installations

macOS

Windows (WSL 2)

Tutorial

Create your first app

Configuration Guideline for Model

source_url

id

object

name

version

description

format

settings

ctx_len

prompt_template

parameters

temperature

top_p

stream

max_tokens

stop

frequency_penalty

presence_penalty

metadata

author

tags

size

engine

olares backups

Create your first app

Configuration Guideline for Model ​

source_url ​

id ​

object ​

name ​

version ​

description ​

format ​

settings ​

ctx_len ​

prompt_template ​

parameters ​

temperature ​

top_p ​

stream ​

max_tokens ​

stop ​

frequency_penalty ​

presence_penalty ​

metadata ​

author ​

tags ​

size ​

engine ​

Configuration Guideline for Model

source_url

id

object

name

version

description

format

settings

ctx_len

prompt_template

parameters

temperature

top_p

stream

max_tokens

stop

frequency_penalty

presence_penalty

metadata

author

tags

size

engine