Basic Completion: Text-to-Text

As mentioned earlier, as developers, we should not be stuck thinking of LLMs as “Chat” interfaces, but instead think of them conceptually as completion machines. While we will be sending the API a series of “Messages”, we have to expand our mind and think of completions in the larger context.

To do a basic completion request using Preternatural, the following variables should be considered:

The LLM Client

Choosing the right LLM API provider (e.g. OpenAI, Anthropic, Gemini, Mistral, Groq etc) for your app or a specific use-case within your app requires a lot of experimentation. Each provider offers multiple models, which vary in pricing and accuracy of the result. For example, it may be possible to get the output you require from an older cheaper model simply by adding more specific instructions in the prompts.

This is why Preternatural is designed to easily switch between different LLM API providers and their models. Simply get the API key from the LLM API provider’s website and initiate the LLM client for interacting with their API as follows:

import AI
 
// OpenAI / GPT
import OpenAI
 
let client: OpenAI.Client = OpenAI.Client(apiKey: "YOUR_API_KEY")
 
// Anthropic / Claude
import Anthropic
 
let client: Anthropic.Client  = Anthropic.Client(apiKey: "YOUR_API_KEY")
 
// Mistral
import Mistral
 
let client: Mistral.Client = Mistral.Client(apiKey: "YOUR_API_KEY")
 
// Groq
import Groq
 
let client: Groq.Client = Groq.Client(apiKey: "YOUR_API_KEY")
 
// ElevenLabs
import ElevenLabs
 
let client: ElevenLabs.Client = ElevenLabs.Client(apiKey: "YOUR_API_KEY")

Note that If you need to abstract out the LLM Client (for example, if you want to allow your user to choose between clients) but still make the same completion call regardless of the client, simply initialize an instance of LLMRequestHandling with an LLM API provider of your choice:

// OpenAI / GPT
let client: any LLMRequestHandling = OpenAI.Client(apiKey: "YOUR_API_KEY")
// Anthropic / Claude
let client: any LLMRequestHandling  = Anthropic.Client(apiKey: "YOUR_API_KEY")
// Mistral
let client: any LLMRequestHandling = Mistral.Client(apiKey: "YOUR_API_KEY")
// Groq
let client: any LLMRequestHandling = Groq.Client(apiKey: "YOUR_API_KEY")

Other LLM API providers may be supported in the future as the ecosystem continues to develop.

The LLM Model

For each LLM API provider, Preternatural supports multiple models:

// OpenAI GPT Models
let gpt_4o_Model: OpenAI.Model = .gpt_4o
let gpt_4_Model: OpenAI.Model = .gpt_4
let gpt_3_5_Model: OpenAI.Model = .gpt_3_5
let otherGPTModels: OpenAI.Model = .chat(.gpt_OTHER_MODEL_OPTIONS)
 
// Anthropic Models
let caludeHaikuModel: Anthropic.Model = .haiku
let claudeSonnetModel: Anthropic.Model = .sonnet
let claudeOpusModel: Anthropic.Model = .opus
 
// Mistral Models
let mistralTiny: Mistral.Model = .mistral_tiny
let mistralSmall: Mistral.Model = Mistral.Model.mistral_small
let mistralMedium: Mistral.Model = Mistral.Model.mistral_medium
 
// Groq Models
let gemma_7b: Groq.Model = .gemma_7b
let llama3_8b: Groq.Model = .llama3_8b
let llama3_70b: Groq.Model = .llama3_70b
let mixtral_8x7b: Groq.Model = .mixtral_8x7b
 
// ElevenLabs Models
let multilingualV2: ElevenLabs.Model = .MultilingualV2
let turboV2: ElevenLabs.Model = .TurboV2 // English
let multilingualV1: ElevenLabs.Model = .MultilingualV1
let englishV1: ElevenLabs.Model = .EnglishV1

The System Prompt (Optional)

Use the System Prompt to give the Assistant general instructions.

The OpenAI documentation (opens in a new tab)says the following about the System Prompt:

Typically, a conversation is formatted with a system message first, followed by alternating user and assistant messages. The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as "You are a helpful assistant.

View the collection of leaked system prompts from popular AI-powered products on jujumilk3’s Github page (opens in a new tab) to learn more about how to structure your own prompt.

You would specify the System Prompt as follows when using Preternatural:

let systemPrompt: PromptLiteral = "You are a highly knowledgeable and engaging assistant specializing in movie trivia. Your role is to provide accurate, concise, and interesting answers to questions about movies. This includes information about actors, directors, movie plots, release dates, box office statistics, awards, and any other film-related topics. Ensure that your responses are well-researched, informative, and enjoyable to read. Use a friendly and approachable tone, and feel free to add interesting facts or anecdotes where relevant to enhance the user's experience."
 
let systemPromptMessage = AbstractLLM.ChatMessage(
    role: .system,
    body: systemPrompt)

The User Prompt

Now that you’ve specified the general function of the AI Assistant in the System Prompt, you can use the User Prompt to ask the Assistant for a question related to a specific query in your system.

let userPrompt: PromptLiteral  = "In the movie 'The Matrix' (1999), what color pill does Neo take to learn the truth about the Matrix?"
  
let userPromptMessage = AbstractLLM.ChatMessage(
    role: .user,
    body: userPrompt)

Parameters (Optional)

Parameters include Token Limit, Temperature or Top Probability Mass (Top-P), and Stop Sequences. While these are all optional, playing with these parameters is something to consider experimenting with as you figure out the exact settings your app needs to get a more accurate response.

Token Limit

The token limit is the maximum number of tokens to generate shared between the prompt and completion. The exact limit varies by model. As most models charge on a per-token basis, you may want to set a token limit to manage the cost per response.

// if not set, the maximum tokens a model provides is the default
let maxTokenLimit: AbstractLLM.TokenLimit = .max 
 
// set to a specific limit
let limit = 200
let fixedTokenLimit: AbstractLLM.TokenLimit = .fixed(limit)
 
// set the token limit in the Parameters object
let parameters = AbstractLLM.ChatCompletionParameters(
            tokenLimit: fixedTokenLimit,
            temperatureOrTopP: nil,
            stops: nil,
            functions: nil)

Temperature or Top Probability Mass (Top-P)

While it may appear from our use of ChatGPT and other new AI-powered products that large language models (LLMs) provide a single response to each input, the reality is different. LLMs can generate a variety of responses. They assign probabilities to different words (more specifically “tokens”) and, based on what the model admin is looking for, they can either choose more conservative (closer to 100% likely) or less predictable (less than 70% likely) tokens in their potential responses.

For instance, a conservative approach is suitable if your app is designed for crafting professional business responses. On the other hand, if your app is intended for creative writing like poems, less predictable responses can foster creativity in the model.

Setting the Temperature or Top-P controls the randomness of the result.

Temperature Temperature is typically a number between 0 and 2 (although some models work with values up to 1.0) Lowering temperature (closer to 0) results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive. Higher values like 1.2 will make the output more random.

let conservativeTemperature: AbstractLLM.TemperatureOrTopP = .temperature(0.8)
let creativeTemperature: AbstractLLM.TemperatureOrTopP = .temperature(1.2)
 
// set the temperature in the parameters
let parameters = AbstractLLM.ChatCompletionParameters(
            tokenLimit: nil,
            temperatureOrTopP: creativeTemperature,
            stops: nil,
            functions: nil)

Top Probability Mass (Top-P) Top Probability Mass (Top-P) is a number between 0 and 1, with 1 as the default. It is an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

let topPForMostLikelyResults: AbstractLLM.TemperatureOrTopP = .topProbabilityMass(0.1)
 
// set the Top P in the parameters
let parameters = AbstractLLM.ChatCompletionParameters(
            tokenLimit: nil,
            temperatureOrTopP: topPForMostLikelyResults,
            stops: nil,
            functions: nil)

Stop-Sequences

Stop-sequences in large language models (LLMs) are used to indicate to the model when to stop generating further text. These are particularly useful in scenarios where you want to limit the response to a specific length or until a certain character, word, or phrase appears.

For example, if you're generating a list and want the model to stop after a specific number of items, or if you're generating a sentence and want it to stop at the end of the sentence (e.g., at a period). Stop sequences ensure that the generated text is relevant, concise, and meets the specific requirements of your application.

Check to see how many stop-sequences your LLM API provider allows for. For example, OpenAI only accepts up to four sequences. Note that the returned text will not contain the stop sequence.

let stopSequences = ["End of Chapter"]
 
// set the Stop Sequences in the parameters
let parameters: AbstractLLM.ChatCompletionParameters = .init(
            tokenLimit: nil,
            temperatureOrTopP: nil,
            stops: stopSequences,
            functions: nil)

Completion

Here’s how to make a full completion call with Preternatural:

import OpenAI
 
let client: OpenAI.Client  = OpenAI.Client(apiKey: "YOUR_API_KEY")
 
let systemPrompt: PromptLiteral = "You are a highly knowledgeable and engaging assistant specializing in movie trivia. Your role is to provide accurate, concise, and interesting answers to questions about movies. This includes information about actors, directors, movie plots, release dates, box office statistics, awards, and any other film-related topics. Ensure that your responses are well-researched, informative, and enjoyable to read. Use a friendly and approachable tone, and feel free to add interesting facts or anecdotes where relevant to enhance the user's experience."
 
let userPrompt: PromptLiteral  = "In the movie 'The Matrix' (1999), what color pill does Neo take to learn the truth about the Matrix?"
 
let messages: [AbstractLLM.ChatMessage] = [
        .system(systemPrompt),
        .user(userPrompt)
  ]
 
let parameters = AbstractLLM.ChatCompletionParameters(
    tokenLimit: .fixed(200),
    temperatureOrTopP: .temperature(1.0),
    stops: ["END OF CHAPTER"],
	  functions: nil)
 
let model: OpenAI.Model = .gpt_4o
 
do {
 
    let result: String = try await client.complete(
        messages,
        parameters: parameters,
        model: model,
        as: .string)
    
    return result
} catch {
    print(error)
}

Using Preternatural to develop AI-powered applications simplifies the process of experimenting with various LLM API providers and models, allowing you to iterate and launch fast.

Chat Completions Overview Few-Shot Prompting