Structured Output (JSON)
LoRAX can enforce that responses consist only of valid JSON and adhere to a provided JSON schema.
Background: Structured Generation
LoRAX enforces adherence to a schema through a process known as structured generation (also called constrained decoding). Unlike guess-and-check validation methods, structured generation manipulates the next token likelihoods (logits) to enforce adherence to a schema at the token level. During each forward pass of inference, LLMs produce a probability distribution over their vocabulary of tokens. The token that is actually generated is selected by sampling from this distribution.
Suppose you've tasked an LLM with generating some valid JSON, and so far the LLM has produced the text { "name"
. When
considering the next token to output, it's clear that tokens like A
or <
will not result in valid JSON. structured generation
prevents the LLM from selecting an invalid token by modifying the probability distribution and setting the likelihood of
invalid tokens to -infinity
. In this way, we can guarantee that, at each step, only tokens that will produce
valid JSON can be selected.
Caveats
- Structured generation does not guarantee the quality of generated text, only its form. structured generation may force the LLM to output valid JSON, but it can't ensure that the content of the JSON is desirable or accurate.
- Even with structured generation enabled, LLM output may not be fully valid JSON if the number of
max_new_tokens
is too low, as this could result in necessary tokens (e.g., a closing}
) being cut off.
Structured Generation with Outlines
Outlines is an open-source library supporting various ways of specifying and enforcing structured generation rules onto LLM outputs.
LoRAX uses Outlines to support structured generation following a user-provided JSON schema. This JSON schema is
converted into a regular expression, and then into a finite-state machine (FSM). For each token, LoRAX then determines the set of
valid next tokens using this FSM and sets the likelihood of invalid tokens to -infinity
.
Example: Python client
This example follows the JSON-structured generation example in the Outlines quickstart.
We assume that you have already deployed LoRAX using a suitable base model and installed the LoRAX Python Client. Alternatively, see below for an example of structured generation using an OpenAI client.
import json
from enum import Enum
from lorax import Client
from pydantic import BaseModel, constr
class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
class Character(BaseModel):
name: constr(max_length=10)
age: int
armor: Armor
strength: int
client = Client("http://127.0.0.1:8080")
prompt = "Generate a new character for my awesome game: name, age (between 1 and 99), armor and strength. "
response = client.generate(prompt, response_format={
"type": "json_object",
"schema": Character.model_json_schema(),
})
my_character = json.loads(response.generated_text)
print(my_character)
You can also specify the JSON schema directly rather than using Pydantic:
schema = {
"$defs": {
"Armor": {
"enum": ["leather", "chainmail", "plate"],
"title": "Armor",
"type": "string"
}
},
"properties": {
"name": {"maxLength": 10, "title": "Name", "type": "string"},
"age": {"title": "Age", "type": "integer"},
"armor": {"$ref": "#/$defs/Armor"},
"strength": {"title": "Strength", "type": "integer"}
},
"required": ["name", "age", "armor", "strength"],
"title": "Character",
"type": "object"
}
Example: OpenAI-compatible API
Structured generation of JSON following a schema is supported via the response_format
parameter.
Note
Currently a schema is required. This differs from the existing OpenAI JSON mode, in which no schema is supported.
import json
from enum import Enum
from openai import OpenAI
from pydantic import BaseModel, constr
class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
class Character(BaseModel):
name: constr(max_length=10)
age: int
armor: Armor
strength: int
client = OpenAI(
api_key="EMPTY",
base_url="http://127.0.0.1:8080/v1",
)
resp = client.chat.completions.create(
model="", # optional: specify an adapter ID here
messages=[
{
"role": "user",
"content": "Generate a new character for my awesome game: name, age (between 1 and 99), armor and strength. ",
},
],
max_tokens=100,
response_format={
"type": "json_object",
"schema": Character.model_json_schema(),
},
)
my_character = json.loads(resp.choices[0].message.content)
print(my_character)