- Article
The ChatGPT and GPT-4 models are language models optimized for conversational interfaces. The models behave differently than the older GPT-3 models. Earlier models were text-in and text-out, meaning they accepted a message string and returned a completion to add to the message. However, the ChatGPT and GPT-4 models are conversation input and output. Models expect input formatted in a transcript format similar to a specific chat, and return a completion that represents a message typed by the model in the chat. Although this format is specifically designed for multi-turn conversations, you'll find that it can work well for non-chat scenarios as well.
In Azure OpenAI, there are two different options for interacting with these types of models:
- Chat completion API.
- Chat Markup Language (ChatML) Completion API.
The Chat Completion API is a new API dedicated to interacting with the ChatGPT and GPT-4 models. This API is the preferred method of accessing these models.It is also the only way to access the new GPT-4 models.
ChatML uses the samecompletion APIthat it uses for other models like text-davinci-002, requires a unique token-based prompt format known as Chat Markup Language (ChatML). This provides access at a lower level than the dedicated chat completion API, but also requires additional input validation, only supports ChatGPT models (gpt-35-turbo) andthe underlying format is more likely to change over time.
This article guides you to get started with the new ChatGPT and GPT-4 models. It is important to use the techniques described here to get the best results. If you try to interact with the models in the same way you did with the previous model series, the models will often be verbose and give less helpful responses.
Works with ChatGPT and GPT-4 models
The following code snippet demonstrates the most basic way of using the ChatGPT and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with ours..
GPT-4 models are currently only available on request.Existing Azure OpenAI customers canrequest access by completing this form.
import osimport openaiopenai.api_type = "azure"openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource endpoint value.openai.api_key = os. getenv("KEYAI ")response = openai.ChatCompletion.create( engine="gpt-35-turbo", # The implementation name you chose when implementing the ChatGPT or GPT-4 model. message=[ {"role": " system" , " content ": "Assistant is a large language model, trained by OpenAI."}, {"role": "user", "content": "Who were the founders of Microsoft?"} ])print (response)print(response[ 'choice'][0]['message']['content'])
Production
{ "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "The founders of Microsoft are Bill Gates and Paul Allen. They both founded the company in 1975." , "role": "assistant" } } ], "created": 1679014551, "id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1", "model": "gpt-3.5-turbo-0301", "chat": .completion ", "use": { "tokens_completion": 86, "tokens_prompt": 37, "total_tokens": 123 }}
Use
The following parameters are not available with the new ChatGPT and GPT-4 models:registration tests
,best of
, yeco
. If you set any of these parameters, you will get an error.
Each answer includes aend_reason
. The possible values ofend_reason
es:
- Aferrate: The API returned the full model output.
- length: Model output incomplete due to max_tokens parameter or token limit.
- content filter: Content excluded due to a flag from our content filters.
- null:API response still in process or incomplete.
Consider establishingmax_tokens
to a slightly higher value than usual, such as 300 or 500. This ensures that the model does not stop generating text before reaching the end of the message.
model versioning
Use
gpt-35-turbo
stands forgpt-3.5-turbo
OpenAI template.
Unlike the previous models GPT-3 and GPT-3.5, they aregpt-35-turbo
velvet modelgpt-4
ygpt-4-32k
The models will continue to be updated. When you create ainsertionFor these models, you must also specify a model version.
currently only version0301
is available for ChatGPT and0314
for GPT-4 models. We will continue to make updated versions available in the future. You can find models of depreciation times in ourmodelerside.
Working with the chat completion API
OpenAI trained the ChatGPT and GPT-4 models to accept input formatted as a chat. The message parameter takes an array of dictionaries with a conversation organized by function.
The format for a basic chat output is as follows:
{"function": "system", "content": "Give a context and/or instructions to the model"},{"function": "user", "content": "User messages go here"}
A conversation with a sample response followed by a question would look like this:
{"role": "system", "content": "Give the model some context or instructions".},{"role": "user", "content": "Sample question goes here".}, { "role": "wizard", "content": "Sample answers come here".},{"role": "user", "content": "First question/message the model will actually answer."}
system function
The system role, also known as the system message, is included at the beginning of the matrix. This message provides the initial instructions for the model. You can provide various information in the system role, including:
- A brief description of the wizard.
- Wizard Personality Traits
- Instructions or rules that you want the wizard to follow
- Data or information needed for the model, such as relevant FAQ questions
You can customize the system role for your use case or just include basic instructions. The role/system message is optional, but it is recommended to include at least one basic role for best results.
Messages
After the system role, you can include a series of messages betweenuseryassistant.
{"role": "user", "content": "What is thermodynamics?"}
To trigger a response from the model, end with a user message indicating that it is the wizard's turn to respond. You can also include a series of sample messages between the user and the assistant as a way to learn a few shots.
Examples of message alerts
The following section shows examples of different prompt styles that you can use with the ChatGPT and GPT-4 models. These examples are just a starting point, and you can experiment with different prompts to tailor the behavior to your own use cases.
basic example
If you want the ChatGPT model to behave the same aschat.openai.com, you can use a basic system message like "Wizard is a large language model trained by OpenAI."
{"role": "system", "content": "Assistant is a large language model, trained by OpenAI."},{"role": "user", "content": "Who were the founders of Microsoft ?" }
example with instructions
For some scenarios, you may want to provide additional instructions to the model to define safeguards for what the model is capable of.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions. Instructions: - Only answer tax related questions. - If you are unsure of a response, you should be able to say "I don't know" or "I'm not sure" and advise users to visit the IRS website for more information."},{"role": "user", "content": "When is my treasure due?"}
Data Usage for Grounding
You can also include relevant data or information in the system message to give the model additional context to the conversation. If you only need to include a small amount of information, you can encode it in the system message. If you have a large amount of data that the model needs to know about, you can usescaleor a product thatAzure Cognitive Searchto retrieve the most relevant information at the time of the query.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer technical questions about the Azure OpenAI Service. Please only answer questions using the context below and if you are unsure of a response, you can say "I don't know" Context: - Azure OpenAI Service provides REST API access to the powerful OpenAI language models, including the GPT-3 series of models, Codex, and Embeddings - Azure OpenAI Service provides customers with Advanced language AI with OpenAI GPT-3, Codex and DALL-E models with the security and enterprise promise of Azure Azure OpenAI develops the APIs alongside OpenAI, ensuring compatibility and a smooth transition from one to the other.- In Microsoft we are committed to advancing AI powered by principles that put people first Microsoft has made significant investments to help protect against misuse and unwanted damage, including requiring applicants to demonstrate good use cases. that incorporate Microsoft's AI responsible use principles".},{" role": "user", "content ": "What is the Azure OpenAI service?"}
Boost learning with chat completion
You can also provide some sample shots for the model. The few shot learn procedure has changed slightly due to the new request format. You can now include a series of messages between the user and the wizard in the prompt as some sample shots. These examples can be used to derive answers to common questions to prepare the model or teach it certain behaviors.
This is just one example of how you can use shot learning with ChatGPT and GPT-4. You can experiment with different approaches to see what works best for your use case.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax-related questions. "},{"role": "user", "content": "When has do I have to file my taxes before?"},{"role": "assistant", "content": "In 2023, you must file your taxes before April 18. The date falls after the due date the usual April 15, because April 15 falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file."},{"role" : "user", "content": "How can I check the status of my tax refund?"},{"role": "assistant", "content": "You can check the status of your tax refund by visiting https ://www.irs.gov/refund" }
Using chat completion for non-chat scenarios
The Chat Completion API is designed to work with multi-turn conversations, but it works well for non-chat scenarios as well.
For example, you can use the following message for a device removal scenario:
{"role": "system", "content": "You are a wizard designed to extract entities from text. Users insert a text string and you will respond with the entities you extracted from text as a JSON object Here is an example of your format output:{ "name": "", "company": "", "phone_number": ""}"},{"role": "user", "content": "Hello. My name is Robert Smith I I'm calling from Contoso Insurance, Delaware. My colleague mentioned that he's interested in learning about our comprehensive benefits policy. Could you please call me back at (555) 346-9322 when you have a chance so we can review benefits?"}
Creating a Basic Conversation Cycle
The examples so far have shown you the basic mechanics of interacting with the Chat Completion API. This example shows you how to create a conversational loop that does the following:
- It receives continuous console input and formats it correctly as part of the message array as user function content.
- Prints responses that are printed to the console and formatted and added to the message array as assistant role content.
This means that every time a new question is asked, a running transcript of the conversation so far is sent along with the last question. Since the model has no memory, you must submit an updated transcript with each new question or the model will lose the context of previous questions and answers.
import osimport openaiopenai.api_type = "azure"openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource endpoint value .openai.api_key = os.getenv (_KEYAI) ")conversation=[{"function": "system", "content": "You're a helpful assistant."}]while(True): user_input = input() conversation.append({"function": " user" , "content": user_input}) response = openai.ChatCompletion.create( engine="gpt-3.5-turbo", # The implementation name you chose when implementing the ChatGPT or GPT-4 model. messages = chat ) chat .append ( {"function": "wizard", "content": response['choice'][0]['message']['content']}) print("\n" + response['choice'] [0 ] ['message']['content'] + "\n")
When you run the above code, you get a blank console window. Enter your first question in the window and then press enter. Once the answer is returned, you can repeat the process and keep asking questions.
Conversation management
The above example runs until it reaches the model token limit. For each question asked and answer received,Messages
matrix grows in size. The chip limit forgpt-35-turbo
is 4096 tokens, while the token limits forgpt-4
ygpt-4-32k
they are respectively 8192 and 32768. These limits include token counts from both the sent messages array and the model response. The number of tokens in the message array combined with the value ofmax_tokens
The parameter must stay below these limits or you will receive an error.
It is your responsibility to ensure that the notice and completion are within the token limit. This means that for longer conversations, you should keep track of the token count and only send the model a notice that is within the limit.
The following code example shows a simple chat loop example with a technique for handling a 4096 token count using the OpenAI tiktoken library.
The code requires the tiktok.0.3.0
. If you have an older version, runpip tiktoken installer --update
.
import tiktokenimport openaiimport osopenai.api_type = "azure"openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Connection point value of your Azure OpenAI resource .openai.api_key(" os.getenv " os. OPENAI_API_KEY")system_message = {"role": "system", "content": "You are a useful assistant".}max_response_tokens = 250token_limit= 4096conversation=[]conversation.append(system_message)def num_tokens_from_messages=" modelgpts_messages=" -3.5-turbo-0301"): encoding = tiktoken.encoding_for_model(model) num_tokens = 0 for message in messages: num_tokens += 4 # each message follows{function/name}\n{content}\ n for key, value in message.items(): num_tokens += len(encoding.encode(value)) if key == "name": # if there is a name, ignore the role num_tokens += -1 # the role is always required and always 1 token num_tokens += 2 # each response is prepared withreturn helper num_tokenswhile(True): user_input = input("") conversation.append({"role": "user", "content": user_input}) conv_history_tokens = num_tokens_from_messages(conversation) while (conv_history_tokens+max_response_tokens >= token_limit) : share conversation[1] conv_history_tokens = num_tokens_from_messages(conversation) response = openg.bo-engine-com, #pleg-5 implementation name you chose when implementing the ChatGPT or GPT-4 model. messages = conversation, temperature=.7, max_tokens=max_response_tokens, ) conversation.append({"function": "wizard", "content": response['options'][0]['message']['content'] }) print("\n" + response['choice'][0]['message']['content'] + "\n")
In this example, the oldest messages in the conversation transcript will be deleted when the token count is reached.of
is used instead ofpop()
for efficiency and we start at index 1 to always keep the system message and only remove user/wizard messages. Over time, this method of controlling the conversation can cause the quality of the conversation to degrade, as the model will gradually lose the context of earlier parts of the conversation.
An alternative approach is to limit the duration of the conversation to the maximum length of the token or a certain number of turns. When the maximum token limit is reached and the model would lose context if it allowed the conversation to continue, it can ask the user to start a new conversation and clear the message array to start a new conversation with the full token limit available.
The token count part of the code shown above is a simplified version of one of theOpenAI Cookbook Examples.
Next step
- Learn more about Azure OpenAI.
- Get started with the ChatGPT model withChatGPT Quick Start.
- For more examples, seeAzure OpenAI Examples GitHub-lager
Working with ChatGPT models
Important
The use of GPT-35-Turbo models with the termination endpoint remains in the preview. Due to the possibility of changes to the underlying ChatML syntax, we strongly recommend using the chat completion endpoint or API. The Chat Completion API is the recommended method for interacting with ChatGPT (gpt-35-turbo) models. The Chat Completion API is also the only way to access GPT-4 models.
The following code snippet shows the most basic way to use ChatGPT models with ChatML. If this is your first time using these models programmatically, we recommend starting with ours..
import osimport openaiopenai.api_type = "azure"openai.api_base = "https://{your-resource-name}.openai.azure.com/"openai.api_version = "2023-05-15"openai.api_key = os. getenv("OPENAI_API_KEY")response = openai.Completion.create( engine="gpt-35-turbo", # The implementation name you chose when you implemented the ChatGPT model prompt="<|im_start|>system\nAssistant is a big language model trained by OpenAI.\n<|im_end|>\n<|im_start|>user\nWho were the founders of Microsoft?\n<|im_end|>\n<|im_start|>wizard\n" , temperature= 0 , max_tokens=500, top_p=0.5, stop=["<|im_end|>"])print(response['choice'][0]['text'])
Use
The following parameters are not available with the gpt-35 turbo model:registration tests
,best of
, yeco
. If you set any of these parameters, you will get an error.
He<|im_fin|>
token indicates the end of a message. We recommend including<|im_fin|>
token as a stop sequence to ensure that the model stops outputting text when it reaches the end of the message. You can read more about the special chips inChat Markup Language (ChatML)section.
Consider establishingmax_tokens
to a slightly higher value than usual, such as 300 or 500. This ensures that the model does not stop generating text before reaching the end of the message.
model versioning
Use
gpt-35-turbo
stands forgpt-3.5-turbo
OpenAI template.
Unlike the previous models GPT-3 and GPT-3.5, they aregpt-35-turbo
velvet modelgpt-4
ygpt-4-32k
The models will continue to be updated. When you create ainsertionFor these models, you must also specify a model version.
currently only version0301
is available for ChatGPT. We will continue to make updated versions available in the future. You can find models of depreciation times in ourmodelerside.
Working with Chat Markup Language (ChatML)
Use
OpenAI continues to improve ChatGPT, and the chat markup used with the models will continue to evolve in the future. We will keep this document updated with the latest information.
OpenAI trained ChatGPT on special tokens that delineate the different parts of the notice. The prompt begins with a message from the system used to prepare the model, followed by a series of messages between the user and the wizard.
The format of a basic ChatML prompt is as follows:
<|im_start|>system Gives some context and/or instructions to the model.<|im_end|> <|im_start|>user User message goes here<|im_end|> <|im_start|>wizard
system message
The system message is included at the beginning of the prompt between<|im_start|>system
y<|im_fin|>
tabs This message provides initial instructions for the model. You can enter a variety of information in the system message, including:
- A brief description of the wizard.
- Wizard Personality Traits
- Instructions or rules that you want the wizard to follow
- Data or information needed for the model, such as relevant FAQ questions
You can customize the system message for your use or just include a basic system message. The system message is optional, but it is recommended to include at least a basic one for best results.
Messages
After the system message, you can include a series of messages in the middleuseryassistant. Each message must begin with<|in_beginning|>
token followed by role(user
oassistant
) and end with<|im_fin|>
flight.
<|im_start|>userWhat is thermodynamics?<|im_end|>
To trigger a response from the model, the flag must end with<|in_start|>assistant
token indicating that it is the assistant's turn to respond. You can also include messages between the user and the assistant in the prompt as a way to learn a few takes.
quick examples
The following section shows examples of different prompt styles that you can use with the ChatGPT and GPT-4 models. These examples are just a starting point, and you can experiment with different prompts to tailor the behavior to your own use cases.
basic example
If you want the ChatGPT and GPT-4 models to behave the same aschat.openai.com, you can use a basic system message like "Wizard is a large language model trained by OpenAI."
<|im_start|>systemAssistant is a great language model trained by OpenAI.<|im_end|><|im_start|>userWho were the founders of Microsoft?<|im_end|><|im_start|>assistant
example with instructions
For some scenarios, you may want to provide additional instructions to the model to define safeguards for what the model is capable of.
<|im_start|>systemAssistant is an intelligent chatbot designed to help users answer their tax-related questions. Instructions: - Answer only the questions related to taxes. - If you are unsure of an answer, you can say "I don't know" or "I'm not sure" and encourage users to visit the IRS website for more information.<|im_end|><|im_start|> userWhen en my tax due?<|im_end|><|im_start|>wizard
Data Usage for Grounding
You can also include relevant data or information in the system message to give the model additional context to the conversation. If you only need to include a small amount of information, you can encode it in the system message. If you have a large amount of data that the model needs to know about, you can usescaleor a product thatAzure Cognitive Searchto retrieve the most relevant information at the time of the query.
<|im_start|>systemAssistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Service. Please answer the questions only using the context below and if you are unsure of an answer you can say "I don't know". Context: - The Azure OpenAI service provides REST API access to powerful OpenAI language models, including the GPT-3, Codex, and Embedding series of models. - The Azure OpenAI service provides customers with advanced language AI with OpenAI GPT-3, Codex and DALL-E models with the security and the Azure promise of the enterprise. Azure OpenAI co-develops APIs with OpenAI, ensuring compatibility and a seamless transition from one to the other. At Microsoft, we are committed to advancing AI driven by principles that put people first. Microsoft has made significant investments to help protect against misuse and unwanted harm, including requiring applicants to demonstrate well-defined use cases that incorporate the Microsoft Principles for Responsible Use of AI<|im_end|> <|im_start|>userWhat is Azure OpenAI Service?< | im_end|><|im_start|>wizard
Boost learning with ChatML
You can also provide some sample shots for the model. The few shot learn procedure has changed slightly due to the new request format. You can now include a series of messages between the user and the wizard in the prompt as some sample shots. These examples can be used to derive answers to common questions to prepare the model or teach it certain behaviors.
This is just one example of how you can use shot learning with ChatGPT. You can experiment with different approaches to see what works best for your use case.
<|im_start|>systemAssistant is an intelligent chatbot designed to help users answer their tax-related questions. <|im_end|><|im_start|>userWhen do I have to file my taxes?<|im_end|><|im_start|>wizardIn 2023, you must file your taxes no later than April 18. The date falls after the usual April 15 deadline because April 15 falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file< |im_end |><| im_start|>userHow do I check the status of my tax refund?<|im_end|><|im_start|>wizard You can check the status of your tax refund by visiting https://www.irs.gov/refunds<| im_end | >
Using chat markup for non-chat scenarios
ChatML is designed to make multi-turn conversations easier to manage, but it works well for non-chat scenarios as well.
For example, you can use the following message for a device removal scenario:
<|im_start|>systemDu is a wizard designed to extract entities from text. Users insert a text string, and you will respond with entities that you extracted from the text as a JSON object. This is an example of its output format: { "name": "", "company": "", "phone_number": ""}<|im_end|><|im_start|>userHello. My name is Roberto Smith. I'm calling from Contoso Insurance, Delaware. My colleague mentioned that you are interested in learning about our comprehensive benefits policy. Could you call me back at (555) 346-9322 when you have a chance so we can go over the benefits?<|im_end|><|im_start|>wizard
Prevention of unsecured user input
It's important to add restrictions in your app to ensure safe use of chat markup.
We recommend that you prevent end users from being able to include special tokens in their input, such as<|in_beginning|>
y<|im_fin|>
. We also recommend that you include additional validation to ensure that the hints you send to the model are well-formed and follow the chat markup format as described in this document.
You can also provide instructions in the system message to guide the model on how to respond to certain types of user input. For example, you can tell the model to only reply to messages on a specific topic. You can also reinforce this behavior with some sample shots.
Conversation management
The chip limit forgpt-35-turbo
is 4096 tokens. This limit includes the token count of both the request and completion. The number of tokens in the indicator combined with the value ofmax_tokens
The parameter must stay below 4096 or you will receive an error.
It is your responsibility to ensure that the notice and completion are within the token limit. This means that for longer conversations, you should keep track of the token count and only send the model a notice that is within the token limit.
The following code sample shows a simple example of how you can keep track of separate messages in the conversation.
import osimport openaiopenai.api_type = "azure"openai.api_base = "https://{your-resource-name}.openai.azure.com/" #This corresponds to your Azure OpenAI resource endpoint valueopenai.api_version = "2023-05 - 15" openai.api_key = os.getenv("OPENAI_API_KEY")# defines a function to create the prompt from the system message and conversation. "\n< |im_start|>{message['sender']}\n{message['text']}\n<|im_end|>" prompt += "\n<|im_start|>wizard\n" return prompt# defines user input and systemmessages_input = "" system_message = f"<|im_start|>sistema\n{''}\n<|im_end|>"# create a message list to track chat messages = [{"sender": "user", "text": user input}]response = openai.Completion.create( engine=" gpt -35-turbo", # The implementation name you chose when you implemented the ChatGPT model. prompt=create_prompt(system_message, message), temperature=0.5, max_tokens=250, top_p=0.9, Frequency_penalty=0, present_penalty= 0, stop =['<|im_end|>'])messages.append({"sender": "assistant", "text": response['choice'][0]['text']})print(response ['choice '][0]['text'])
Stays under the chip limit
The easiest way to stay under the token limit is to delete the oldest messages in the conversation when you reach the token limit.
You can choose to always include as many tokens as possible while staying under the limit, or you can always include a set number of past messages as long as those messages stay within the limit. It is important to note that longer messages take longer to generate a response and incur a higher cost than shorter messages.
You can estimate the number of tokens in a string usingTik TokPython library as shown below.
import tiktoken cl100k_base = tiktoken.get_encoding("cl100k_base") enc = tiktoken.Encoding( name="gpt-35-turbo", pat_str=cl100k_base._pat_str, mergeable_ranks=cl100k_tokens._mergeable 0 tokens," <|im_start|>": 100264, "<|im_end|>": 100265 } ) tokens = enc.encode( "<|im_start|>bruger\nHej<|im_end|><|im_start|>asistente", allow_special= {"<|im_start|> ", "<|im_end|>"} ) afirmar len(fichas) == 7 afirmar fichas == [100264, 882, 198, 9906, 100265, 100264, 78191]
Next step
- Learn more about Azure OpenAI.
- Get started with the ChatGPT model withChatGPT Quick Start.
- For more examples, seeAzure OpenAI Examples GitHub-lager