Daiyi Yang
June 3, 2024
•
4 minutes
Introducing built-in Chain-of-Thought (CoT) support in empower functions models, the empower functions models could output their thought process along with the response.
We’re excited to share that we have launched an update to the family of Empower functions models, now featuring built-in Chain-of-Thought (CoT) support, the empower functions models could output their thought process along with the response. This can be easily toggled with a parameter in the request. This feature is available on both the Empower platform endpoints (doc) and our open-source model family (doc). Also we have the CoT enabled in our live demo so the assistant will display its thought process when the “thinking mode” is enabled.
Chain-of-Thought (CoT) is a prompting technique that enhances complex reasoning in AI models by breaking down the reasoning process into intermediate steps. This method allows models to handle tasks that require multi-step thinking by explicitly generating and following a thought process before arriving at a final response. By doing so, CoT improves the accuracy and transparency of the model's outputs, because of the nature of causal inference of LLMs.
In function-calling use cases, CoT is typically utilized to analyze the intent of the user input to determine whether it’s appropriate to trigger functions or continue the conversation as usual. If it’s suitable to trigger functions, the model identifies the most appropriate function(s) to invoke. It checks if any required parameters are missing and cannot be inferred from the conversation context. Based on this analysis, the model triggers the functions or asks the user for follow-up information.
Below is a quick example of prompt used for the model to do CoT for function calling and a sample model response on the thought process:
Prompt:
To respond to the user's request, use relevant tools if available. Follow these steps:
Thinking response:
The user asked for the weather in San Francisco. The relevant tool to use is "get_current_weather," which requires the "location" parameter. The user provided the location directly as "San Francisco," so all required parameters are present, leading to the tool call with the argument "location" set to "San Francisco."
While it’s typical to implement CoT at the prompt level, this approach has two main drawbacks:
To address these drawbacks, we decided to enable CoT at the model level. Empower functions models have been trained with built-in CoT capability that can be enabled with a special prompt (less than 10 tokens in the internal system prompt). When CoT is enabled, Empower functions models will respond with their thought process within tags before the actual response (which will be a set of function calls or regular conversations). This approach provides the model with a full “thought process” before deciding whether to trigger any functions and which function(s) to trigger. We have fully supported streaming with CoT. Additionally, the model can function without CoT if the special prompt is not added.
Because this is model level feature, all you need to do is just add an “include_thinking” parameter when hitting the chat-completion API as the curl example below:
curl -XPOST 'https://app.empower.dev/api/v1/chat/completions' \-H 'Authorization: Bearer API_KEY' \-H 'Content-Type: application/json' \-d '
{
"model": "empower-functions-medium",
"include_thinking": true,
"messages": [
{
"role": "user",
"content": "How'\''s the weather in Paris and Tokyo?"
}
],
"temperature": 0.0,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA"
}
},
"required": [
"location"
]
}
}
}
]
}'
Output:
{
"id": "",
"created": 1717306979,
"model": "empower-functions-medium",
"choices": [
{
"message": {
"role": "assistant",
"content": "<thinking>The user asked for the current weather in Paris and Tokyo. The relevant tool to use is \"get_current_weather,\" which requires the \"location\" parameter. The user provided both locations directly, so the tool calls were made with \"Paris\" and \"Tokyo\" as the location values.</thinking>",
"tool_calls": [
{
"id": "",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Paris\"}"
}
},
{
"id": "",
"index": 1,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Tokyo\"}"
}
}
]
},
"index": 0,
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 189,
"completion_tokens": 149,
"total_tokens": 338
},
"object": "chat.completion"
}
Full code examples can be found in this doc. And please refer to our github document for using this feature in our open-source model family.
Deploy and serve your first fine-tuned LLM in 1 minute for free!