Using direct AI REST API instead of SDK
When working with large language models (LLMs) like ChatGPT and Gemini, developers often turn to Software Development Kits (SDKs) for streamlined integration. While SDKs offer convenience, there are compelling reasons to consider using the direct REST API, especially when starting new implementations like plugins or SDKs for a new language. This blog post will explore the advantages of this approach.
ChatGPT and Gemini SDK
SDKs simplify the process of interacting with LLMs. They provide pre-built functions and handle low-level details, making it easier to send prompts and receive responses. Let's look at examples for ChatGPT and Gemini:
ChatGPT SDK
The ChatGPT SDK for Node.js is available at https://github.com/openai/openai-node. Here's how to use it:
import OpenAI from 'openai';
const client = new OpenAI({apiKey: process.env['OPENAI_API_KEY']});
async function main() {
const chatCompletion = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'gpt-3.5-turbo',
});
}
main();
Gemini SDK
For Gemini, you can find the SDK at https://github.com/google-gemini/generative-ai-js. Here's a usage example:
const { GoogleGenerativeAI } = require("@google/generative-ai");
const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const prompt = "Does this look store-bought or homemade?";
const image = {
inlineData: {
data: Buffer.from(fs.readFileSync("cookie.png")).toString("base64"),
mimeType: "image/png",
},
};
const result = await model.generateContent([prompt, image]);
console.log(result.response.text());
SDKs abstract away the complexities of API requests and responses. You don't need to manually construct the request body, headers, or parse the response. This convenience comes at the cost of flexibility and control, which might be limiting in certain scenarios.
ChatGPT and Gemini REST API
Directly using the REST API gives you more control over the interaction with the LLMs. You send HTTP requests to specific endpoints and handle the responses. Here's how to do it for ChatGPT and Gemini:
ChatGPT REST API
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
Gemini REST API
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts":[{"text": "Give me python code to sort a list."}]
}]
}'
While the REST API offers more control, it requires careful attention to detail. Incorrectly formatted requests will result in errors, and you're responsible for parsing the responses, including error handling.
Use the Direct REST API for New Implementations
When building a new tool or integration, especially for a niche use case or a new programming language, using the direct REST API can be advantageous. It allows for greater flexibility and control over the interaction with the LLM. You can tailor the requests and responses to your specific needs without being constrained by the limitations of an SDK.
Example: Neovim Plugin
In the code-ai.nvim plugin, the REST API is used directly to interact with ChatGPT and Gemini. This approach provides the flexibility needed to integrate LLM functionality seamlessly into the Neovim editor.
local curl = require('plenary.curl')
-- ...
-- much more code here
-- ...
function query.ask(instruction, prompt, opts, api_key)
query.log("entered gemini query.ask")
local api_host = 'https://generativelanguage.googleapis.com'
local path = '/v1beta/models/gemini-1.5-pro-latest:generateContent'
curl.post(api_host .. path,
{
headers = {
['Content-type'] = 'application/json',
['x-goog-api-key'] = api_key
},
body = vim.fn.json_encode(
{
system_instruction = {parts = {text = instruction}},
contents = (function()
local contents = {}
table.insert(contents, {role = 'user', parts = {{text = prompt}}})
return contents
end)()
}),
callback = function(res)
vim.schedule(function() query.askCallback(res, opts) end)
end
})
end
-- ...
-- much more code here
-- ...
function query.ask(instruction, prompt, opts, api_key)
local api_host = 'https://api.openai.com'
local path = '/v1/chat/completions'
curl.post(api_host .. path,
{
headers = {
['Content-type'] = 'application/json',
['Authorization'] = 'Bearer ' .. api_key
},
body = vim.fn.json_encode(
{
model = 'gpt-4-turbo',
messages = (function()
local messages = {}
table.insert(messages, { role = 'system', content = instruction })
table.insert(messages, {role = 'user', content = prompt})
return messages
end)()
}
),
callback = function(res)
vim.schedule(function() query.askCallback(res, opts) end)
end
})
end
Conclusion
While SDKs offer a convenient way to interact with LLMs like ChatGPT and Gemini, using the direct REST API provides greater flexibility and control, which is particularly beneficial when developing new integrations or targeting niche use cases. By understanding the trade-offs, developers can make informed decisions about the best approach for their specific projects.