It wasn’t that long ago (end of May 2024 at Build) when GPT-4o was released. In the era of AI everything evolves fast and now our applications can already utilize GPT-4o from Azure OpenAI Services. And that’s not all, as GPT-4o mini was announced for testing using the AI Playground at the end of July. And now, just a few weeks later, you can already deploy the GPT-4o mini base model for your use. This means you can use GPT-4o mini utilizing it’s API in your own application. Regions where this is available are limited today (East US and Sweden Central for standard & global standard deployments), but you can expect the list grow quite soon.
You can also test (early access preview) the latest version of GPT-4o ( 2024-08-06) in the AI Studio Playground. What’s new in this release is that GPT-4o is smarter (enhanced ability to support complex structured outputs) and output token amount maximum has been increased from 4k to 16k. When testing the model in the early access Playground, keep in the mind that it is currently limited to 10 requests per minute and you don’t have API access to that yet. For the API, deploy 2024-05-13 model version of GPT-4o.
If you want to try it out, go to the Playground with this link.
Why GPT-4o mini is a big thing?
Basically, it is the model you should start using instead of GPT-3.5 Turbo. GPT-4o mini is smarter, faster, cheaper and it has a larger context (128k tokens) it can be used with. That is roughly 80,000 words in English. Look at the current pricing:
That is quite impressive improvement on the price. If you are still using the plain GPT-4, I suggest you switch to GPT-4o or GPT-4o mini as soon as possible, if models meet your needs. As always, make sure all features & feature combinations you need are tested before flipping the new model onto existing systems. If something doesn’t work yet with 4o-versions, then consider GPT-4 Turbo. Comparing GPT-4o to GPT-4 Turbo there has been big improvements on multilingual capabilities.
I want also to highlight two features that were also highlighted in the announcement by Microsoft.
Enhanced Vision Input: Leverage the power of GPT-4o mini to process images and videos, enabling applications such as visual recognition, scene understanding, and multimedia content analysis.
Comprehensive Text Output: Generate detailed and contextually accurate text outputs from visual inputs, making it easier to create reports, summaries, and detailed analyses.
O in GPT-4o stands for omni, which means these models are multimodal and understand both text and images as input. There isn’t yet support for video, and they don’t generate images or videos. But I want to emphasize that they don’t do that yet. We have already seen demos of those in action (in Build 2024), but they aren’t available publicly. Yet.🤞
On top of all these, GPT-4o mini is in public preview for continuous fine-tuning, so it is possible to create your specialized version of the model.
I was testing out switching from GPT-4o to GPT-4o mini when utilizing a few features, and it had no issues. So if you have already updated to GPT-4o the step to GPT-4o mini should be straight-forward.
What I tested with GPT-4o and GPT4-o mini? Tools (functions) and Vision. What is cool about the vision models, that (just like 4 Turbo with vision) these don’t require Azure Vision Services. It is all built onto the model itself.
The latest GA API is 2024-06-01 at the moment, and there is 2024-07-01-preview also available. The call URI is just like before. For example https://youraoaiservice.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-06-01
Using gpt-4o-mini with vision was easily tested with a quick sample
{
“messages”: [
{
“role”: “system”,
“content”: “You are a helpful assistant.”
},
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Describe this picture:”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png”,
“detail”: “high”
}
}
]
}
],
“max_tokens”: 1000,
“stream”: false
}
Then I did a bit more complex test with both GPT-4o and GPT-4o mini, embedding the image onto the call with base64 encoding and asking the output return a specific JSON. The prompt I used is “Inspect received product and find out what it is (product name with brand and type), condition (is it damaged, missing parts, approved) and description that contains information what is damaged, missing, noteworthy and also information what might have happened during delivery. Create a JSON output that includes ProductName, Quantity, Condition and Description )“
“type”: “image_url”,
“image_url”: { “url”: “data:image/png;base64,<base64encodedimage>” }
GPT-4o mini result:
“message”: {
“content”: ““`json\n{\n \”ProductName\”: \”Xbox Series X 1 TB SSD\”,\n \”Quantity\”: 1,\n \”Condition\”: \”Approved\”,\n \”Description\”: \”The product is in good condition with no visible damage. All parts appear to be included. The box shows minimal signs of wear, suggesting it was handled carefully during delivery. There are no noteworthy issues.\”\n}\n“`”,
“role”: “assistant”
}
GPT-4o result:
“content”: ““`json\n{\n \”ProductName\”: \”Xbox Series X\”,\n \”Brand\”: \”Microsoft\”,\n \”Type\”: \”Gaming Console\”,\n \”Quantity\”: 1,\n \”Condition\”: \”Approved\”,\n \”Description\”: \”The product is an Xbox Series X gaming console by Microsoft. The packaging appears to be in good condition with no visible signs of damage or tampering. There are no missing parts indicated from the packaging. The box is intact and there are no noticeable dents, tears, or other damage that would suggest mishandling during delivery.\”\n}\n“`”,
It can be seen, is that they do have slight differences, but as we know the results are rarely the same. GPT-4o added more properties than I requested originally and it didn’t include the 1TB SSD version information. Is that critical? It would depend on your needs – I wouldn’t rely models to discover product names exactly, but instead the result would be used to retrieve the product name from product lists. To help that, prompt could include more properties models need to extract from the picture. GPT-4o also provided a longer description.
I was also testing GPT-4o-mini with a picture containing my (very poor) handwriting. It performed at the same level as GPT-4 Turbo with Vision did. There is a one catch row in my “grocery list” handwriting picture. The prompt used really simple describe and summarize this image, please.
What the last line says is gardening equipment. Just like GPT-4 Turbo with Vision, GPT-4o mini understood that row being gambling equipment. Occasionally models get this right, but overall it does provide an incorrect result quite often for that.
When testing this one out with GPT-4o it immediately returned the right result for all rows, understanding it correctly being gardening equipment. I run the test four times, and it resulted the right interpretation each time. Now, that makes the full GPt-4o model the winner! If there is a need accurate image understanding that should cope with less ideal images, I would choose the full GPT-4o for that.
I did try GPT-4o image understanding with a Finnish handwritten list that has even more worse handwriting than the English note. It did cause issues for the model, so in case the plan is to use this to analyze handwritten feedbacks in other languages than English, test it very well with a lot of materials.
But it was not bad for the mini-model! Thinking its the price and speed, it is good to think which model would be more useful in your scenarios.
Is GPT-4o or GPT-4o mini better for you?
There isn’t a clear answer for this one – it depends on your needs. If you need higher accuracy in image understanding and better “smartness” for the model, then GPT-4o will be possibly a better choice. When analyzing larger texts and making conclusions and so forth, GPT-4o (as the big brother) should provide you with better responses. If you have a need for faster responses and expect higher volumes then start the testing with GPT-4o mini.
I would try these both models in various cases, to see if GPT-4o mini is smart enough. This is due to speed and price – and you can also think that it uses less energy as it is smaller (and thus more efficient) than GPT-4o. Switching between models can be as easy as changing the URL and the key, if you have both models deployed.
Published by
I work, blog and speak about Future Work : AI, Microsoft 365, Copilot, Microsoft Mesh, Metaverse, and other services & platforms in the cloud connecting digital and physical and people together.
I have about 30 years of experience in IT business on multiple industries, domains, and roles.
View all posts by Vesa Nopanen