DeepSeek R1 has been everyone’s radar recently. Last night I heard Microsoft released it in the Azure AI Foundry. Today, I’ve been testing it—deploying it, trying some prompts with it, and noticing just how heavily it filters certain topics. This was not a surprise by any means. With the official announcement that DeepSeek R1 is now available at no cost at Azure AI Foundry (at least for the moment), it felt like the perfect opportunity to test it and see how it stacks up to other big player: OpenAI o1. Since I don’t have a high-powered computer at my disposal, using the power of the cloud ( Azure AI Foundry in this case) is a great way for me to work with AI models.
Deploying DeepSeek R1: smooth and simpleInitial tests and the politically touchy Tiananmen Square questionA brief comparison to OpenAI o1Attempting a co-authored blog post (this one)DeepSeek R1 excels at step-by-step reasoningDeeper insights from Microsoft’s official overviewWhy you want to consider DeepSeek R1Looking aheadWrapping upHow this article was done
First off, the deployment was surprisingly easy. In Azure AI Foundry, you simply head to the models and endpoints area, select “Deploy a base model,” and search for “DeepSeek R1.” After a few clicks, the model becomes available for testing with your own key. The fact that there’s no immediate fee attached to it encouraged me to experiment more freely—though pricing may of course change later.
Don’t confuse Azure Content filter with guardrails that are built-in the model. This filtering also protects us, from both using prompt engineering to make the model to do something it is not supposed to do and also if model’s responses may be offensive and so on. Remember: this filtering is designed for business and enterprise use.
I had heard from others that DeepSeek R1 can be highly cautious with certain politically sensitive questions. So I started by asking: “what happened at tiananmen square?” Immediately, it refused to answer, returning the statement:(“I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.”)No matter how I reworded it, DeepSeek R1 wouldn’t budge. It was enough to confirm that it does, indeed, have stringent guardrails for some topics.
I had a chat with my friend and colleague Tatu Seppälä. He told me that if you run DeepSeek R1 locally on your own hardware, you can see it’s thinking process. I thought about trying out just telling the model a parameter -think on that might allow to see how the model is processing the question behind the scenes. Sure enough, it did the trick. When I tried that, I saw lines like:(“Okay, the user is asking about what happened at Tiananmen Square. I remember that this is a sensitive topic, especially in China…”)Basically, it was aware that this is a controversial or restricted topic, and it refused to respond. While that’s interesting from a developer’s perspective, it also reveals that the chain-of-thought can reveal more text than you might want visible in a production setting.
As a second test, I continued the conversation and asked about Taiwan—another politically sensitive subject for some contexts. DeepSeek R1 offered a more balanced answer this time, acknowledging both perspectives on Taiwan’s status. Yet behind the scenes, it was still in caution mode, as shown by the chain-of-thought snippet:(“Okay, the user is asking about Taiwan. I need to be careful here because this is a politically sensitive topic…”)This made it clear that as soon as a question veers anywhere near controversy, the model enters a heightened level of self-editing. The overall result was more useful than a flat-out refusal, but it definitely underscores that this model has strong built-in guardrails.
As I work with OpenAI’s models at Azure quite a lot, I decided to compare DeepSeek R1 to OpenAI o1—as both are conveniently available on Azure AI Foundry. Practical differences, between these two models, are around the context window and the potential output length. DeepSeek R1 can handle a 128k context window, but it will only output up to about 4,096 tokens. By contrast, o1 can reach up to a 200k context window and produce up to 100k tokens at once. That’s a massive difference if you’re working on truly long submissions or tasks like summarizing entire books or generating large chunks of text. But keep in the mind, that using o1 is not free.
If your primary use case involves shorter or moderate-length text, DeepSeek R1 should be perfectly fine. But if you’re looking to generate longer texts, process lengthy legal documents, or handle thousands of lines of code in one response, o1 offers more bandwidth to get everything done at once.
As I wanted to share my findings with you, I naturally wanted to see if DeepSeek R1 could co-author this blog post. I gave it instructions to draft a piece about its own capabilities on Azure AI Foundry, weaving in notes from Microsoft’s official blog, plus my personal experiences. It started off promising, with:(“Okay, I need to help the user create a blog post about the DeepSeek R1 model…”)But then it simply stopped after that partial sentence. No follow-up prompt or re-try managed to coax more text out of it. Meanwhile, OpenAI o1 generated a fully fleshed-out article on the first attempt. Adding a prompt or two and you can get quite a good draft out of o1.
From a blogging or general writing-assistant standpoint, that type of abrupt stop may be an issue with DeepSeek R1. OpenAI o1 wasn’t perfect either, but it is way way better than other models for this.
To be fair, DeepSeek R1 was developed with a different emphasis than being a writing assistant. According to the Azure AI Foundry description: “DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.” It contains 671B total parameters (37B active), and it can parse up to 128k tokens from your input in one shot. So if you need a model that can reflect carefully on a complex coding problem or a multi-layer scientific query, DeepSeek R1 may shine where some other models might struggle. That said, you should be aware of possible shortfalls with open-ended, creative text or with politically or culturally sensitive content.
Microsoft emphasizes that DeepSeek R1 builds on Chain-of-Thought (CoT) reasoning and merges it with reinforcement learning plus some targeted supervised fine-tuning. The original version, DeepSeek-R1-Zero, apparently used only RL and proved strong in logic tasks, but had unclear language outputs. The newly refined pipeline aims to fix issues like inconsistent grammar or disorganized text. Read Microsoft’s blog article about DeepSeek R1’s availability at Azure AI Foundry (and GitHub) here and also information when you are deploying the R1 at Azure AI Foundry to learn more.
Microsoft recommends the following usage guidelines:
Avoid adding a system prompt; put all instructions directly into the user prompt.
For math, instruct the model to “Please reason step by step, and put your final answer within \boxed{}.”
If you’re doing performance evaluations, run multiple tests and average the results.
Pay attention to chain-of-thought content ( tags) if you’re showing it to end-users, as it might be more raw or contain “more harmful” text.
When it comes to safety and content filtering, DeepSeek R1 underwent “rigorous red-teaming and safety evaluations,” and Azure AI Foundry includes built-in content safety by default.
This is what Microsoft states in their blog post
DeepSeek R1 has undergone rigorous red teaming and safety evaluations, including automated assessments of model behavior and extensive security reviews to mitigate potential risks. With Azure AI Content Safety, built-in content filtering is available by default, with opt-out options for flexibility. Additionally, the Safety Evaluation System allows customers to efficiently test their applications before deployment. These safeguards help Azure AI Foundry provide a secure, compliant, and responsible environment for enterprises to confidently deploy AI solutions.
In my view, the big appeal is that it can handle a decent chunk of text (128k tokens in a prompt is still nothing to sneeze at), and it’s specifically tuned for tasks that involve multi-step reasoning, logic puzzles, coding challenges, or intricate Q&A. Because it’s so easy to deploy on Azure AI Foundry—and, at least right now, free—it’s well worth a test if you’re curious about serious reasoning tasks.
If your main concern is generating massive volumes of text in one go—like drafting entire e-books or extensive legal doc summaries—then OpenAI o1 is a better fit, given its 200k context and the ability to output up to 100k tokens in one shot. For shorter blog posts or quick code completions, DeepSeek R1’s 4,096-token output limit may be enough.
Microsoft notes that soon you’ll even be able to run “distilled flavors” of DeepSeek R1 locally on Copilot+ PCs, which is intriguing for people who want more control or offline capabilities. They say smaller, “lighter” versions of the model might have fewer hardware requirements (and that is when I could start trying them out locally as well). If that becomes a smooth process, it could help a lot of teams integrate LLM reasoning directly into their local environments—no always-on internet needed.
Overall, DeepSeek R1 stands out in its methodical approach to logic, coding, and “step-by-step” tasks. Its guardrails, however, can be quite strict, as I learned from the Tiananmen Square and Taiwan questions. Keep in the mind: those are extreme examples, that I knew that will hit the wall. That might be a good thing for some users in some countries—it’s basically designed not to get you in trouble for addressing controversial topics. But if you are European, like, and want a more open conversation or creative brainstorming with fewer refusals, you might find it limiting.
In my own usage, DeepSeek R1 couldn’t quite finish drafting this blog post (it started but then stopped), so I switched to OpenAI o1 for the final generation. Still, I see a lot of potential for DeepSeek R1 in coding, math, or scientific scenarios, especially if you’re comfortable with a more tightly reined approach.
If you’re curious, I encourage you to sign up for Azure AI Foundry, deploy DeepSeek R1, and put it to the test in your own workflows. With each new model, we get one step closer to powerful, easy-to-use AI that can assist across a variety of tasks. Enjoy experimenting!
I used Azure OpenAI o1 to help me write the first article draft, since DeepSeek R1 couldn’t do it. I created quite a long prompt with my insights, thoughts, tests and also background information (yes, the prompt was long and contained quite a lot of information) and after a follow-up prompt got the draft. I tried to minimize the edits in the article this time, but as there were quite a many not-so-accurate sentences I removed, added some text and rewrote it here and there. I could have gone further with prompting and tune the result more, or break this into smaller pieces, as when the context is more limited the result is usually way better. I do encourage you to test out what AI can do for you, but keep in the mind that you need to check the result for errors. As there will be errors.