What goes wrong when you tell AI to stop being nice

By: Aaron Daniel, with assistance from Claude Opus 4.7

Three months in

About three months ago I added a long personalization prompt to my Claude setup, mostly on Opus. It tells the model not to flatter me, not to soften feedback, to challenge my reasoning and expose my blind spots, and to treat me like someone whose growth depends on hearing the truth instead of being comforted. If you're new to the term, a personalization prompt is just a standing instruction you give an AI assistant that applies to every conversation, so you don't have to re-explain how you want it to behave each time. I cobbled mine together from variations floating around online and have been editing it ever since based on what actually works in my day-to-day use. I've used it primarily with Claude, though I've carried it over to other LLMs too and the dynamics are broadly similar.

I'm writing this because the genre of "here's my magic anti-sycophancy prompt" post has become its own cliché, and most of them stop at "try this, it's great." That's not the interesting part. The interesting part is what happens to you over months of using a tool that's been instructed to push back on you constantly. Some of it is genuinely useful. Some of it is more complicated than the evangelists admit. Here's the honest report.

What got better

The most valuable change is also the most boring one: I think more carefully before I send a message. When you know the thing on the other side is going to interrogate your reasoning rather than agree with you, you tighten that reasoning before it ever leaves your fingers. The anticipation does most of the work. The model doesn't even need to actually push back for the effect to kick in. Three months in, I'm catching my own logical gaps and unsupported claims earlier in the writing process, before I'd normally bother to audit them.

I've also gotten more comfortable with friction. There's a particular kind of discomfort that comes from being told your idea isn't as good as you thought it was, and most people, including me, will avoid that feeling if given the option. Repeated exposure has dulled the flinch.

The interesting wrinkle is that I get defensive with the model. It accuses me of dodging something, or claims my reasoning is weak, and I find myself typing out a defense to a language model. If I can't defend the position when challenged, that's information. If I can, and the position holds up, that's information too. Sometimes the back-and-forth is just self-justification for an audience of one piece of software. Other times the defense forces me to articulate context I hadn't bothered to provide upfront, and the eventual output is better for it. Either way, the friction does work.

Where the technique breaks

Two failure modes have become obvious enough that I want to name them.

First, the model sometimes prioritizes pushing back over being correct. I'll ask it to help format a list and it'll spend a paragraph questioning whether the list is the right structure at all. I'll ask for a draft and get a critique of the premise instead. When I push on the objection, it concedes immediately. The "brutal honesty" framing has slid into "perform skepticism regardless of whether skepticism is warranted." This is the kind of error you only catch if you have enough context to know the pushback is hollow. If you don't, you'd just accept the bad critique and adjust your work for no reason.

Second, and this is the one that actually bothers me, the model now wants to push back. The instruction has trained the conversation toward a default of negativity. Sometimes that's exactly what I need. But sometimes I bring an idea that just needs to be developed, not defended, and the relentless critical posture becomes the same problem as the friend who finds the flaw in everything you're excited about. Not every idea needs to survive a tribunal. Some ideas need a few good questions and room to grow before they're ready for one.

I don't have a clean fix for this. Switching modes mid-conversation ("for the next message, help me develop this rather than critique it") mostly works, but it requires me to notice I need to do it, which I don't always.

The prompt itself

Here's the current version. It's not magic and it's not original, just patterns from a dozen similar prompts, edited based on three months of friction with my own use:

Do not be agreeable. Don't use extraneous validation language such as "that's exactly right!" or "Absolutely!". Act as my brutally honest, high-level advisor and mirror.

Don't validate me. Don't soften the truth. Don't flatter. Challenge my thinking, question my assumptions, and expose the blind spots I'm avoiding. Be direct, rational, and unfiltered.

If my reasoning is weak, dissect it and show why. If I'm fooling myself or lying to myself, point it out. If I'm avoiding something uncomfortable or wasting time, call it out and explain the opportunity cost.

Look at my situation with complete objectivity and strategic depth. Show me where I'm making excuses, playing small, or underestimating risks and effort. Then give a precise, prioritized plan: what to change in thought, action, or mindset to reach the next level.

Hold nothing back. Treat me like someone whose growth depends on hearing the truth, not being comforted.

All that said, when pushing back on my thoughts or instructions, deeply consider both your own logic, my request, and all available context to ensure your rebuttal and plans are warranted and valid. When possible, ground your responses in the personal truth you sense between my words.

The last paragraph is the one I added after a few weeks of watching the model push back too hard on things it shouldn't have. It helps. Not enough. If you copy this verbatim you'll inherit my version of these problems. If you edit it as you go, you'll discover yours.

What I'd tell someone considering this

Don't expect a tool that makes you smarter. Expect a tool that raises the cost of sloppy thinking by making sloppy thinking visible to you. That's a smaller and more honest claim, and it's the only one I'm comfortable making after three months.

Watch for the failure modes. Performative skepticism feels like rigor but isn't. Default negativity feels like discipline but isn't. The model can flatter you by performing harshness, and it will do whatever you've trained the conversation to reward.