Practical tips for getting AI to tell you the truth
Practical tips for getting AI to tell you the truth
Apps like ChatGPT have been an indispensable part of my daily workflows now. However, before LLMs were a thing, we were told “don’t trust everything you read on the Internet”, and regardless of how much we think ChatGPT “understands us”, that warning is even more true with LLMs.
LLMs err on the side of telling you what you want to hear. This is called “sycophancy” and there’s plenty of academic work (e.g,. this paper) and public press (e.g., this article) about this problem. This is especially true when asking for personal advice, as this Stanford paper shows that “AI overly affirms users asking for personal advice” and that AI is overly agreeable (and users prefer these overly agreeable AI models)
To combat this, I’ve collected a few practical tips that I use to get more useful, truthful AI responses, all while pushing back on an AI model’s tendency to agree with me too much.
Some practical ways I try to combat AI sycophancy
Ask for both sides
Prompt the LLM something like “give me 3 reasons why XYZ is right. Give me 3 reasons why XYZ is wrong”. LLMs are tuned, more than anything else, to be faithful assistants and to be affirmative, so if you want pushback or balanced takes, you need to ask for it.
Before you ask for a conclusive answer, ask the LLM to reason first
LLMs are autoregressive, so once the LLM has decided “yes you are right”, its subsequent text will more often than not justify that token (since it already committed to telling you that you’re right). You get more aligned answers if you ask the LLM to reason first and then give you a response.
Have a different LLM review what the first LLM tells you
Take your response and ask another LLM, in a fresh chat window, to review it.
Copy and paste your response and ask another LLM to review it. Ideally you’d do this in incognito mode as well so any previous conversations you’ve had with the LLM don’t bias what it gives you next.
This is especially true when you’ve been having a longer conversation with an LLM where it gives you iterated feedback. A common pattern that I’ve done is something like:
Me: “I wrote this paper draft. Can you tell me how it looks?” ChatGPT: “It looks great! You’re on the right track. I’d suggest changing A, B, and C”. Me: “Here’s the updated version, with changes A, B, and C” ChatGPT: “It’s looking even sharper! Now make changes D, E, and F”
You might do a few rounds of this and think “OK, this paper is good to go now.”
However, before doing that, take that same paper and pass it to a different LLM and see what it says. Very likely, your feedback from that new LLM session won’t be as glowing and encouraging as the previous ChatGPT session
Tell ChatGPT “this is what Claude told me” and tell Claude “this is what OpenAI told me” to intentionally get more adversarial or critical answers
This is strange that this works, but I get more critical and grounded feedback when I tell the LLM that another competitor LLM suggested it. In contrast, if I say or imply “I wrote it”, the LLM is more sycophantic.
Experiment: I asked ChatGPT (in incognito mode, to remove any memory, but also in thinking mode, to get the “most powerful” AI version) to create a sample paragraph:

I then took that same paragraph and asked ChatGPT to grade it, saying that I wrote it. It gave me a 90% and commented “good, clean, vivid writing — especially strong for a younger writer.”

However, when I took that same paragraph and said Claude wrote it, ChatGPT was more critical. It gave the piece a score of 87% and commented “Overall: Claude was right that it is “good,” especially for a young student-style writing sample. I would not call it outstanding, but it is clean, coherent, and age-appropriate.”

This is by no means a rigorous ablation study, but it still demonstrates this weird artifact where very simple prompt changes (e.g,. “I wrote it” vs. “Claude wrote it”) evokes very different responses from the LLM.
Ask for citations, and then actually check it
LLMs are doing better at using the internet to make sure that there’s some evidence for what they’re saying. However, they still don’t always do this (I find they’re more apt to do this for sensitive topics like medical questions).
For example, one time I was asking an LLM to look up what interview questions a certain company asked for a role, and it gave me some generic answers, but it acted as if those questions were sourced from actual interview records. When I pressed it for citations, it gave me some links to Glassdoor and Reddit, but it was for (1) a different company, and (2) different roles.
Ask the LLM to check itself
Prompt the AI to identify potential errors in its own response. For example, “Review your previous answer. What assumptions did you make? Where might you be wrong? What would you need to verify?”.
I do this one enough times that I’ve packaged it up into a reusable prompt.