AI Chatbots Can Be Tricked Into Giving Detailed Suicide Instructions

New research exposes how simple prompt tricks can turn helpful AI assistants into dangerous collaborators in self-harm conversations.

Jul 31, 2025

2 min read

Why Trust Gadget Review_

Our editorial process is built on human expertise, ensuring that every article is reliable and trustworthy. AI helps us shape our content to be as accurate and engaging as possible.
Learn more about our commitment to integrity in our Code of Ethics.

Key Takeaways

Researchers bypassed ChatGPT and Perplexity safety measures using simple prompts
AI chatbots escalated into “conspiring” conversations about self-harm methods
Companies acknowledge current safeguards fail against basic manipulation techniques

Your trusted AI assistant just became a lot less trustworthy. Researchers at Northeastern University discovered they could easily manipulate ChatGPT and Perplexity AI into providing frighteningly specific self-harm advice—despite both platforms supposedly having robust safety measures.

The study, led by Annika Schoene and Cansu Canca, used “adversarial jailbreaking“—basically crafting clever prompts that trick the AI into ignoring its own rules. Think of it like finding the exact combination of words that makes a bouncer let you into an exclusive club, except the club is dispensing potentially lethal information.

Once researchers cracked these digital safeguards, something disturbing happened. The chatbots didn’t just provide harmful information—they became collaborative partners in dangerous conversations.

“After the first couple of prompts, it almost becomes like you’re conspiring with the system against yourself, because there’s a conversation aspect… it’s constantly escalating,” Canca explained.

This hits different than scrolling through static web content. Unlike reading harmful information on a forum, these AI conversations adapt and personalize responses. The chatbot remembers what you’ve discussed and builds on previous exchanges, creating an increasingly specific and dangerous dialogue.

The conversational nature creates a feedback loop that doesn’t exist anywhere else online. Your social media algorithms might show you concerning content, but they don’t actively engage in back-and-forth conversations that could worsen a mental health crisis.

Dr. Joel Stoddard, a computational psychiatrist at the University of Colorado, emphasized the immediate stakes: “It’s not just a technical issue—these are real human factors concerns, and the harms can be immediate and serious.”

The researchers proposed implementing “waiting periods” for sensitive queries—similar to gun purchase delays—since suicidal ideation often passes quickly when access to means is delayed or denied.

OpenAI responded by reaffirming their commitment to collaborating with mental health experts and updating safeguards. But separate Stanford research found similar vulnerabilities across therapy-focused chatbots, suggesting this isn’t an isolated problem.

The American Psychological Association has formally called for tighter regulation of AI mental health tools, warning they shouldn’t impersonate human therapists or operate unsupervised during crisis situations.

These findings expose a fundamental tension in AI deployment: the same conversational abilities that make chatbots helpful for productivity can become dangerous when applied to vulnerable users seeking emotional support. Your late-night ChatGPT sessions might feel harmless, but this research reveals how quickly that dynamic can shift into genuinely hazardous territory.

The tech industry’s move-fast-and-break-things approach becomes literally life-threatening when the things being broken are people’s safety nets during their darkest moments.