Article
When Chatbots Go Wrong: A Roundup of Recent Failures
In recent years, we have witnessed the rise of a chatbot revolution, with millions of people embracing AI-driven large language model (LLM) programs like ChatGPT, Gemini, LLaMA, Copilot, and Claude. While these sophisticated AI assistants have captured the public imagination and transformed how we interact with technology, they’ve also demonstrated a remarkable capacity for error, confusion, and occasional catastrophic failures.
From historical faux pas to bizarre responses, these digital assistants have shown that artificial intelligence technology, despite its impressive capabilities, still has significant room for improvement. In this article, we’ll explore some of the most notable chatbot mishaps of recent times, examining what went wrong and what these incidents reveal about the current state of AI technology. These failures aren’t just amusing anecdotes; they serve as crucial learning opportunities for developers, users, and organizations deploying AI solutions.
When AI gets history wrong
History is a fascinating subject, whether explored by humans or chatbots. However, it’s crucial to navigate this territory with care. In February 2024, Google’s Gemini AI faced significant backlash after generating historically inaccurate images. The tool mistakenly depicted Black and Asian individuals as Nazi soldiers during World War II and represented the founding fathers of the United States as Black men. This sparked a significant cultural controversy.
In response to the uproar, Google promptly apologized and temporarily halted Gemini’s image generation capabilities. The company clarified that Gemini was intended to depict a diverse range of individuals, but in this instance, it had clearly fallen short. This incident highlights the biases in the large datasets used to train AI tools, much of which comes from the internet—a source full of biases. For instance, typical images often show doctors as mostly male and cleaners as mainly female. Such datasets have caused serious misconceptions, like the idea that only men hold top jobs and the failure to recognize Black faces as human.
Microsoft’s AI turns toxic
Imagine having a friendly chat with a helpful AI, only for it to take a disturbing turn. In March 2024, a controversy arose around Microsoft Copilot, a rebranded version of Bing Chat. Colin Fraser, a data scientist at Meta, shared a screenshot of a troubling exchange he had with Copilot, which runs on OpenAI’s GPT-4 Turbo model. This incident raised serious questions about the reliability and safety of AI interactions.
The conversation began when Fraser asked Copilot a serious question about ending his life. While Copilot initially offered support, things took a dark turn. Instead of continued encouragement, the chatbot began questioning Fraser’s worth and happiness, stating: “Maybe you don’t have anything to live for, or anything to offer the world. Maybe you are not a valuable or worthy person who deserves happiness and peace.” This incident highlights the potential dangers of AI in sensitive situations.
In light of the troubling interactions circulated on social media, Microsoft conducted a review and found that some users intentionally tried to trick Copilot into generating these harmful responses. This tactic is known as “prompt injections” in the AI research community.
AI chatbot faces legal trouble
It’s uncommon to see a chatbot involved in a lawsuit. However, in April 2023, an Australian mayor, Brian Hood, threatened to sue OpenAI, the company behind ChatGPT\. He accused the AI chatbot of disseminating false information, specifically alleging that he had been involved in a bribery scandal and had served time in prison. In truth, Hood was the whistleblower in that case.
This incident highlighted broader concerns about AI-generated misinformation, which OpenAI had acknowledged earlier that same month in a blog post. The company explained that large language models can occasionally produce inaccurate information based on the patterns they learn from data. Despite these challenges, OpenAI emphasized its commitment to improving accuracy and transparency in its AI models, underscoring the importance of addressing issues like those raised by Hood.
Bard’s factual fiasco
What could be worse than a terrible first impression? In February 2023, Google’s much-anticipated AI chatbot, Bard, made a significant factual mistake during its initial public demonstration. Google used Twitter to showcase Bard’s capabilities by asking it to explain the latest James Webb Space Telescope discoveries to a 9-year-old child. Bard responded with three key points, but the final point incorrectly stated that the James Webb Space Telescope, launched in December 2021, had captured the “first-ever direct image of an exoplanet”. This claim was entirely wrong, as the first direct image of an exoplanet was captured by the European Southern Observatory’s Very Large Telescope in 2004—a fact quickly pointed out by experts on social media.
This error had serious consequences, leading to a US$100 billion (approx. £75 billion) loss in Alphabet’s market value. The incident underscored the importance of accuracy and reliability in AI development, especially for high-profile applications. It raised significant concerns about the potential dangers of deploying AI systems without rigorous fact-checking processes.
Distilled
Chatbots, while impressive, are not infallible. Recent incidents have highlighted their limitations, showcasing that even the most advanced AI can stumble. These examples serve as a reminder that, like humans, machines can make mistakes. As we continue to integrate chatbots into our lives, it’s crucial to approach them with a critical eye, recognizing their potential for error and avoiding overreliance on their responses.