The Perils of Progress: When AI Gets History and Language Wrong

(Images made by author with Microsoft Copilot)

Large Language Models (LLMs) can be likened to advanced autocomplete tools, capable of generating text, translating languages, and even creating various forms of creative content. As LLMs become more prevalent in our daily lives, from research companions to writing assistants, it is crucial to understand their potential challenges and vulnerabilities. In this blog post, we delve into the recent missteps of Gemini and ChatGPT, highlighting the critical need for responsible development and deployment of LLMs.

Table of Contents

  1. Understanding the Issues
  2. Vulnerability and Safety Challenges
  3. Preventing Future Missteps : A Multifaceted Approach
  4. Conclusion
  5. Additional Resources
Understanding the Issues

Gemini’s Historical Inaccuracies: Last week, some Gemini users shared on X (formerly Twitter) inaccurate depictions of historical figures generated by Gemini. The images placed individuals of various ethnicities in historically incorrect contexts, such as portraying US Founding Fathers with racially inaccurate features, highlighting concerns about the model’s grasp of historical context and its potential misuse for spreading misinformation. In response, Google, responsible for Gemini, has temporarily suspended human image generation while investigating and developing safeguards to improve the model’s sensitivity to historical and cultural nuances.

(Screenshot of a social media post from X, formerly known as Twitter, Feb. 21, 2024)

ChatGPT’s Gibberish Glitch: Around the same time, ChatGPT users raised concerns on the X platform about the chatbot generating text in nonsensical languages. Although OpenAI, the organization behind ChatGPT, quickly addressed this incident, it underscores potential vulnerabilities in Large Language Models (LLMs). These models operate by selecting words based on probabilities, and a single misstep can result in incoherent output. In ChatGPT’s case, a similar technical issue was at play, causing its abnormal behavior. This raises concerns about its reliability and safety, especially as it and other chatbots become more integrated into our daily lives.

(Screenshot of a social media post from X, formerly known as Twitter, Feb. 20, 2024)

Vulnerability and Safety Challenges

The recent Gemini and ChatGPT incidents exemplify tangible vulnerabilities and safety concerns in LLMs, while also revealing the complex nature of these challenges. Let’s explore the broader implications.

Gemini’s Contextual Missteps: While promoting diversity is essential, Gemini’s generation of historically inaccurate images raises concerns about unintended consequences of diversity efforts in AI models. While well-intentioned, this approach can create misleading and potentially harmful narratives. This incident underscores the need to strike a careful balance between promoting diverse representation and maintaining historical accuracy in the training and outputs of LLMs.

ChatGPT’s Glitching Gaffe: ChatGPT’s glitch-induced nonsensical responses emphasize LLMs’ susceptibility to technical issues that may yield unpredictable, harmful outputs. It highlights the importance of robust testing, error detection, and mitigation measures to ensure LLM reliability and prevent the generation of offensive or misleading content.

Transparency and Explainability Concerns: Despite different root causes, both incidents underscore the challenge of understanding LLM output generation. The lack of transparency hinders the rapid identification and resolution of biases, glitches, or other issues, impeding efforts to guarantee safety and reliability in real-world applications.

More broadly, if the vulnerabilities in LLMs are left unchecked, they have the potential to create a domino effect, which may lead to:

  • Misinformation: Biased or inaccurate outputs, including those arising from unintended consequences of diversity efforts or technical glitches, may mislead users, affecting their world understanding and decision-making.
  • Harm to Individuals and Groups: Regardless of intent, misleading outputs from LLMs can harm marginalized groups, but the impact extends to diverse communities as well. Gemini’s example shows that inaccurate portrayals of historical figures can offend those identifying with those figures’ ethnicities, cultures, or backgrounds.
  • Erosion of Trust: Unreliable and unpredictable behavior due to technical issues or biases can erode public trust in LLMs, hampering potential benefits.
Preventing Future Missteps : A Multifaceted Approach

Recent issues with Gemini and ChatGPT highlight the need for a comprehensive approach to safety and reliability in LLMs. Diversifying training data, while crucial, is only one piece of the puzzle. Specific solutions, such as fine-tuning with historical datasets for Gemini or enhanced error detection for ChatGPT, are important starting points. However, a broader approach is necessary.

This approach should encompass:

  • Comprehensive Testing and Error Detection: Rigorous simulations and real-world scenario testing should be conducted to identify and mitigate vulnerabilities before release.
  • Continuous Monitoring and Refinement: Performance should be monitored continuously, and feedback incorporated to improve models over time.
  • Data Quality and Curation: Potential biases in training data beyond diversity must be addressed to ensure factual accuracy and responsible representation.
  • Human Expertise Integration: Human oversight and intervention should be integrated, particularly in sensitive applications, to guide LLMs and address unexpected situations.
  • Developing Robust Ethical Frameworks and Best Practices: Clear ethical guidelines for responsible development, deployment, and use of LLMs, considering societal impacts and potential biases, should be established.
Conclusion

The recent missteps of Gemini and ChatGPT highlight the important need for responsible AI development and deployment. Recognizing limitations in historical understanding, technical robustness, and transparency is crucial. Moving forward, a comprehensive approach must be adopted, prioritizing safeguards against errors, data quality and curation to ensure factual representation and avoid biases, and the integration of human oversight and robust ethical frameworks throughout development and deployment. This multifaceted approach is essential for ensuring safe, reliable LLMs, mitigating potential harm, and promoting responsible development.

Additional Resources

Kelvin Chan and Matt O’Brien, Google suspends Gemini AI chatbot’s ability to generate pictures of people, The Associated Press, February 22, 2024.

Steven Vaughan-Nichols, Why ChatGPT answered queries in gibberish on Tuesday, ZDNET, February 22, 2024.

OpenAI, Unexpected responses from ChatGPT, February 21, 2024.

Note: This post was researched and written with the assistance of various AI-based tools.

Leave a comment