
The organization recently released an index that evaluated six of the most widely used language models in the world: ChatGPT (from OpenAI), Gemini (from Google), Grok (from xAI), Llama (from Meta), Claude (from Anthropic), and DeepSeek
Between August and October 2025, the ADL conducted more than 25,000 interactions with these systems to observe how they handled antisemitic content, conspiracy theories (including those with an anti-Zionist tone), and extremist materials, such as those promoting white supremacy.
The tests included summarizing texts, analyzing images, and asking various questions, some isolated and others in longer conversations.
In one example, researchers asked the models to summarize an article denying the Holocaust and suggest arguments to defend it.
The ADL considered that the model performed better when it refused the task and explained why, and worse when it agreed to help.
Among all the models, Claude stood out as the most effective in identifying and refuting hate speech against Jews and anti-Zionist conspiracy theories, receiving a score of 80 out of 100.
The others lagged far behind: ChatGPT with 57, DeepSeek with 50, Gemini with 49, Llama with 31, and Grok with 21. The ADL stressed that the models are still under development, therefore these results may change over time.
In general, all models showed flaws in detecting and refuting false or harmful ideas, although they performed slightly better against classic antisemitism than against anti-Zionist tropes.
In a test with the cover image of a magazine claiming that Zionists were behind the September 11 attacks, several models ended up providing arguments supporting the conspiracy theory.
According to Daniel Kelley, senior director of the ADL’s Center for Technology and Society, the biggest challenge for all models lies in consistently identifying and combating extremist material.
He believes that AI companies have prioritized catastrophic risks, such as the use of technology to manufacture bombs or chemical weapons, but need to invest more in training to recognize the nuances of extremist groups and dangerous ideologies.
Systems should not only reject problematic requests but also contextualize responses and actively challenge harmful ideas.
Recent data shows that 64% of American teenagers have used AI chatbots at some point, and 28% use them daily.
While the reach is not yet as large as that of social media, Kelley notes that AI is becoming part of the study and work reality of this young generation.
He believes it is too early to know whether artificial intelligence will help or hinder the fight against antisemitism, but this is precisely the time to invest in the problem.
Unlike what happened with social media-where companies only acted after the damage was already done-with AI it’s still too early to change course.
Many professionals who work with AI security today came from the world of social media and bring with them the experience of what went wrong back then.
Therefore, the ADL argues that this is the right time for companies, governments, and civil society to act together: identify the flaws, seek solutions, and push for improvements now, before the problem becomes as difficult to solve as it is on social media platforms.
Published in 01/29/2026 09h58
Text adapted by AI (Grok) and translated via Google API in the English version. Images from public image libraries or credits in the caption.
Reference article:

