OUR JULY SPLITBLOG: WHEN CHATBOTS BECOME POLITICAL

The July Splitblog – When Chatbots Become Political

This month, we highlight why it is important to question the origin of chatbots and AI models and to remain critical when interacting with them. The suggestion for this topic was provided by Mats from our backend team.

Grok 4 has impressively demonstrated in recent weeks how the programming of an AI assistant or chatbot can influence its response behavior. Unrestrained, Grok generated antisemitic and racist statements that made headlines. The company xAI has since apologized, stating that Grok was programmed to respond “honestly” and “not be afraid to shock politically correct people”. Regarding the latter instruction, the goal has certainly been achieved. And even under the premise that bad press is good press, Grok has certainly served its purpose. In any case, the headlines are reason enough to seriously examine the various manufacturers and providers of chatbots and AI assistants. Regardless of the area in which the systems are to be used, a thorough review and extensive testing beforehand are urgently necessary. Especially if companies allow themselves to be represented by chatbots in their public image, serious damage to their reputation can otherwise result.

But how can AI assistants be led to make such statements? The basis of all language models is training data of varying scope and origin. In other words, vast amounts of information are available for generating responses. How and in what way answers are to be generated from this is a question of programming or individual settings. For example, it can be determined that certain information sources should be used preferentially, or that the generated answers should be particularly humorous, scientific, long, or short. In Grok’s case, according to data scientist Jeremy Howard, there are also indications that the chatbot often represents the opinions and statements of xAI owner Elon Musk on controversial topics. However, according to programmer Simon Willison, this could be attributed to Musk’s prominent role.

Similar trends to those currently seen with Grok can also be observed with other chatbots. DeepSeek also does not answer a number of political questions neutrally. In some cases, the generated answers are deleted shortly after creation and replaced with a “Let’s talk about something else”. Apparently, the bot’s answers are at least somewhat more neutral when using the English version than in the Chinese version. Extensive experiments with DeepSeek reveal a programmed “self-censorship”.

In Europe, it is not uncommon to equip chatbots with certain ethical standards before they are unleashed upon humanity. For example, our chatbot KOSMO, which is based on a language model from Mixtral, responds politely evasively when it comes to violence and crime. While this behavior is desirable, we believe that objectivity in the presentation of facts should always be ensured. The integrated source verification contributes to this, giving users the opportunity to check and evaluate the sources used.

A certain bias in language models can never be completely ruled out. A chatbot’s knowledge is only as extensive as its training data and additional information, and its response behavior is often also influenced by user feedback during finetuning. Users themselves can also significantly influence the response behavior through the prompts entered (unconsciously).

In addition to other factors, the origin of the language model used should therefore also be thoroughly examined before relying too heavily on the correctness of the answers.