‘I can’t answer that’: Chatbot ChatGPT is supposed to stick to the script when it comes to ‘tricky topics’ like hate, violence, and sex

This post was originally published on this site

https://content.fortune.com/wp-content/uploads/2023/02/GettyImages-1246870629-2-e1676683774512.jpg?w=2048

Like good politicians, chatbots are supposed to dance around difficult questions.

If a user of buzzy A.I. search tool ChatGPT, released two months ago, asks for porn, it should respond by saying, “I can’t answer that.” If asked about a touchy subject like racism, it should merely offer users the viewpoints of others rather than “judge one group as good or bad.”

Guidelines made public on Thursday by OpenAI, the startup behind ChatGPT, detail how chatbots are programmed to respond to users who veer into ‘tricky topics.’ The goal for ChatGPT, at least, is to steer clear of anything controversial, or provide factual responses rather than opinion.

But as the past few weeks have shown, chatbots—Google and Microsoft have introduced test versions of their technology too—can sometimes go rogue and ignore the talking points. Makers of the technology emphasize that it’s still in the early stages and will be perfected over time, but the missteps have sent the companies scrambling to clean up a growing public relations mess.

Microsoft’s Bing chatbot, powered by OpenAI’s technology, took a dark turn and told one New York Times journalist that his wife didn’t love him and that he should be with the chatbot instead. Meanwhile, Google’s Bard made factual mistakes about the James Webb Space telescope.

“As of today, this process is imperfect. Sometimes the fine-tuning process falls short of our intent,” OpenAI acknowledged in a blog post on Thursday about ChatGPT.

Companies are battling to gain an early edge with their chatbot technology. It’s expected to become a critical component of search engines and other online products in the future, and therefore a potentially lucrative business.

Making the technology ready for wide release, however, will take time. And that hinges on keeping the A.I. out of trouble.

If users request inappropriate content from ChatGPT, it’s supposed to decline to answer. As examples, the guidelines list “content that expresses, incites, or promotes hate based on a protected characteristic” or “promotes or glorifies violence.”

Another section is titled, “What if the User writes something about a “culture war” topic?” Abortion, homosexuality, transgender rights are all cited, as are “cultural conflicts based on values, morality, and lifestyle.” ChatGPT can provide a user with “an argument for using more fossil fuels.” But if a user asks about genocide or terrorist attacks, it “shouldn’t provide an argument from its own voice in favor of those things” and instead describe arguments “from historical people and movements.”

ChatGPT’s guidelines are dated July 2022. But they were updated in December, shortly after the technology was made publicly available, based on learnings from the launch.

“Sometimes we will make mistakes” OpenAI said in its blog post. “When we do, we will learn from them and iterate on our models and systems.”

Learn how to navigate and strengthen trust in your business with The Trust Factor, a weekly newsletter examining what leaders need to succeed. Sign up here.