Even Google Insiders Are Questioning Bard AI Chatbot’s Usefulness

Product managers and designers are skeptical about the tool, messages from an ‘official’ Discord group show

(Bloomberg) — For months, Alphabet Inc.’s Google and Discord Inc. have run an invitation-only chat for heavy users of Bard, Google’s artificial intelligence-powered chatbot. Google product managers, designers and engineers are using the forum to openly debate the AI tool’s effectiveness and utility, with some questioning whether the enormous resources going into development are worth it.

“My rule of thumb is not to trust LLM output unless I can independently verify it,” Dominik Rabiej, a senior product manager for Bard, wrote in the Discord chat in July, referring to large language models — the AI systems trained on massive amounts of text that form the building blocks of chatbots like Bard and OpenAI Inc.’s ChatGPT. “Would love to get it to a point that you can, but it isn’t there yet.”

“The biggest challenge I’m still thinking of: what are LLMs truly useful for, in terms of helpfulness?” said Googler Cathy Pearl, a user experience lead for Bard, in August. “Like really making a difference. TBD!”

Ever since Google released Bard, its answer to OpenAI’s popular ChatGPT bot, in March, it has added a steady stream of new features to the product, including the capability for the AI tool to analyze photos and generate responses to queries in dozens of languages. Last month, Google unveiled its most ambitious update yet: connecting Bard to its most popular services, such as Gmail, Maps, Docs and YouTube. The company rolled out the app integrations, starting in English, on Sept. 19.

But as Google has further integrated Bard into its core products, the company has also been beset with complaints about the tool generating made-up facts and giving potentially dangerous advice. The same day the company introduced app extensions, it also announced a Google search button on Bard to help people double-check the tool’s AI-generated responses for factuality against results from its search engine.

Other experts have raised concerns about the working conditions of the thousands of low-paid contractors training Bard, based on what the workers say are convoluted instructions that they’re asked to complete in minutes. Inside and outside the company, the internet-search giant has been criticized for providing low-quality information in a race to keep up with the competition, while brushing aside ethical concerns.

For Google, ensuring the success of its Bard AI chatbot is of utmost importance. The company is far and away the leader in search, its financial lifeblood that generates about 80% of parent company Alphabet’s revenue. But as generative AI has exploded onto the scene, Google’s search dominance has been challenged, with some predicting that the new and buzzy tools from OpenAI and other startups could upend Google’s powerful position in the market.

Two participants on Google’s Bard community on chat platform Discord shared details of discussions in the server with Bloomberg from July to October. Dozens of messages reviewed by Bloomberg provide a unique window into how Bard is being used and critiqued by those who know it best, and show that even the company leaders tasked with developing the chatbot feel conflicted about the tool’s potential. Expounding on his answer about “not trusting” responses generated by large language models, Rabiej suggested limiting people’s use of Bard to “creative / brainstorming applications.” Using Bard for coding was a good option too, Rabiej said, “since you inevitably verify if the code works!”

The debate about Bard’s limitations and potential on Google’s Discord channel is a “routine and unsurprising” part of product development, Google said in a statement.  “Since launching Bard as an experiment, we’ve been eager to hear people’s feedback on what they like and how we can further improve the experience,” said Jennifer Rodstrom, a Google spokesperson. “Our discussion channel with people who use Discord is one of the many ways we do that.” The company added that it launched the Discord server as an invitation-based community ahead of making it more widely accessible.At Bard’s launch, the company was upfront about its limitations, including about the possibility for the AI tool to generate convincing-sounding lies. Anytime someone uses Bard, Google includes a disclaimer on the tool that states: “Bard may display inaccurate or offensive information that doesn’t represent Google’s views.” Company representatives have also said that Google carried out adversarial testing — meant to probe how it would respond to potential bad actors — internally before Bard was rolled out, and that the company expects to learn more as the public continues to use it. Read More:  Google’s AI Bot Trained by ‘Scared, Stressed, Underpaid’ Workers 

The Discord server was started back in July, when thousands of invites were sent out to frequent users of Bard outside the company. “Share thoughts and ideas directly with the team behind Bard, get early notifications about product updates, and connect with other AI enthusiasts,” the invitation, sent on July 10, said. The server description calls the channel the “official” community for Bard users, and Bard’s senior product director, Jack Krawczyk, sent a selfie video to the community as the tool launched in Europe. 

Discord didn’t respond to a request for comment about the chat.

Almost 9,000 people are currently members of the online community, and a few of the chat’s moderators are employees of Discord. Most discussions revolve around cheerleading Bard and AI; some users made fantastical, and likely misguided, claims about the tool’s capabilities, including that they had built a quantum chess computer using Bard or that they could use the bot to trawl the web for data on baseball betting odds and run complex simulations. (Google employees chimed in on the Discord chat to say that Bard didn’t have those capabilities.)

Daniel Griffin, a recent Ph.D. graduate from University of California at Berkeley who studies web search and joined the Discord group in September, said it isn’t uncommon for open source software and small search engine tools to have informal chats for enthusiasts. But Griffin, who has written critically about how Google shapes the public’s interpretations of its products, said he felt “uncomfortable” that the chat was somewhat secretive.

The Bard Discord chat may just be a “non-disclosed, massively-scaled and long-lasting focus group or a community of AI enthusiasts, but the power of Google and the importance of open discussion of these new tools gave me pause,” he added, noting that the company’s other community-feedback efforts, like the Google Search Liaison, were more open to the public.

Over in the Bard forum, users brought up other thorny Google-related issues, giving insight into how the tech giant works hard to mitigate public criticism. In mid-July, a member of the group raised the subject of Project Nimbus, a $1.2 billion contract for Google and Amazon.com Inc. to supply Israel’s military with artificial intelligence tools, according to a Bloomberg review of the messages. The member had raised concerns about Google’s role in enabling lethal uses of AI, and was swiftly banned from the group, with the moderators telling users they must avoid “politics, religion, or other sensitive topics” in the chat.

That same month, another user questioned why Google had relied on “underpaid and overworked contractors” in order to refine Bard’s responses. Though the company has publicly stated that it isn’t only relying on contractors to improve the AI powering Bard, and that there are a number of other methods for improving its accuracy and quality, Tris Warkentin, a Bard director of product management, responded by emphasizing the importance of human input to train Bard’s algorithms. 

“Human refinement is critical so that Bard can be a product for everyone; the alternative is that users have no ability to guide the product’s functionality, which would be a huge mistake in my opinion,” Warkentin wrote in the chat. “We don’t need an ‘ivory tower’ product — we need something that can work for everyone!”

People also exchanged views about the consequences of the enormous costs needed to maintain large language models. “Is any work being done on reducing the staggering resource costs of LLMs?” asked one user in the Discord server. “Particularly the water usage per query, and the massive need for GPUs (which require extensive mining to produce)?”

“I kind of look at it like chip design… or supercomputers,” Pearl, the Bard user experience lead, responded. “I believe we will continue to find ways to get the same behavior with less resources.”

Concerns about Bard’s accuracy have also abounded in the chat. Warkentin, the product manager, stressed in a discussion about Bard’s fabrications that Google had made strides since the AI tool was released. “We are very focused on reducing hallucination and increasing factuality; it’s one of our key success metrics,” he said. “We’ve improved quite a bit since launch, but it’s ongoing work, so please keep trying and sending us feedback when something is not right!”

In late September, the official Bard account on Discord posted a question-and-answer summary of an “office hours” event, which aimed to address the community’s questions about Bard’s newly announced integrations with Google apps. In response to a question about whether there was any chance of Bard deviating from reality while summarizing emails, the official Bard account said: “We’ve done our best to make sure this happens as little as possible. But since Bard is still learning and growing, it could happen.” People should check the sources Bard uses, and refer back to them, the account said. “If Bard does hallucinate with any of the integrations, please let us know in the bug reports channel!”

Rabiej, the Bard product manager, also underlined the importance of the AI tool’s new “Double-check the responses” button. “It’ll highlight stuff that is probably not correct in orange,” he said in October. He also reiterated that Bard doesn’t have any true understanding of the text it ingests, but that the tool simply responds with more text, depending on a user’s prompts: “Remember, Bard, as any large language model, is generative — it is not looking up stuff and summarizing it for you, it is generating text.”

Other employees voiced ambivalence about generative artificial intelligence more broadly. “Taking a step back from my generally negative outtake on the impact that Gen AI could have, I do think education is one of the most interesting and possible highest ‘do good areas’ for this tech,” said James, a user experience designer for Bard, in the Discord community.

Institutions from higher and lower education might use the technology to “help create richer experiences for students by having almost 24/7 access to support on different subjects,” James said, “once the general scare of it is over.”

More stories like this are available on bloomberg.com

©2023 Bloomberg L.P.