Why it takes humanity to advance conversational AI

Couldn’t attend Transform 2022? View all summit sessions in our on-demand library now! Watch here.


Conversational AI is a subset of artificial intelligence (AI) that allows consumers to interact with computer applications as if they were interacting with another human. According to Deloitte, the global AI chat market is set to grow by 22% between 2022 and 2025 and is estimated to reach $14 billion by 2025.

Providing enhanced language adaptations to cover a wide variety and large group of hyperlocal audiences, many practical applications of this include financial services, hospital wards and conferences and can take the form of a translation application or a chatbot. According to Gartner, 70% of white-collar workers are said to regularly interact with chat platforms, but this is just a drop in the ocean of what may unfold this decade.

Despite the exciting possibilities in the field of artificial intelligence, there is a significant obstacle. the data used to train conversational AI models do not adequately take into account the subtleties of dialect, language, speech patterns and inflection.

When you use a translation app, for example, a person will speak in their source language and the AI ​​will calculate that source language and convert it into the target language. When the source speaker deviates from a standardized learning pronunciation — for example, if they speak with a local accent or use local slang — the effectiveness rate of live translation drops. This not only provides an inferior experience, but also hinders users’ ability to interact in real time, whether with friends and family or in a business environment.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to provide guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.

Register here

The need for humanity in AI

In order to avoid falling efficiency rates, AI needs to make use of different data. For example, this could include accurately mapping speakers across the UK — both regionally and nationally — to provide better active translation and speed up interaction between speakers of different languages ​​and dialects.

The idea of ​​using training data in ML programs is a simple one, but it’s also fundamental to how these technologies work. The training data operates on a unique reinforcement learning framework and is used to help a program understand how to apply technologies such as neural networks to learn and produce sophisticated results. The wider the pool of people interacting with this technology on the back-end, for example speakers with speech impairments or who stutter, the better the resulting translation experience will be.

Specifically within the translation space, focusing on how a user speaks than what which they talk about is the key to enhancing the end user experience. The darker side of reinforcement learning was illustrated in the recent news with Meta, which recently came under fire for having a chatbot that spewed intrusive comments — which it learned from public interaction. Therefore, the training data should always have a human-in-the-loop (HITL), in which a human can ensure that the overall algorithm is accurate and fit for purpose.

Accounting for the active nature of human conversation

Of course, human interaction is incredibly varied, and creating conversational design for bots that can navigate its complexity is a perennial challenge. However, once achieved, well-structured, fully implemented chat design can ease the burden on customer service teams, translation applications, and improve customer experiences. Beyond local dialects and slang, the training data must also take into account active conversation between two or more speakers interacting with each other. The bot needs to learn from their speech patterns, the time it takes to make an interjection, the pause between speakers and then the response.

Prioritizing balance is also a great way to ensure that conversations remain an active experience for the user, and one way to do this is by eliminating dead-end responses. Think of this as similar to being in an improvisational environment, in which “yes, and” statements are fundamental. In other words, you’re supposed to accept your partner’s world-building while bringing a new element to the table. The most effective bots work in a similar way by openly articulating answers that encourage additional inquiries. Offering options and additional, relevant options can help ensure that all end-user needs are met.

Many people find it difficult to remember long trains of thought or take a little longer to process their thoughts. Because of this, translation apps would do well to give users enough time to calculate their thoughts before pausing at the end of an intervention. Training a bot to learn filler words — including so, erm, well, um, or like, in English for example — and having it associate a longer delivery time with those words is a good way to allow users to engage in a more realistic real-time conversation. Offering targeted “barge-in” programming (opportunities for users to interrupt the bot) is also another way to more accurately simulate the active nature of the conversation.

Future innovations in conversational artificial intelligence

Conversational AI still has a long way to go before all users feel accurately represented. Taking into account the subtleties of dialect, the time it takes for speakers to think, and the active nature of a conversation will be instrumental in moving this technology forward. Especially in the field of translation applications, recording pauses and thought words will improve the experience for everyone involved and simulate a more natural, active conversation.

Pulling the data from a wider dataset in the back-end process, for example learning from RP and Geordie English dialects, will prevent a translation from dropping in effectiveness due to pronunciation processing issues. These innovations provide exciting possibilities, and it’s time for translation apps and bots to take linguistic subtleties and speech patterns into account.

Martin Curtis is CEO of Palaver

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data technicians, can share data-related insights and innovations.

If you want to read about cutting-edge ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read more from DataDecisionMakers

Leave a Comment