Brendan's Blog

8. The ChatBot said WHAT?!?! - Artificial Intelligence Fails That Every Insurance Executive Should Know About

Written by Brendan McLoughlin | May 30, 2024 12:35:31 PM

Brendan McLoughlin, President of e123, is participating in an executive education course at the Massachusetts Institute of Technology on Artificial Intelligence (AI) and its implications for business strategy. This is the eighth in a series of blog posts where he shares the insights he is gaining and how they apply to health insurance distribution.

Artificial intelligence (AI) has the potential to dramatically disrupt health insurance distribution as we know it. But to appropriately utilize the power of AI as a competitive advantage, insurance executives must understand where AI is well suited to tackle business problems and where it might struggle. Every day, new examples of AI failures come to light, and for insurance executives seeking to implement AI in their distribution operations, knowing the weaknesses of the technology is just as important as understanding its power.

The infamous Air Canada case

In 2022, Air Canada’s chatbot “Claude” gave a customer very clear and precise information about their bereavement policy, stating “If you need to travel immediately or have already traveled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.” Well done Claude! The information provided was timely, thorough and understandable.

It was also 100% wrong.

When the customer later tried to submit a refund, they were denied. Despite the customer having a screenshot of their interaction with Claude, Air Canada denied the refund. They pointed to the published bereavement policy, on the very same website that housed Claude, which stated that post-travel refunds would not be honored. Air Canada found themselves in court and the judge ruled that they were liable for any information supplied by their chatbot, even if contradicted by other publicly posted policies.

While the actual damages in this case were small, just a few hundred dollars for the refund, the implications for health insurance marketers are significant. Imagine the damage to a carrier, both financially and reputationally, if their chatbot erroneously told customers that a policy would cover 100% of cancer treatments. Customer experience is a vital strategic differentiator for all insurance companies, and the emotional and psychological damage created by such a mistake would far outweigh any cost advantages of implementing AI.

Don’t blame Claude

Poor Claude did everything right - everything it was taught to do - and its failure demonstrates a challenge with AI that is as old as computing: “garbage in, garbage out.” Claude’s answers were powered by a machine learning (ML) algorithm that taught it the appropriate answers to a huge variety of customer questions. But at no point does ML actually understand the question or the answer. Think of it like training a dog. It is easy (depending on the breed of course), to train a dog to sit when you tell them to “sit”. But it is equally as easy to train a dog to sit when you command them to “stand”. If the training is not conducted properly, ML algorithms will return the wrong answer while being highly confident that their answer is right.

AI can’t even catch COVID

One interesting case study of the challenges of training ML came from the COVID epidemic. As the epidemic reached its peak, researchers scrambled to build ML algorithms to assess chest scans hoping to identify early COVID cases that were likely to become severe. It seemed like a perfect use of ML, with lots of training data available and a very clear outcome to predict - either this patient ends up on a respirator or not. MIT assessed hundreds of these tools and found that none helped, and many may have made outcomes worse.

MIT’s assessment is that all of these promising tools failed at the training stage. These algorithms were trained using a vast array of third-party data - chest scans from around the world. Problems arose in that much of this data was duplicated, with different sources repeating the same data and thus over-emphasising certain factors in the training stage. A great example of this is that most of the chest scans of severe cases were taken with the patient lying down, where a much larger portion of the mild case patients were scanned sitting up. With such a strong correlation, the algorithms “learned” that the position of the spinal cord and rib cage were dominant indicators of the likelihood that a COVID case would become severe - utter nonsense to you or I, but rock-solid logic to AI.

One can easily imagine this kind of issue arising in L&H insurance. Let’s say, for example, we are training an ML algorithm to predict the best policy for a new customer. In the training phase, the algorithm might be influenced by the fact that, on the whole, customers who don’t submit many claims (presumably those in good health) have higher satisfaction and renewal rates. In this scenario, AI might actually steer customers to policies that do not meet their needs since fewer claims will result in higher satisfaction and greater renewal. Clearly that is not the objective and, in addition to unhappy customers, this kind of AI failure could cause tremendous damage to the carrier’s reputation and bottom line.

“I do not think that word means what you think it means”

Another area where AI can struggle is sentiment analysis, and these AI failures can range from comical to significant. Sentiment analysis is the subset of AI that tries to assess the emotional tone behind a statement, which helps determine the attitudes, opinions, and emotions expressed within an online mention. Sentiment analysis is commonly used in applications such as customer feedback, social media monitoring, and market research, where it helps businesses classify comments as positive, negative or neutral.

While this evolving field within AI has tremendous promise for business applications, it is also particularly susceptible to the limitations of AI. Sentiment analysis was used in the early days of automated trading, and from there we see a potential example of its failure. In 2011, a journalist noticed that over the previous three years, every time actress Anne Hathaway was in the news for something positive - a movie premiere, hosting the Oscars - Berkshire Hathaway stock would see out-sized daily gains. He hypothesized that automated trading algorithms were picking up the news articles and using sentiment analysis to determine that people were saying positive things about Berkshire Hathaway, and that the stock was a “buy”. Interestingly, this theory can neither be proven nor disproven because even the AI algorithms cannot explain why they reach the conclusions they do.

Another more obvious example of AI struggling with sentiment analysis is from WestJet’s customer feedback system that is powered by sentiment-driven response. A seemingly satisfied customer left the following comment in the system.

“Shout out to the crew member on my flight today who helped me take care of a plant cutting by breaking out her own duct tape and helping me make a protective case to get a new succulent home."

To this comment, the AI-generated system inexplicably responded “We take these comments very seriously. If you're having these thoughts, please reach out to the Association for Suicide Prevention” and provided the phone number and website for the suicide prevention hotline. The interaction went viral and, while harmless on its own, got people talking about WestJet for all the wrong reasons.

The importance of executive leadership

AI is getting more and more sophisticated every day, and data scientists are working tirelessly to address the shortcomings inherent in the technology. But everyone agrees that the current state of AI is far from perfect, and business implementations of AI will continue to be susceptible to limitations of the technology and of the humans that program it.

So as insurance business leaders, it is our responsibility to help minimize or compensate for the areas where AI solutions are likely to fail. One simple way to start is to ensure that AI is not implemented in a technology bubble, but rather is executed as part of a well-thought-out strategic plan.

Conclusion

While AI holds tremendous potential to transform the life and health insurance distribution landscape, it is imperative for executives to understand its limitations as well as its capabilities. The examples of AI failures, from chatbots providing incorrect information to sentiment analysis misinterpretations, underscore the importance of cautious and well-informed implementation. By integrating strategic business goals with AI projects, involving diverse teams, starting with small, focused initiatives, and ensuring human oversight, insurance companies can harness AI’s power effectively while mitigating risks. As the technology continues to evolve, those who approach AI with a balanced perspective of its strengths and weaknesses will be best positioned to achieve competitive advantages and drive innovation in the industry.

Want to learn more about the future of AI in insurance distribution? Get in touch here. For prior posts in this series, click here or below: