3 ways the best product teams solve AI feedback problems
Sara Remsen
Jul 25, 2024
The promise of AI, for so many users and companies, is the idea of continuous improvement. It's not enough to simply launch a customer chatbot or write an AI-generated email; the most valuable AI model learns from user behavior and human feedback so that it's able to nimbly meet our needs.
With recent strides in foundational Large Language Models like OpenAI’s GPT and Anthropic’s Claude, capable of automating complex workflows and generating near-infinite content, this dream of customization and flexibility feels even closer.
While it's an exciting vision, there are some big stumbling blocks even the most nimble product teams encounter on the path to their goal. Because collecting — and applying — human feedback isn't as simple as you might think, and it's important to build a healthy foundation for your product from the get-go.
We’ve talked to hundreds of product teams and gone through this process firsthand ourselves, and we often see teams approach the process by “eyeballing it” or doing “vibe checks.” Honestly, this is fine to get a product out the door, but the process starts to break down as volume scales and pressure to drive business growth with AI investment ramps up.
The best AI teams have a few approaches to collect, organize, and process human feedback to create the data flywheel necessary for continuous AI improvement. These teams invest in:
Collecting explicit feedback in addition to implicit feedback
Including the data context when collecting feedback
Recognizing human feedback needs to be reviewed to unlock AI improvement
An example of OpenAI's explicit thumbs up / down feedback
1. Collecting explicit feedback in addition to implicit feedback
As its name implies, generative AI features create something for the end user: an email, an image, a search result, code, etc. When teams build these capabilities into their products, they often show up as AI-generated suggestions that the user can then accept or reject.
Good product teams track this implicit feedback: the higher the rates of “acceptance” in this user flow, the better the model performs. However, the best AI product teams know that explicit feedback is equally important.
What can go wrong: Implicit feedback metrics can mask real user behavior. For example, if a user accepts an AI-generated email, they may then go back and make multiple edits. Basing model improvements solely on this metric will include many false positives.
What to do instead: Implicit feedback is important, but it’s critical to track explicit feedback as well. A simple thumbs up / thumbs down with a comment box provides an avenue for users to flag real issues. If the implicit feedback rate for your AI-generated email is high, but your feedback ratio is low, then it becomes clear how you may want to update your model to improve the experience for customers.
Pendo's feedback component, which is helps users provide feedback on screens, not on AI-generated data
2. Including data context with the feedback
Any feedback is better than no feedback, but most off-the-shelf tools are missing the most important part of AI user feedback: the data context.
What can go wrong: Historical bug reporting or feedback tools, like Pendo or Userbit, were designed for users to provide feedback on screens. They flag problems like “I tried to check out but I got this error.” It’s easy for the product team to recreate the error when they go to the same page.
In the age of AI, we need feedback tools designed to provide feedback on the AI-generated data that’s specific to them. If a user leaves the feedback “the action items in this email aren’t right,” the product team needs to understand what the action items were in order to diagnose and fix the problem.
What to do instead: When collecting user feedback, make sure that it’s tracked alongside the data that the user actually saw. This happens automatically in Melodi, but you can also link context and feedback if you are logging your AI responses on other platforms. When you have the feedback in context of the data the user was assessing, it accelerates the process for everyone on the product team to diagnose and fix the problem.
Melodi's issue dashboard that organizes human feedback into resolvable issues
3. Recognizing human feedback needs to be reviewed to unlock AI improvement
So now you have all of this great feedback — your customers clearly want better action items, and aren't sold on the summary. At this point, it should be easy to use that feedback to improve the model itself, right? Sadly, no.
Teams get tripped up all the time by the assumption the AI model can be automatically improved with feedback. The reality is that human feedback is noisy and it needs to be processed to find the signal that will unlock improvement.
What can go wrong: Collecting the feedback is just the first step in being able to diagnose the problem. But in order to actually fix the model, the feedback itself needs to be organized, sorted, and classified into a format that the AI model can then ingest programmatically. Without this second step, teams are often left making optimistic changes to their model and hoping that they fix the problem without breaking something else. This is often called “eyeballing it” or doing “vibe checks.”
What to do instead: In order to programmatically use the feedback to improve the model, teams need to filter the feedback by specific intent or task, then classify the issue, and then manually correct the LLM-generated data. This manual classification and correction step is a painful truth for building better AI products, but it unlocks few-shot prompting and automated regression tests for that particular issue to make sure it never happens again.
What does this mean for you?
Building great AI products isn’t just about collecting feedback - it’s about collecting the right feedback and integrating it thoughtfully into your workflow.
There are a number of ways to use these lessons when it comes to your own business. That's why we launched Melodi, a product specifically designed to help product teams become high-performing AI teams. We know it can be a challenge to drive real business growth with AI investment, and we wanted to make that process easier and faster.
Melodi streamlines the process of integrating customer feedback to unlock continuous improvement for your AI models. It grades AI performance by customer intent, identifies issues through real-time monitoring, provides diagnostics tools, and automatically evaluates new model changes based on custom and standard LLM metrics.
If you collect explicit feedback, collect context with that feedback, and build feedback processing into your workflow, you’ll be able to build an AI-powered product that provides real value for your customers. And that will bring us all one step closer to genuinely useful, profitable generative AI implementation.
If you're interested in learning more about how Melodi can turn human feedback into AI improvements, send us a note at info@melodi.fyi.