Watermelon Guide
Step up your automation game with AI
Harness the power of AI and unlock your chatbots’ full potential.
Table of contents
How to create a good dataset
Building your own AI-powered digital assistant starts with creating a good dataset. If you want your chatbot to actually understand your customers, you need to teach your chatbot which questions your customers ask and how they ask them.
Here it’s a good idea to start with making a list of the frequently asked questions. Maybe you already have them on your website, but it’s also wise to discuss with your customer service team if there are any additional questions that people often ask them.
Once you have your list, there’s a good chance that you’ll see quite a bit of overlap in the asked questions. After all, there are many different ways people might ask the same question, or there may be different questions with the same or a similar intention. It’s useful to group these questions together, as we’ll need them in the next step.
The thing that will make your AI chatbot such a powerful tool is the fact that your chatbot will be able to understand hundreds of different ways someone might phrase their question. To provide your chatbot with the right context to learn from, we are going to teach the chatbot the frequently asked questions plus a couple of variations on those questions. In the previous step you created the groups of similar questions and these will be the start of your dataset.
A good dataset consists of a question your customers might ask and at least two variations of that question. The variants are different ways your customers might phrase the question. An example of a good dataset:
Question: Can I make an appointment with you?
Variant 1: I would like to plan an appointment.
Variant 2: How do I schedule a meeting?
In this example, we would like to make an appointment with a business. As you can see, we have three very different ways to convey the same intention. What makes this dataset good is that in each variant we use different words and lengths of the sentence. In this example we have an important phrase to convey the intention: ‘make an appointment’’. For this important phrase we have used different synonyms to increase the context for the chatbot; instead of using just ‘make an appointment’, we also use 'schedule a meeting’ or ‘plan an appointment’.
By using these synonyms, the chatbot can learn that these words convey the same intention and use this knowledge to automatically learn more combinations of different words. Now when someone types in any other variation on this same question, the chatbot will recognise the intention and will be able to link it to the correct response.
When creating your dataset, we advise you to think of at least two variants for each question. This way your chatbot has enough context to analyse and learn from. However, if you can think of more variations, it will be easier for your chatbot to expand its knowledge. An ideal dataset consists of a question with four or five different variations.