Preparing your content for uploading
Preparing your content before uploading this to your AI can make your training faster and more effective. Here's some guidance and top tips.
Understanding prompts and completions
First up, if you haven't already, take a look at our guidance on prompts and completions so you understand the training data you are creating.
Finding your content
Your knowledge exists in many places. Here are some places to look:
- Written content: books, articles, blog posts, website pages
- Social content: LinkedIn posts, Twitter threads, forum responses
- Speaking content: keynotes, YouTube videos, podcast interviews
- Teaching content: course materials, speaking notes, frameworks
- Client communications: emails, WhatsApp messages, FAQ responses
Selecting your best content
Your content needs to be clear and informative to generate good training data.
Useful content includes:
- FAQ documents
- Blog posts and articles
- Course materials with structured content
- Copy for your website about you and your work
Content to avoid uploading without transforming it first, include:
- Unedited podcast or video transcripts
- Text with long, complex sentences
- Data in tables or unusual formats
- PowerPoint files with just bullet points
Topics to cover
Start with the basics - Begin by uploading information about:
- Your background and experience
- What you offer clients
- Your coaching philosophy
- How you typically work with clients
Then add your specialist topics - Create training data for each main topic you cover.
For example, if you're a business coach, you might cover your approach to leadership development and how you help with team building.
Include specific scenarios - Think about particular situations your clients often face:
- Common client challenges
- Typical questions in your field
Situations where clients need support
Add general information too - Think about the general questions clients might ask your AI, such as:
- Who you are
- Where are you located
- What is your AI trained on
Top tip: Your AI is smart about matching similar questions. If you train it to answer "How can I be more productive?", it can also handle "What's the best way to get more done?" or "I'm struggling to manage my time".
Preparing your content
Some documents will be ready to upload straight away, such as blog posts and FAQs.
Some may need organising into an ideal format first:
- Simplify: Save any documents with complex formatting or lots of images as plain text.
- Clarify: Check the copy makes sense for your audience, and remove any unnecessary jargon.
- Split: Break long documents, such as books, into chapters or sections.
- Convert: Turn video and audio content into text transcripts.
- Save: Save/download/export in one of the required file formats (see below).
- Transform: Some types of content, such as transcripts, are best transformed into prompts (questions/statements) and completions (answers) before uploading. See below for how.
File formats
You can upload .pdf, .docx and .txt files in knowledge upload.
You can also upload .csv files of prompts & completions in training data.
Top tip: Using other file types such as Google Docs? No problem, nearly all programs allow you to download or export files in other formats. Download your file as a .pdf .docx. or .txt format ready for uploading.
Using AI to help transform content into training data
For content that isn't currently in an ideal format for uploading, you can get ChatGPT, Claude or another large language model to turn this into suitable prompts (questions/statements in the voice of your audience) and completions (answers in your voice) with just a little prompting.
Here are some instructions you can use with different types of content:
"Here’s a transcript of one of my YouTube videos. Turn it into a set of questions and answers as if I (the coach) were advising someone (the coachee). Phrase the questions in a way that my target audience of [describe audience] might write them. Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert transcript>"
"Here’s a transcript of one of my podcast episodes. Turn it into a set of questions and answers as if I (the podcast host) were advising my audience. Phrase the questions in a way that my target audience of [describe audience] might write them. Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert transcript>"
"Here’s a transcript of an interview where I [your name] was interviewed by [interviewer name]. Turn it into a set of questions and answers as if I [your name] were advising my audience [describe target audience]. Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert transcript>"
"Here’s a transcript of a talk I gave. Turn it into a set of questions and answers as if I (the coach) were advising someone (the coachee). Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert transcript>"
"Here’s an anonymised transcript of a client coaching session between me (the coach) and my client. Turn it into a set of questions and answers I can use for training the AI coach version of me. Display these in a two-column table where column 1 is named "prompt" (the client's question/statement), and column 2 is named "completion" (the response from me, the coach). Do not include any personal, identifying or sensitive information at all. Do not reference the transcript or coaching session. Remove greetings, filler words, rambling and small talk - only include the meaningful coaching moments that would be helpful for future client interactions.<insert transcript>"
"Here are some emails I wrote. Turn them into a set of questions and answers as if I (the coach) were advising someone (the coachee). Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert content>"
"Here’s some text from my LinkedIn profile. Turn it into a set of questions and answers as if I (the author) were answering questions about myself. Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert content>"
"Here’s some text from my social media posts. Turn them into a set of questions and answers as if I (the author) were answering questions from my audience. Display these in a two-column table where column 1 is named "prompt" (the question), column 2 is named "completion" (the answer).
<insert content>"
"Here’s a copy of my frameworks and methods. Turn this into a set of questions and answers where the questions are what my audience [describe target audience] might ask if they want to learn my specific systems and steps, and the answers are explanations from me, the coach. Include questions about different aspects and make sure my answers enable my audience to understand and apply the methods. Display these in a two-column table where column 1 is named ‘prompt’ (the question) and column 2 is named ‘completion’ (the answer).
<insert content>"
"Here is a list of resources I would like my AI to refer clients to when appropriate. My target audience is [describe target audience]. Turn this list into a set of questions and answers where the questions are phrases a client might naturally say when looking for help or recommendations, and the answers are short, helpful responses from me - the coach - that signpost clients to the relevant resource. For example: “For help with [topic], you could look at [resource] at: [link].” Display the results in a two-column table where column 1 is titled 'prompt' (the question or statement), column 2 is titled 'completion' (the answer).
It may be appropriate to include multiple prompts and completions for the same resource to reflect the different situations when this could be relevant to mention.
<insert content>"
Important! Before turning transcripts, communications, or any other documents into training data, remove anything that could identify an individual or organisation, including names, contact details and other personal information.
You could also prompt the AI to add labels to the table. Labels can help you when reviewing your training data, but are not essential as they are not used by your AI.
Optional addition for labels:
"Now add a third column to the table titled 'labels' and give each row a suitable one or two-word label that summarises the theme of the question and answer pair. If multiple labels apply, separate them with commas."
Further optional instruction:
"Here's a menu of labels you can use: [Add label list]"
Uploading a CSV file of prompts and completions
Once you've transformed content into a table of prompts and completions, you can upload a CSV file of these straight to your training data - Instructions here.
Further guidance
Watch the following video for more guidance on optimising your training data.