How to choose prompts for AI visibility monitoring

Prompts for AI visibility monitoring should not be collected at random. Otherwise, even a good tool produces noise: the model mentions the brand in one answer and omits it in another, but the dataset says little about demand or competitive gaps.

The goal is to choose ChatGPT tracking prompts that represent real decision scenarios. This guide explains how to build an AI search query set, track ChatGPT recommendations, and separate useful GEO keyword research from noisy phrases.

Two common mistakes at the start

Of the things that are most common when a team first forms a list:

Brand queries. Check "what is [brand name]", "[brand name]" services, "[brand] reviews". ChatGPT, as expected, gives out brand information. The report looks "green", but does not show anything useful — users almost do not put such requests before choosing a contractor.
Too general questions. "What is marketing", "how does advertising work", "why CRM". The models give a reference answer without brands. In such a picture, it is impossible to see a competitive gap.

A work pool is a balance between queries that actually lead the user to a choice and queries in which competitive dynamics are visible.

Kozak filters noisy queries and keeps a useful pool with real choice intent

Start with demand scenarios, not a list of phrases

Before writing specific wording, it is useful to describe situations in which a person generally turns to AI for a recommendation. That is, the question is not "what keys do we want to monitor", but "when the user really asks a model request".

The three main groups of scenarios are:

Scenario Type	What the User Is Looking for	Examples
Category	Type of business, product or service	"contextual advertising agency in Kyiv", "CRM for small business", "laser surgery clinic"
Comparative	The difference between the options	"HubSpot vs Salesforce", "which is better: Pixel or Samsung", "iHerb or Rozetka"
Problem-oriented	A way to solve a problem	"how to automate customer accounting", "how to reduce cost per lead", "how to prepare for knee surgery"

Separately, there is a fourth group — clarifying queries, which narrow the category by budget, region, audience, format ("CRM up to $20 per month", "pediatric dentist Lviv", "contextologist for B2B SaaS"). We add them to the basic three.

Principle: JTBD first, then phrases

JTBD (Jobs To Be Done) works well for AI monitoring. Instead of "collect keys through a tool", we record the work performed by the user, and only then translate them into requests.

Example for a CRM service:

Jobs: "choose a CRM that will suit a team of 8 sales", "understand if it is possible to migrate from Sheets without pain", "find a CRM that connects to [tool]", "understand how much it really costs to implement".
Queries: "CRM for a sales team of up to 10 people", "how to switch from Excel/Google Sheets to CRM", "CRM with integration with Telegram", "how much does it cost to implement CRM in Ukraine".

Prompts drawn from real customer conversations usually produce a more relevant view than keyword-planner exports alone.

Kozak turns JTBD, context, and user jobs into monitoring queries

How many requests to take at the start

A realistic benchmark for the first iteration is 40–80 queries. Smaller is a small sample, it is impossible to see repeatability. More — without automation, it is difficult to maintain quality and takes more time for analysis.

Niche size	Recommended starting base
Local business, category 1	30-40 requests
Medium agency / SaaS, 2-4 directions	60-80 requests
A large brand with multiple categories	100-150 queries, but grouped by segments

For the second iteration, the base is usually expanded by 1.5–2 times, removing what has shown itself to be noise.

4x4 Framework for the First Pool

If you don't have time to delve into JTBD, a simple grid helps out at the start:

4 category-level queries;
4 comparative queries;
4 problem-oriented requests;
4 clarifying queries for key segments/products.

The result is 16 requests. This is enough to see after the first run through ChatGPT, Gemini, Claude and Perplexity:

where the brand appears consistently;
in which topics competitors dominate;
what sources are repeated between models;
which pages of the site AI pulls up most often.

After the first iteration, expand the grid to 6x6 or 8x8 — adding language variations (for example, English-language queries), regions, and segments.

Where to get query ideas

Not out of my head. Working sources of ideas:

Source	What do we take from it
Google Search Console	Queries for which the site is already receiving impressions — especially those with high impressions but low CTR
Sales team calls	How customers really formulate their tasks is almost always not keyword phrases
Support tickets	Clarifying requests that are generated after selection — some of them are relevant for a problem-oriented pool
Reddit, Quora, specialized forums	Living language, real questions, pros and cons context
AnswerThePublic, AlsoAsked	Structured Question Options Around One Topic
Reddit threads + Google "People Also Ask"	Checking how questions sound in real dialogues
Competitive content	Comparison page titles and FAQ sections of competitors
Hints from ChatGPT/Gemini itself	The query "what questions do people ask before choosing X" is then filtered manually

A combination of 3-4 sources gives enough material not to "invent" queries.

Kozak gathers query ideas from GSC, sales, support, and forums into one prompt pool

What does a noise query look like?

Not every wording is useful for monitoring. A noise request usually has the following features:

there is no clear intention of choice ("what is marketing");
it is not clear why the brand should appear in it;
the model responds explanatory, without recommendations;
the request is not tied to topics in which the business really competes for demand;
too broad ("business in Ukraine", "technology").

Information queries should not be completely removed — they are useful as a second layer and show where your explanatory content sags. But as a basis for monitoring, they do not work.

How to adapt requests to different models

ChatGPT, Gemini, Claude, and Perplexity give different answers to the same query. This is normal — they have different citation logic, different access to the web, and different training base. A few rules:

Same basic wording for all models. Without this, it is impossible to compare.
Fixing the date and mode (with or without web search). This is part of the primary record.
Localization of language. Ukrainian brand — queries in Ukrainian; For a global product, a separate English-language pool.
Localization of the context. "Dentist" and "dentist Kyiv" are different queries with a different competitive picture.
Do not edit on the go. If a mention does not appear in Gemini, this is data, not a reason to "rewrite the request to make it work". We correct the request only from the new iteration.

How often to review the prompt pool

Monitoring is not "set it up and forget it". Realistic rhythm:

Once a month — revision of the base base: remove noise, add new wording.
Once a quarter — full review: add/remove scenarios, check the relevance of segments, update the reference group of competitors.
Reactive — after the launch of a new product, entering a new geography, the appearance of a strong competitor, a major update of models.

It is better not to touch the base between these points - otherwise it is impossible to compare the sections with each other.

How to Transfer a Request List to a Production System

A good pool alone is not yet beneficial. It becomes useful when there is a clear sequence next:

Run the requests through 3-4 patterns and fix the baseline.
Put together a picture: where the brand appears, where the failure is, who stands next to it.
Divide sources into four categories (own, editorial, catalogs, competitive).
Come up with hypotheses for SEO, content, PR.
After 4-6 weeks, repeat the measurement and compare share of voice.

If the database is collected correctly, the report ceases to be a list of answers and becomes a working map of the market.

Kozak moves a prompt through baseline, sources, hypotheses, and a re-run

Frequently Asked Questions

How many times to run the same request? A practical minimum is two or three repetitions per model on different days. Because models are stochastic, one run can produce an isolated result.

Can I monitor "full context" queries (long prompts)? It can be as a separate pool for research, but not as the main one. Users rarely formulate long prompts — most of the queries are short.

Do I need to translate queries for English-speaking models? If your audience is Ukrainian-speaking, the basis is Ukrainian. The English pool makes sense if you work for foreign markets or want to compare how the model behaves in another language.

What if the query gives different answers in two models? Write down both, do not choose the "right" one. The discrepancy is also data: it shows that the brand has different visibility in different AI systems.

Should I add competitive queries ("why X is worse than Y")? Yes, but be careful - the model may answer evasively. The best format is neutral comparisons ("what to choose X or Y for...").

What else to read

How we do it in VYDAI

Prompt selection determines whether monitoring produces evidence or noise. VYDAI therefore includes a dedicated onboarding step for the base prompt pool, with category-level, comparison, problem-oriented, and follow-up scenarios suggested from the niche and competitor set. The system then runs those prompts through ChatGPT, Gemini, Claude, and Perplexity while recording the date, model, and mode, so measurement periods remain comparable.

If you want to see how it works on your category, you can create an account or view demo. You decide which requests to take to the final database; We will be there and show where there is noise and where there is real demand.