Where does ChatGPT get its information, and which websites does it cite? Teams often focus on the answer text, but the cited sources are usually more actionable. They show which pages, media outlets, directories, or discussions supplied the visible evidence.
Looking only at wording can lead to a vague conclusion such as "we need more content." Source analysis shows whether the gap is on your website, in third-party coverage, or in technical discovery. Below we explain how to check ChatGPT sources, Perplexity citations, and Google AI Overviews sources and turn the findings into an action plan.
When source analysis applies
The first thing to understand is that not every AI response has sources that can be analyzed.
| Model Mode | Are | Sources Shown What to Do |
|---|---|---|
| Answer with web search (ChatGPT Search, Perplexity, Gemini, Google AI Overviews, or Claude with web search) | Usually includes links or citations | Primary material for source analysis |
| Generative response without web search | No, there are no citations | We do not "finish" sources manually. Either we repeat the query in the search mode, or we analyze the organic search results separately |
| API call without tools | No | Same — without webgrooming, the sources are not available |
The rule is simple: we analyze only what the model has clearly shown. If there are no links in the interface, you should not reconstruct them "logically" — this is a quick way to false conclusions.
Why Sources Are More Important Than the Answer Itself
Teams often get stuck on the wording: "we were not named — we need more content." This is a superficial conclusion. The deeper layer is in the sources, and it gives several levels of information at once:
- What the model relies on. Which page types it treats as sufficiently authoritative in your niche.
- Who forms the context. What media, catalogs and ratings have become "reference books" for the model.
- Where your brand is. If the domain is strong, but you are not there, this is a specific task, not an abstract "PR".
- Which format works. Comparisons, FAQs, curated lists, and case studies that recur in citations deserve closer analysis.
- What works for competitors. You can see not the general "they are stronger", but specific sources that bring them mentions.
What Types of Sources Does AI Quote Most Often
No model publishes a complete list of "favorite domains", but according to public research of the SEO industry (Ahrefs, Semrush, BrightEdge, Authoritas, sparktoro), several patterns are consistently repeated in 2024-2025. Let's summarize them in a table.
| Type of source | Why does the model love it | Example |
|---|---|---|
| Great encyclopedias and reference books | High credibility, structure, fact-checked | Wikipedia, specialized encyclopedias |
| User discussions | Natural language, real questions, and the context behind pros and cons | Reddit, Quora, and specialist forums |
| Independent ratings and selections | Ready-made list of options - the model easily transfers it in response | "Top 10", industry ratings, GoodFirms, Clutch, G2, Capterra |
| Industry media | Expert context, freshness, links to specific brands | AIN, MC.today, dev.ua, Forbes Ukraine, Mind in the local context |
| Official brand websites | Verifiable facts, specific figures, and product details | Service pages, FAQs, and documentation |
| YouTube and podcasts with transcripts | Reviews, comparisons, demos | Channels with reviews, interviews CEO |
| Review platforms | First-hand user feedback | Trustpilot, Google Reviews, OtzyvUA, and specialist platforms |
An equally important layer is formal partnerships that directly affect citations:
- Google and Reddit announced a partnership for access to data in February 2024 — part of Google's AI features actively uses Reddit content (Reddit press release, February 2024).
- OpenAI has licensing agreements with a number of publishers: News Corp, Axel Springer, Le Monde, Vox Media, The Atlantic, Time, Financial Times. Content from these publishers may be used in ChatGPT and Search responses (OpenAI Blog).
- Bing/Copilot explicitly recommends that content is open to crawlers and structured data (Bing Webmaster Guidelines).
There are no guarantees here. But the chances of seeing these sources in the answers are higher, so the analysis should almost always start by checking whether the brand is represented in these categories.
If you want to look at a live example of such a map, open a study how ChatGPT recommends smartphone brands in Ukraine. It clearly shows how the answers mix big media, retailers, official brand pages, and user discussions.
Where to look for sources in different models
The interfaces are different, so a short guideline for where exactly to get data for analysis:
| Model | Where to find sources |
|---|---|
| ChatGPT with web search | Link icon next to paragraph + "Sources" block at the end of the answer |
| Perplexity | Numbered Quotes in Text + Sources panel on top and right |
| Google AI Overviews | List of Cards with Right/Below Answer Card |
| Gemini in Google Search | "Show sources" button / link icons |
| Claude from web search | Inline citations with links, block with sources |
| Bing Copilot | Numbering [1], [2]... in-text + link bar |
Record more than the URL: the check date, exact prompt, model and mode, and where the citation appears in the answer.
How to Build a Source Map
A single URL almost never explains the big picture. Therefore, in the first step, we assemble the map, and only then go into details.
It is convenient to divide the sources into four categories:
- Own — pages of your site.
- Editorial external — media, reviews, articles, ratings.
- Catalogs and aggregators — Clutch, G2, specialized directories, marketplaces.
- Competitor sources - competitor pages that the model cites directly.
The distribution immediately shows where you already have a foothold and where the failure is. If 80% of answers cite editorials that do not contain your brand, this is a clear task for PR, and not "we need more articles on the site".
What to look for in each source
Once the map is assembled, each key source should be checked according to five parameters.
1. Domain recurrence
The same domain in multiple responses is a signal that the model has already taken it as a pillar in the niche. Let's see:
- which domains are repeated most often;
- which query types they appear in (category-level, comparison, or problem-oriented);
- whether it is editorial media, catalogs, or websites of the brands themselves.
2. Page type
It is not the domain itself, but the page itself that determines why it was cited. Most often, the answers include:
- review articles ("what is it", "how to choose");
- comparison ("X vs Y", "Top-N...");
- category pages;
- FAQ;
- service or product cards;
- cases and research;
- catalogs and ratings.
The conclusion from this is specific: what format should be added to the editorial policy for you.
3. The role of the source in the response
Not all citations play the same role. Divide sources into two groups:
- Formative — without them, the answer would most likely be different. These are the sources on which the recommendation itself is based.
- Confirming — the model uses them as an additional support, but the recommendation has already been formed.
Insights are usually given by formative sources. Confirmatory ones are useful as a second layer.
4. Presence of the brand at the source
A strong domain in the answer is only half the story. The second half is whether your brand is there and in what context:
- the brand is present at all or not;
- this is a direct recommendation or in passing;
- the mention is strong (with justification) or accidental (one line in the list);
- it is clear why exactly the brand got into the material.
5. Recurring pattern in competitors
If two or three competitors consistently appear in the same type of sources, this is not an accident, but a pattern. It must be understood and solved: repeat, bypass the other side, or build up your reference group.
Quick filter: is this source worth pursuing?
Not every URL deserves an hour of analysis. To avoid wasting time, ask three questions:
- Does this domain appear in multiple answers (rather than one)?
- Is there a competitor here or are you in a strong context?
- Do you see a specific action for SEO, content, or PR here?
If the answer to all three is "no", the source is secondary. If at least one "yes", we add it to the working pool.
How to Turn Source Analysis into Action
The most common mistake is to look at the sources, agree that "it's interesting" and do nothing. To avoid this, it is useful to have a ready-made "signal → action" table.
| What we see in the sources | Diia |
|---|---|
| AI cites industry media where we are absent | PR publications, expert columns, interviews, and comments for journalists |
| AI often uses Reddit or specialist forums | Audit relevant discussions, contribute expert brand responses, and run AMAs where appropriate |
| AI cites ratings and selections (Clutch, G2, "Top 10") | Apply to relevant listings, update profiles, publish case studies, and collect customer reviews |
| AI cites competitor pages with a strong comparison | Create a more useful comparison with tables, selection criteria, and case studies |
| AI cites a competitor's own pages (FAQs, guides, documentation) | Cover the same information need with useful guides, FAQs, and a glossary on your site |
| AI cites a directory where our profile is weak | Update the description, add case studies, and ask customers for detailed reviews |
| AI does not cite any sources and responds without grounding | We work with the training layer: PR, Wikipedia, stable external mentions on the horizon of 6-12 months. |
Such a table removes the moment "it's interesting, but it's not clear what to do about it."
Common errors in source analysis
From what we often see in teams:
- Draw conclusions from one answer. The model is non-deterministic — the same query can give a different set of sources. We look at at least 3-5 repetitions.
- Analyze only one model. ChatGPT, Gemini, Claude, and Perplexity have different citation logic. The picture is honest only at the intersection.
- Confuse ranking and citation. Just because a page is in Google's top 3 doesn't mean that it will be cited by AI Overviews. And vice versa — sometimes pages from 2-3 pages of search results are found in the answers.
- Look only at your domain. Power in the map: where we are, where are the competitors, where are the empty.
- Analyze once and for all. The source pool of models changes. A realistic inspection horizon is once every 4-6 weeks.
Frequently Asked Questions
Why do different model responses have different sources for the same query? This is expected. Models are stochastic, and web search depends on the current result set. Analyze recurring patterns rather than one screenshot.
Should I re-optimize the page for "AI citation"? There is no special magic. Basic things work: clear structure, titles, answers to specific questions, relevance, structured data (Schema.org), accessibility for crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot).
Can models be banned from quoting your site? You can do it through robots.txt and the corresponding user-agent. But this is the same mechanism as "not appearing in answers". For most businesses, it is more profitable to be open.
What if AI cites outdated brand information? Check which URL it is taken from. If this is your site, refresh the page and wait for reindexing. If it is an external source, contact the editorial office with the factual right to edit.
How many sources can be analyzed manually? Approximately, up to 50-80 URLs per iteration. Further, without automation of monitoring, repetitiveness and dynamics are lost.
What else to read
To analyze the sources in the context of a complete diagnosis:
- How to Understand Why AI Recommends Competitors
- Which pages of the site are most often included in AI responses
- How to Turn AI visibility report into a plan for SEO, content, and PR
- What Is AI Visibility And Why Businesses Are Not Enough Anymore SEO
How to collect sources systematically
Manual analysis works for a dozen prompts and two models. Beyond that, recording dates, storing screenshots, exporting tables, rerunning checks, and grouping domains becomes difficult. A month later, the team may have hundreds of URLs but no coherent view of the pattern.
VYDAI does this part automatically: it runs your queries through ChatGPT, Gemini, Claude, and Perplexity, saves all cited URLs, groups domains, shows the repetition rate and competitors nearby. As a result, instead of a sign with links, there is a map of sources with which you can immediately go to SEO, content, and PR teams.
If you want to see what such a map looks like on your themes, you can create an account or view demo. What to do with this card next is up to you; We will be there and show the logic of decisions.