SEO – Content Gap Analysis on Massive Sites

|

Let’s do some SEO. ?️‍♂️

Recently we were requested to provide a content gap analysis for a website containing over 1.8 million indexed pages in Google.

I know what you’re thinking.

Holy SH*T.

Try exporting keyword rankings, top URLs, parent topics, etc. out of Ahrefs with an Agency plan.

Good luck.

That wasn’t going to work.

In the enterprise field, simple requests like…

“Would you provide a content gap?  We’re out of ideas.”

Turns into hours of rabbit-holing through different scenarios.

Check out this sweet diagram I made for visual reference.

Because of this, we’re constantly testing new variables, strategies, techniques, and variations of our current processes.

Things become… ridiculously efficient.

It’s a beautiful thing, my heart flutters just thinking about it which is why I wanted to share this with you all.

Enterprise-Level Difficulties

The number one issues you’re going to face in an enterprise-level environment is the large scale of difficulty in data interpretation.

Some things won’t necessarily be difficult as a concept, like a simple content-gap analysis.

For a 1,000 page blog, I could simply open Ahrefs, plug in some competing domains, and run the content gap there.

Ahrefs would tell me everything that these competitors are ranking for that my clients site isn’t.

Done.

It would take me 15-20 minutes tops, but that doesn’t always work.

We’re going run into issues like search intent, where many of the recommended keywords Google just isn’t associating with topics we’ve already covered.

Even IF we’ve already covered the topics in another similar topic.

For example, these 3 search queries yield 14/30 similar results.  Meaning there are over 50% of the results for these pages not being identical despite how closely related the intent might be.

how to paint a golf cart
how to paint a golf cart body
how to paint a golf cart roof

All three of these queries share 6 out of the 7 words of the searched phrase.

That’s 85.71% similar.

Strange to think how 15% of long-tail phrases could be enough differentiation to show 50% different results from Google right?

This same theory applies at a mass scale, which is what makes content gaps very difficult at an enterprise level.

So… how do we do this?

Enterprise-Level Content Gap Analysis

Step 1 – The Index Page

In the first step to every new encounter, we typically like to create an index.

An index page would be almost identical to a project brief but much more simple and less description.

For example, if we were to run this enterprise-level content gap on a well-known golf website.

Our index page might look like this…

Column A would represent the brand name.

Column B would represent clustered groups of topics like “Questions”, “Industry Terms”, “Pro Golfers”, “College Golfing” etc.

Lastly, Column C would represent a link to the additional tab where the magic happens.

Let’s start with B2, which is a simple “how” based modifier to our head query “golf.”

P.S. if you don’t follow anything specific terminology or acronyms, just leave me a comment below, I’ll clear it up.

Step 2 – Research

In step 2, we’re going to open Ahrefs and run through some research.

You know, let our brain wander a bit.

Heading over to keywords explorer, we’re just going to enter in our industry term “golf.”

If we select “phrase match” we’re looking at over 3.8 million keywords.

Don’t panic, we’re going to break this down quickly.

After choosing questions asked, we’re down to ~217,000 results.

This number is still unreasonable, we need to identify 100 potential topics out of these 217,000 keywords without worrying about suggesting duplicate topics that Ahrefs or Google hasn’t caught.

That’s the key with enterprise, further and further down the rabbit hole.  Filter after filter, you’ll find what you’re looking for.

The process looks something like this.

With the second row here being “Ahrefs Questions Asked” being enough for most sites, the following 3rd and 4th row represent the necessary steps for enterprise.

With the dark purple color being the route or path we chose, you can easily see there are many other variations that would probably yield decent results.

Our goal is to identify variations and choose the best one then test later for efficiency gains.

Step 3

Apply the next filter.

In this filter, we’re going to be applying the word “How” which is part of our Topic Clusters – Question within the index page that we’ve laid out prior.

After including this keyword, we’re down to 104,345 keywords.

Still too many.

How about adding in an additional filter to only show keywords with ~50+ search volume?

Now we’re talking.

Down to 1,036 keywords.

Perfect for an export.

Instead of volume, you may also add an additional modifer like “cart” where we know “cart” and “golf” are commonly together.

Or “hole” which yields 917 results.  We’re saving “hole” for later as this would fall under out “Industry Terms” scrape list.

Step 4

So we’ve made it this far, what now?

Easy, we repeat the same process but pulling our target clients existing keyword rankings.

Apply the filters and consolidate the lists.

Now you have two lists comparing against themselves.

List 1) Your filtered keyword list using “how”, <50 search volume, and contains a question.

and List 2) Your filtered keyword rankings for the client site containing “how.”

Step 5

Step 5 is simple.

Apply a conditional format duplicate filter that highlights the cells red.

Now, any keyword topic containing “how” is going to be unique and has not been covered according to Google.

See the below photo for the example.

Rinse and repeat for any variable you may think of that might fall within a topic cluster on the index page.

In about 1 hour of thinking, and 20 minutes of action, I was able to conjure up over 100 questions & phrases that might turn into high-ranking blog pages on client approval.

Summary

Think first, act second.

You’ll be surprised at how much time you’ll save and the results speak for themselves.

And again, this is one variation to this technique.  I implore you to explore options out of your own creativity and don’t be afraid to share.  There’s more than enough to go around. ?