Breaking Down the Google Search Documentation Leak

Neil Patel
I hope you enjoy reading this blog post. If you want my team to just do your marketing for you, click here.
Author: Neil Patel | Co Founder of NP Digital & Owner of Ubersuggest
A graphic that says: "Breaking Down the Google Search Documentation Leak."

It’s all anyone in SEO is talking about right now.

A leak of more over 2,500 pages of internal Google API documentation has exposed thousands of potential Google algorithm ranking factors and blown the lid off Google’s most closely guarded secrets. 

While we still don’t know how much weight Google gives to each factor, this information is invaluable for SEOs and has several implications for the strategies my team and I will make going forward. 

So, what does the leaked Google document tell us?

You can find the files online if you’re willing to read them all, but my team and I have done the hard work for you.

Below, you’ll find our summary of the Google search leak, the key insights you need to know, and the subsequent actions you should take. 

Key Takeaways

  • The Google leaked document is a collection of 2,596 Github modules with 14,014 attributes.
  • Erfan Azimi, the founder of EA Eagle Digital, first shared the API documents with SparkToro’s Rand Fishkin, who roped in iPullRank CEO and SEO expert Michael King to analyze them. 
  • Contrary to Google’s previous statements, the documentation reveals that the search giant does use Chrome user data in rankings.
  • Various forms of click data, such as badClicks, goodClicks, lastLongestClicks, and unsquashedClicks, influence rankings.
  • Freshness, how recently a page was uploaded or updated, also plays a role in rankings according to the leaks.
  • The documentation suggests that PageRank and link diversity remain important factors in Google’s ranking algorithm, although Google has indicated that some of this information might be outdated.
  • The documentation suggests Google can identify authors and treat them as entities in the system.
    • Google stores authors associated with a document as text, and they look to determine if an entity on a page is also the author of the page. This shows us there is some comprehensive measurement of authors.
  • Google has unique ranking factors for several types of websites, including sites in the “Your Money or Your Life” (YMYL) niche, local websites, and product review sites. 

What Happened?

The internal Google API documents were first leaked on Sunday, May 5th, 2024, by an anonymous source who shared them with Rand Fiskin, founder and CEO of SparkToro. 

Rand vetted the documents by speaking with three Googlers. One didn’t want to comment. But the other two agreed the leak looked genuine and appeared to have happened accidentally when the documents were pushed to a public code repository. They were subsequently captured by an automated documentation service. 

Happy they were genuine, Rand roped in iPullRank CEO and  SEO expert Mike King to analyze the documents and draw out key insights.

The pair subsequently shared their thoughts (and the documents themselves) in two separate blog posts published on May 27, 2024. 

Rand’s post: An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them  

Mike’s post:The Google Algo Leak: Actionable Insights, Best New Practices, and Killer Tactics 

In the days that followed, the original source of the leaked documents came forward and is Erfan Azimi, founder of EA Eagle Digital.

Google’s Response

For several days, Google was silent. 

Google refused to confirm the leak’s legitimacy — a strategy that won’t surprise anyone familiar with the search giant’s comms strategy. 

That changed on May 29th, when Google finally confirmed the legitimacy of the leak through several prominent media publications, including Search Engine Land and The Verge. 

Google provided the following statement:

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”

Insights From the Google Search Document Leak

Okay. Let’s cut to the chase and give you what you came here for. Below, I’ll cover what my team and I think are the most critical insights from the Google leaked documents.

Chrome User Data Is Used in Rankings

Google has been adamant since 2012 that it doesn’t use data from Google Chrome to rank websites. 

But the Google API documentation leak tells a different story. Rand, Mike, and others interpret attributes like “Chrome_trans_clicks” and “ChromeInTotal” to mean that Google uses click data from Chrome to assess the quality and popularity of your website. 

Google may also use these attributes to determine your most important or popular links when deciding which pages to include in Sitelinks, the additional links that appear below some search results. 

A google search for Neil Patel.

What does that mean for your site?

It means a great user experience is indispensable. If Google is using click and browsing data from Chrome, you need to make sure users can navigate your site easily, find what they want, and stay on your site because of the quality of your content. The goal is to increase user engagement time while decreasing bounce rate.

Click Data and User Engagement Matters

Another thing Google has seemingly denied on several previous occasions is the use of SERP click-through data to guide ranking efforts.

That, too, appears to be factually inaccurate. Google seems to rely heavily on click data.

Those following Google’s antitrust trial with the DOJ) are already aware of NavBoost and Glue, two ranking systems that use clicks to improve or reduce search rankings. 

If testimony under oath wasn’t enough for you, the Google search engineering documentation leak confirms the presence of these two factors. It also shows the existence of several click-related modules in the documentation, including “goodClicks,” “badClicks,” and “lastLongestClicks.”

A Google Search Engineering Documentation Leak.

This suggests Google can filter the clicks they don’t want (badClicks) and include the ones they do (goodClicks). It also looks like they measure the length of clicks. No, that’s not how long someone holds down the left-click button. It’s whether users stay on the page once they click or return to the search result because they weren’t happy with the result. 

So, what does that mean for you?

If users aren’t happy with your content when they click through from Google, your content won’t perform as well on Google.. 

The solution is to create high-quality content that engages users and matches search intent. 


I recommend monitoring and improving metrics like click-through rates and time on page. If you see either metric falling or other engagement metrics like bounce rate increasing, then find a way to offer more value on that page or make it more relevant to searchers. 

PageRank

One of the most interesting things about the leaked Google document is the continued use of PageRank. 

Yes, the patents related to PageRank may have expired in 2019, but the documentation suggests that PageRank remains an important factor in Google’s ranking algorithm. 

To be clear, Google has indicated some of the information in the leak is outdated, and this may be one example. But it’s interesting to look at the attributes that mention it.  

 PageRank_NS (NS stands for Nearest Seed) is based on the theory that “Seed Sites” are a group of websites and pages that Google deems authoritative and trustworthy. PageRank-NearestSeeds attribute suggests that links directly from or close to a seed site may have a higher page rank. 

PageRank attributes in Google.

PageRank was previously determined by the quantity and quality of links pointing to the given page. Not all links were created equal though. If an authoritative website linked to your website, this was a higher vote of confidence and provided more link juice than a link from a non-authoritative website. In other words, high-quality backlinks are an integral part of your overall SEO strategy. 

The Google leak also sheds light on the importance of links and what constitutes a good backlink profile. 

Link quality, diversity, and freshness seem to matter most. The documents suggest Google wants to see a range of different sources linking to pages and uses the reputation of referring sites to judge quality. 

The documents also appear to show that low-quality (spammy) links will hurt your site.

My recommendation?

To enhance your site’s authority, continue to prioritize earning diverse, high-quality backlinks, ideally from fresh content. More importantly, though, remain adaptable to changes in Google’s ranking signals as more information becomes available. 

The Role of Authorship

Authorship is one of the more overlooked components of E-E-A-T, but it’s the only one covered in detail in the leak. 

The API documentation suggests Google can identify authors and treat them as entities within its system. 

API documentation on Google.

Specifically, Google stores authors associated with a document as text and looks to determine if an entity on a page is also the author of the page. This shows us there is some comprehensive measurement of authors.

In other words, you can improve your rankings by getting a reputation as a known and trusted author in your niche. 


You can get outside help here, though. Leveraging content creators with a strong online author profile may help with rankings. So, create detailed author bios that reference credentials and link to social media profiles. Make sure all previous posts each author has written appear on these bio pages, too. 

Special Treatment for YMYL Websites

We’ve known for a while that Google treats Your Money or Your Life (YMYL) content differently. Google first introduced the concept in its Search Quality Evaluator Guidelines (SQEG) and introduced an update in 2018 specifically targeting these sites.

If you aren’t already aware, YMYL sites are ones that impact a person’s future happiness, health, financial stability, or safety. Examples include:

YMYL attributes on Google.

The documentation shows Google has classifiers that create scores for YMYL content, notably health and news sites.

We don’t know how Google uses these classifiers or to what extent they affect your rankings. Still, it’s clear that if your site falls into a YMYL category, then you need to tailor your SEO strategies to meet the criteria outlined in the Search Quality Evaluator Guidelines. Specifically, that means building trust with users by creating credible and accurate content from an expert’s point of view. 

FAQs

What are Twiddlers?

Twiddlers are a ranking function that can alter initial rankings and demote content for several reasons, including user dissatisfaction signals, an exact match domain, and poor-quality links.

What learnings about authorship came out of the Google documentation leak?

The leaked Google documents show that Google stores information on authors and checks to see if they are the same person as website owners. This means you could improve rankings by improving the quality of your thought leadership or recruiting well-known authors in your space.

What information about backlinks came from the Google documentation leak?

Link quality, diversity, freshness, and relevance all matter. Google wants to see links from a diverse range of high-authority websites. It may ignore links from sites that aren’t relevant and it penalizes sites with spammy links.

Why is this documentation leak such a big deal in the SEO world?

These insights will be scoured by SEOs for insights they can use to improve their rankings and used by thought leaders and publications to prove that Google has lied to them in the past.

Conclusion

Nothing has shaken the SEO industry like this Google leaked document. Never before have we had a real look inside the internal workings of the Google algorithm.

It’s exciting, and there’s plenty we can learn to improve our SEO strategies. But there are still loads of things we don’t know, like which parts are outdated and how each element is weighted. 
So, while you should absolutely use the insights to improve your rankings, don’t forgo some of the core principles that will probably always matter, like user experience and quality link building.

Consulting with Neil Patel

See How My Agency Can Drive More Traffic to Your Website

  • SEO - unlock more SEO traffic. See real results.
  • Content Marketing - our team creates epic content that will get shared, get links, and attract traffic.
  • Paid Media - effective paid strategies with clear ROI.

Book a Call

Ubersuggest

Unlock Thousands of Keywords with Ubersuggest

Ready to Outrank Your Competitors?

  • Find long-tail keywords with High ROI
  • Find 1000s of keywords instantly
  • Turn searches into visits and conversions

Free keyword research tool

Neil Patel

About the author:

Co Founder of NP Digital & Owner of Ubersuggest

He is the co-founder of NP Digital. The Wall Street Journal calls him a top influencer on the web, Forbes says he is one of the top 10 marketers, and Entrepreneur Magazine says he created one of the 100 most brilliant companies. Neil is a New York Times bestselling author and was recognized as a top 100 entrepreneur under the age of 30 by President Obama and a top 100 entrepreneur under the age of 35 by the United Nations.

Follow the expert:

Share

Neil Patel

source: https://neilpatel.com/blog/google-leaked-search-document/