Unveiling Google’s Secret Influence on SEO and Data Collection: A Story of Power and Deception – Part 1 of 3

In the ever-evolving world of technology, Google has long stood as a dominant force, shaping how we search, shop, and consume information. But behind the scenes, the tech giant has been engaging in practices that raise significant questions about its transparency, ethical conduct, and influence. This article unveils how Google has manipulated its monopoly, particularly in the SEO industry, and how it uses its products to gather vast amounts of data — often without user consent. Now, the US Department of Justice has announced that they might be breaking up the different parts of Google, including breaking Chrome and the Google Play Store away from Google Search.

Much of the testimony in the DOJ trial has been behind closed doors, but this result certainly begs the question, ‘what are we not being told about what Google has been doing?’ This is the first article in a 3-part series that is designed to explore a new concept of how Google may be getting some of their crawling and ranking data, and what the implications might be. This first article will give high-level assumptions about the theory, the second article will delve into the details of the theory,  and the third article will discuss the potential implications and next steps

Last week I published a video that outlined a new understanding of Google’s Mobile-First Indexing process -which just actually completed after 6+ years; my best estimate of how Google’s Crawling, Rendering and Indexing systems work. In addition to describing how this new model for understanding worked, I presented a number of circumstantial evidence and convenient coincidences that led me to my conclusion. What is most shocking to me, is that much of the evidence was ‘hiding in plain sight’ in the form of Google employee quotes, official statements, announcements, documentation and in their various product Terms and Conditions.


Phase II of Google’s Mobile-First Indexing is just Chrome

I’ll point out here that the visual theme of the presentation was UFO’s, and this was quite intentional. I used this theme to preempt the detractors who I knew would call this new model of understanding a crazy conspiracy theory. I am quite used to this type of feedback, because it has been a mainstay of my career, which has been marked by a variety of ideas and theories that were originally deemed to be crazy, impossible or at least ‘of no consequence,’ but that have all been proven out over time.

I was the first in the SEO space to focus on mobile SEO, when leaders in the industry said that it didn’t exist and would never be important. Before it even had a name, I was talking about using style sheets to format one page to work on both mobile and desktop – a concept now known as responsive design. I was also early in talking about the importance of optimizing mobile apps for search as a larger SEO brand strategy; I was very early in talking about the importance of entities for SEO and the first to spot what I called ‘Fraggles’ but what Google later announced as ‘Passages.’ In all of these cases, my ideas were called into question or sometimes even ridiculed – often by the same people who are still my detractors today. Happily, they have been wrong each time.

This article series will add detail to the concepts presented in the video and discuss some of the finer points that could not be covered in the video. It will also address some of the concerns and pushback on the concepts presented; while most people received the ideas with enthusiastic curiosity, some were concerned that there was not enough evidence to support the new concept. As much as possible, those concerns will be addressed here. 

It must be noted though, that at least for the purposes of this discussion, we won’t be able to take Google at their word – especially in the justification or explanation of their data collection behaviors. In recent years, Google has proven themselves to be disingenuous in their communication – even at the highest levels, in court with  the US Department of Justice; so we can’t expect that their communication to the rest of the world has been any more accurate or honest. Google often  justifies much of what they do with  concepts like ‘page performance’, ‘load time’, ‘user experience’ and ‘search quality’; but I contend that these goals are likely only part of the reasons Google makes many of their decisions; Data that they collect can be shared across all of their platforms, and have many uses – not just the ones that Google openly specify.

The last thing that I want to highlight before jumping into the details, is the idea that the ‘proof of abuse’ by Google is not even necessary in this discussion; What is more important is the ‘proof of potential for abuse,’ which is currently going somewhat unchecked because of lack of skilled oversight and regulation. While the DOJ is evaluating some abuses, it often seems to focus on the most superficial and simple abuses – avoiding the more technical discussions. The unfortunate truth is that Google/Alphabet and the companies under it control so much data, access and information around the world – that even the *potential* for the deceptive practices that I describe here should be enough to warrant concern, push-back and further investigation.

Google has Proven Themselves Untrustworthy in the Most Consequential Audiences

Court testimonies and legal investigations have further exposed Google’s business and legal tactics, including actively hiding and destroying evidence in ongoing lawsuits. Judge Donato described Google’s deceptive practices in court an egregious violation, and called Google’s behavior in the trial a “frontal assault on the fair administration of justice,” underscoring the gravity of the issue.

This raises a fundamental question: Why should we trust what Google tells us? The DOJ’s findings reveal significant patterns of deception, not just in evidence preservation but also in secretive courtroom testimonies, where Google has consistently shaded the truth to cover up its true practices. Why do we in the SEO community think that Google would be any more forthcoming to digital marketers – who have historically tried and succeeded in manipulating their algorithms to benefit the websites of our choice? And who have consistently made their job harder with any knowledge and understanding that we gain. 

The truth is, Google is not an accurate narrator, especially when it is communicating to digital marketers and the SEO community. For years, they have used us as an unpaid army, to spread their wants and needs for specific types of website formatting to our clients, with the implied promises of a ranking reward. In reality, we were making code easier to crawl, index and process into topic models for Google’s AI systems. There may be nothing wrong with that, except when the result is a Google AI system that uses our better formatted website data to replace links to websites with unlinked and uncited answers that have only been made possible by the well-formatted data that Google crawled and processed.

While complaining about Google rankings has been common for years, it seems that the complaints have never been so wide-spread or problematic as they have been starting with the launch of the first Helpful Content Update, which removed a large number of small publishers from the Google Index entirely. The abuse does not end there though – Because there are accounts of sites that de-indexed in the Helpful Content Update, but that were still quoted, nearly verbatim, in a Google AI Overview without any link or attribution. In other cases, sites are still ranking, but their content is almost out-right copied and shown in an AIO without attribution but out-ranking the original content. In other cases, sites that got hit by the Helpful Content Update will not show up in AI Overviews that specifically mention the brand name, and instead, related sites that mention them are shown. Actions like this go beyond abuse to outright theft, but small publishers have almost no recourse. A small business that has recently lost a main source of revenue would have little chance going up against Google and their well-trained teams of attorneys in court, and Google knows this.

Research and court cases have shown that Google intentionally gives preferential rankings to their own properties – especially when they are well monetized like YouTube, or provide deep user data, like Maps and now the partially Google-owned Reddit. It is with this new understanding of Google’s moral turpitude, that I began reevaluating the basic understanding that most SEOs have about the way Google crawls, renders, indexes and ranks web content. This time, I focused on what may have been left unsaid, especially when it would create privacy concerns for users or provide a potential benefit for Google’s advertising and/or AI models. 

The Hidden Role of Clicks and Engagement in Google’s Algorithm

Evidence that came from the Department of Justice (DOJ) case against Google and other sources have shed light on practices that many in the SEO industry have suspected but could not prove until now – that Google has been leveraging Chrome click and engagement data as a core part of a three-part model in their algorithm. Chrome has the largest market share of any browser world wide, capturing about 65% of the market on mobile and desktop. For years, Google has downplayed or outright denied the role of clicks and user engagement as ranking factors in its algorithm. Google refers to this as a “proxy of engagement,” yet they’ve gone to great lengths to obscure this information from both the public and the SEO industry.

The recent Google leak, uncovered by prominent SEO experts like Mike King, Dejan Petrovik  and Rand Fishkin further proved this out, highlighting specific click-based ranking alterations like ‘NavBoost’. Interestingly, some of these signals have been visible for years in Chrome’s Histogram tracking. In Histograms (chrome://histograms), Chrome visibly tracks every click, page load, scroll depth, and even when auto-filled forms and credit card fields are used — in both regular and Incognito browsing modes. This data, gathered without explicit consent, from users on both mobile and desktop versions of Chrome without explicit consent, is presumably being fed back into Google’s algorithms to optimize search rankings, advertising models and other machine learning systems.

https://youtu.be/txNT1S28U3M?si=NV4ZHQP6oLM9LncN&t=206

Some have pushed back on the idea that Google is using the histogram data for anything other than debugging, or that they may not use it at all, but this seems naive at best. We can see information for Core Web Vitals, Google’s webpage loading and performance utility being actively tracked in histograms – even Credit Card Auto fills in Incognito mode. It seems reasonable if Google is tracking some of this data, they are likely tracking all of it; and even if it is not all used in the ranking algorithm, it could be used for other things like conversion tracking for ads and MUM Journey modeling, cohort modeling or UX behavior & conversion modeling – likely similar to what is offered in User Flow testing shown below..

Mobile-First Indexing: More Than Just a Crawling Change

The core of the video reviews a new theory about how Google’s Mobile-First Indexing actually works. The introduction of Mobile-First Indexing in 2016 marked a pivotal shift in how Google crawled and indexed the web. They pre-announced the transition for a full year; this shift went beyond the need to not block CSS and JavaScript files for bots, which was part of the previous ‘Mobile-Friendly Update’ from 2015; it stressed the importance of having pages that were well-formatted for mobile browsers, ideally with the use of Responsive Design, which uses one URL with various style sheets (CSS) to work well on both mobile and desktop screens. It began rolling out in earnest in 2018

Historically, SEO’s had understood crawling, indexing and ranking as three separate processes that Google used in its evaluation of websites. With the launch of Mobile-First Indexing, they introduced ‘rendering’ to the process. Rendering is the process of executing JavaScript and applying CSS styling to actually create the visual representation of the page.

The problem was, Google botched the launch, confusing and conflating crawling, rendering, and indexing processes at different points in the launch; they called it Mobile-First Indexing, which would make it seem like the change was about how content was indexed – essentially, how it was organized and stored in Google’s database. But they described it as a change of the crawler from a desktop-based crawler to a mobile-based crawler, which made it seem like it was about crawling; and then, after the launch, all Google talked about related to Mobile-First Indexing was rendering, and how their systems now needed a second phase of crawling to do the rendering. 

With this major release, Google also announced that they would be switching from crawling with an older version of Chrome browser emulation (Chrome 41) in the bot, they would now crawl with what they called the ‘Evergreen Bot’ which would stay up to date with the version of Chrome that was live -within a day or two. While SEO’s will still militantly assert that crawling, indexing, rendering and ranking are all different and distinct processes, it really seemed to me like crawling, rendering and indexing had been combined into one two-part process where they were combined to some degree. Nevertheless, SEO’s believed that this new process for Mobile-First Indexing was mostly about switching to a mobile crawler, and two-phase rendering.

What I noticed, after the launch of Mobile-First Indexing was that Google was able to make more small algorithm updates in a shorter period of time, and that Google all of a sudden had a better understanding of topics, or what we call ‘entities’ in SEO. This observation was strong enough that I re-cast Mobile-First Indexing as Entity-First Indexing, since Google had actually been using a smartphone bot as their primary crawler for years when Mobile-First Indexing launched, and since the impact of the update seemed less about Mobile formatting, and more about topic understanding. In a talk at MozCon 2018, I called it ‘A Whole New Google.

My suspicions grew even stronger when I found this quote from Ben Gomes, a long-time Google Search architect, and one of the main people credited with building the Mobile-First Indexing system. In an interview with FastCompany’s Harry McCracken from 2018, [Ben] Gomes framed Google’s challenge as “taking [the PageRank algorithm] from one machine to a whole bunch of machines, and they
weren’t very good machines at the time.” This seems to be talking explicitly about a distributed system – which will be discussed more at length in the next article in this series.

Looking back now, it seems more obvious that Google was only keeping their Evergreen Bot up with Chrome releases and understanding topics/entities better because Google was using our local installations of Chrome for the second phase of crawling – AKA rendering, at least to some degree; and likely that they were also using our local compute power to pre-process the website data to organize it into topic models for entity understanding – what Google calls The Topic Layer; And it is this theory that changes our understanding of how Google really works.

I hope this article has started to show you why Google would want – or indeed need – to use users’ Google Chrome instances to aid their crawling efforts. I’ll carry on in the next installment to explore more concepts around local JavaScript execution, and the makeup of the Chrome application contents itself on Windows devices. In the final article in this series, I will highlight implications, because at the very least, I think Google has questions to answer.

Tags: , , , ,