The Entity & Language Series: Entity-First Indexing with Mobile-First Crawling (1 of 5)

Voiced by Amazon Polly

By: Cindy Krum 

Mobile-First Indexing has been getting a lot of attention recently, but in my mind, most of it misses the point. Talking about Mobile-First Indexing only in terms of the different user-agent seems like a gross oversimplification. It is very unlikely that Google would need more than two years just to change the user-agent and viewport of the crawler – They have had both a desktop and mobile crawler since 2013 (or earlier if you count the WAP crawler), and Google has changed the user-agent and view-port of the primary crawler before, multiple times with minimal fanfare. Sure, Google is now using a different crawler for finding content to index, but my best SEO instincts say that Mobile-First Indexing is about much more than the different primary user-agent.

From what I can see, Google’s change to Mobile-First Indexing is much more about an entity classification and translation than it is about a different user-agent and viewport size for the bot. I believe this so much that I have started calling Mobile-First Indexing ‘Entity-First Indexing’. It is much more accurate and descriptive of the challenges and changes that SEO’s are about to face with Mobile-First/Entity-First Indexing. This article will focus on what the change to Entity-First Indexing means, plain-sight signals that ‘Entity-First Indexing’ is already underway, and how the change will impact SEO in the future.

This is the first in an article series that will dive much deeper into how Google understands languages and entities, how they use them in indexing and algorithms and why that is important for SEO. It will review what entities are and how they interact with language and keywords. Then it will speculate on how organizing their index based on entities might benefit Google, how they might have accomplished it during the switch to Mobile-First Indexing and how device context might be used in the future to help surface the right content within an entity. It wraps up with a discussion of what can go wrong with indexing based on entities, and what Google has said on the topic of Mobile-First Indexing.

The next article in this series will focus on the tools that Google used to break down the languages of the web and classify all the sites into entities, and then subsequent articles will focus on research that we completed that show how entity indexing works in different linguistic contexts, based on the different Google APIs that are used, and how those impact Google’s Entity Understanding.  Finally, the last article in the series will focus on how individual phone settings and search conditions like GPS location can impact query results, even when the query does not have a local intent, like a query for a local business might. 

Entity Understanding & Understanding Entities

Historically, Google’s reliance on links and keywords as the primary means of surfacing content in a search has eschewed the idea that the world had some larger order, hierarchy or organizing principle than language, but it does — it has entities! Entities are ideas or concepts that are universal and exist outside of language. As Dave Davies describes, in an excellent article about one of Google’s patents on entity relationships, “an entity is not simply a person, place or thing but also its characteristics. These characteristics are connected by relationships. If you read a [Google] patent, the entities are referred to as ‘nodes,’ and the relationships as ‘edges.’”

With that in mind, Entity Understanding is a process by which Google strives to understand and organize the relationships between different ‘nodes’ and ‘edges’ – or more readily, different thoughts, concepts, ideas and things and their modifying descriptors. They organize them into a hierarchy of relationships that is roughly what we all know as the Google Knowledge Graph. It is somewhat related to Semantic Understanding, but Semantic Understanding is based on language, and this is one step preceding the language, to be more conceptual, and universal; it is language agnostic.

Entities can be described by keywords, but can also be described by pictures, sounds, smells, feelings and concepts; (Think about the sound of a train station – it brings up a somewhat universal concept for anyone who might hear it, without needing a keyword.) A unified index that is based on entity concepts, eliminates the need for Google to sort through the immense morass of changing languages and keywords in all the languages in the world; instead, they can align their index based on these unifying concepts (entities), and then stem out from there in different languages as necessary.

The value of entities can be a bit hard to understand, but from the perspective of efficiency in search, the concept can’t be overstated. The internet has altered the way many of us think about knowledge, to make it seem like knowledge might be infinite, imperceivable and unending, but from the pragmatic and practical perspective of a search engine, this is not exactly true. While the potential of knowledge MAY be infinite, the number of ideas that we can describe, or that are regularly searched or discussed is somewhat limited. In fact, it used to fit in an encyclopedia, or at least into a library. For many years in history, libraries indexed all of the knowledge that they had available, and most carried more information than any one human could peruse in a lifetime. It is with this limitation that we must approach trying to understand ‘entities’ from the perspective of a search engine.

From a search engine perspective, it is important to understand that domains can be entities, but often have larger entities like ‘brands’ above them in an entity hierarchy. Indexing based on entities is what will allow Google to group all of a brand’s international websites as one entity, and switch in the appropriate one for the searcher, based on their individual country and language, as John Mueller describes in his recent Reddit AMA:

“You don’t need rel-alternate-hreflang. However, it can be really useful on international websites, especially where you have multiple countries with the same language. It doesn’t change rankings, but helps to get the “right” URL swapped in for the user. If it’s just a matter of multiple languages, we can often guess the language of the query and the better-fitting pages within your site. Eg, if you search for “blue shoes” we’ll take the English page (it’s pretty obvious), and for “blaue schuhe” we can get the German one. However, if someone searches for your brand, then the language of the query isn’t quite clear. Similarly, if you have pages in the same language for different countries, then hreflang can help us there.”

Notice how he talks about the brand as a whole, despite the fact that there might be different brand ccTLD domains or urls in the hreflang. Before Entity-First Indexing, the right international version of the website would have been more determined by algorithmic factors including links, because the websites were not grouped together under the brand and evaluated together as an entity. This concept is illustrated below in the first inverted pyramid. Historically, getting the correct ccTLD version of a site to rank in different countries was a constant struggle, (even with Search Console settings to help,) that this will hopefully solve.

For more topical queries, that are less focused on a brand, the entity relationships may be more loose, and include top resources on the topic, like blogs, books and accredited experts. These groupings could focus on domains, but depending on the strength of engagement with other content, such as popular podcast on a niche topic, the domain could be less prominently displayed or expressed in the entity ranking, illustrated below.

The Relationship Between Entities, Languages & Keywords
Remember, when SEO and search all about keywords, it is a language-specific task. Entities are different because they are universal concepts that keywords in any language can only describe. This means that entity-based search is more efficient, because the search engine can query more content faster (all languages at once), to find the best information. The algorithm can cut through the noise and nuance of language, spelling and keywords, and use entities and context to surface the appropriate type of response for the specific query. Though entities are language-agnostic, language is critical for informing Google’s Entity Understanding. It is this process that probably made the transition to Mobile-First Indexing so slow; the entire web had to be classified and re-indexed as entities, which is no-small task.

NOTE: While many SEO’s agree that the hreflang protocol was established to help train Google’s machine learning algorithms to build and refine their translation API’s, we believe it was ALSO used, more holistically, to develop its Entity and Contextual understanding of the web, because it allowed Google to quickly compare the same textual content, in the same context, across many languages all at once.

(Did anyone wonder why so many of the questions that John Mueller responded to in the Reddit AMA were about hreflang? Probably, because it is so important for Google’s ability to index domains based on entities, then switch the correct version of the content in based on language and location signals. Together with Schema, hreflang tagging is like Google’s Rosetta Stone for the internet and Entity Understanding. This is also why Mobile-First Indexing was rolled out globally, instead of US-first, one country at a time,  like the last major change to indexing, Caffeine was. It is by design that Entity-First Indexing can’t be rolled out one country at a time.)

If you think about it, language is fluid; it is changing every day, as new slang is added and new words come in and out of vogue. This is even seen in the nuances of pronunciation and spelling, and it happens not only in English but in every language. It even happens in subversive ways, with images and icons, (as any teen who has sent dirty text messages with a standard set of emojis can tell you.) But rapid changes to language can also be empowering and political, as you can see in the tweet below, about the #MeToo movement in China, which has been suppressed by certain groups in mainstream communication.

Google does care about communication, and has actually enabled even more emoji to work in Chrome recently, potentially to help enable empowering political movements, but also simply because their focus on PWAs means that more and more chat and communication apps will be leveraging browser code for core functionality. This shift to enable emojis could also hint at the potential that Google is concerned about trying to index, as many chat apps and social networks transition to crawlable PWAs, instead of having content locked away, much harder to crawl and index in native apps; the level of public communication in crawler-accessible browsers could grow exponentially.

What Does It Mean to Index on Entities & Why Would Google Do it?

To be clear, entity understanding has existed at Google for a long time, but it has not been core to indexing; It has been a modifier. I believe that the shift to Mobile-First Indexing is a reorganization of the index – based on entity understanding; roughly, a shift from organizing the index based on the Link Graph to organizing it based on the Knowledge Graph. Continuing to organize and surface content based on the Link Graph is just not scalable for Google’s long-term understanding of information and the web and it is definitely questionable in terms of the development of AI and multi-dimensional search responses that go beyond the browser.

For years Google has been trying to distance themselves from the false economy that they created, based on the relative value of links from one page to another, but they have not been able to do it because it was core to the system – it was part of how content was discovered, prioritized and organized. As Dave Davies says, “The idea that we can push our rankings forward through entity associations, and not just links, is incredibly powerful and versatile. Links have tried to serve this function and have done a great job, but there are a LOT of advantages for Google to move toward the entity model for weighting as well as a variety of other internal needs.” While neither Dave nor I are recommending you abandon linking as a strategy, we all know that it is something Google has been actively advocating for years.

Constantly crawling and indexing content based on something as easy to manipulate as the Link Graph and as fluid as language is hard, resource intensive, and inefficient for Google; And it would only grow more inefficient over time, as the amount of information on the web continues to grow. It is also limiting in terms of machine learning and artificial intelligence, because it allows the country and language-specific algorithms to evolve separately, which John Mueller specifically said in his Reddit AMA that they don’t want to do. Separate algorithms would limit potential growth of Google’s AI and ensure that larger, more populous country and language combinations remained much more advanced, while other smaller groups continued to lag and be ripe for abuse by spammers. Finally, most crucially for Google’s long term goals, Google would not be able to benefit from the multiplier effect that ‘aggregation of ALL the information’ could have for the volume of machine learning and artificial intelligence training data that could be processed by their systems, if only they could they could get around the problem of language  …  And this is why entities are so powerful!

Just a Guess – How I Imagine Entity Indexing Works

With all that in mind, here is my vision of how Mobile-First Indexing works, or will eventually work, with entity indexing. Possible problems you may have experienced related to the new indexing process (which may have started around March 7th) are noted in parentheses next to the proposed step that I believe may be causing the problem:

  1. Content is crawled for Mobile-First Indexing (Most of the content has already been crawled and re-indexed. You have been notified in Search Console, but Mobile-First Indexing probably began at least 3 months before the notification so that the New Search Console could begin building up the data and comparing it to old Search Console data to validate it before the notification was sent.)
    • The User-Agent is: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    • The Aspect Ratio is: 410x730px (or probably 411x731px), with an emulated DPR (devicePixelResolution) of 2.625
      Please Note – This may be variable and will probably change as new phones come out. This is why building in Responsive Design is the safest bet.
  2. Entire domains are re-indexed in the new Mobile-First Indexing process. This happened one domain at a time, rather than one page at a time. The bot only follows links that are accessible to it as a smartphone, so only content and code that is present for the mobile user-agent is re-indexed with the Mobile-First Indexing process, (whatever that may be). It may still be crawled periodically by the desktop crawler. This detail has not been made clear in the Google communication.
    • Old web content and desktop-only content that that was historically in the index but can’t be found by the mobile crawler but will remain in Google’s index, but will not receive any value that is associated with being evaluated in the Mobile-First Indexing process.
    • In addition to being evaluated for the content and potentially for mobile rendering, domains are evaluated for entity understanding using on-page text, metadata, Schema and other signals.
      • The domain itself is considered an entity. It has relationships to other domain and non-domain entities. Pages on the domain are indexed along with the domain-entity, rather than the larger entity concept.  (Entity Clustering, Re-indexing XML Sitemaps)
      • Links from the Link Graph are aggregated attributed to all alternate versions of the page equally, including mobile and desktop versions of the page as well as hreflang translated versions of the page. The same is true of Schema markup – especially if it is on the English or X-default version of the page. Google still uses local information as a ranking signal, but the signals may change in relationship to the larger entity.
        • Links continue to impact rankings, though they are less critical for indexing. The current Link Graph is probably noted so that the impact of aggregation can be rolled out slowly, over time in the algorithmic rankings. We can assume that links will remain part of the algorithm for a long time, or potentially even forever, until Google has vetted and tested many other suitable replacements. The most likely replacement will probably be some type of Real User Metric (RUM) similar to what we are seeing Google do with Page Speed as Tom Anthony brilliantly describes, but this may be some time off.
      • Pages (URLs on the domain) become interchangeable entities of the domain, which can be switched in as necessary depending on the search language, query, context and potentially, the physical location of the searcher. International versions of a page now share most or all ranking signals. (Weird international results)
      • Google’s understanding of structural content like subdomains and their association with specific countries as well as XML sitemap locations may be reset, and may need to be re-established in New Search Console. (Previously properly indexed XML sitemap files re-appearing in SERPs)
  3. Google’s newly organized index is based on entity hierarchy grouped roughly according to the Knowledge Graph, instead of however it has been organized historically, (we assume based somehow on the Link Graph.) This provides an efficiency benefit for Google but is not intended to *directly* impact rankings – Indexing and ranking are two different parts of the process for a search engine. It may however, make it easier for Google to de-emphasize links as part of the algorithm in the future. Remember, Knowledge Graph entities, Topic Carousels, App Packs, Map Packs, Utilities and other elements that so often surface at the top of SERPs now, do so without any links at all.The indexing establishes associations through an entity’s relationships to other entity concepts, and these associations can be loose or strong. These relationships are informed by their relative placement in the Knowledge Graph (proximity), but also probably fed by the historical information from the Link Graph. (FYI: Danny from Google specified that it is not a new index or a different index; It is the same index. We are simply speculating that this one index has been reorganized.)
  4. The entity hierarchy includes both domain and non-domain entities. Google will use their machine learning to inform, build-out and fine-tune the entity understanding of all entities over time
    • Non-domained Entities: Entities without a domain, like ideas, concepts or things in the Knowledge Graph are given a Google URL that represents their location in the index (or Knowledge Graph).
      • Indexed content like apps, maps, videos, audio and personal content deep linked on a personal phone also fall into this category. (EX: app deep links or system deep links, like ones for contacts – The contacts utility is essentially just an app.) Remember that more and more content that people eagerly consume is not ON websites, even if it is purchased from websites – though this may change with the rise of PWAs.
      • These non-domain entities are indexed along with existing websites in the hierarchy.
      • Temporary Google URLs are given to non-domain entities. The URL is not necessarily meant to build up traditional ranking signals, but instead, the URL is simply an encoded locator, so that the item can be found in the index. Once un-encoded, a unique ID allows the entity to be related to other content in the index, and surfaced in a search result whenever the top-level entity is the most appropriate result.
Follow-Up Discussion from Conferences Last Year: It seems like the idea that URL’s are optional might be an overstatement. Google still needs URLs to index content, they just don’t have to be unique, optimized, static or on a domain that an SEO optimizes. Google is creating Dynamic Link URLs for loads of types of content – especially when the content might qualify as an entity, and just putting it on different Google short links.  If you have certain kinds of content that you want indexed but it doesn’t have a URL, Google will essentially just give it one. Examples include locations such as businesses, but also locations that don’t have specific addresses like cities, countries and regions. They are also giving URLs to Google Actions/Assistant Apps, and information that appears to be indexed as part of an instant app, such as movies in the Google Play Movies & TV app. Types of music, bands, musicians, musical instruments, actors, painters, cartoon characters – really anything that might have an entry in a an incredibly comprehensive encyclopedia is getting a Google Short link.
      • Domain Entities: These are simply websites, which have historically been Google’s crawling and indexing focus. They are entities that already have their own domains, and don’t need to be given temporary URLs from Google.
        • Entities can be parts of other entities, so just because a website is a domain entity on its own, that does not preclude it from being a part of a larger concept, like the Florence & the Machine official website URL which is included as part of the official Google entity.
        • Larger entities like ‘brands’ may be related to domains but sit above the domains in the entity hierarchy. International brands could have many domains, and so the international brand is an entity, and the domains that are a part of it are also entities. Similarly, there could be concepts that are entities, that are smaller than domains, lower in the hierarchy.

5. Search rankings and entity relationships will be fed, reinforced or put up for re-evaluation using automated machine learning processes that are based on the user-behavior and engagement with the SERPs over time, especially when Google perceives a potential gap in their understanding.

    • At launch, the big entity concepts will be strong for head-term searches, but the long-tail results will be weaker and Google can fall back on traditional web SERPs and the content that has yet to be migrated to Mobile-First Indexing whenever they want. Google will use machine learning and AI to localize and improve more niche results. (Weird long-tail results, Unrecognized entities)
    • In the short term, newly perceived relationships will only lead to a temporary change in rankings, but in the long term, with enough signals, sustained changes in entity relationships could trigger a re-crawl of the domain so that the content can be re-evaluated by the Mobile-First Indexing process for additional Entity Understanding and categorization.

6. New types of assets that can rank will be indexed based on entity understanding, rather than the presence or absence of a website.

[Note from the author: I am not a systems architect, database manager, sys-admin or even a developer. I am just a lowly SEO trying to make sense of what the smart people at Google do. Please forgive me for any poorly worded technical descriptions or missteps, and let me know if you have corrections or alternate theories. I would love to hear them!]

Does the Crawler Render Now or Later?

The other major change that might be part of the Mobile-First Indexing process is that indexing and ranking now seem less tightly tied to rendering.  This is surprising, since Google has historically focused so much on mobile UX as a dimension of feedback to webmasters. But feedback has also always been in the context of Google’s PageSpeed Insights tool, which as Tom Anthony describes, is now fed by Real User Metrics (RUM) rather than data that it synthesizes during an on-demand page render, as the tool previously did.

Most SEO’s have been focused on how the change to Mobile-First Indexing will impact crawling of their content, which is important because it happens before indexing. Whatever is not crawled, is not indexed, or at least that is how it worked before. But if the Mobile-First Indexing process has changed something about when and how the bot renders the page, this could be substantial. Is it possible that once Google knows about a domain, it is just waiting on RUM rendering data to be collected and compiled from real-user rendering sources for some of the data?

This is all still very unclear, but some SEO’s have reported that content that was previously penalized because of interstitials is now ranking again, which was previously not allowed. John Mueller also recently specified that Google could index CSS grid layouts even though Google’s rendering engine, Chrome 41 does not support them. This does not seem to be a one-off thing either – Where Google used to be limited to indexing what it could render without changing tabs, now Google says it can index everything on all tabs, as long no on-click events are required to fetch content from the server. In potentially related news, John also says that parameters no longer hinder URL rankings or need to be managed in Search Console – something that Google has been saying for awhile, but so far, has never really been 100% true, but in a recent Google Hangout, it was explained that they are now just considered signals for crawling, rather than rules; it is possible that they signal Google to use a different type of rendering engine, after the content is indexed – this is something that we would love for John to expand on in future discussions.

Rendering is the most time and resource-intensive part of crawling, but recently, Google has not seemed worried about developers building their progressive web apps (PWAs) as single-page apps (SPAs). If unique URLs on a domain are just attributed to the domain entity anyway, (or if links are less important for indexing over-all), perhaps the entity as a whole can be rendered and evaluated later, with crawlers looking for deep links, long parameterized urls, JavaScript server requests for content from the server, or regular web URLs from internal links. If rendering doesn’t matter, or different bots can crawl the entity as needed, maybe Google will just lift whatever text it can, and try again with different bots later, as needed.

What Can Go Wrong When You Index on Entities?

As noted above, many SEO’s have noticed weird anomalies in the SERPs since the major update in March. Many of these anomalies seem much more related to indexing rather than ranking – Changes in how an entire query or domain is perceived, strong domain-clustering, changes to AMP, Answer Boxes and Knowledge Graph inclusions, changes in schema inclusions and problems with local and international content and sitemaps. My supposition here is some content, signals and indexing instructions may have been lost during the Entity-First Indexing process, but there are other things that can go wrong too.

From what we can tell, Google is still doing a great job responding to head-term queries, surfacing Knowledge Graph Entities like they have been for awhile. The problems only seem to come in for long-tail queries, where most SEOs focus. This is where Google’s Entity Understanding may be more vague or the relationships between different entities may be more complex.

The switch to Entity-First Indexing will certainly create instances where Google misunderstands context or makes assumptions that are wrong about when and where something is relevant to a searcher. Hopefully, this all gets sorted out quickly or rolled back until it is fixed. The fact that Google has announced that they will stop showing Google Instant results, where they used to include the keyword level entity disambiguation, may be a sign that they are worried it would expose too much of the inner workings of the system, at least in the short term. But they do still appear to include simple definitions and occasionally a link to a Wikipedia result in the instant results now, but that is it for now. It is interesting though that the old style of Google Instant results do still appear to be supported in the Google Assistant App, as shown below, but this could be temporary:

It is important to understand that Google’s Entity Understanding appears to be keyed off of the English definitions of words in most cases, so this means that there will be instances when the English concept of something is broken compared to the rest of the world’s concept of the same thing, like with pharmacies, as described in Article 4. Other examples might be the US reversal of the sports games ‘soccer’ and ‘football’ or disambiguation of the word ‘cricket’ where it is a popular sport instead of just a chirping bug – both quite strong and widely understood concepts that are regionally very different. In these cases, it is hard to know what to do, other than find a way to let Google know that they have made a mistake. 

Is Now Really the Time for Entities?

The biggest and most jarring change that has happened since the March update, was when temporarily Google replaced the normal response to queries about the time, with a single-entry answer, as shown below on the right.

This type of result only lasted a few days, and you can see why in the image below – Google was over-classifying too many queries as ‘time queries’ and this was causing problems; A query for a brand of scotch was being misunderstood as a time query.Google tried to perceive the intent of the query, but failed miserably, possibly because there were not enough entities included in the Knowledge Graph or Google’s index, possibly because they were not taking enough context into account or most likely, a bit of both. This will be a big risk in the early days of Entity-First Indexing. For brands, missing classification or mis-classification is the biggest risk.  I have been told that Time Magazine and the New York Times experienced similar problems during this test.

 

Context is King

With all this change, it is important to remember that Google’s mission is not only limited to surfacing information that is available on a user-facing domain. Google does not consider itself a utility that’s only job is to surface website content, and you shouldn’t either! Surfacing content on the web and surfacing websites are different. Google’s goal is to surface the most useful information to the searcher, and sometimes that will depend on the context that they are searching in. Google wants to serve their users, and the best information for their users may be a video, a song, a TV show, a deep link in an app, a web utility, or an answer from the Knowledge Graph.

Context allows Google to disambiguate multiple versions of a single entity, to know which one is the most relevant to the user at the time of their search. To better understand a complex entity and its indexing and how that might work, let’s look at the example of Monty Python. Among other things, Monty Python is in fact a domain, but it also the name of a comedy group, the name of a series comedy skits and compilations on video, a YouTube Channel, and part of the name of multiple albums of recorded comedy. When someone searches for the keyword ‘Monty Python’ how could Google know which one of those things they are looking for? They really couldn’t unless they knew more about the context of the search. If the user is searching on a computer, they could want any of those things, but if they are searching in a car or on a Google Home device, or something else without a screen, they are most likely looking for something with just audio – not videos. If they are searching on a TV, they are more likely looking for video. If they are searching on a computer or a phone, there is a chance they are looking for information, but if they are searching on a TV, the likelihood that they want to read information is low-they probably want to just watch a video.

Contextual signals are particularly important for delivering a great experience to mobile users. Google has been open about this, such as in this “Think With Google” article published in 2017 about advertising to mobile users, where Google says, “When we reach someone on mobile…there are loads of context signals, like time and location…To really break through with mobile users, it’s important to take full advantage of their context.”

When we index based on only keywords – keywords like ‘watch’ ‘listen’ ‘video’ ‘audio’ ‘play’ ‘clip’ ‘episode’ are necessary. When you index based on entity, the understanding of the search query is more natural, based on context. With context instead of additional keywords, queries become more simple, basic and natural. Indexing on entities allows Google to surface the right content based not only on the keyword query but also the context of the device that they start the search from, like a TV, a Google Home, a phone, a web-enabled car system or something else! We get closer to natural language.

The problem that SEO’s have is that we have focused on the broadest and context-free devices first – our computers. This makes it hard to conceive of how strong a signal context could be in determining which part of an entity is most relevant in a particular search but start to think about all the new devices that are getting Google Assistant and how Google will work on those devices.

Someone searching on a Google Home or Android Auto might not be able to use a website at all. They will be much more interested in audio. Someone searching on a TV is also probably more interested in videos and apps than they are in websites. SEO’s who limit their understanding of their job to optimizing website experiences will limit their success. Just because Google crawls and indexes the web, does not mean that they are limited to websites, and SEO’s should not be either.

Discussion with Google

This change to the time queries has since been rolled back, but when it happened, I tweeted that this was a clear indication of Mobile-First Indexing. Danny Sullivan, a long-time friend, search personality, SEO expert, and Google’s Search Liaison explained that it had nothing to do with Mobile-First Indexing, which I found confusing. I realize now that my tweet didn’t convey my more robust belief that Mobile-First Indexing is all about Entity Understanding, but we can suffice to say that Google officially conceives of these two concepts as separate. Perhaps they are two separate projects, but I find it impossible to believe that they are totally unrelated. To me, it seems self-evident that the goal of any change towards Mobile-First [anything], especially if it was meant to support voice-search, would improve Entity Understanding. But in his response, Danny seemed to assert that Mobile-First Indexing has absolutely nothing to do with Entity Understanding.

Danny gave an analogy that I love, about Mobile-First Indexing being like removing old paper books from a library and replacing them with the same thing in an e-book format. This analogy was provided to prove the point that there is only one index, not a separate mobile and a desktop index which Danny emphasized as a very important point. This seems perfectly aligned to illustrate the efficiency of entity-based indexing – I love it! An eBook would not need to keep multiple paper copies of translated versions of the text, but could potentially be translated on the fly – the same way we describe language agnostic entity understanding here and in Article 4 of the Mobile-First Indexing series. It is overwhelmingly disappointing that Google is not willing to talk about this part of the change to Mobile-First Indexing, and that Danny is willing to give the analogy but not willing to discuss the full depth of the explanation at this point.

The only problem is that library analogy is at odds with the explanation that is being given from John Mueller from the Webmaster team, that it is just about a change to the user-agent. If the only thing that changes is the user-agent, how do we get an eBook from the same crawler that previously only gave us paper books? Unfortunately, after the library analogy, the conversation got derailed (as it has before with other Google representatives) to focus on the number of indexes Google was using to organize content. The ‘one index vs. multiple indexes’ point is a point that can be a bit confusing because some Google representatives repeatedly explained and implied that there was an old ‘desktop-oriented’ index (that we have been using historically) and a new ‘Mobile-First’ index that content was migrating too.

There is a lot to be confused about, starting with the change in messaging from when Google was telling us about sites “being moved into the Mobile First Index one domain at a time;” to the “same index, different crawler,” line that is now the official, go-to talking point on this topic, for Google representatives. The position allows Google to say that desktop content will be maintained even if it is not accessible by the mobile crawler, which makes the discussion of the new crawler…almost irrelevant! If desktop content will see no negative effect from the change, why bother making any accommodations for it at all? But ultimately, this ‘one index’ mantra is a nuanced point that really doesn’t matter and I think it is a bit of a red herring. The same index can have different partitions, formatting or organization, virtual partitions or any number of designations that make it function like one or two indexes as necessary. It is also true that one index can exist and simply be re-organized one domain at a time, without duplication. The net result for users and SEO’s does not change.

Conclusion

Google has made a big investment in voice search and Google Assistant and recently doubled down in AI by promoting two new heads of search as people with extensive backgrounds in machine learning and artificial intelligence.  All of these things should be taken as a sign of change in the lives and job descriptions of SEO’s. As more and more devices become web-enabled, and fewer and fewer of the best results for users are websites, the context for search is getting much broader.

New strategies will include adding audio versions of text-only content, adding video and voice-interactive versions of content, and getting all these assets indexed and associated correctly with the main entity. They will also include optimizing non-website entities, like Knowledge Graph relationships to ensure that main entities are correctly correlated with the domain and all of its assets. They will include monitoring the translation and entity understanding, to make sure that all the interactions are happening correctly for users around the world, and they will include monitoring feedback like reviews, which Google will be using more and more to automate the sorting and filtering of content for voice navigation. They will also no-doubt include technical recommendations like use of Schema and JSON-LD to mark up content, transition to Responsive Design and AMP only design, transition to PWA and PWAMP.

This has been the first of a five part article series on Google’s new Entity-First Indexing (what everyone else calls Mobile-First Indexing) and how it is related to language. Future articles in this series will provide deeper information about the relationship between language, location and entity understanding, and how these things can impact search results. The next article in the series will focus on the tools that Google has made available to marketers, that we think offer a good view into how their language and entity understanding works and the following three articles will walk through specific research we have done in this area to validate and explain our theories. This will include one article about the Google language APIs, one about how language impacts Entity Understanding and one about how personalization impacts Google’s Entity Understanding, and changes search results for individual users. The final article in the series will focus on how individual phone language settings and physical location can change search results for individuals, making it even harder for SEO’s to predict what a search result will look like, and how their content will rank in different search scenarios. 

 

TL;DR: (Article Summary)

1. Google’s Mobile-First Indexing is much more about entity classification and translation than it is about a different user agent(mobile) and viewport size for the bot.

2. Entities are universal or unifying concepts that keywords in any language can describe – implicating that they are language agnostic. They are not only a person, place or thing but also characteristics. These characteristics are connected by relationships and can be described by keywords, pictures, sounds, smells, feelings and concepts.

3. Entity based search is more efficient because the search engine can query more content faster(all languages at once) in order to find the best information. Since language is always changing, it makes sense to organize information this way.

4. This shift of organizing information for Google is about organizing the index based on entity understanding, which means transitioning from the link graph to the knowledge graph.

5. Contextual signals (which type of device the user is querying from) will play a massive role on Google’s rankings in the future – these signals are critical for the disambiguation of a searcher’s intent when it comes to entities.

6. Google’s intent is to surface the most useful information to the searcher by giving the best information through a video, a song, a TV show, a deep link in an app, a web utility or an answer from the Knowledge Graph.

7. As time goes on and the context for search is getting broader (i.e. More devices become web enabled and fewer of the best results are websites), Google must transition to an entity-first indexing method in order to keep up with technology and the context in which users are searching.