The Entity & Language Series: Frameworks, Translation, Natural Language & APIs (2 of 5)
By: Cindy Krum
All of a sudden, we are seeing translation happen everywhere in the Google ecosystem: Google Maps is now available in 39 new languages, Google Translate rolled out new interface with faster access to ‘Conversation Mode,’ Google Translate on Android switched to server-side functionality that allows it to be invoked and work within any app or website on the phone, as shown in the image on the right, where Google Translate has been invoked to translate a tweet in the Twitter native app. Google clearly has reached a new level in their voice and language capabilities!
We are also seeing Google find lots of new uses for their Cloud Natural Language API; Google also just launched ‘Google Talk to Books’ which uses it to allow you to ask questions that it will answer from knowledge it has gained from crawling and processing/understanding the contents of over 100,000 books. They also just launched a new word association game called Semantris, which has two modes that both allow players to work against time to guess at on-screen word relationships to advance ever-increasing special hurdles caused, as more and more words are added to the screen.
And the list goes on. We are also seeing some of this play out in search. Image search results are now clearly pulling from an international index, with captions that have been translated to fit the query-language. Map results include translated entity understanding for some major generic queries, like ‘grocery store’ and ‘ATM’, and they also auto-translate user-reviews for local businesses in to the searcher’s language.
The timing of all of these changes is not a coincidence. It is all a natural side-effect of the recent launch of Mobile-First Indexing, as we call it Entity-First Indexing. This is the second article in a multi-part series about entities, language and their relationship to Google’s shift to Mobile-First/Entity-First Indexing. The previous article provided fundamental background knowledge about the concepts of entity search, entity indexing and what it might mean in the context of Google. This article will focus on the tools that we believe Google used to classify all the content on the web as entities, and organize them based on their relationships, and launch the new indexing methodology. Then we will speculate about what that might mean for SEO. The next three articles in this series will focus research and findings that we completed to show evidence of these theories in play internationally, as they are related to the functioning of the different translation APIs, how those impact Entity Understanding and how personalization plays into Google’s Entity Understanding and search results.
- Fuchsia & Why Entities are So Important to Google
- How Device Context & Google Play Fit In
- Are Local Businesses Already Being Treated Like Apps?
- Global Search Algorithms are Better than Local
- Two Parts of an Entity Classification Engine
- How Marketers Can Start Getting Value from the Tools
- Essential Entity Elements – Critical Requirements for Correct Classification of an Entity
Fuchsia & Why Entities are So Important to Google
To understand how language and entities fit into the larger picture at Google, you have to be able to see beyond just search and SEO. Google cares a lot about AI, to the point that they just made two people who previously worked on the AI team, the heads of the Search Team. At Google, they also care a lot about the voice search – so much that Google Assistant has already shipped in more than 400 million devices around the world. Finally, Google care about reaching what they call The Next Billion Users – people living outside of North America and Western Europe, who have historically not been on the cutting edge of technology, but are now getting online and becoming active and engaged members of the online community. All of these goals may be brought together with a new software that Google is working, currently under the code name Fuchsia.
Fuchsia is a combination of a browser and an OS. What is the most important about it from a search perspective, is that it works based almost entirely on entities and feeds. The documentation and specifications are still very thin, but if this is the direction that Google is headed, then we can be sure that search of some sort will be tightly integrated in the software. As SEO’s, what we need to remember, is that search is not just web, and this is something that Google now really seems to be taking seriously. Google knows that there are lots of different types of content that people want to surface and interact with on their phones and other devices, and not all of it can be found on websites. This is why entities are so important. They allow web and non-web content to be grouped together, and surface when they are the most appropriate for the context, or surface together, if the context is not clear, to let the user decide. This is where Google Play comes in.
Even now, if you are not sold on the idea of Fuchsia ever impacting your marketing strategy, it is worth looking at the Chrome Feed, which is a default part of all Android phones and part of the Google App on iOS. This customization feed, sometimes called ‘Articles for You’ is almost entirely entity based and according to NiemanLab and TechCrunch, traffic from this source increased 2,100% in 2017. Users get to select the specific topics that they want to ‘follow’ and the feed updates based on those topics, but also shows carousels of related topics, as shown below. Users can click on the triple-dot menu at any time to update the customization of their feed. If you don’t think this is a powerful way of getting news, realize that people can’t search for a news story until they at least have an idea of the topic or keywords that they want to search for – they have to be aware of if to put in a query. You can also think of how Twitter and Facebook work – both in feeds that you customize based on who you are friends with or follow – but most of us wish we could customize those feeds more. Google is hoping to be able to get us there in their own offering!
How Device Context & Google Play Fit In
Once Google launched app indexing, most SEO’s probably thought that Google’s ultimate goal was to integrate apps into normal search results. For awhile, it probably was, but deep linking and app indexing proved to be so problematic and complex for so many companies that it fell off of most people’s radar’s and Google changed course.
Either your app and website had exact web parity, all the time, and it was easy, or you didn’t and it was much more complicated. The problems generally stemmed from large sites with different CMS running the back-ends of their web and app platforms-sometimes even different between Android and iOS. This made mapping between all three systems to establish and maintain the required web-parity between the apps and the website a nightmare. Beyond that, whenever anything moved in the app or the website, content had to be moved everywhere else, to mirror the change. We think that this was one of the many good reasons that Google started advocating PWAs so strongly – it got them out of having to sort out the problems with deep linking and app indexing.
PWA’s allowed one set of code handled app and web interaction, which was brilliant, but what a lot of SEO’s missed, was the announcement that PWA’s were being added to Google Play, Google’s app store. PWAs are essentially ‘websites that took all the right vitamins’ according to Alex Russell from Google, so them being added to Google Play was a big deal! We have suspected it for a long time, but with the addition of PWA’s (and Google Instant Apps) to Google Play, it is finally clear that apps are not being integrated into traditional web search at Google, like most SEOs suspected, BUT, traditional web search is being integrated into Google Play – or at least using the Google Play framework. This fits perfectly into the concept of Entity-First Indexing, because Google Play already uses a cross-device, context-aware, entity-style classification hierarchy for their search!
Google Play can also handle the multi-media, cross-device content that Google probably wants to surface more in Mobile-First/Entity-First Indexing, including games, apps, music, movies, TV, etc, as shown below in the Google Play & Monty Python Google Play Search examples. All that content is already integrated, populated and ranking in Google Play. It is also set up well for entity classification, since things are already broken down based on basic classifications, like if they are apps, games, movies, TV shows or books. Within each of those categories, there are sub-categories with related sub-categories. There are also main entities, like developer accounts, or artists, from which multiple apps or albums and/or songs can be surfaced, and these also have relationships already built in – to other genres of related content, so this is all great for Entity Understanding.
Google Play is already set up to include device-context in it’s search algorithm, so that it only surfaces apps and content that can be downloaded or played on the device that is searching. It is also set up to allow different media types in a SERP. As discussed in the first article in this series, context is incredibly important to Google right now because it is critical for the disambiguation of a searcher’s intent when it comes to entities.
Googles additional focus on context could also make the addition of videos and GIFs to Google Image Search seem potentially more logical, if contextual is considered. Perhaps this is now just a contextual grouping of visually oriented content, which would make it easier to interact with on devices like a TV, where you might use voice search or assisted search, casting or sharing your screen from a phone or laptop to the larger screen so that the viewing experience can be shared. Bill Slawski explains that many of Google’s recent patents focus on “user’s needs” and “context”….One of those was about Context Vectors, which Google told us [sic] involved the us of context terms from knowledge bases, to help identify the meaning of terms that might have more than one meaning” We think that the ‘knowledge base’ that Google is referring to in this patent documentation is actually Google Knowledge and similar data repositories that may have since been merged into the Knowledge Graph. The current status of Google Image Search could just be a middle-term result, that will change more, as more classification and UX positioning is added to the front-end side of the search interface.
From a linguistic perspective, Google Play was also a great candidate to use as a new indexing framework. For all the categories of content, but especially for apps, the categories that are available in the store stay the same in every language, though they are translated. More importantly though, metadata that app developers or ASOs submit to describe their apps in the store is auto-translated in all languages, so that your app can be surfaced for appropriate keyword searches in any language. So Google Play is already set up for a basic entity understanding, with all the hreflang information and hierarchical structure already in place.
Are Local Businesses Already Being Treated Like Apps?
If you are not focused on Local SEO, you might not be aware of the massive number of changes that have launched for GoogleMyBusiness (GMB) listings in the past couple of weeks, in the time since the March 17th update. In general, small business owners have recently been given a lot more control of how their small business look in the Google Knowledge Graph listings. This includes: the ability to add and edit a business description that shows at the top of the listing, the ability to actively edit the menu of services that the business offers, and more.
Before March 17, Google had also quietly been testing Google Posts, which allowed small businesses to use their GMB accounts to publish calls to action, and allow searchers to take actions directly from the GMB – Knowledge Graph panel, including booking appointments and reservations. It is essentially a micro-blogging platform that lets business owners make direct updates to their business listing whenever they want, and this is a big deal. Joel Headley and Miriam Ellis do a great job of covering it on the Moz Blog.
All of this makes it seem very much like Google is empathizing with, and trying to fix one of the biggest pains of small businesses – maintaining their websites. This is another aspect of the Google Play store that fits well in the model we believe Google is going for, is that proven entity owners, such as app developers, are able to edit their app listings at will, to help market them and optimize them for search. If Google can empower small business owners to build out their GMB listings, and keep them current, then it will save them a lot of time and money, and many of them would be just as happy, or happier with that solution then having the burden and cost of maintaining a website.
From Google’s perspective, they just want to have the best and most accurate data that they can, as quickly and efficiently as they can. Google knows that small businesses often struggle to communicate business changes to web development teams in real time, and budget constraints may keep them from making changes as often as they would like. By empowering the business owners to control the listing directly, and even allowing them to set up calls to action and send push-notifications, Google is really creating a win-win situation for many small businesses. There are some obvious SEO questions about how easy or hard it will be to optimize GMB listings in the complete absence of a website, but this is an area to watch. Google is likely using off-line engagement data, and travel radiuses to inform how widely a business’s ranking radius should be, and how relevant it is for various queries, so we could be in all-new territory here, in terms of optimization and success metrics are concerned.
Global Search Algorithms are Better than Local
The websites that Google currently ranks in search results are translated by the website creators or their staff, but this is not necessarily true of the other entities that are ranked, for instance Knowledge Graph results, and related concepts that are linked there, like apps, videos and music. In these, Google is often using their own tools to translate content for presentation in search results (as they do aggressively with Android apps) or actively deciding that translation is not necessary, as is common with most media. They do this translation with basic translation APIs and Natural Language APIs and sometimes, potentially human assistance.
Without a language agnostic, unifying principle, organizing, sorting and surfacing all the information in the world will just get more and more unwieldy for Google over time. This is why, in our best guess, Google is not translating the entire web – they are just doing rough translations for the sake of entity classification. From there, they are ranking existing translations in search results, and then their language APIs makes it possible to translate other untranslated content with APIs, on an as-needed basis, which may become more important as voice search grows in adoption. For Google, it is actually easier to unify their index on a singular set of language agnostic entities, than it is to crawl and index all of the concepts in all of the languages, without the unifying, organizing principles of entities.
This synthesis of information necessary for entity classification may actually create more benefit than is immediately apparent to most SEOs; most SEOs assume that there is an appropriate keyword for everything, but in reality, language translation is often not symmetrical or absolute. We have probably all heard that Eskimos have more than 50 words for ‘snow’. These 50 words are not all exact translations but have slight variations in meaning which often do not directly translate in other languages. Similarly, you may have been exposed to the now-trendy Danish concept of ‘Hygge,’ which is a warm, soft homey feeling that one can create, which usually includes snacks and candle light, but again, there is no a direct translation for this word in English. If we required direct translation for classification, much of the richer and more detailed and nuanced meaning would be lost. This could also include loss of larger data concepts that are valuable across international borders, as postulated in the example below:
EX: If I am a Danish climate researcher, and we develop a method for measuring a the carbon footprint of a community, we create a new keyword to describe this new ‘collective community carbon footprint measurement’ concept, and the keyword is, ‘voresfodspor.’ This word exists only in Danish, but the concept is easily described in other languages. We don’t want the data and our research to be lost just because the keyword does not universally translate, so we need to tie it to a larger entity – ‘climate change,’ ‘climate measurement, ‘carbon measurement,’ ‘community measurement.’ Entity understanding is not perfect translation, but it is great for making sure that concepts don’t get lost or ignored. It is great for allowing further refinement by humans or by machine learning and AI down the road.
We know that the nature and content of languages in the world changes over time, much more quickly than the nature and number of entities (described at length in the previous article). Keying Google’s index off of a singular list of entities, in this case, based in English, makes surfacing content on the ever-growing web faster than it would be if entities had to be coded into the hierarchy of all languages individually. This is perhaps why in John Mueller’s recent AMA, John clearly said that Google wants to get away from having language and country-specific search algorithms. According to John, “For the most part, we try not to have separate algorithms per country or language. It doesn’t scale if we have to do that. It makes much more sense to spend a bit more time on making something that works across the whole of the web. That doesn’t mean that you don’t see local differences, but often that’s just a reflection of the local content which we see.”
MarketFinder Tool is an Entity Classification Engine
In discussing Entity-First Indexing and the process by which Google may have approached it, we think it is useful to look at the tools that they have released recently, incase they can give us insights into what Google’s tech teams have been focusing on. The assumption here is that Google often seems to release cut-down versions of internal tools and technologies, once they are ready to start helping marketers take advantage of the new options that Google has been focusing on in the background. The best example here is the Page Speed Insights tool, that came out after the PageSpeedy server utility became available and the internal Google Page Speed Team had been working on helping speed up Chrome, and helping webmasters speed up their own web pages for a couple years.
In the past couple months, along with the many other translation and language-oriented new releases, Google has launched the MarketFinder and promoted it to their advertising and AdWords clients (Big thanks to Bill Hunt, one of the most notable PPC experts in the industry, for pointing this out to me!) In this tool, you can input a URL and it will quickly will tell you what advertising categories it believes are most appropriate for the URL, as you can see below in the www.Chewy.com example; from there, it will tell you what markets and languages show the most potential for marketing and advertising success in these topics, depending on if you sell products on the site. From there it gives you detailed information about each of the markets where it suggests you should advertise, including a country profile, economic profile, search and advertising information, online profile, purchase behavior and logistics for the country.
What is important to understand about the tool is that it is not telling you the value of the keyword but the value of the keyword concept – or the entity based on the automatic categorization of the site. The keyword and its related concepts, translated in to all the relevant languages, in all the countries where people might be searching for this topic or related topics. It is ALMOST like Google published a lite version of their ‘Entity Classification Engine’ and made available for PPC marketers to help them find the best markets for their advertising efforts – regardless of language, currency and other ideas that are often tied to countries, currencies and languages, but are less tied to entities.
The other thing that is interesting about the tool, which could be a coincidence, or could be related to Mobile-First Indexing and Entity classification, is that it does not allow you to evaluate pages – only domains – but it evaluates domains very quickly. It is almost as if it is pulling the classification of each domain from an existing entity database – like Google already has all of the domains classified by what entities they are most closely related to. This part is still unclear, but interesting from an SEO perspective. If it is telling us exactly how a domain has been classified, we can verify that we agree with the classification, or potentially do things to try to alter the classification in future crawls.
Cloud Native Language API Tool
The next somewhat newly released tool, and what many of the newest translation technology has been based on is the Google Cloud Natural API, which uses natural language technologies to help reveal the meaning of texts and how Google breaks it down into different linguistic structures to understand it. According to Google, the API uses the same Machine Learning technology that Google relies on for Google Search and Google Assistant. When you visit the API documentation, you can interact with the API directly, even without a project integration, by dropping text into the text box half way down the page. The first things that it does is to classify the submitted text based on it’s understanding of it, as entities! The tab is even called the ‘entities’ tab in the tool. (Those who doubt the importance of entities, probably also don’t realize how hard it must have been to develop this technology for all languages around the world – The level of commitment to developing and honing a tool like this is quite impressive!)
As you can see in the example below, with text taken from the MobileMoxie home page, our Toolset is somewhat correctly identified as a consumer good, though it might be better described as a ‘SaaS marketing service.’ A lot of keywords that the Cloud Natural Language API should be able to identify are identified as ‘other’ which might mean that it needs more context. It is also interesting that many of the words in the submission are totally dropped out and not evaluated at all. This probably means that these words are not impacting our entity classification at all, or at least not very much – because they did not add significant uniqueness or clarification to the text. What is interesting here, is that many of these words are classic marketing terminology, so it is possible that they are only being ignored BECAUSE something in the text has been identified as a Consumer Product.
For SEO’s, this tool might be a great way to evaluate new page copy, before it goes live, to determine how it might impact the evaluation and entity classification of a domain. If it turns out that a domain has been mis-classified, this tool might be the best option for quick guidance about how to change on-page text for a more accurate entity classification.
NOTE: Changing the capitalization on ‘MobileMoxie Toolset’ did change that classification from ‘Consumer Product’ to ‘Other’ but that did not change the number of words in the sentence that were evaluated, nor did removing the mention of the Toolset from the sentence all together.
Beyond just entity classification, another way the API reveals meaning is by determining Salience and Sentiment scores for an entity. According to Google, “Salience shows the importance or centrality of an entity to the entire document text.” In this tool, sentiment can probably only be evaluated based on what is submitted in the text box, using a score from 0 to 1, with zero representing low salience and 1 representing high salience, but in any real algorithm, we are guessing that salience is measured as a relationship with multiple metrics including the relationship to the page, to the entire domain and possibly to the larger entity as a whole, if there is one.
Sentiment isn’t defined, but it is generally agreed to be the positivity or negativity associated with a particular concept and in this, Google provides a score from -1.0 which is very negative, to 1.0, which is very positive. The magnitude of this score is described as the strength of the sentiment (probably in the context of the page or potentially on a more granular sentence level,) regardless of the score.
The next part of the tool is a separate Sentiment Analysis section which is a bit hard to understand because it has new numbers and scoring, different from what was used numbers in the Entities section of the tool. There are three sets of Sentiment and Magnitude scores. They are not labeled, so it is not entirely clear why there are three or what each of the three scores is associated with. Since only one of the Entities warranted a score of anything but 0, it is hard to know where the scores of .3 to .9 are coming from here, but a legend explains that 1- to -0.25 is red, presumably bad, -0.25 [sic] to 0.25 is yellow, presumably neutral, and 0.25 [sic] to 1 is green, presumably positive. Since this is different from the scoring used for Sentiment on the Entities tab, it is a bit hard to tell. It seems that Google offers more details about Sentiment Analysis Values in separate documentation but until the feedback from this tool is more clear it will probably not be too useful for SEO.
The next tab in this tool is very interesting – it is the Syntax evaluation. It basically breaks the sentences down, and shows how it understands each piece of it as a part of language. Using this in conjunction with the information on the Entity tab will allow you to understand how Google believes searchers are able to interact with Entities on your site.
After that is the shortest, but in my mind, most important information – the Categories. This takes whatever you have put into the tool and assigns it a Category tab, essentially telling you what part of Google’s Knowledge Graph the information that you submitted would be classified as. A full list of the categories that you can be classified as can be found here: https://cloud.google.com/natural-language/docs/categories
Two Parts of an Entity Classification Engine
While the value of these two tools to marketers might be hard to understand, their value and what they represent to Google is huge. We believe that these two tools together make up parts of what made it possible for Google to switch from the old method of indexing to the Entity-First Indexing. They are basically both Entity Classification Engines that use the same core, internationally translated entity hierarchy to either show how language and entity classification is done, in the case of the natural language API or show the financial results of entity classification for a businesses marketing plan in international markets, in the case of the market finder. It is basically the upstream and downstream impacts of entity classification!
How Marketers Can Start Getting Value from the Tools
The value of these new Google tools for digital marketers is still evolving but here are some steps SEOs can take to start better understanding and using them for thinking about entities in the context of their SEO efforts:
- Make sure Google is categorizing your domain content correctly. Use the toolset to make sure that Google is classifying the most important pages on your site, like your homepage, as expected, since inaccurate classification could negatively impact your SEO. Google will struggle to display your page in the search results to the right people at the right time if Google has an incorrect and/or incomplete understanding of the page’s content. The MarketFinder tool can be used to determine how Google might be evaluating the domain as a whole, and the Cloud Natural Language API can be used to evaluate content on a page by page or tab by tab basis. If Google is classifying your site in an unexpected way, investigate which keywords on the page might be contributing to this misclassification.
- Read Google’s Natural Language API documentation about Sentiment Analysis. As described earlier in this article, the Sentiment section in the Natural Language API is not labeled clearly, so it will likely be challenging for most SEOs to use it in its current form. Google has separate documentation with more details about Sentiment Analysis that is worth checking out because it offers a bit more context, but more clarity from Google about Sentiment would be ideal. We’ll be keeping an eye open for documentation updates from Google that may help fill in the gaps.
- Learn as much as you can about “Entities” in the context of search. Entities can be a tough concept to understand, but we recommend keeping it top-of-mind. As Google moves into a new era that is focused much more on voice and cross-device interaction, entities will grow in importance, and it will be challenging to get the full value out of the Google tools without that foundational knowledge. Here are some great resources that will help you build that knowledge: the previous article in this series about “Entity-First Indexing,” this excellent article by Dave Davies about one of Google’s patents on entity relationships, this great patent breakdown by Bill Slawski, and Google’s official documentation about Analyzing Entities using the Google Natural Language API.
- Understand alternate theories about Mobile-First Indexing. MobileMoxie recently published a four-part series investigating various changes in search results and other aspects of the Google ecosystem that seem related to the switch to Mobile-First Indexing, but have not been elucidated by Google. Most SEO’s and Google representatives are focusing on tactical changes and evaluations that need to be done on a website, but it is also important to not lose sight of the larger picture, and what Google’s larger, long term goals are, to understand how these changes fit into that mix. This will help you relate entities, entity search and entity indexing to your company’s larger strategy more readily.
Essential Entity Elements – Critical Requirements for Correct Classification of an Entity
Over time, Google will find new things that need to be classified as entities, or old things will need to be re-classified as different kinds of entities. SEO’s will 1) need to know what type of entity that they would want to be classified as, and then 2) need to know what are the critical requirements that Google needs to find to classify something as a specific type of entity.
To do this, SEO’s will need to determine what Google considers to be the essential elements for similar entities that are correctly classified and ranking well in their top relevant searches. The various types of entities that Google recognizes and highlights in Knowledge Graph panels will have unifying elements that will change from entity to entity but will be the same for similar groups of entities or types of content. For instance, local businesses have had the same requirements for a long time, generally abbreviated as NAP – Name, Address and Phone number. This could be built out to include a logo and an image of the business. In other cases, like for a movie, most movie Knowledge Graph entries have a name, cast list, run time, age rating, release date, promo art and a video trailer. If your business is not classified as a particular kind of entity, and would like to be, then this will be and important step to take.
In the long-run, this model could be difficult for publishers and companies that are not original content creators, but this is probably by design. Websites that use an ‘aggregation and monetization’ model, or that survive primarily on ad revenue will struggle more; this is Google’s model, and they don’t appreciate the competition, and also, it hurts their user’s experience when they search! Google wants to raise the bar for quality content and limit the impact that low quality contributors have on the search ecosystem. By focusing more on entities, they also focus more on original, authoritative content, so this is easily a net-positive result for them. In the short term, it could even decrease the amount of urgency around Google’s effort to provide more safety and security for searchers, and minimizing the negative impact of ads, popups, malware and other nefarious online risks.
While many SEO’s, designers and developers will see moves in this direction as a huge threat, small business owners and users will probably see it as a huge benefit. Perhaps it will make the barrier to entry on the web high enough that nefarious actors will look elsewhere for spam and easy-money opportunities, and the web will become a more reliable, high-quality experience, on all devices. We can only hope. In the meantime, don’t get caught up on old SEO techniques and miss what is at the top of all of your actual search results – Knowledge Graph and entities.
This is the second article in a five part series about entities and language, and their relationship to the change to Mobile-First Indexing – what we are calling Entity-First Indexing. This article focused on the tools that Google used to classify the web, and reindex everything in an entity hierarchy. The next three articles will focus on our international research, and how the various translation APIs impact search results and Entity Understanding around the world, and how personal settings impact Google’s Entity Understanding and search results on an individual basis.
The Entity & Language Series: Entity-First Indexing with Mobile-First Crawling (1 of 5)
By: Cindy Krum
Mobile-First Indexing has been getting a lot of attention recently, but in my mind, most of it misses the point. Talking about Mobile-First Indexing only in terms of the different user-agent seems like a gross oversimplification. It is very unlikely that Google would need more than two years just to change the user-agent and viewport of the crawler – They have had both a desktop and mobile crawler since 2013 (or earlier if you count the WAP crawler), and Google has changed the user-agent and view-port of the primary crawler before, multiple times with minimal fanfare. Sure, Google is now using a different crawler for finding content to index, but my best SEO instincts say that Mobile-First Indexing is about much more than the different primary user-agent.
From what I can see, Google’s change to Mobile-First Indexing is much more about an entity classification and translation than it is about a different user-agent and viewport size for the bot. I believe this so much that I have started calling Mobile-First Indexing ‘Entity-First Indexing’. It is much more accurate and descriptive of the challenges and changes that SEO’s are about to face with Mobile-First/Entity-First Indexing. This article will focus on what the change to Entity-First Indexing means, plain-sight signals that ‘Entity-First Indexing’ is already underway, and how the change will impact SEO in the future.
This is the first in an article series that will dive much deeper into how Google understands languages and entities, how they use them in indexing and algorithms and why that is important for SEO. It will review what entities are and how they interact with language and keywords. Then it will speculate on how organizing their index based on entities might benefit Google, how they might have accomplished it during the switch to Mobile-First Indexing and how device context might be used in the future to help surface the right content within an entity. It wraps up with a discussion of what can go wrong with indexing based on entities, and what Google has said on the topic of Mobile-First Indexing.
The next article in this series will focus on the tools that Google used to break down the languages of the web and classify all the sites into entities, and then subsequent articles will focus on research that we completed that show how entity indexing works in different linguistic contexts, based on the different Google APIs that are used, and how those impact Google’s Entity Understanding. Finally, the last article in the series will focus on how individual phone settings and search conditions like GPS location can impact query results, even when the query does not have a local intent, like a query for a local business might.
- Entity Understanding & Understanding Entities
- The Relationship Between Entities, Languages & Keywords
- What Does It Mean to Index on Entities & Why Would Google Do it?
- Just a Guess – How I Imagine Entity Indexing Works
- Does the Crawler Render Now or Later?
- What Can Go Wrong When You Index on Entities?
- Is Now Really the Time for Entities?
- Context is King
- Discussion with Google
Entity Understanding & Understanding Entities
Historically, Google’s reliance on links and keywords as the primary means of surfacing content in a search has eschewed the idea that the world had some larger order, hierarchy or organizing principle than language, but it does — it has entities! Entities are ideas or concepts that are universal and exist outside of language. As Dave Davies describes, in an excellent article about one of Google’s patents on entity relationships, “an entity is not simply a person, place or thing but also its characteristics. These characteristics are connected by relationships. If you read a [Google] patent, the entities are referred to as ‘nodes,’ and the relationships as ‘edges.’”
With that in mind, Entity Understanding is a process by which Google strives to understand and organize the relationships between different ‘nodes’ and ‘edges’ – or more readily, different thoughts, concepts, ideas and things and their modifying descriptors. They organize them into a hierarchy of relationships that is roughly what we all know as the Google Knowledge Graph. It is somewhat related to Semantic Understanding, but Semantic Understanding is based on language, and this is one step preceding the language, to be more conceptual, and universal; it is language agnostic.
Entities can be described by keywords, but can also be described by pictures, sounds, smells, feelings and concepts; (Think about the sound of a train station – it brings up a somewhat universal concept for anyone who might hear it, without needing a keyword.) A unified index that is based on entity concepts, eliminates the need for Google to sort through the immense morass of changing languages and keywords in all the languages in the world; instead, they can align their index based on these unifying concepts (entities), and then stem out from there in different languages as necessary.
The value of entities can be a bit hard to understand, but from the perspective of efficiency in search, the concept can’t be overstated. The internet has altered the way many of us think about knowledge, to make it seem like knowledge might be infinite, imperceivable and unending, but from the pragmatic and practical perspective of a search engine, this is not exactly true. While the potential of knowledge MAY be infinite, the number of ideas that we can describe, or that are regularly searched or discussed is somewhat limited. In fact, it used to fit in an encyclopedia, or at least into a library. For many years in history, libraries indexed all of the knowledge that they had available, and most carried more information than any one human could peruse in a lifetime. It is with this limitation that we must approach trying to understand ‘entities’ from the perspective of a search engine.
From a search engine perspective, it is important to understand that domains can be entities, but often have larger entities like ‘brands’ above them in an entity hierarchy. Indexing based on entities is what will allow Google to group all of a brand’s international websites as one entity, and switch in the appropriate one for the searcher, based on their individual country and language, as John Mueller describes in his recent Reddit AMA:
“You don’t need rel-alternate-hreflang. However, it can be really useful on international websites, especially where you have multiple countries with the same language. It doesn’t change rankings, but helps to get the “right” URL swapped in for the user. If it’s just a matter of multiple languages, we can often guess the language of the query and the better-fitting pages within your site. Eg, if you search for “blue shoes” we’ll take the English page (it’s pretty obvious), and for “blaue schuhe” we can get the German one. However, if someone searches for your brand, then the language of the query isn’t quite clear. Similarly, if you have pages in the same language for different countries, then hreflang can help us there.”
Notice how he talks about the brand as a whole, despite the fact that there might be different brand ccTLD domains or urls in the hreflang. Before Entity-First Indexing, the right international version of the website would have been more determined by algorithmic factors including links, because the websites were not grouped together under the brand and evaluated together as an entity. This concept is illustrated below in the first inverted pyramid. Historically, getting the correct ccTLD version of a site to rank in different countries was a constant struggle, (even with Search Console settings to help,) that this will hopefully solve.
For more topical queries, that are less focused on a brand, the entity relationships may be more loose, and include top resources on the topic, like blogs, books and accredited experts. These groupings could focus on domains, but depending on the strength of engagement with other content, such as popular podcast on a niche topic, the domain could be less prominently displayed or expressed in the entity ranking, illustrated below.
The Relationship Between Entities, Languages & Keywords
Remember, when SEO and search all about keywords, it is a language-specific task. Entities are different because they are universal concepts that keywords in any language can only describe. This means that entity-based search is more efficient, because the search engine can query more content faster (all languages at once), to find the best information. The algorithm can cut through the noise and nuance of language, spelling and keywords, and use entities and context to surface the appropriate type of response for the specific query. Though entities are language-agnostic, language is critical for informing Google’s Entity Understanding. It is this process that probably made the transition to Mobile-First Indexing so slow; the entire web had to be classified and re-indexed as entities, which is no-small task.
|NOTE: While many SEO’s agree that the hreflang protocol was established to help train Google’s machine learning algorithms to build and refine their translation API’s, we believe it was ALSO used, more holistically, to develop its Entity and Contextual understanding of the web, because it allowed Google to quickly compare the same textual content, in the same context, across many languages all at once.
(Did anyone wonder why so many of the questions that John Mueller responded to in the Reddit AMA were about hreflang? Probably, because it is so important for Google’s ability to index domains based on entities, then switch the correct version of the content in based on language and location signals. Together with Schema, hreflang tagging is like Google’s Rosetta Stone for the internet and Entity Understanding. This is also why Mobile-First Indexing was rolled out globally, instead of US-first, one country at a time, like the last major change to indexing, Caffeine was. It is by design that Entity-First Indexing can’t be rolled out one country at a time.)
If you think about it, language is fluid; it is changing every day, as new slang is added and new words come in and out of vogue. This is even seen in the nuances of pronunciation and spelling, and it happens not only in English but in every language. It even happens in subversive ways, with images and icons, (as any teen who has sent dirty text messages with a standard set of emojis can tell you.) But rapid changes to language can also be empowering and political, as you can see in the tweet below, about the #MeToo movement in China, which has been suppressed by certain groups in mainstream communication.
Google does care about communication, and has actually enabled even more emoji to work in Chrome recently, potentially to help enable empowering political movements, but also simply because their focus on PWAs means that more and more chat and communication apps will be leveraging browser code for core functionality. This shift to enable emojis could also hint at the potential that Google is concerned about trying to index, as many chat apps and social networks transition to crawlable PWAs, instead of having content locked away, much harder to crawl and index in native apps; the level of public communication in crawler-accessible browsers could grow exponentially.
What Does It Mean to Index on Entities & Why Would Google Do it?
To be clear, entity understanding has existed at Google for a long time, but it has not been core to indexing; It has been a modifier. I believe that the shift to Mobile-First Indexing is a reorganization of the index – based on entity understanding; roughly, a shift from organizing the index based on the Link Graph to organizing it based on the Knowledge Graph. Continuing to organize and surface content based on the Link Graph is just not scalable for Google’s long-term understanding of information and the web and it is definitely questionable in terms of the development of AI and multi-dimensional search responses that go beyond the browser.
For years Google has been trying to distance themselves from the false economy that they created, based on the relative value of links from one page to another, but they have not been able to do it because it was core to the system – it was part of how content was discovered, prioritized and organized. As Dave Davies says, “The idea that we can push our rankings forward through entity associations, and not just links, is incredibly powerful and versatile. Links have tried to serve this function and have done a great job, but there are a LOT of advantages for Google to move toward the entity model for weighting as well as a variety of other internal needs.” While neither Dave nor I are recommending you abandon linking as a strategy, we all know that it is something Google has been actively advocating for years.
Constantly crawling and indexing content based on something as easy to manipulate as the Link Graph and as fluid as language is hard, resource intensive, and inefficient for Google; And it would only grow more inefficient over time, as the amount of information on the web continues to grow. It is also limiting in terms of machine learning and artificial intelligence, because it allows the country and language-specific algorithms to evolve separately, which John Mueller specifically said in his Reddit AMA that they don’t want to do. Separate algorithms would limit potential growth of Google’s AI and ensure that larger, more populous country and language combinations remained much more advanced, while other smaller groups continued to lag and be ripe for abuse by spammers. Finally, most crucially for Google’s long term goals, Google would not be able to benefit from the multiplier effect that ‘aggregation of ALL the information’ could have for the volume of machine learning and artificial intelligence training data that could be processed by their systems, if only they could they could get around the problem of language … And this is why entities are so powerful!
Just a Guess – How I Imagine Entity Indexing Works
With all that in mind, here is my vision of how Mobile-First Indexing works, or will eventually work, with entity indexing. Possible problems you may have experienced related to the new indexing process (which may have started around March 7th) are noted in parentheses next to the proposed step that I believe may be causing the problem:
- Content is crawled for Mobile-First Indexing (Most of the content has already been crawled and re-indexed. You have been notified in Search Console, but Mobile-First Indexing probably began at least 3 months before the notification so that the New Search Console could begin building up the data and comparing it to old Search Console data to validate it before the notification was sent.)
- The User-Agent is: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- The Aspect Ratio is: 410x730px (or probably 411x731px), with an emulated DPR (devicePixelResolution) of 2.625
Please Note – This may be variable and will probably change as new phones come out. This is why building in Responsive Design is the safest bet.
- Entire domains are re-indexed in the new Mobile-First Indexing process. This happened one domain at a time, rather than one page at a time. The bot only follows links that are accessible to it as a smartphone, so only content and code that is present for the mobile user-agent is re-indexed with the Mobile-First Indexing process, (whatever that may be). It may still be crawled periodically by the desktop crawler. This detail has not been made clear in the Google communication.
- Old web content and desktop-only content that that was historically in the index but can’t be found by the mobile crawler but will remain in Google’s index, but will not receive any value that is associated with being evaluated in the Mobile-First Indexing process.
- In addition to being evaluated for the content and potentially for mobile rendering, domains are evaluated for entity understanding using on-page text, metadata, Schema and other signals.
- The domain itself is considered an entity. It has relationships to other domain and non-domain entities. Pages on the domain are indexed along with the domain-entity, rather than the larger entity concept. (Entity Clustering, Re-indexing XML Sitemaps)
- Links from the Link Graph are aggregated attributed to all alternate versions of the page equally, including mobile and desktop versions of the page as well as hreflang translated versions of the page. The same is true of Schema markup – especially if it is on the English or X-default version of the page. Google still uses local information as a ranking signal, but the signals may change in relationship to the larger entity.
- Links continue to impact rankings, though they are less critical for indexing. The current Link Graph is probably noted so that the impact of aggregation can be rolled out slowly, over time in the algorithmic rankings. We can assume that links will remain part of the algorithm for a long time, or potentially even forever, until Google has vetted and tested many other suitable replacements. The most likely replacement will probably be some type of Real User Metric (RUM) similar to what we are seeing Google do with Page Speed as Tom Anthony brilliantly describes, but this may be some time off.
- Pages (URLs on the domain) become interchangeable entities of the domain, which can be switched in as necessary depending on the search language, query, context and potentially, the physical location of the searcher. International versions of a page now share most or all ranking signals. (Weird international results)
- Google’s understanding of structural content like subdomains and their association with specific countries as well as XML sitemap locations may be reset, and may need to be re-established in New Search Console. (Previously properly indexed XML sitemap files re-appearing in SERPs)
- Google’s newly organized index is based on entity hierarchy grouped roughly according to the Knowledge Graph, instead of however it has been organized historically, (we assume based somehow on the Link Graph.) This provides an efficiency benefit for Google but is not intended to *directly* impact rankings – Indexing and ranking are two different parts of the process for a search engine. It may however, make it easier for Google to de-emphasize links as part of the algorithm in the future. Remember, Knowledge Graph entities, Topic Carousels, App Packs, Map Packs, Utilities and other elements that so often surface at the top of SERPs now, do so without any links at all.The indexing establishes associations through an entity’s relationships to other entity concepts, and these associations can be loose or strong. These relationships are informed by their relative placement in the Knowledge Graph (proximity), but also probably fed by the historical information from the Link Graph. (FYI: Danny from Google specified that it is not a new index or a different index; It is the same index. We are simply speculating that this one index has been reorganized.)
- The entity hierarchy includes both domain and non-domain entities. Google will use their machine learning to inform, build-out and fine-tune the entity understanding of all entities over time
- Non-domained Entities: Entities without a domain, like ideas, concepts or things in the Knowledge Graph are given a Google URL that represents their location in the index (or Knowledge Graph).
- Indexed content like apps, maps, videos, audio and personal content deep linked on a personal phone also fall into this category. (EX: app deep links or system deep links, like ones for contacts – The contacts utility is essentially just an app.) Remember that more and more content that people eagerly consume is not ON websites, even if it is purchased from websites – though this may change with the rise of PWAs.
- These non-domain entities are indexed along with existing websites in the hierarchy.
- Temporary Google URLs are given to non-domain entities. The URL is not necessarily meant to build up traditional ranking signals, but instead, the URL is simply an encoded locator, so that the item can be found in the index. Once un-encoded, a unique ID allows the entity to be related to other content in the index, and surfaced in a search result whenever the top-level entity is the most appropriate result.
- Non-domained Entities: Entities without a domain, like ideas, concepts or things in the Knowledge Graph are given a Google URL that represents their location in the index (or Knowledge Graph).
|Follow-Up Discussion from Conferences Last Year: It seems like the idea that URL’s are optional might be an overstatement. Google still needs URLs to index content, they just don’t have to be unique, optimized, static or on a domain that an SEO optimizes. Google is creating Dynamic Link URLs for loads of types of content – especially when the content might qualify as an entity, and just putting it on different Google short links. If you have certain kinds of content that you want indexed but it doesn’t have a URL, Google will essentially just give it one. Examples include locations such as businesses, but also locations that don’t have specific addresses like cities, countries and regions. They are also giving URLs to Google Actions/Assistant Apps, and information that appears to be indexed as part of an instant app, such as movies in the Google Play Movies & TV app. Types of music, bands, musicians, musical instruments, actors, painters, cartoon characters – really anything that might have an entry in a an incredibly comprehensive encyclopedia is getting a Google Short link.|
- Examples of Non-domain Entities:
Sample Google Entities
Sports Team Athletes Public Figures News & Politics Soccer
Hobbies Ideas Lifestyle Food Geographic Feature Painting
TV Movies Actors Musicians Music Video Archer
- Examples of Non-domain Entities:
- Domain Entities: These are simply websites, which have historically been Google’s crawling and indexing focus. They are entities that already have their own domains, and don’t need to be given temporary URLs from Google.
- Entities can be parts of other entities, so just because a website is a domain entity on its own, that does not preclude it from being a part of a larger concept, like the Florence & the Machine official website URL which is included as part of the official Google entity.
- Larger entities like ‘brands’ may be related to domains but sit above the domains in the entity hierarchy. International brands could have many domains, and so the international brand is an entity, and the domains that are a part of it are also entities. Similarly, there could be concepts that are entities, that are smaller than domains, lower in the hierarchy.
- Domain Entities: These are simply websites, which have historically been Google’s crawling and indexing focus. They are entities that already have their own domains, and don’t need to be given temporary URLs from Google.
5. Search rankings and entity relationships will be fed, reinforced or put up for re-evaluation using automated machine learning processes that are based on the user-behavior and engagement with the SERPs over time, especially when Google perceives a potential gap in their understanding.
- At launch, the big entity concepts will be strong for head-term searches, but the long-tail results will be weaker and Google can fall back on traditional web SERPs and the content that has yet to be migrated to Mobile-First Indexing whenever they want. Google will use machine learning and AI to localize and improve more niche results. (Weird long-tail results, Unrecognized entities)
- In the short term, newly perceived relationships will only lead to a temporary change in rankings, but in the long term, with enough signals, sustained changes in entity relationships could trigger a re-crawl of the domain so that the content can be re-evaluated by the Mobile-First Indexing process for additional Entity Understanding and categorization.
6. New types of assets that can rank will be indexed based on entity understanding, rather than the presence or absence of a website.
[Note from the author: I am not a systems architect, database manager, sys-admin or even a developer. I am just a lowly SEO trying to make sense of what the smart people at Google do. Please forgive me for any poorly worded technical descriptions or missteps, and let me know if you have corrections or alternate theories. I would love to hear them!]
Does the Crawler Render Now or Later?
The other major change that might be part of the Mobile-First Indexing process is that indexing and ranking now seem less tightly tied to rendering. This is surprising, since Google has historically focused so much on mobile UX as a dimension of feedback to webmasters. But feedback has also always been in the context of Google’s PageSpeed Insights tool, which as Tom Anthony describes, is now fed by Real User Metrics (RUM) rather than data that it synthesizes during an on-demand page render, as the tool previously did.
Most SEO’s have been focused on how the change to Mobile-First Indexing will impact crawling of their content, which is important because it happens before indexing. Whatever is not crawled, is not indexed, or at least that is how it worked before. But if the Mobile-First Indexing process has changed something about when and how the bot renders the page, this could be substantial. Is it possible that once Google knows about a domain, it is just waiting on RUM rendering data to be collected and compiled from real-user rendering sources for some of the data?
This is all still very unclear, but some SEO’s have reported that content that was previously penalized because of interstitials is now ranking again, which was previously not allowed. John Mueller also recently specified that Google could index CSS grid layouts even though Google’s rendering engine, Chrome 41 does not support them. This does not seem to be a one-off thing either – Where Google used to be limited to indexing what it could render without changing tabs, now Google says it can index everything on all tabs, as long no on-click events are required to fetch content from the server. In potentially related news, John also says that parameters no longer hinder URL rankings or need to be managed in Search Console – something that Google has been saying for awhile, but so far, has never really been 100% true, but in a recent Google Hangout, it was explained that they are now just considered signals for crawling, rather than rules; it is possible that they signal Google to use a different type of rendering engine, after the content is indexed – this is something that we would love for John to expand on in future discussions.
What Can Go Wrong When You Index on Entities?
As noted above, many SEO’s have noticed weird anomalies in the SERPs since the major update in March. Many of these anomalies seem much more related to indexing rather than ranking – Changes in how an entire query or domain is perceived, strong domain-clustering, changes to AMP, Answer Boxes and Knowledge Graph inclusions, changes in schema inclusions and problems with local and international content and sitemaps. My supposition here is some content, signals and indexing instructions may have been lost during the Entity-First Indexing process, but there are other things that can go wrong too.
From what we can tell, Google is still doing a great job responding to head-term queries, surfacing Knowledge Graph Entities like they have been for awhile. The problems only seem to come in for long-tail queries, where most SEOs focus. This is where Google’s Entity Understanding may be more vague or the relationships between different entities may be more complex.
The switch to Entity-First Indexing will certainly create instances where Google misunderstands context or makes assumptions that are wrong about when and where something is relevant to a searcher. Hopefully, this all gets sorted out quickly or rolled back until it is fixed. The fact that Google has announced that they will stop showing Google Instant results, where they used to include the keyword level entity disambiguation, may be a sign that they are worried it would expose too much of the inner workings of the system, at least in the short term. But they do still appear to include simple definitions and occasionally a link to a Wikipedia result in the instant results now, but that is it for now. It is interesting though that the old style of Google Instant results do still appear to be supported in the Google Assistant App, as shown below, but this could be temporary:
It is important to understand that Google’s Entity Understanding appears to be keyed off of the English definitions of words in most cases, so this means that there will be instances when the English concept of something is broken compared to the rest of the world’s concept of the same thing, like with pharmacies, as described in Article 4. Other examples might be the US reversal of the sports games ‘soccer’ and ‘football’ or disambiguation of the word ‘cricket’ where it is a popular sport instead of just a chirping bug – both quite strong and widely understood concepts that are regionally very different. In these cases, it is hard to know what to do, other than find a way to let Google know that they have made a mistake.
Is Now Really the Time for Entities?
The biggest and most jarring change that has happened since the March update, was when temporarily Google replaced the normal response to queries about the time, with a single-entry answer, as shown below on the right.
This type of result only lasted a few days, and you can see why in the image below – Google was over-classifying too many queries as ‘time queries’ and this was causing problems; A query for a brand of scotch was being misunderstood as a time query.Google tried to perceive the intent of the query, but failed miserably, possibly because there were not enough entities included in the Knowledge Graph or Google’s index, possibly because they were not taking enough context into account or most likely, a bit of both. This will be a big risk in the early days of Entity-First Indexing. For brands, missing classification or mis-classification is the biggest risk. I have been told that Time Magazine and the New York Times experienced similar problems during this test.
Context is King
With all this change, it is important to remember that Google’s mission is not only limited to surfacing information that is available on a user-facing domain. Google does not consider itself a utility that’s only job is to surface website content, and you shouldn’t either! Surfacing content on the web and surfacing websites are different. Google’s goal is to surface the most useful information to the searcher, and sometimes that will depend on the context that they are searching in. Google wants to serve their users, and the best information for their users may be a video, a song, a TV show, a deep link in an app, a web utility, or an answer from the Knowledge Graph.
Context allows Google to disambiguate multiple versions of a single entity, to know which one is the most relevant to the user at the time of their search. To better understand a complex entity and its indexing and how that might work, let’s look at the example of Monty Python. Among other things, Monty Python is in fact a domain, but it also the name of a comedy group, the name of a series comedy skits and compilations on video, a YouTube Channel, and part of the name of multiple albums of recorded comedy. When someone searches for the keyword ‘Monty Python’ how could Google know which one of those things they are looking for? They really couldn’t unless they knew more about the context of the search. If the user is searching on a computer, they could want any of those things, but if they are searching in a car or on a Google Home device, or something else without a screen, they are most likely looking for something with just audio – not videos. If they are searching on a TV, they are more likely looking for video. If they are searching on a computer or a phone, there is a chance they are looking for information, but if they are searching on a TV, the likelihood that they want to read information is low-they probably want to just watch a video.
Contextual signals are particularly important for delivering a great experience to mobile users. Google has been open about this, such as in this “Think With Google” article published in 2017 about advertising to mobile users, where Google says, “When we reach someone on mobile…there are loads of context signals, like time and location…To really break through with mobile users, it’s important to take full advantage of their context.”
When we index based on only keywords – keywords like ‘watch’ ‘listen’ ‘video’ ‘audio’ ‘play’ ‘clip’ ‘episode’ are necessary. When you index based on entity, the understanding of the search query is more natural, based on context. With context instead of additional keywords, queries become more simple, basic and natural. Indexing on entities allows Google to surface the right content based not only on the keyword query but also the context of the device that they start the search from, like a TV, a Google Home, a phone, a web-enabled car system or something else! We get closer to natural language.
The problem that SEO’s have is that we have focused on the broadest and context-free devices first – our computers. This makes it hard to conceive of how strong a signal context could be in determining which part of an entity is most relevant in a particular search but start to think about all the new devices that are getting Google Assistant and how Google will work on those devices.
Someone searching on a Google Home or Android Auto might not be able to use a website at all. They will be much more interested in audio. Someone searching on a TV is also probably more interested in videos and apps than they are in websites. SEO’s who limit their understanding of their job to optimizing website experiences will limit their success. Just because Google crawls and indexes the web, does not mean that they are limited to websites, and SEO’s should not be either.
Discussion with Google
This change to the time queries has since been rolled back, but when it happened, I tweeted that this was a clear indication of Mobile-First Indexing. Danny Sullivan, a long-time friend, search personality, SEO expert, and Google’s Search Liaison explained that it had nothing to do with Mobile-First Indexing, which I found confusing. I realize now that my tweet didn’t convey my more robust belief that Mobile-First Indexing is all about Entity Understanding, but we can suffice to say that Google officially conceives of these two concepts as separate. Perhaps they are two separate projects, but I find it impossible to believe that they are totally unrelated. To me, it seems self-evident that the goal of any change towards Mobile-First [anything], especially if it was meant to support voice-search, would improve Entity Understanding. But in his response, Danny seemed to assert that Mobile-First Indexing has absolutely nothing to do with Entity Understanding.
Danny gave an analogy that I love, about Mobile-First Indexing being like removing old paper books from a library and replacing them with the same thing in an e-book format. This analogy was provided to prove the point that there is only one index, not a separate mobile and a desktop index which Danny emphasized as a very important point. This seems perfectly aligned to illustrate the efficiency of entity-based indexing – I love it! An eBook would not need to keep multiple paper copies of translated versions of the text, but could potentially be translated on the fly – the same way we describe language agnostic entity understanding here and in Article 4 of the Mobile-First Indexing series. It is overwhelmingly disappointing that Google is not willing to talk about this part of the change to Mobile-First Indexing, and that Danny is willing to give the analogy but not willing to discuss the full depth of the explanation at this point.
The only problem is that library analogy is at odds with the explanation that is being given from John Mueller from the Webmaster team, that it is just about a change to the user-agent. If the only thing that changes is the user-agent, how do we get an eBook from the same crawler that previously only gave us paper books? Unfortunately, after the library analogy, the conversation got derailed (as it has before with other Google representatives) to focus on the number of indexes Google was using to organize content. The ‘one index vs. multiple indexes’ point is a point that can be a bit confusing because some Google representatives repeatedly explained and implied that there was an old ‘desktop-oriented’ index (that we have been using historically) and a new ‘Mobile-First’ index that content was migrating too.
There is a lot to be confused about, starting with the change in messaging from when Google was telling us about sites “being moved into the Mobile First Index one domain at a time;” to the “same index, different crawler,” line that is now the official, go-to talking point on this topic, for Google representatives. The position allows Google to say that desktop content will be maintained even if it is not accessible by the mobile crawler, which makes the discussion of the new crawler…almost irrelevant! If desktop content will see no negative effect from the change, why bother making any accommodations for it at all? But ultimately, this ‘one index’ mantra is a nuanced point that really doesn’t matter and I think it is a bit of a red herring. The same index can have different partitions, formatting or organization, virtual partitions or any number of designations that make it function like one or two indexes as necessary. It is also true that one index can exist and simply be re-organized one domain at a time, without duplication. The net result for users and SEO’s does not change.
Google has made a big investment in voice search and Google Assistant and recently doubled down in AI by promoting two new heads of search as people with extensive backgrounds in machine learning and artificial intelligence. All of these things should be taken as a sign of change in the lives and job descriptions of SEO’s. As more and more devices become web-enabled, and fewer and fewer of the best results for users are websites, the context for search is getting much broader.
New strategies will include adding audio versions of text-only content, adding video and voice-interactive versions of content, and getting all these assets indexed and associated correctly with the main entity. They will also include optimizing non-website entities, like Knowledge Graph relationships to ensure that main entities are correctly correlated with the domain and all of its assets. They will include monitoring the translation and entity understanding, to make sure that all the interactions are happening correctly for users around the world, and they will include monitoring feedback like reviews, which Google will be using more and more to automate the sorting and filtering of content for voice navigation. They will also no-doubt include technical recommendations like use of Schema and JSON-LD to mark up content, transition to Responsive Design and AMP only design, transition to PWA and PWAMP.
This has been the first of a five part article series on Google’s new Entity-First Indexing (what everyone else calls Mobile-First Indexing) and how it is related to language. Future articles in this series will provide deeper information about the relationship between language, location and entity understanding, and how these things can impact search results. The next article in the series will focus on the tools that Google has made available to marketers, that we think offer a good view into how their language and entity understanding works and the following three articles will walk through specific research we have done in this area to validate and explain our theories. This will include one article about the Google language APIs, one about how language impacts Entity Understanding and one about how personalization impacts Google’s Entity Understanding, and changes search results for individual users. The final article in the series will focus on how individual phone language settings and physical location can change search results for individuals, making it even harder for SEO’s to predict what a search result will look like, and how their content will rank in different search scenarios.