What the Heck are Fraggles?

What the Heck are Fraggles in SEO?

If you have seen me speak or tweet in recent months, you may have heard me mention #Fraggles. If you grew up in the late 1970’s or early 1980’s, you will probably remember the Jim Henson TV show called Fraggle Rock; It was a puppet show, in the same vein as Sesame Street; The main characters were fun-loving, musical creatures called Fraggles, that lived harmoniously underground in Fraggle Rock. Now, when I reference #Fraggles, I am referring to  – an SEO asset that I named after the creatures in Fraggle Rock.

In SEO, Fraggles are a combination of ‘Fragments’ and ‘Handles’ that rank in Google search results. When clicked, Fraggles scroll directly to a specific section of an article, forum or webpage. What is important about this, when it happens, is that multiple answer fragments on a page are being indexed and ranked separately in Google, as part of a larger result – usually in a carousel of potential answers, like this one, shown below.

I tend to believe that when a page starts ranking with Fraggles, it indicates that Google has indexed the various Fraggles into their Knowledge Graph as authoritative information that they could potentially use as an Answer or Featured Snippet. So far, they are almost always in the format of a quick fact or an answer to a question. My best guess is that Fraggles allow Google to build out the Knowledge Graph with small pieces of information from a page, without requiring them to deal with the rest of the content on the page that they don’t need, or that they need to different parts to the Knowledge Graph. In this way, Google appears to be fitting the web into the Knowledge Graph, rather than the other way around; we have described this larger process as Entity-First Indexing. I believe that top performing Fraggles will become Answers and Featured Snippets that can rank visually or be spoken in a voice search result or even translated, as part of the Knowledge Graph.

Fraggles in SEO - Fragment + Handle = Fraggle

The behavior of a Fraggle is different from the behavior on a regular link because of the scrolling, as shown below from StackOverflow. Each of the answers from the carousel is linking to the same page, without a # included in the URL of the link – the URLs passed to the address bar for each of the Fraggles in the carousel are always the same: https://stackoverflow.com/questions/20080577/cant-access-cloudfront-and-fastly-files-web-sites-not-loading. This is not to say that JumpLinks can’t be passed by Fraggles but just to say that they are not required.

Fraggles in Action - Fraggles Scroll Directly to the Content, With or Without JumplLinks in the Code

We first started seeing Fraggles in May of 2017 and have been talking about them and researching them ever since. Fraggles that scroll without a jump links passed in the address bar do exist, but they can sometimes be hard to find. Sometimes when you find Fraggles without JumpLinks, the scrolling behavior isn’t working as expected – Likely a Google bug, or something related to Google’s testing. Some Fraggles from StackOverflow do pass handles in the address bar, but this example (query for ‘fastly css problem’) does not.

If you prefer further evidence that Fraggles without handles can exist, Google just announced the ability for users to share links to a specific word in a web pages, illustrated below – The process is described below, and this is the exact behavior that we have been describing for Fraggles – only Google creates the link for Fraggles, instead of a user. (Perhaps making and sharing targetText= links will augment Google’s link economy!) Ultimately though, for now, JumpLinks do seem to be the easiest way to get Google to start ranking Fraggles.

Fraggles - Similar to New Google Announcement about targetText= Links

Fraggles occur in more than just traditional web copy, and this is important. Below, you can see examples of video Fraggles – where Google is ranking not just a video, but a video that is specifically time-constrained by Google, to highlight the exact part of the video that answers the question that was submitted:

Video Fraggles - Time Constrained Video Results that Answer Specific Questions

We have yet to see Fraggles of audio content in search results, but expect to see them soon, since Google has so publically doubled-down driving podcast content, and also on parsing and auto-captioning audio content – likely to make it more easily searchable. We are seeing more and more instances where Google appears to be lifting content from the web and presenting it directly in the SERP. as an Answer or part of the Knowledge Graph; this is probably Google’s most likely use-case for Fraggles.

More Details About Fraggles

Fraggles are a big deal – the scrolling behavior may imply that Google is now indexing pieces of a page separately, with different weighting and signals for the various pieces, then saving the unique location of the content on the page, so that it can be surfaced easily from a search result. This makes a lot of sense if you think about all the things that Google really wants to do a better job of indexing and ranking: Progressive Web Apps (PWA), Single Page Web Apps (SPAs), Native Apps, Databases, Podcasts & Videos. These assets, along with any website that uses complex JavaScript or AJAX, will benefit greatly from this kind of indexing, because there is a much greater risk that the content doesn’t have unique URLs for content; Instead, they often have few URLs with lots of content that may be about many topics.

AMP Featured Snippets Scroll Directly to the Content - Just Like Fraggles

Previously, Google struggled to index and rank content like this simply because the different assets lacked separate URLs. Now, with Fraggles, the assets that Google appears to be indexing are similar to what we would call a ‘fragment’ in a native app. Since there are not URLs or pages in native apps, App Indexing works with URIs and fragments. In this context, fragments are a thin slice of the app experience, like a page; but since apps are more of a continuous, fluid experience, it is hard to call it a page, or know when any given ‘page’ begins and ends. The fragment is the piece of visible content on the page, at that specific point in the experience, and this is the kind of ‘fragment’ that I think about when I am looking at Fraggles.

The predecessor to Fraggles is not JumpLinks, it is Answers and Featured Snippets. They are similar, because they are short bits of text that Google finds on the page, like ‘fragments’ that Google caches and gives special treatment. The difference between a Answer and a Fraggle is primarily just the size and display of the text, and Google’s ability to navigate directly to the content once the link has been clicked. You can see that this may soon changes with a new Fraggle-like process for AMP Featured Snippets, illustrated below. The process provides scrolling and highlighting for Featured Snippet content on AMP pages, to allow the user to see the Featured Snippet in context on the page – Both Fraggles & AMP Featured Snippets now scroll directly to the content location of the fragment when the link from the SERP is clicked.

In the long-run, it seems like Google may be taking the effort to add invisible ‘Jump Links’ to everyone’s websites, SPAs, PWA, AMP pages or PWAMP’s (or even possibly SPA-PWAMPs 😛 ). This will make it easier for them to surface and read the parts of your website that are relevant to users, when answering their questions out loud in a voice search. I tend to believe that Google will soon be pushing SEO’s to use something like the Gutenberg framework from WordPress, to help facilitate crawling and indexing of website content. This will help them identify and catalog the innate relationships of the pieces of content on a site to each other, much like internal and external links used to be used.


Things that Look Like Fraggles:

There are two kinds of results that are very similar to Fraggles, but don’t fit the definition perfectly because they either don’t include the scrolling behavior or because they only include a jump-link without a full answer fragment. These are Answer Carousels and Jump Links, are both described further below. While these could eventually become Fraggles, by Google adding scrolling behavior, or lifting the answer-fragment, they are not 100% Fraggle.

Answer Carousels/Carousel Featured Snippet:

Rob Bucci from Stat wrote about Answer Carousels. Answer Carousels are important, because they linked to OTHER ANSWERS in the Knowledge Graph, rather than other sections of a page. An example is included below, for the query, ‘how to reheat pizza,’ but this one shows the result over time, possibly showing how Google has fine-tuned the result – likely based on engagement and clicks. Last month, ‘taco’ was included as one of the options in the Answer Carousel as a topic that was related to ‘how to reheat pizza’, but this month, it is no longer included. Instead, we see two presentations of the Answer Carousels – one with rounded buttons, and one with expansion boxes, very similar to a ‘People Also Ask’ box.

Answer or Featured Snippet Carousels Link from One Knowledge Graph Entry to Another

An additional example is included below. In this example, as in the one above, it appears possible Google is using the rounded buttons when the intent of the query is more broad or vague, but expansion boxes when it is more clear or specific; this is just an observation though – it could just be a random UX test.

Answer Carousels May Turn to Expansion Boxes when the Query Intent is Clear

JumpLinks/ Bookmarks/ Anchor-Links:

Google has long struggled with ‘bookmarks’/‘named anchors’/ ‘jump-links’ as an aspect of web indexing. In the early days, Google indexed URLs that included a # as an indication of a named anchor as separate pages. Then, they decided that this was inefficient because many bookmarks on the same page were ranking separately for the same query, so they stopped indexing to any information in a URL after the #. In fact, before the canonical tag was launched, some SEO’s included a # in their URLs before any tracking code or affiliate codes, to ensure that the codes would not get indexed.

As Richard Baxter points out in his article about How to get Fraggles, this changed again in 2009 when Google began allowing for the indexing of bookmarks and named anchors as ‘Jump Links’ but these have never played that prominently into the SERP until now.  A good example of a desktop SERP with a lot of JumpLinks is included below:

Fraggles and JumpLinks are not the same thing. JumpLinks seem to help generate Fraggles, but Fraggles can occur without them.

The formatting of JumpLinks has changed a lot recently, so it is probably a good ideal to give a deeper explanation here, for what is and is not a JumpLink. When horizontal list of rounded buttons occur outside of an Answer in position-zero, they can be Site Links, Related Page Links or JumpLinks to different portions of a single page. Diagrams below show how similar these different kinds of results can be – with nearly the same styling, but significantly different in terms of the links, where they point and how they are triggered. This makes them hard to classify, but they are only Fraggles if they are pulling in a fragment of content, and scrolling directly to the content on the page with a handle included in the code, or applied by Google when the link is clicked.

Site Links Can be Presented Vertically or Horizontally, but these are not JumpLinks or Fraggles

Related Page Links to Different Pages vs Jump Links all to One Page

While Fraggles often include JumpLinks and adding JumpLinks seems to be a good way to Trigger Fraggle Indexing, they are still different. JumpLinks do not lift a formatted answer from the page – and thus, are likely not being considered for a promotion to a Featured Snippet or Answer from Google.


Conclusion:

Google’s recent release of the Indexing API, along with their inclusion of DataSet Markup are both indicators of Google’s intent to rank and display information or content in a SERP without relying on URLs. (This is something that I have been talking about for more than two years now!) Now, Google appears to be indexing multiple pieces of content on a page separately, caching the most important information as a Fraggle, so that it can be shown in a carousel of other Fraggles, or ultimately, possibly so that it can be promoted quickly to an Answer in position-zero.

From our perspective, SEO has become about optimizing all of these types of assets. If the company that you are optimizing for only has a website, then SEO’s are left to optimizing for Answers, or striving in other ways for incorporation to the Knowledge Graph. Now Fraggle optimization can also be included as part of the strategy.

In the short term, it seems like the best way to get Fraggles is to optimize for Featured Snippets, but also include JumpLinks and H2’s or H3’s in the code.  It might also be useful to consistently code the optimized elements with the same heading tags, and DIV structure, and have a great Schema implementation. In cases where these elements are built with JavaScript including AJAX, the information may only be indexed after the 2nd pass of the crawler, for JavaScript rendering. These are the strategies that we think will help, but for these to work, the page also needs to already have strong signals for authority, ranking reasonably well for the queries that you are targeting for Fraggles.

In the long run, we also think that the best way to optimize for Fraggles is by using Speakable Schema which is meant to help Google with voice search results. This especially includes marking up FAQ, Q&A and How To content, since these formats are most easily parsed and passed into a useful voice search result. Beyond this, it will probably help to add jump links to the different, speakable aspects of a page.

Answers and Rich Snippets are already impacting SEO in very significant ways. The recent research from Stat shows that the occurrence of Featured Snippets is up 6x. Beyond that, the JumpTap data written up by Rand Fishkin indicates that 63% of mobile users never click on a result, presumably because they get exactly what they need from the search result, and 34% of desktop searches have no clicks. All of these statistics represent trends that we expect to continue, and potentially pick up, as Google increases its focus on voice search and the Google Assistant.

The mobile statistics are especially important, as mobile searches continue to grow beyond 50% of all searches; but that growth is slowing and desktop searches are essentially flat. Google will likely use the Knowledge Graph, Fraggles and other Hosted Inclusions to keep people on Google properties, to ensure that they they can continue monetizing the behavior as much as possible. With that in mind, it is becoming more and more critical to do mobile SERP testing, with real phones other than your own, that pass real location and user-agent data to Google. You have to know what is really ranking and generating engagement in the SERP – and you can’t just blindly keep doing SEO the same way you always have.

Beyond the change to traditional, mobile SERPs, Google expects dramatic increases in voice search, but so far, we have only seen Google respond with information from Google utilities, media and feeds, like the Calculator, the Weather & Music; or if they are specifically requested, Actions directed by an Assistant App. When those things are not available, Google responds with content that has been scraped from websites, including Answers or information from the Knowledge Graph, and I think Fraggles will play an important part here!

Webinar: Mobile-First Indexing: What Got You There, Won’t Get You Here

At MobileMoxie, we believe that Google’s Launch of Mobile-First Indexing enabled them to change the organization of their index, which in turn, is changing the look and feel of most search results – especially on mobile. This presentation focuses on the very important data that is not Representative in most SEO analytics, but that can help explain why traffic on certain keywords might drop, even if nothing has changed in your site or your SEO strategy. It is not always an algorithm shift, per se, but is often because Google is testing new ‘Position-Zero’ and ‘Hosted Inclusion’ style results, or because Google may have gotten more aggressive with the PPC results for your industry. This presentation focuses on how things like App Packs, Map Packs, People Also Ask, Answers, Knowledge Graph and other types of Google-hosted results are often taking clicks out of your analytics accounts, making them less measurable, even if they could be a good thing for your brand or company bottom line.

There are video and audio versions of the presentation that you can access below. Soon, you will also be able to access this webinar as a podcast – part of our MobileMoxie podcast series, M4:MobileMoxie Marketing Musings – SEO & ASO Webinars. We are working on this now, and you should be able to find it in both iTunes or GooglePlay soon. Thanks to everyone who attended this presentation live in Denver and Philadelphia! Please note that we have updated the promo code to work for people who did not attend the conference. Instead of the promo code in the presentation, use the one in the image below. It is ‘WEBINAR001’ – The ‘001’ in the promo code is numeric – ‘zero, zero, one’.

 

We have included images of some of the slides below, to help you get a sense for what is in this presentation. If you scroll over the images, each one has a caption, to help you understand what is going on.

MobileMoxie Mobile SEO Tools Promo Code Mobile SEO Analytics Data is Incomplete - Except for GSC, it is Only Telling you about Traffic that Clicks. Ignores Hosted Inclusions & Position Zero Results in the SERP. Sometimes SEO Data Makes No Sense! Rankings Improve, but Total Clicks Drop Down to Zero. WHY! MobileMoxie SERP Test Allows You to Upload a CSV of Addresses to test with, so you can see real search results & from all over the world. & improve your Mobile SEO Strategy Accordingly
The MobileMoxie Service Area Tool Shows Real Mobile Search Results Tested at different Intervals of distance away from the center. Real, localized Mobile SERPs Allow Us to tailor SEO Strategy based on Location - Focusing on Where Zero-Click Results are Most Common or where Competition is Strongest. Google is trying so hard to build-out the Knowledge Graph that they have things that don't exist - Chili's does not have a Bloomin Onion, they have the Awesome Blossom. Also, look at the other Chili's menu items in the related content at the bottom. Menus are apparently part of the Knowledge Graph now. Mobile Search Results change from place to place, and also from iOS to Android. Knowing when Ads, App Packs & Map Packs are in a SERP helps you know why traffic may be down.

 

The Entity & Language Series: Query Language, Phone Language & Physical Location (5 of 5)

NOTE: Please use these links to catch up on the previous posts in the series: Article 1Article 2 / Article 3Article 4 / Article 5

By: Denica Nyagolova & Cindy Krum

In the past few months, we are seeing more and more evidence that Google’s Mobile-First Indexing is not just a change of the primary crawler, but a major shift in Google’s strategy for organizing information and processing queries. The relationship between languages and entities in Mobile-First Indexing, or as we call it at MobileMoxie, ‘Entity-First Indexing,’ can not be overstated.

Language can be more important for some topics or search queries than others. For example, language now plays a minimal role in Image Search results because Google is translating the title tags of all the relevant images, and presumably ranking the images based on those translations, and the other normal signals, rather than basing the rankings on the un-translated text; as discussed in Article 4 of this series, the intent of an Image search is visual, so the language of the textual content surrounding the image is inconsequential. This is the essence of the relationship between languages and entities in SEO. Removing language as a primary signal for understanding a query and ranking the results.

Our research indicates that Google is trying to build-out its understanding and proficiency in ranking non-English queries by fine-tuning its Entity Understanding of the web, and this is core to the change. This new strategy of categorizing and re-indexing information is based on a Knowledge Graph-centered index that uses information from mobile devices to help Google serve the best results for the individual user. With this research, we aim to provide a deep understanding of Google’s new method for determining the best individualized search results using engagement patterns, physical location and phone specifications, rather than simply relying on information from the browser.

While traditional organic (blue-link) rankings have been in flux recently, Mobile-First Indexing focuses much more on making it possible for Google to rank more rich content in multiple position-zero entries, above the traditional organic (blue-link) results. Google is trying harder to surface answers to the queries being submitted directly in the SERP, rather than just returning a list of websites, and this strategy fits well with Google’s major focus on the Assistant and voice-only/eyes-free search. Since Google hosts the Knowledge Graph and many of the other assets that are shown in position-zero, they can translate them as needed, based on their understanding of user-intent. This organization of search results is the simplest illustration of the relationship between entities and language in the future of SEO, especially when the settings that impact language are explored in detail.

This is the last article in MobileMoxie’s five-part Entity & Language Series. The first four articles discussed what Entity-First Indexing is, Google’s strategy for re-indexing content in Mobile-First Indexing, how Google’s language APIs work in search and what linguistic factors impact Google’s algorithmic Query Understanding. This article will combine the learnings from all of the previous articles, and add in information about when and how Google is changing search results based on various language signals sent from the phone, search settings and user’s Google accounts; it will outline which signals Google uses to detect and serve the correct language in searches from both Chrome and the Google App. It will then provide a detailed explanation of how translation in Knowledge Graph and other Google-hosted results may play out in other SEO scenarios, especially as Google gets better at indexing and surfacing content that is less reliant on URLs.

Jump To:

Sometimes, it can be tricky for Google to determine what the right language of a query result actually is – especially when users search in languages that do not match their normal search patterns. Historically, Google used the searchers physical location, and the Google ccTLD where they started the search to determine the language of the results. This was especially important before Google launched their own browser, (Chrome) or allowed people to create individual Google accounts. Now that the adoption of Chrome and Google accounts is near ubiquitous, Google sometimes also seems to rely on a users Google account settings, browser settings and search settings to help determine the appropriate language for search results. Since the change to Mobile-First Indexing, Google is relying more and more on the physical (GPS) location of the searcher and the searcher’s language settings on their phone or in their Google account, to direct the search results that are returned. This is a distinct shift towards more personalization in search results that Google has long been striving for.

Previous to the launch of Mobile-First Indexing, Google used different local algorithms for different countries and sometimes even redirected users to the country-specific top level domain names (ccTLD) before a search could be submitted. The local algorithm was determined by the ccTLD, so if a user in the US began a search in google.co.in, then they would have seen search results for India. In countries where multiple languages were spoken, Google would try to recognize the language of the first search query, and then ask if the user wanted to continue searching in that language, or search in a different local language that it suggested.

This changed just before the launch of Mobile-First Indexing. Since October 2017, search results have been determined by the phone geo location. Google is still redirecting to the local ccTLD in some territories, but the localization of the results is determined by the location of the phone, not by the ccTLD extension. Now it appears that in many queries, Google is prioritizing the phone language settings over the physical location of the searcher, the Google ccTLD and even the Google account settings.

Recently Google clarified that localization plays a more significant role in search than personalization does. In a tweet, Google Search Liaison Danny Sullivan pointed out that they “do not personalize search results based on demographic profiles nor create such profiles for use in Google Search”. Danny also recommends comparing personal results and Incognito results to review SERP differences. According to Danny, the main reasons for results to differ are “location, language settings, platform & the dynamic nature of search”. Over all, localization, language and platform (OS and/or app) seems to be the most useful way for Google to make results more relevant, so this is what we set out to test.   

Historically, when Google used keyword matching to conduct a search, the language of the results would almost always match the language of the query. Over time, we have seen keyword matching results skew to begin returning results in the users preferred language, even when it does not match the query. We believe that this is changing due to Google’s expanding ability to access user’s language preference from their mobile devices – especially on Android. Android and iOS signals vary but it seems that both send signals about the language preferences set on the phone. In the grid below, we have presented the different signals for both operating systems. It appears that Google does not count this as ‘personalization,’ but instead as ‘localization.’

  Summary of the Language Signals We Tested:  
  Android  
  iOS    
  – Query Language   yes   yes
  – Location Detecting   yes   yes
  – Phone Language Settings   yes   yes
  – Google Cloud Language Settings   yes   yes
  – Search Settings   yes   yes
  – iCloud Language Settings   no   yes

Google Search in the Browser vs. The Google App

Before going any further, it’s also important to differentiate Google Chrome() and the Google App (). These are two very similar utilities where searches can be submitted to Google, but they are known to give slightly different results. Google Chrome is Google’s internet browser, default on all Android phones, which allows users to change or specify the ccTLD version of Google that they are searching from and also allows savvy users to specify a Search Language from the browser. In Chrome, searchers generally begin on the default Chrome ‘start screen’ or navigate to Google to submit a search. Conversely, the Google App is a native app for the Google Search Engine. It does not allow users to change the ccTLD or the Search Language settings. This app is included on most modern Android phones – accessible by swiping all the way to the left-most screen; It is not included by default on iOS devices, and must be downloaded from the AppStore. The other most relevant difference between the two utilities is that the Google App requires users to be logged into their Google Account, and there is no Incognito mode; So unlike with Chrome, in the Google App, Google can rely on having access to the Google Account language. More language settings from the device are expressed to the app (Chrome or the Google App) when it is downloaded and installed, but this is only true when the user is searching from Chrome or the Google App – not from a browser search in Safari.

Recently, Google search results in Chrome and other browsers are becoming more and more like the experience in the Google App. Google has already specified that Chrome 69 will essentially require users to be logged in at all times. For awhile now, the search bar that is included by default on the main screen of Android devices has led to results in the Google App, rather than the browser. With this in mind, we can speculate that for both iOS and Android, Google intends to rely heavily on the Phone Language Settings or the Google Cloud Account Language Settings and only use the Google Maps Geolocation API and the Native Language API as additional reference points for Entity Understanding in search, but not for determining the language of the results.  To learn more about how these two API’s are already impacting search results, please view the third article in this series, Translation and Language APIs Impact on Search (3 of 5).

As described, there are a variety of signals that Google can use to determine the linguistic intent of a search. These include information and preferences that are saved in the cloud accounts hosted by Google and Apple, and the settings on the phone, which Google has access to whenever one of their apps (Chrome or the Google App) is added to the phone. Our testing indicates that Google can access the following language signals from all modern devices: Query Language, Location Detection, Phone Language Settings, Search Settings and Google Cloud Language Settings; but these have different impacts on results in the iOS and Android versions of Chrome and the Google App. When it comes to iOS, the iCloud Settings can bring an additional level of complexity, which will be discussed later in this article. The following sections will outline how Google detects language settings in both Android and iOS search scenarios, and how the settings impact the search results:

Android

To determine which language settings were the most important to Google in an Android environment, we designed a test where all of the possible language preference settings were different languages; this would help us find out what the strongest signals were. We tested results in both Google Chrome and the Google App. Since, we conducted our testing from the USA, we avoided using English, the default language of the US, in any of these settings. This would guarantee us that if we see English results, the signal comes from a location detection not from other settings.

We started our testing by switching a Pixel 3 phone settings to Spanish, and changing the Google Cloud Account settings associated with the phone to Bulgarian. We also changed the Search Settings in Google Chrome to Japanese. (NOTE: The ability to change the Language in ‘Search Settings’ is available only in Google Chrome – the option to change it is not present in the Google App.) Within Search Settings, users are given the option to change both the ‘Language in Google Products’ and the ‘Language of Search Results’, but the ‘Language in Results’ setting always matches the ‘Language in Google Products’ setting. When you add additional languages to the ‘Language in Search Results’ setting, Google still prioritizes the main (first) language on this list. To ensure the accuracy and quality of the test test, we needed to change the ‘Google Products Language’ to Japanese and remove the additional language (English) from both settings; this would guarantee that the only signals sent by the Search Language settings were Japanese. For more information on how to change the language settings on your Android phone click here.

Even though we changed the ‘Language in Google Products’ in Chrome, the language setting in the Google Account was not impacted – it remained set to Bulgarian, but began returning a note that said, “Some products are not using Bulgarian” with the option to ”CHANGE ALL” (Example images below are in English for clarity).

Language Settings that Impact Mobile Search & SEO

In the grid below you can see the language diversity in the testing. We conducted the test in USA, which sends a signal for the default language, English. We tested a Greek query (Φύλακες του γαλαξία, Guardians of the Galaxy), on a phone with Spanish settings, linked to Google Cloud account in Bulgarian and Chrome Search settings in Japanese.

Android
Phone Setting iCloud Settings Google Cloud Account Search Settings Location Query Language
Spanish NA Bulgarian Japanese USA/English Greek

We did the same test, using the same settings three different ways: in the Google App, Google Chrome logged in and Google Chrome Incognito. The results are outlined in a grid and summarized below:

Language  Settings Google App Chrome Logged In Chrome Incognito
Results & Relevant Language Setting Spanish (Phone Settings) Japanese (Search Settings) USA, English (Default for ccTLD)
Notes As illustrated below, the Google App returns search results in the Phone Language, which led us to believe that Google prioritizes the Phone Language as a top signal. This makes sense, because it is a language that the user selected themselves and likely understand fluently.                                                                                                                                                                                                                                                                                                        As you can see below, when logged into Chrome, the search results were returned in Japanese, the Search Language we used in the Chrome Search Settings. We also did the same test with Search Settings Language not specified, and in this case, the search results matches the Google Cloud Account Language Settings (Bulgarian). This shows that the Google Search Settings in Chrome can override the Google Cloud Account Language Settings. This is the option where Google has the least information about the user, so language assumptions are hard for Google to make. As you can see illustrated below, when logged out, searching in Google Chrome Incognito mode, search results came back in English. This was either based on our physical location in the US or more likely because English is the default language of the ‘.com’ version of Google where the search was submitted. 

NOTE: For Chrome Incognito mode, the SERP included this note: “Tip: Search for English results only. You can specify your search language in Preferences” which links to Search Settings in Google Incognito mode. Apparently, if it is not specifically set, the search language for an Incognito search seems to match the default language of the Google ccTLD, but they appear to be testing different options. In the past few weeks, we have sometimes seen Google Chrome Incognito searches returning results in the Phone Settings Language, the same as the Google App.

iOS

In iOS, iCloud language settings add another level of complexity so we needed to add an additional language to the test. The language diversity in the iOS testing was the same as the Android testing, however we specified German for the iCloud Settings. Again, we conducted the tests while physically located in the USA, using Google.com, so that would explain any English results. Again, we tested the same Greek query (Φύλακες του γαλαξία, Guardians of the Galaxy). Since Chrome is not the default browser on iOS devices, we also tested results in Safari. The language various languages and corresponding phone settings for this test are outlined below:

iPhone
Phone Setting iCloud Settings Google Cloud Account Search Settings Location Query Language
Spanish German Bulgarian Japanese USA/English Greek

For more information on how to change the language settings on your iOS phone click here.

As you can see below, in iOS the language selected in the phone settings had priority over the iCloud Settings and all other settings unless a user had used the browser to alter their Search Settings in Google from the browser.

Language Settings Google App Chrome Logged In Chrome Incognito
Results & Relevant Language Setting Spanish (Phone Settings) Japanese (Search Settings) Spanish (Phone Settings)
Notes Similar to Android, the Google App used the phone language as the primary setting for determining the language of results.                                                                                                                                                                                                            When testing logged-in to the Google account from Chrome and Safari, both results were returned in Japanese, the Search Language we set with Google from the browser. If we did not specify the Search Settings Language, results matched the Phone Settings by default.  Both, Chrome and Safari returned results in Spanish, the phone language.                                                                                                                                                                                                                                                                                                                   

The biggest difference between Android and iOS is in logged-in Chrome.  On an iPhone, if users do not specifically change the Google Search Language Setting (from the link at the bottom of the SERP), Chrome and Safari will match the Phone Language (Spanish). So in iOS the phone settings are a stronger signal than Google Cloud Account, but the phone settings can be overwritten by the Search Language that is set in Google from the browser (Japanese).

When it comes to iOS, we needed to consider an additional testing in Safari, logged in and logged out in Incognito mode. The results matched the Chrome results, as you can see below:

Language Settings Safari Logged In Safari Incognito
Language of the Results Japanese (Search Settings) Spanish (Phone Settings)
Notes When testing logged into the Google account from Safari, both results were returned in Japanese, the Search Language we set with Google from the browser – Just like in Chrome. Similarly, when we did not specify the Search Settings Language, results matched the Phone Settings by default. Both, Chrome and Safari returned results in Spanish, the phone language.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
Result

Summary of Results:

After the testing, we see obvious patterns in the language of the results strategy, which is good, because after testing many times over many months, we also noticed that the impact of some of the settings was in flux; Luckily, there was a pattern there too!  Whichever settings changed in Android was followed, a couple weeks or up to a month later, by the same changes in iOS. The grid below represents the final results for all tests. Overall, in Google App, Google returns results matching the language of the phone, since it is a language the user speaks and chooses when he/she set the phone. In Chrome, Google returns results in the Search Language but when the Search Language is not set, Google uses different signals to determine the best language for the results, depending on if the user is on an Android or iOS phone and if they are searching in Incognito/Logged out or logged-in mode.

Android iOS
In Chrome Search Settings Language
Matching Google Cloud Language if Search Settings are not specified   
Search Settings Language
Matching Phone Language if Search Settings are not specified
In Chrome Incognito Search Settings Language
Matching Location or ccTLD Default Language if not specified  
Search Settings Language
Matching Phone Language if Search Settings are not specified  
In Google App Phone Settings Language Phone Settings Language
In Google App Incognito N/A N/A
In Safari N/A Search Settings Language
Matching Phone Language if Search Settings are not specified
In Safari Incognito N/A Search Settings Language
Matching Phone Language if Search Settings are not specified

It seems that the Google Search Settings are the strongest signal on both iOS and Android devices – likely because this has to be actively set by the searcher. If the searcher is using the Google App or if the language in the browser-based Google Search Settings is left to default, with no language specified, (which we imagine is quite common,) Google uses the Phone Language Settings to determine the right language for iOS results. In the same scenario for Android, the Google Cloud Language setting may be used by the browser, the Phone Language Settings will be used by the Google App or the default language of the Google ccTLD or the physical GPS location will be used when Incognito. Google never uses the iCloud Language Setting, or the Query Language to determine the language of the query results, and they rarely use the Google Cloud Account settings.

Here is the kicker: All of that research is just representative of the Knowledge Graph – not the rest of the results. All the traditional organic results (blue links) below the Knowledge Graph were in Greek, the language of the query. Google was determining the language of the organic results differently, and in a much less sophisticated way,  based on simple keyword matching! As you can imagine, we take this as a strong signal that position-zero style results, especially Knowledge Graph results are critical for Mobile-First Indexing. Google is putting much more effort into getting the language right in these results, compared to traditional organic rankings.

What we found is that position-zero results are translated to optimize for personalized language settings, but regular organic blue-link results are not. You can see this again in a comparison of the book query ‘War and Peace’ below. Both queries are from a phone that is physically located in Bulgaria, with Bulgarian phone settings. Only the Knowledge Graph and Book Carousel are translated to Bulgarian to match the Phone Language Settings. The rest of the results are in English. This is particularly interesting because Bulgarian is not one of the languages of priority, included in the Cloud Natural Language API:

With similar thoughts in mind, we compared the search results of the same query in English between a phone in Bulgaria, searching from the Bulgarian Google ccTLD, Google.bg, with Bulgarian Phone Language Settings and and a phone in the USA, searching from Google.com, with Bulgarian phone language settings. In this one, you can see below that all of the results are in English, with only minor differences between the two ccTLD – likely caused by differences in trademarked brands like Pillsbury being popular and available in the USA but not Bulgaria. But you can see that Phone Language Settings appear to have little or no impact on search results that don’t include Knowledge Graph, regardless of what version of the ccTLD is used to submit the search:

Phone Language Settings appear to have a minor impact on search results that don't include Knowledge Graph.

Entity Understanding and Translation Process

Overall, entities provide Google a better and deeper understanding of topics because they give Google the ability to easily develop connection and relationships between different topics (entities). Deeper understanding of an Entity and its relationships, in turn, gives Google the opportunity to potentially serve information about the Entity in any language (with live translation from the Google language APIs if necessary), since now the language has only a supportive role for the query – like a modifier. Whatever Entity Understanding and Entity Relationships Google learns in one language can automatically be translated to other languages, especially in Google-hosted, position-zero results like the Knowledge Graph.

Google has been working actively on their Cloud Datastore of entities. Each entity has a numerical Entity ID, which is associated with the Knowledge Graph; (remember from the movie Contact, math is the universal language!) The Knowledge Graph Search API lets you find entities in the Google Knowledge Graph by search, or simply by the Entity ID. The API uses standard schema.org types and is compliant with the JSON-LD specification.  It also lets you find relationships of one entity to another, that are understood in the Knowledge Graph. Some of the types of entities found in the Knowledge Graph include: Book, BookSeries, EducationalOrganization, Event, GovernmentOrganization, LocalBusiness, Movie, MovieSeries, MusicAlbum, MusicGroup, MusicRecording, Organization, Periodical, Person, Place, SportsTeam, TVEpisode, TVSeries, VideoGame, VideoGameSeries and even WebSites.

We believe that now, differentiating ‘the content that Google hosts’ from ‘the content it does not’ is becoming critical for SEO; It is an indication of what content Google can translate and serve in a SERP in different languages.As shown in the example below, each Entity is associated with an Entity ID, so that it can be located and understood in many languages; Google will return results associated with this Knowledge Graph and may translate the language of the Knowledge Graph results according to the searcher needs.Google Using Numeric Entity IDs for Knowledge Graph - Using Math as a Universal Language - Entity ID's Allow One set of Knowledge Graph Mapping to work be translated in all languages.Image: Google (modified by MobileMoxieWith all that, it is critical to make it as easy as possible for Google to recognize and access business entities and their relationships. This might make it possible for Google to lift and translate extra content into its position-zero results, which could be very strategic, especially when it comes to international content. Currently, the Google Translation API uses  Neural Machine Translation (NMT) to understand queries, but relies on English as the ‘hub-language’ for the translation of languages that are not included in the Cloud Natural Language API (an indication that they are already understood with native proficiency), as diagrammed in Article 4 of this series.

The top of the SERP is no longer dominated by websites, so understanding Google’s new SERP structure and position zero results is critical for SEO. With Google’s new Entity based understanding, the language of the entity and content does not matter as much anymore – at least in some languages, and for some queries. Content can be clustered in the index based on the entity understanding, without being omitted because it is in the wrong language. Now, the availability of good Knowledge Graph information is based on the presence or absence of the information in the Knowledge Graph at all, in any language, and that information can be translated to suit the user needs based on the language signals they are sending to Google.  Of course, HREFlang tagging can make this process even easier because it allows Google to index information to the Knowledge Graph based on the English or default version of the content and use the tagging and language APIs to organize the non-English content without relying so heavily on crawling it and understanding each translation independently.This ability to translate on the fly could be expanded to other kinds of informational queries soon. Recently, Google has begun to rank pieces of web page content (on-page content fragments) independently of each other, and linking to them with inserted handles or bookmarks, especially when there are multiple possible answers to a query located on one page. We call these Fraggles – a combination of the word ‘fragment’ and ‘handle’. As shown below, jump-links in search results, scroll the user directly to the relevant content, even though there is no handle or bookmark included for it in the code. Clicking on the jump-link not only opens the page but scrolls to the part of the page where that specific content is located on the page.  

NOTE: It appears that heading tags, Schema and linked content ID’s may be helping drive this type of result. By marking content with heading tags (shown below), we can help Google to locate the unit of content they need and serve it to a user as a Fraggle. We think Google may also be using CSS classes and x-Path locations to create these – especially when those things are also associated with Schema.Fraggles / Jump Links from a SERPThis is a clear illustration Google is changing the way they crawl and index the content on the pages and we speculate that in the long-term, it may be for the benefit of providing quick, voice-answers to queries; and further, Google may begin to translate results in Fraggles, for voice and visual search, showing the results in position-zero like Knowledge Graph. This is also important as more and more PWAs and especially SPAs can begin to rank Fraggle content, even without the content being on different URLs.

Query Ambiguity and Query Language Impact in Search Results 

It is important to understand that the research above was just focused on Knowledge Graph (though we suspect it may soon impact other position-zero results). The language of a query seems to be easier for Google to determine once the idea/concept has been indexed in the Knowledge Graph. Knowledge Graph association may make the intent is easy to detect – especially for movies, personalities, images, etc. We call this ‘direct intent’ because Google has a direct understanding of the query based on a Knowledge Graph entry. But during our testing we also noticed that queries without ‘direct intent,’ when the intent of a search query may be ambiguous or broad, Google returned results differently than it would have if a Knowledge Graph entry was present. With this in mind, we wanted to take our research just a tiny a step further, to find out what elements and settings impact the results for more ambiguous queries.

The research started with a search for ‘Red shoes’, which easily could be a search for a product, image or an old movie. In queries like this, with ‘broad intent,’ the Query Language complicates the search; Google appeared to prioritize different Google Assets, such as Images, Knowledge Graphs and Shopping results, based on the query language more than the other language settings. The example below shows a test in the Google App for a search query ‘Red Shoes’ in Bulgarian, Greek and Spanish. The search was conducted from the USA, on a phone with English settings. As we expected, Google returned the Google Assets in English, the phone language, but what was surprising was that Google returned different Google Assets, depending on the query language, potentially indicating that the Knowledge Graph may associate different assets with different degrees of intent, depending on the Query language. Again, as with the organic results below the Knowledge Graph in our previous tests, the blue links that Google returned below the Google Assets were generally in the language of the query.

Phone with English Settings in the USA | Greek Query ‘κόκκινα παπούτσια’/’Red Shoes’ Phone with English Settings in the USA | Bulgarian Query ‘червени обувки’/’Red Shoes’ Phone with English Settings in USA | Spanish Query ‘zapatos rojos’/’Red Shoes’


To verify our understanding of the process, we repeated a test from a previous article in the series – testing search results in the Google App for various translations of the famous Mexican movie ‘Y Tu Mamá También’. Our goal was to assess the impact of a potential ‘broad intent’ query in search results. For this one, the intent  could be considered ‘direct’ in Spanish, because there is a Mexican movie with this exact title. In other languages, the intent could be considered more ‘broad’: Spanish (‘Y Tu Mamá También’ [original language]), English (‘And Your Mother Too’), Greek (‘Θέλω και τη μαμά σου’), and Bulgarian ‘И Твойта Майка Също’). Thе query intent is most broad in Bulgarian since is also the title of a well-known Bulgarian hip-hop song, that can be found in YouTube. We anticipated that the song title, without translation would out rank the movie, with a translation, but that both would rank. (Since Bulgarian is a language that is not part of the Natural Language  API, inclusion of a disambiguation prompt was not expected.)

As shown below, all of the search results, except the Bulgarian test, are returned in the Phone Language (English) with Knowledge Graph results for the Mexican movie. In the Bulgarian test, the Knowledge Graph result is one about the Bulgarian hip-hop song not the Spanish Movie. While it was a little unexpected, it was not too surprising – the song is very popular, so it makes sense why Google would have indexed the song result for user relevance (most likely, after noticing that more users tend to click on the YouTube video when searching in Bulgarian). This led us to believe that Google also uses the Query Language, not only the location, to determine the results they serve to users. And this is not always a personalization but localization and intent driven by language choice. It seems that even though they may understand queries similarly in a variety of languages, they are probably using machine learning and click-data segmented by query language, to determine what ranks, regardless of the Phone Language or the location of the searcher.

Conclusion:

After our testing, we believe that there are are a variety of elements that can impact the language of a search result, and that these things change between regular organic results and Knowledge Graph or other position-zero Google Assets. This is great for ‘direct intent’ queries, for which Google can easily make an association with something in the Knowledge Graph, but adds complexity for ‘broad intent’ queries for which the intent is less clear. In these cases, Google may vary the Google Assets that are shown in position-zero when the same query is submitted in different languages, but all other settings are held constant.

With all this, we can speculate that Entity Understanding in the Mobile-First Index is meant to help Google learn more, faster, to cross populate information in different languages, but also to help Google get better at understanding search intent in all languages. This is why they would unify the various languages with universal Entity ID’s, but still alter results based on click-data that is specific to a Query Language. This system also makes it easier to ensure that the position-zero results, which could potentially be read as voice results, are in the correct language for the user. Google is constantly working to improve user experience by delivering accurate, relevant information, as quickly as possible, but the growth of voice search makes it even more vital that it the results are also in the correct language. Being able to do this accurately for users where and when they needed is the goal of the Mobile First Index. This goal makes Entity Understanding critical for Google’s fine-tuning of Mobile First Indexing because Entity Understanding is what gives Google the ability to efficiently and accurately process loads of information from around the world, in a sustainable way.

The Entity & Language Series: Translation and Language APIs Impact on Query Understanding & Entity Understanding (4 of 5)

NOTE: Please use these links to catch up on the previous posts in the series: Article 1Article 2 / Article 3Article 4 / Article 5

By: Denica Nyagolova &  Cindy Krum

At MobileMoxie, we believe that Mobile-First Indexing represents a fundamental shift in how Google organizes information and processes queries. Further, we speculate that the shift to Mobile-First Indexing took Google more time than they originally expected, partially because they did not anticipate the level of complexity that this shift would have, when different languages and locations were taken into account, to make the change viable world-wide.

This is the fourth article in a five-part series about the relationship between Entity Understanding and language in search. The first three articles in this series focused on what Entity-First Indexing is, how Google managed to re-index the content of the web this way, and how it makes things like language detection more important in search. This article continues the focuses on practical research that MobileMoxie completed, testing international searches for Entity Understanding to outline the impact of language and localization in the query results. It provides more deep insights about how linguistic factors impact Google’s algorithmic Query Understanding; Specifically, our research in this article focuses on how and when Google’s advanced Cloud Natural Language API is used and what elements of the query may trigger different types of linguistic understanding. The final article in this series will focus on new ways that Google is adapting search results for each user individually, based on the individuals’ language settings and the software they are using to submit their search, (EX: if the search began in the Google App () or Google from a mobile browser – two slightly different experiences).

The findings of our research will be especially important to online businesses who appeal to audiences around the world; For them, knowing what search results actually look like for their potential users’ queries, beyond just basic numeric rankings, can be critical for the business success. This is particularly urgent because incorrect Entity Understanding in Google’s new infrastructure might change or limit what searchers can actually see, especially on mobile, when they are traveling or when they are using voice search. Business owners may have to work harder to manage and optimize online marketing channels and SEO, particularly for non-English markets and languages that represent a statistically small portion of overall Google searches.

Jump To:

Word Order Impact on Query Understanding

The order of the words submitted in a search query has not always changed Google’s search results. It only became a factor when they added Phrase Based keyword matching to their English-based algorithm. Now, when word-order is changed, the meaning  and intent of the query changes, based on the standard role of word-order in English phrases. In a US-based search for ‘red stoplight’ and ‘stoplight red’, both searches generated image results at the top of the main SERP. In English, the word in position ‘two’ usually describes characteristics of the word on position ‘one’ – generally, it is an adjective then a noun.

The example of this shown below easily illustrates the impact that word-order can have on Google’s results.In the query for ‘red stoplight’ in the screenshot on the left, the words are in the standard order – ‘red’ describes ‘stoplight’. When the order is reversed, to ‘stoplight red,’ as it is in the query on the right, it becomes a query for the color ‘red’ that is as bright as a ‘stoplight’; here, ‘stoplight’ becomes the adjective and ‘red’ becomes the noun (a concept).

Provided by the MobileMoxie Search Simulator

The impact of word order of a query is easy to spot in searches in the US, because it is where Entity Understanding and natural language understanding are the most advanced; but word-order impacts the Query Understanding for any language in which Google has an intermediate level of translation capability. 

Language Priority Impact on Translation & Understanding

Next, we wanted to see how Google’s apparent language prioritization impacts their ability to translate, understand and potentially achieve Entity Understanding to generate appropriate Knowledge Graph results. We started by testing the Google Translate API with different languages to see when and how it was more proficient at translation. Since both elements of the Translate API, the Phrase-Based Machine Translation and the Neural Machine Translation, use Machine Learning, they tend to provide more sophisticated translation results for more prolific languages, where they have more ongoing feedback and data to train the system. For languages that have fewer searches and/or a lower priority in Google’s eyes, the Machine Learning part of the translation is less proficient.

The chart below breaks down the relationship of the languages we tested, with The Translate API, The Cloud Natural Language API and Google’s general level of priority in addressing translation in the language. As you can see below, English, Spanish and Hindi are listed as languages of priority, whereas Bulgarian is listed as a low priority language. All tested languages are supported by the Google Translation API and only Spanish and English are included in the Natural Language API. Even though Hindi search results have not always been a strong point for Google, it is listed as a language of priority because of Google ‘Next One Billion Users’ Goal, which we discussed in detail in our precious article: Understanding the Basics of International ASO (1/4).)

Google's Current Language Understanding

 Language of Priority
(Y/N)
Natural Language API
(Y/N)
EnglishYesYes
SpanishYesYes
HindiYesNo
Bulgarian NoNo

Knowing this, we submitted test queries in Spanish, Hindi, and Bulgarian. First, we wanted to find out if Google was using the Phrase-Based Machine Translation (PBMT) & Neural Machine Translation (NMT) models, rather than just translating individual words. Our research focused on testing variations of well-known, language-specific  idioms. Since idioms are culturally specific and their meaning does not directly correlate with any word-by word translation, they would never accurately convey meaning without some level of Query Understanding. Similarly, generally has the ability to achieve a basic Query Understanding for phrases and idioms using the Machine Learning aspects of the Translate API. 

With this in mind, we figured the translations would give us a clear idea of weather or not Google understood the words in the context of each other with understanding or only individually, one at a time. Experience told us that Google would do a better job with this type of advanced Query Understanding in Google’s top priority languages. Since The Google Translate Utility isolates the The Google Translate API from The Cloud Natural Language API (which are both generally incorporated when you submit a query), the variation between something submitted in Google Translate and something submitted to Google as a search query would be telling. Results taken from this isolated instance of The Translate API clarify how the two APIs are used together to generate the Query Understanding or Entity Understanding in a search result.

We tested two idioms: ‘Break a leg’ and ‘Put yourself in my shoes’. These are good idioms to test because they both say something that no one would really mean literally. Without the locally and linguistically understood meaning of the idiom, the literal meaning of the phrase is confusing, useless or even potentially harmful, (especially with the suggestion that someone ‘break a leg!’) In the same vein, when we say ‘put yourself in my shoes,’ we are not actually requesting anyone wear our shoes, so again, the meaning is lost without deeper linguistic understanding.

With the idiom ‘break a leg’ Google gets it wrong in both Spanish and Bulgarian, but gets it right in Hindi. We think that this might be an indication of the level to which Machine Learning and user engagement factors into the Phrase Based Machine Translation (PBMT) and the Neural Machine Translation (NMT) models from The Translate API. While many people in the world speak Spanish, there are more that speak Hindi. The quantity of users could cause more proficient translation and understanding in Hindi to be available more quickly, simply due to the rate at which feedback is returned and processed for Machine Learning. This would be even more true if potentially, the idiom ‘break a leg’ was commonly queried, translated or searched for in Hindi as compared to Spanish. (Unfortunately, the query volume for these phrases is not high enough that Google Trends could measure and compare the data, so we don’t know this to be true; this is just one potential theory that could explain the data.)

The chart below shows how Google did at translating the idioms:


 
Idiom -  Google Translate Results Idiom Meaning in
English - Correct Translation
EnglishIdiom: Break a leg!Meaning: Good Luck!
Spanish Translation

Leg: pierna
Luck: suerte
¡Rompe una pierna!Google got it
WRONG -

We saw a direct translation of about
injury to a leg.
Bulgarian Translation

Leg: крак or krak
Luck: късмет or kŭsmet
Пречупете крака
or
Prechupete kraka
Google got it
WRONG -
We saw a direct translation of
an injury like a threat.
Hindi Translation

Leg: पैर or pair
Luck: भाग्य or bhaagy
भाग्य तुम्हारे साथ हो!
or
bhaagy tumhaare saath ho!
Google got it
RIGHT -
Understanding that we are wishing the
person ‘luck’ and not any kind of injury.

The idiom “Put yourself in his shoes” in Spanish is “Ponte en su lugar,” which literally translates to ‘Put yourself in his place.’ The same phrase in Bulgarian is “Постави се на негово място,” which is also literally translated to ‘Put yourself in his place.’ As shown below, Google Translate translated the idiom from English to Spanish correctly but got it wrong in Bulgarian – what we expected and likely a failing of the quantity of the amount of Machine Learning data in Bulgarian.

To be sure of the validity of the test, we took it a step further and also tested different ownership variations of these idioms, changing the phase, ’Put yourself in MY shoes’ to ‘Put yourself in HIS shoes’. The goal here was to see how deep The Translate API’s mastery of the idiom really went, based on its ability to recognize and correctly interpret both forms (my/his) of the idioms. In this test, the results were the same; Spanish translated correctly but Bulgarian did not, as you can see in the chart below:

 

Idiom -  Google Translate Results Idiom Meaning in English - Correct Translation
Idiom Ownership Variation Variation
Idiom Meaning in English - Correct Translation
English
Idiom: Put yourself in MY SHOES
Meaning: Put yourself in MY SITUATION
Idiom: Put yourself in HIS SHOES
Meaning: >Put yourself in HIS SITUATION
Spanish Translation

Shoes: zapatos
Place: lugar
Ponte en mi lugar
Google got it
RIGHT -
translating ‘shoes’ to ‘place’ or
‘lugar’
Ponte en mi lugar
Google got it RIGHT-
translating ‘shoes’ to ‘place’ or ‘lugar’
Hindi Translation

Shoes: जूते or joote
Place:जगह or jagah
अपने आप को मेरी जगह रखो Google got it
RIGHT -
translating ‘shoes’ to ‘place’
खाद अपने जूते में दाल दिया
Google got it WRONG -
they did NOT translate ‘shoes’ to ‘place’
Bulgarian Translation

Shoes: обувки or obuvki
Place: място or myasto
Постави
се в моите обувки
Google got it
WRONG -
They did NOT translate ‘shoes’ to ‘place’ or
‘situation’
Постави се в неговите обувки
Google got it WRONG -
they did NOT translate ‘shoes’ to ‘place

If you are using Google Translate utility to complete a similar test, be sure to either select ‘English’ or let it default to ‘English-detected.’ Sometimes the tool will translate English words to other languages, even if a different language is selected in the first language ‘from’ field. This is because many languages incorporate English words seamlessly into their vernacular. If the wrong language is selected, you will most likely only be getting the direct dictionary translation of the terms and not the enhanced capabilities required for idioms, probably because there is not enough Machine Learning data for Phrase-Based Machine Translation for the English writing of the words, coming from regions where the non-English languages are spoken. This tells us that Machine Learning for the translation aspect of the API is regionally segmented.  

EX: With Korean selected as the ‘from’ language, Google translated ‘Put yourself in my shoes’ incorrectly as ‘Ponte en mis zapatos,’ misunderstanding the idiom, whereas when English is selected, it understood the meaning of the idiom, and translated it correctly to ‘Ponte en mi lugar,’ (Put yourself in my place).

Provided by the MobileMoxie Search Simulator

For languages with low priority, The Cloud Natural Language API is not used at all, so idioms are rarely successfully translated, and if they are, it is because of Machine Learning, through the PBMT and NMT models which are both highly dependent on the volume of in-language user interaction for the particular idiom. Since The Translate API does not provide meaning about the query on its own, variations on idioms are even more rarely translated successfully. We believe that Google rarely understands idioms without support from the Natural Language API and only understands variations on idioms when it is in play or when query volume is high enough that Phrase-Based Machine Translation and Neural Machine Translation may also come into play.

The grid below outlines the predicted levels of Query Understanding you can expect in different languages (with the different level of API support that Google provides each language), according to the research:


High Priority Language (EX: Spanish/Hindi) Low Priority Language (EX: Bulgarian)
Google Translate API - YES
Natural Language API - YES
Google Translate API - YES
Natural Language API - NO
Google Translate API - YES
Natural Language API - NO
Google Translate API - NO
Natural Language API - NO
Translate Idioms Yes YesNoN/A
Translate Variations of Idioms YesNoNoN/A

Translation & Entity Understanding

Since Google can adapt to changes in word order by recognizing phrases, and can recognize and even translate idioms, we wanted to see when and how Google’s Query Understanding changes over to Entity Understanding, especially when language and translation are in-play. Internationally, the impact of word-order is stronger and more meaningful for languages when The Cloud Natural Language API can be used to evaluate the word-order natively without translation, so we made sure to incorporate this into our testing too. To show the varying impact that word order has on search results and Entity Understanding in non-English searches, we set up the the following test:

  • We picked two languages that had different language API profiles in Google:
    • Spanish – Available in The Cloud Natural Language API and The Translae API – a high priority language
    • Bulgarian – Available only in The Translate API, but not in The Cloud Natural Language API – a low priority language
  • In each language, we picked two queries to test. The two queries had to convey an entirely different meaning and intent when the word-order was changed. They also had to represent a known Entity when written one way, but not the other:
    • Spanish – ‘Amores Perros,’ which translates to ‘Love’s a Bitch’ in English, is a Mexican movie. When the word-order of the movie title is reversed to to be ‘Perros Amores’ it translates to mean ‘sweet/lovable dogs’ because ‘amores’ is the adjective explaining the noun ‘dogs’ or ‘perros’.
    • Bulgarian –  ‘С Деца на Море’ is an old Bulgarian movie, and the title in English translates to ‘With Kids at the Seaside.’ When you change the order of the movie title, it becomes, ‘На Море с Деца’ which translates in English to be ‘At the Seaside with Kids,’ and reflects a search for a generic vacation plan or concept.

We speculated before testing, that Google would detect the meaning of the Spanish queries in all cases, because it is available in The Cloud Natural Language API, but that Google would struggle in Bulgarian, because it is not supported by The Cloud Natural Language API. You can see below that we were wrong. You can see in the image below that Google serves the movie Knowledge Graph in both versions of the Spanish query. This shows that they are either not detecting the different meanings in the query when the word-order is reversed, or potentially Google is unintentionally ‘algorithmically overfitting‘ the query to the entity. The algorithm may assume that any instance of the words together in a query is a reference to the movie; even when reversing the word-order changes the query meaning entirely. 

Provided by the MobileMoxie Search Simulator

It is important to note that this mistake does not happen when the words ‘sweet dogs’ and ‘lovable dogs’ are submitted in English. When submitted in English, the query that triggers a movie query in Spanish, triggers a Knowledge Graph result for the Quite Riot song ‘Love’s a Bitch’ with disambiguation at the top for the Spanish movie entity and the song ‘Love is a Bitch,’ by Two Feet. This confirms that, at least in languages that are supported by The Cloud Natural Language API, that Entity Understanding is either associated with the language or region of the searcher. (Our test searches were submitted from regionally appropriate locations that would naturally be associated with the language of the search – not all submitted from our location in the US. This aspect of testing will grow in importance as Google’s algorithms are able to rapidly evolve, now that Mobile-First Indexing has launched.)

In Bulgarian, Google gets it right, which is a bit unexpected. The Bulgarian movie clearly exists in Google’s Knowledge Graph, so there is Entity Understanding, but the Knowledge Graph result is only triggered when the movie title is written correctly. When the word order is changed, Google does not ‘overfit’ as it did in Spanish, but instead, correctly shows search results that are related to vacation planning – including a high number of paid results. No Knowledge Graph result for the movie.

Provided by the MobileMoxie Search Simulator

This test indicates  – at least in this comparison, that Entity Understanding, or what might more accurately be described as Entity Matching, from The Cloud Natural Language API is some how available in Bulgarian even though Bulgarian is not supported by The Cloud Natural Language API. We believe that this may only be possible because when Google detects that a query is being submitted in a language that is not supported by The Cloud Natural Language API, it translates the query to English BEFORE using the API for Entity Matching.

It seems that if there is no exact match the English translation of a query in the Entity Matching functions of The Cloud Natural Language API, Google simply serves local results for the query without Knowledge Graph entities, as shown on the right. You can see that the results in the second Bulgarian query (about a vacation plan) are only in Bulgarian, and it seem to be based on simple keyword matching. It appears that Entity Understanding only comes into play for foreign queries that are not part of The Cloud Natural Language API if there is an exact match to an existing entity in English. If no entity is available that exactly matches the English translation of the query, the algorithm reverts to keyword matching and phrase-based matching to generate search results.

The Cloud Natural Language API is required for their Entity Understanding, even when, (or especially when) comprehensive linguistic understanding is incomplete. Its combination with the Google Translate API makes it possible for the correct Entity Understanding (Knowledge Graph results) to surface for a query, despite not having full understanding of Natural Language for the query.

Provided by the MobileMoxie Search Simulator

The important finding here is that The Cloud Natural Language API is used in all queries; it is used natively for the languages that it supports, and in these cases, can access some degree of Entity Understanding in a ‘broad keyword-match’ approach but risks overfitting the query; for languages that it does not support, it translates the queries to English, and only uses ‘Exact Entity Matching’ to trigger potential Knowledge Graph results. We have included a diagram below to help visualize the process. In any given search query, we can speculate that languages that are supported by The Cloud Natural Language API would follow the path of the yellow line, whereas the process for languages which are not currently supported by The Cloud Natural Language API would follow the path of the blue line:

Manual Translation Feedback

It is important to understand that Google’s linguistic APIs are also being supplemented with user feedback that is accepted within the Google Translate interface and it is admittedly hard to account for the impact that this process could have in our testing, but it is also interesting to think about how it could be used to benefit SEO. Problems in Entity Understanding and Knowledge Graph search rankings for queries may be caused by a mis-translation of the query back to English, especially for languages that are not supported by The Cloud Natural Language API. The manual translation feedback option may give SEO’s strategies to either generate or suppress Knowledge Graph results – especially for languages not supported by The Cloud Natural Language API.

It may be possible to impact Google’s translations by submitting better English translations of the query to Google using the Google Translate feedback utility (app and web both offer this feature.) Volume may be a factor if you attempt to correct something that you perceive as a mistake. We don’t know how Google processes translation feedback – it could be passed through the normal Machine Learning protocols or it could be managed separately, since it is likely received in lower volume. Regardless, manual translation feedback will probably have a stronger impact where there is lower volume of data in the machine learning. 

Similarly, SEOs working in non-English languages who want to be trigger or be incorporated into a Knowledge Graph may be able to reverse-into better Entity Understanding and stronger Knowledge Graph search results by choosing to optimize for keywords that already translate correctly into the English. SEO’s who strategically want to avoid triggering or be incorporated into a Knowledge Graph result should use The Cloud Natural Language API tool to ensure that they avoid optimizing for keywords that do not translate correctly to the English keywords that trigger the Knowledge Graph results. 

SEO Wants Knowledge Graph Results:Cloud Natural Language API Available:Recommended Strategy:



Yes
YesEnsure keywords are understood natively in The Cloud Natural Language API. If they are not, target keywords that DO correctly trigger the correct Entity Understanding in The Cloud Natural Language API/ trigger Knowledge Graph results in searches.
NoIf in-language keywords DO NOT translate correctly to English words that trigger the appropriate Entity Understanding in The Cloud Natural Language API/ in Knowledge Graph results in searches, submit manual translation suggestions that help update the translation to match the current Entity Understanding in The Cloud Natural Language API. Target and optimize for words that DO trigger the correct Entity Understanding in The Cloud Natural Language API/ Knowledge Graph results in searches, so that your on-page SEO can help drive traffic for those keywords while you are waiting for the manual translation suggestions to take effect.



No
YesTarget and optimize for variations of your keywords are NOT understood natively in The Cloud Natural Language API, and thus won't trigger Entity Understanding in The Cloud Natural Language API/ won't trigger Knowledge Graph results in searches.
NoTarget and optimize for in-language keywords do not translate correctly to English words that trigger the appropriate Entity Understanding or Knowledge Graph results in searches. Submit manual translation suggestions that distance your in-language keywords from the English Entity Understanding in The Cloud Natural Language API/ Knowledge Graph entities in search results.

With this approach, it is important to realize that the status of a language could change. Google is working quickly to build-out The Cloud Natural Language API, incorporating more languages as quickly as they can – presumably because it is so important for surfacing correct Knowledge Graph and entity results. In a Google IO session this year, Scott Huffman, Google’s Vice President of Engineering, for the Google Assistant announced that by the end of the year Google Assistant will be available in 30 languages. This may indicate that they are about to include many more languages in The Cloud Natural Language API so this could be an even more important strategy soon.

Conclusion

When considering expanding your business internationally, knowing your audience and their online behavioral patterns is critical for your success. Google’s main goal, in this new way of processing and serving information, is to serve most relevant results by detecting intent through deep linguistic understanding of the search query.When Entity Understanding and search intent are at the center of Google’s indexing of the web, and language and keyword matching is becoming secondary, it represents a fundamental shift in how content will be surfaced moving forward. We feel strongly that in the future of SEO, Entity Understanding will be critical for the fine-tuning a business’ ability to leverage Google’s Mobile-First Indexing across variety of devices, in countries and languages around the world.

In this article, we discussed how and when Google’s two primary Translation APIs play into their ability to leverage Query Understanding and perceive Entity Understanding. Our research indicates that Google seems to have found a somewhat successful way to work around the complexity of language by supporting their machine learning using these two APIs to help with linguistic detection, translation, semantic meaning and Entity Understanding.

It is critical for companies to actively manage mobile, international SEO strategy, to help align online SEO efforts around the world. Being able to reach users with the right information where and when they need it is the goal of any business and seems to be in the core goal of Mobile-First Indexing. The growing potential for cross-device search makes the process of surfacing information even more complex, but Google’s Entity Understanding can help. Previous articles in this series focused on what Entity Understanding is, how Google may have re-indexed the web based on Entity Understanding, and how Entity-First Indexing makes things like language detection and translation more important. In the final article in this series we will outline more of our research related to the impact that query language patterns, physical location and users language preferences, have on search results and Google’s Entity Understanding.

 

The Entity & Language Series: Translation and Language APIs Impact on Search (3 of 5)

NOTE: Please use these links to catch up on the previous posts in the series: Article 1Article 2 / Article 3Article 4 / Article 5

By: Denica Nyagolova &  Cindy Krum

Google’s Mobile-First Index, or as we call it at MobileMoxie – the Entity-First Index, is changing the way Google organizes information, especially in terms of language. In the past, Google used language as a primary defining element, to determine the correct search results for a query, potentially even to determine which part of the index or which algorithm they should use to process the query. Now, with Entity-First Indexing, we believe that Entity Understanding and search context have become more important for Google’s ranking and evaluation of top results, and with that, the language of the query has become more of a secondary factor, like a strong modifier. With the launch of Mobile-First Indexing, we believe that Google is switching from being organized in the previous methodology (likely based on the Link Graph) to being organized based on Entity Understanding derived from their Knowledge Graph, which is to some degree, language-agnostic. We are already seeing signs of this in search results now.

Google’s increased focus on Entity Understanding and decreased focus on linguistic keyword matching seems to play a huge role on determining the search results that they serve. Google’s recent change away from relying on country-specific ccTLD’s for passing language and country information, and the apparent internationalization of many of their other properties also feeds this theory. The timing of these changes made us think that their handling of language and their Entity Understanding in searches might be tightly related to the shift to Mobile-First Indexing. We also couldn’t help but notice the high level of focus on HREFLang discussion in John Mueller’s Spring AMA on Reddit, when one would expect much more of a focus on more mobile-focused topics that might be related to Mobile-First Indexing, such as PWAs, JavaScript, Deep Linking, App Indexing, Google Actions or Instant Apps. Seeing evidence of this increased focus on language, casually in our own search results motivated us to seek a deeper awareness of how Entity Understanding has begun to impact international search and SEO.

This is the third article in a five-part series about the topic of Entity Understanding and language in search. The first two articles in this series focused on the concept of Entity-First Indexing, how it manages to be language agnostic, and the tools we believe Google used to accomplish the transition from indexing based on the Link Graph to indexing based on the Knowledge Graph. This article and the next one focus on practical research that MobileMoxie completed to illustrate these points; testing international searches to show differences related to Google’s language detection and recognition in Search.  

The final article in this series will provide further research to show new ways that Google is adapting search results for each user individually, based on the language settings on the phone they are searching from and the origin of the search (EX: if the search began in the Google App () or Google from a mobile browser – two slightly different experiences).

Jump To:

Query Matching & Query Understanding

Originally, Google’s search algorithms and technology focused on English, because they were written by people who spoke predominantly English. Now Google’s ability to understand and respond accurately in many languages is critical to their continued success around the world. The ‘search’ part of Google’s algorithm initially focused on keyword matching, just like the other search engines at the time. They crawled and indexed the text of the web, and surfaced sites that included a match of the keyword that was submitted in the query. They sorted the matching web pages based on the frequency of the matching keyword occurrence on the page, and other signals, including inbound links and anchor text – but eventually giving special weight to keyword matching anchor text.

The important thing to know here is that none of this actually represented anything more than ‘query matching’ without any real ‘understanding.’ The difference in these two concepts really comes into play when the basic ‘sentiment’ of the search and the relationship between multiple words in the query is reflected in the results. At first, when early search engines were doing basic ‘query matching,’ they relied heavily on something called ‘boolean’ search modifiers or attributes to explicitly express the sentiment and the relationship between the words in the query. They included ‘AND,’ ‘OR,’ and ‘NOT;’ and were added before the keyword that they described, to allow searchers to string together multiple concepts; so a typical boolean query might look like this: “hotel OR motel AND denver.”

From a very early point with Google, the ‘AND’ was implied and did not need to be written. When options with both keywords connected by the ‘AND’ were not in the index, Google defaulted to the ‘OR’ attribute. ‘NOT’ wasn’t supported, but the results were still overall much better. This evolution by Google might represent the very most primitive level of ‘understanding’ in a query, but nothing like what we are used to today.

At the time, many search engines also relied on keyword popularity, as a ranking factor, so websites with popular keywords naturally ranked better for all queries, even if the topics of the keywords were minimally related. When SEO’s began spamming the web by including the most popular keywords over and over again on their pages, Google developed a capacity to understand synonyms, which broadened the primitive ‘understanding’ a bit more. This ‘synonym matching’ minimized the value of exact-match keyword spamming, and made it more valuable to include a variety of different keywords and their synonyms. When this system also started to be gamed, Google began adding and relying more heavily on other, external quality signals to help determine which pages were the best match for the query. Eventually, over time, Google was able to start using these external signals to further inform their understanding of the relationships – not only between the keywords in the query, but also the relationships between the website and the other similar websites on the web.

It was not until about 2005 when Google began building out the foundations of their Translation API, and they began to understand words in different languages as approximate synonyms. Now Google’s linguistic and translation ability and understanding in search are quite advanced. 

How Voice Search Changes Query Understanding & Helps Get us to Entity Understanding

What many SEO’s seem to be missing is way that voice search changes these relationships, and how keywords work differently now, not only since the Hummingbird update, which focused on semantic and voice-driven queries, but also in the launch of Mobile-First Indexing. Over time, Hummingbird brought with it a lot of changes, especially related to local searches, and Google Map Pack rankings more accurate. Now, when SEO’s see their voice query directly translated to text and submitted to Google as a simple text query, they think nothing has changed, but we believe this assumption is incorrect. While the user-interface has remained constant, the back-end understanding of the query and the keyword relationships has changed.

Voice-queries tend to be longer than written queries, and they tend to include more inconsequential words. This makes it harder for Google to determine the most important part of a query, and return the related results without getting distracted by the extra keywords. This is where a more comprehensive Keyword Understanding becomes critical and Google is forced to rely more heavily on phrase-based matching and machine learning for keywords. The main technology that makes that easier is phrase based indexing, as described by Bill Slawski, as well as ontological based Query Understanding, which he describes perfectly here and here.

The other thing that makes surfacing the right content in long, complex voice searches easier and more likely for Google to get right is Entity Understanding. Entity Understanding is different from Keyword Understanding because it allows Google to identify specific ideas or concepts that might be especially important in a long, complex query. As outlined in the two previous articles in this series, entities are ideas or concepts that are universal, and thus, are language agnostic. Luckily for Google, in a voice query, they often represent the core thrust of the query. Entity-based search results allow Google to provide a somewhat accurate result, that can be further parsed or disambiguated, even when the nuance of the query is not entirely clear.

EX: If you ask Google, “Who was the drummer for The Cranberries in 1999?” it can visually surface the Entity result for The Cranberries (Knowledge Graph), which includes pictures and information about the band members, their instruments and hopefully the years they were in the band. In the same search, when it is submitted as a voice-only query, Google can respond conversationally, to get you to the information that you are trying to find by saying “Ok, I understand that you want to know about the band The Cranberries – is that right? Did you want to know something about the drummer?”

The important thing here, is the change in the background of Google’s functionality, focusing on Entity Understanding first, whenever possible, because it allows for ongoing verification and disambiguation of the query, easily in both a visual and voice-only context. If Google had misunderstood the query, and responded with “Ok, you want to know about cranberries, (the food) – is that right?” that would surface a different entity (Knowledge Graph result) entirely, the searcher would stop Google before too much computational power had been used to match all the keywords in the query, and before Google delved deeper to find an answer that might not exist because they were looking at the wrong entity; (since foods do not have drummers, they may have ended up looking for ‘a drum of cranberries’).

This kind of back-end processing makes the responses quicker and easier for Google to surface, often without feeling like there was a loss of nuance or accuracy, and we believe that this is a crucial part of the changes that came with Mobile-First Indexing.

Mobile-First Indexing also changed how Google approaches international markets, language detection and country relevant searches. Based on our previous understanding of how Google used and understood languages, we anticipated the process diagrammed below, so the testing that we will describe in this article series followed a similar path: 



Google Linguistic Intent Determination ProcessDetect Query LanguageDetermine Query Understanding (when possible)Determine Entity Understanding (when possible)
Corresponding MobileMoxie Testing • Alphabet Testing
• ‘From’ Testing in Google Translate
• Phone Location Testing (Described in the next article in the series)
• Word-Order Testing
• Idiom & Idiom Variation Testing
• International Movie & Song Identification Testing

Google Translation API vs. Google Cloud Natural Language API

Entity Understanding is important for voice search and Mobile-First/Entity-First Indexing, but potentially hard to replicate in different languages. Historically, to achieve proficiency in non-English based searches, Google focused heavily on developing language and translation APIs. More recently, they have focused on using machine learning to fine-tune natural language, conversation and contextual language understanding in top languages around the world for eventual use in many of their new products and technologies. To this end, Google has developed a variety of APIs and other utilities to help them work natively in the most prevalent languages, and translate accurately and comprehensively when they can’t do that. These various technologies have different strengths and weaknesses that can be leveraged together or separately, as needed to make the most of each potential translation scenario. To understand the changes in how language now factors into search and SEO, you must first have a general awareness of the two main APIs that Google uses to process language: The Google Translation API and The Google Cloud Natural Language API:

Google Translate API

The Google Translate API has been around for a long time and is the most widespread and straightforward of the language APIs, so it is the one that most people are familiar with. It powers Google Translate and provides translation between many languages. In the early days, it focused on dictionary-style keyword matching and translation. Now it is much more sophisticated, and uses both state-of-the-art Neural Machine Translation and Phrase-Based Machine Translation. These two technologies allow words to be translated together in groups, or as phases, to provide more cohesive meaning. (Google understands keyword searches as phrases, so it makes sense that they would also want to translate words together as a phrase too.) This translation model was introduced into Google’s Translation API a couple years ago and many competitive translation APIs do not have anything similar.

Before The Google Translate API can begin a query or a translation, it has to detect the language of the query, or the ‘from’ language for the translation. In the past Google was only able to recognize the language of a query when it was written in the official alphabet of the language, or based on the physical location of the searcher, but now The Google Translate API  can identify languages most of the time without this, even if they are not submitted from a place where that language is spoken or in the official alphabet of the language. This can be complicated because in some languages, one word that is spoken exactly the same way can be written in many different ways.

Once the query or the ‘from’ language in a translation has been detected, The Google’s Translation API then attempts to process a translation request using the Neural Machine Translation (NMT) model because this is their most robust translation technology. (You can find a list of all the languages supported by the NMT Model here.) According to Google, the NMT is available for all languages ‘to English,’ with the exception of Belarusian, Kyrgyz, Latin, Myanmar, and Sundanese. Since it uses machine learning, the phrase matching in each language will continue to improve over time, the more the Translate API is used.

If the NMT model does not support translation in the requested language pair, the Translate API falls back to Google’s Phrase-Based Machine Translation (PBMT) model, which can translate phrases proficiently, but tends to be less accurate than the NMT model, presumably because it is less driven by Machine Learning. This model works more like keyword and synonym matching models, but focuses only on multi-word groupings. (You can find a list of all the languages supported by the PBMT Model here.)

Google Cloud Natural Language API

Google uses the Translate API to detect language and find keyword matched, but Google uses The Natural Language API (officially called Google Cloud Natural Language API) to detect meaning. This newer API relies heavily on Machine Learning and AI, but is NOT meant for translation per-se; Instead, when a query is submitted in one of the 10 languages that it supports, it simply determines the meaning of text natively, using context and linguistic structure, without using translation

This API was discussed in the previous article in this series, from the perspective of an SEO tool, because it detects and infers the contextual meaning of words and entities in a sentence or a paragraph. The API can do this because it has a deep understanding of grammar and linguistic structure in the languages that it supports. It does the same thing for individual queries, using its linguistic understanding, sentiment analysis and Entity Understanding along with engagement data from previous searches to achieve understanding natively, in the original language of the query. It is this API that allows Google to detect ‘search intent’ from queries, including detecting when keyword order changes the intent of the query, and thus, should change the search results for that particular query.

Knowing the basic functioning of these two API’s should help you understand how we approached our researched. Based on what we found, an estimated summary of Google’s capacity for language translation and keyword understanding is outlined below. As you can see, Google’s translation capabilities include basic ‘word-by-word’ translation, ‘phase-based’ translation and ‘advanced conversational language understanding’ which incorporates the broader context of the words to achieve clearer meaning. Each of these levels of translation and understanding relies on one or more of the Google’s language APIs, and falls  within a few various options:


Google API

Google Translation API

Cloud Natural Language API

Function

Individual Keyword
Matching

Keyword Phrase Matching

Entity Understanding

Language/Grammar Understanding

Levels Language UnderstandingBasic

Word-by-Word

Intermediate

Phrase Based

Advanced

Natural, Conversational Language

Advanced

Natural, Conversational Language

Google
Utilities
Leveraging
the API

Google Search, Google Translate, Google Play

Google Search, Google Translate

Google Search, Google Home & Google Assistant, Google Actions

Google Search, Google Home & Google Assistant, Google Actions

LanguagesLanguage Detection & Dictionary Translation

Only for Belarusian, Kyrgyz, Latin, Myanmar, Sundanese. No Phrase-Based Machine Translation (PBMT), Neural Machine Translation (NMT)

Top 26 languages in the world.

(Specifically does not include: Belarusian, Kyrgyz, Latin, Myanmar, Sundanese)

Top 10 languages in the world.

Chinese - Simplified and Traditional, English, French, German, Italian, Japanese, Korean, Portuguese, Spanish

Top 10 languages in the world.

Chinese - Simplified and Traditional, English, French, German, Italian, Japanese, Korean, Portuguese, Spanish

Supporting
Technology
/Data

Language Detection, Dictionary Translation

Language Detection, Phrase-Based Machine Translation (PBMT), Neural Machine Translation (NMT)

Entities Database

Engagement Data*, Sentiment Analysis, Sentence Structure, Context Analysis

*Data provided by users feedback

Alphabet Selection & Its Impact on Language Detection

The first part of understanding the meaning of a query is detecting what language it is in, so we wanted to find out when the Translation APIs involvement in this part of a search starts and ends. Historically, since Google used keyword matching to surface web content that corresponded with the keyword in the query, the spelling and how the word was written impacted what results would be included; Language detection was easier and Google could use physical location and the language-specific alphabets to determine the language of a query.  

At the time, Google also used IP detection and redirection to guess what country the searcher was in, and get them to the corresponding, country-specific version of Google where they could submit their query; then, Google actually had language-specific algorithms that were triggered by the Google ccTLD where the query originated. At this time, it was common for international searchers to have a computer keyboard that featured letters in the official written language, so it was reasonable to assume that they would be searching in the official alphabet of the language. Initially, this combination of signals was reasonably accurate, and the alphabet that a searcher used when submitting a query helped validate the classification and refine the results.

Conversely, when phones first got on the internet, and began searching, they had physical, rather than digital keyboards, and native-alphabet keyboards were not always available in every language. Many people adjusted to typing in their language with the Latin alphabet. Optimizing for local searches in Latin alphabet was a good SEO strategy at that time, because it expanded the market for your products – especially for mobile searches. On the other hand, some international businesses strived to localize mobile content with the ‘correct’ native alphabet, and not use the Latin character options for writing, but from an SEO perspective, relying only on native writing, and not including both native and Latin was especially limiting for mobile search.

With all this in mind, we wanted to find out if Google uses the choice of alphabet as part of the search context and if changing the keyboard or characters used to submit a search query would change the search results.

In the grid below we have included simple alphabet variation in two languages – Spanish and Hindi to illustrate this concept. Spanish relies predominantly on the Latin alphabet for written communication so it does not have much variation with specialized alphabets – Occasionally accent marks are included in proper written communication, but no new characters are added. Hindi on the other hand, uses a completely different alphabet called Devanagari, which is necessary for proper written communication. Most speakers have adapted to writing Hindi words with Latin characters in mobile, using phonetic spelling of the words. The two alphabets, Latin and Devanagari, are rarely used together.

Language TestNative AlphabetEnglish WordLocal Translation Variations Content Type
SpanishLatin with accent marks And Your Mother TooY Tu Mamá También
Y Tu Mama Tambien
TV and Movie
HindiDevanagariWill You Marry Me?मुझसे शादी करोगी
Mujhse Shaadi Karogi
Music/TV and Movie

From a search engine and language detection perspective, these two scenarios are in stark contrast to each other; When searchers write in Hindi, using the Devanagari alphabet, they rarely mix in Latin characters, but when Spanish speakers search, they casually include or omit proper accenting above Latin characters.

In the example searches below, in both Spanish and Hindi, you can see this concept at play. Google Instant, Google’s utility for anticipating and suggesting queries as a user types, based on query volume and machine learning, still manages to provide autocomplete suggestions for keywords in all different character variations, interchangeably, including the native alphabet and Latin options, despite the original query being written with only Latin characters. In general, it seems like Google orders the original query, including the alphabet choice, at the top of the Instant suggestions. This is especially true when a related Entity is available, but they do not do it to the exclusion of non-Latin query suggestions.

The language of a query is easy to detect when it is written in an alphabet that is specific to the language, like Devanagari, but harder when the query is written in Latin and other shared alphabets. Accent marks in the Spanish query may help Google determine when a query is in Spanish, but not much because they are only on some letters. The meaning of the word is not substantially changed if they are missing, so searchers include or omit them casually, making them less helpful in language detection.

NOTE: In this test, it is relevant to note that movies and songs are potentially even more variable than many informational queries that are not simply about local businesses. The words used in the media titles often break norms of language and contextual understanding, and thus, would be hard to associate with a proper meaning without some level of Entity Understanding. In traditional writing, titles of movies, books and songs are often off-set with italics or punctuation, but this is not possible in a search query.

With this in mind, you can understand how disambiguation of complex queries and entities becomes more important in voice search – especially with a query like ‘I want to watch American Pie.’ Without Entity Understanding, the query might be processed as a request for a recipe. Now, with Mobile-First Indexing, action words like ‘watch’ and ‘play’ are probably being used to help inform the contextual understanding of the query, and trigger specific media oriented results and Entities, even when media titles lend themselves to confusion. Words like ‘I want’ are probably being dropped out of the query entirely.

Alphabet Selection & Google’s Query Understanding

Things get more complex in language-detection when Latin characters are mixed with additional characters – the Cyrillic and Greek alphabets both do this. The Cyrillic alphabet, for example, contains 30 letters that are sometimes Latin characters and sometimes unique to Cyrillic; so in Bulgarian for example (one of about 13 significant world languages that uses the Cyrillic alphabet), native speakers often use a number OR a 2 or 3-letter combination of Latin characters to replace the Cyrillic characters that are missing from the normal Latin keyboard, rather than hunting for the correct Cyrillic character in extra menus. 

EX: ‘ш’ in Cyrillic can also be written as the numeral ‘6’ or the letter combination ‘sh’; similarly, the Cyrillic character ‘ч’ can be written as the numeral ‘4’ or the letter combination ‘ch’. As you can see in the chart below, analogous query and searching patterns are also used with the Greek language and alphabet.

The number of potential variations in one keyword can be very high, especially considering the different character replacements can be mixed in one word – using a Latin letter combinations, numerals and Cyrillic characters in the same word. Though that scenario might be rare, it means that the number of ways to write on keyword becomes quite high, which initially made Google’s language-detection efforts  in these languages much harder. Over time Google got better at recognizing these spelling and alphabet variations and determining when they represented the same words (probably by using a system similar to ‘stemming’ from the English algorithms).  

To illustrate the concept, top possible variations for how sample keyword phrases can be written using different alphabets are outlined below:

Language TestNative AlphabetEnglish WordLocal Translation Variations Content Type
BulgarianCyrillicLittle Red Riding Hood/(Little Red Hood)Червената Шапчица
Chervenata Shapchica
4ervenata 6ap4ica
Chervenata Shapchitsa
4ervenata 6ap4itsa
Fairy Tale
GreekGreekChicken Soup with VegetablesΚοτόσουπα με λαχανικά
Κοτοσουπα με λαχανικα
Kotósoupa me lachanika
Kotosoupa me laxanika
Recipe

Since Bulgarian and Greek are less prevalent languages, Google spent less effort in detecting and translating these languages. We wanted to check if Google would accurately detect all of these language with all of the potential character combinations. We picked these languages because there is such a difficulty in language detection, due to the large number of possible alphabets and letter substitutes in play. 

As you can see, this is a lot to expect Google to interpret and understand, but they do. These variations seem to be understood as ‘near-exact synonyms’. We can tell this because Google Instant shows us a mix of all variations of the keyword, with different alphabets and letter replacements included in their Instant suggestions, as shown in the example searches below.  The example searches below indicate that Google detects language in Google Search by using the Translation API, even in cases when the language and alphabet of the query do not match the official alphabets of the language. Google seems to prioritise local results based on the choice of query alphabet, but this is probably a product of engagement volume with the particular alphabets, rather than intentional design.

From an SEO perspective, it is very important to realize that even though the different character variations are understood as synonyms in Google Instant, Google returns slightly different search results for each of the variations, depending on how the final query is submitted. Even though the Google Instant autocomplete suggests keywords in all different character variations interchangeably, results appear to correlate with search volume and engagement, but also probably include a small amount of keyword understanding from the Translate API. Also, according the Danny Sullivan, Google’s Search Liaison, these predictions are ‘also related to your location and previous searches’.

NOTE: Computers only pass Google their location using a general IP address, but cell phones can pass the more specific and robust GPS coordinates. It is possible that Google’s machine learning algorithms may have unknowingly been skewed; on one side, towards more localization in the search result for queries that use a less localized alphabet (Latin), and on the other side, towards less localization in the search query results that use the more localized Cyrillic alphabet.

If you are interested in testing mobile search results with specific location, language and alphabet choice, the  MobileMoxie SERP Test works for queries in all languages and locations around the world (down to the post-code level of specificity – not just ‘city’ or ‘country’ specific queries!)  It also provides more than 20 mobile devices comparison pairs. 

Language Detection’s Impact on Entity Understanding  

Next, to see how variations in alphabets impact Google’s Entity Understanding, we tested a query for a Spanish movie ‘Y Tu Mamá También’ which, translated in English to be, ‘And Your Mother Too’. This is a good test because it is also the name of a Bulgarian hip-hop song, written like this in Bulgarian: ‘И Твойта Майка Също’. The query could be referencing two separate entities, one of which is known broadly around the world in Spanish, the other is more country-specific and when it is searched for in Bulgaria, it should be associated with a different entity (the hip-hop song) that is well known locally in Bulgaria, but not well known around the world.

When the query is submitted in English and Bulgarian/Cyrillic, the results include the Knowledge Graph result for the Spanish movie, as you can see in screenshots below. However, once we switch and begin typing the name of the movie in Bulgarian, using the Latin alphabet, as shown in screenshot 3 on the bottom, Google Instant shows suggestions related to the Spanish movie AND the Bulgarian song. It also begins to offer Google Instant predictions in the Cyrillic alphabet, suggesting that the Latin formatting of the query triggers a larger Entity Understanding that encompass more queries including ones that are submitted using the other alphabets. Once the Bulgarian query is submitted using Latin characters, the results focus exclusively on the local Bulgarian hip-hop song, and do not include the Spanish movie. There is also a Bulgarian movie with the same name, it seems that the Bulgarian movie signals are not yet strong enough to trigger a Knowledge Graph results, but Google clearly understands that the Spanish movie may be an irrelevant result for this query when the Bulgarian language is detected.

This experiment shows that Google not only detects the language even when the query is not in the native alphabet (Cyrillic), but also suggests query options with alphabetically, linguistically and geographically diverse search results when they are unclear about the searcher’s intent. Google clearly does use alphabets and language as a contextual meaning but uses Google Instant for on-the-fly disambiguation. Prioritization of some results over others may not be an intentional organizing principle to discount the official local or native alphabets, but instead, a result triggered by the higher levels of engagement data that is associated with Latin-alphabet queries.

NOTE: Again, if you are interested in testing your brand’s Entity Understanding in different countries and searches around, the MobileMoxie SERP Test is great for this. It provides real search results for mobile queries on a large number of popular phones and works for queries in all languages, alphabets and locations around the world.

Conclusion

Sometimes, the work that Google has to do to surface good results around the world is not appreciated, but Entity-First Indexing forces it to the front of our minds. This is especially true for SEO’s that focus on international audiences and will have to anticipate how and when Google will relate different queries to specific Entities. In this article, we discussed the various APIs that Google uses to detect and understand languages, and provide search results, even in the cases where a large variety of characters could be used interchangeably to mean the same thing.

Our research clearly shows that Google follows a process for language and query identification that is critical to determining how the query will be understood, and what results will surface as a result of it. When queries are submitted to Google in languages that it has a deep, native understanding of, through the Natural Language API, Google will return better results. When queries are submitted and Google detects that they are not part of the Natural Language API, the algorithm and the results it produces are less accurate and more prone to problems, because they are relying more on keyword matching than keyword understanding.

This was the third article in a five-part series about the relationship between Google’s shift to Mobile-First Indexing (Entity-First Indexing) and language. The first two articles in the series focused on how and why Google needed to make this shift. This article focused on practical research that shows how language impacts queries in Google’s algorithm. The next article will continue to outline our research, first focusing on how and why the order of keywords in a query matters, and impacts Google’s ability to understand the query, depending on the language. Then it will delve into more complex linguistic understanding, like Google’s ability to interpret and parse different idioms and location-specific Entities. The final article in the series will talk about how individual personalization, like the languages that are set up on the phone and the physical location of the searcher play into queries, even when there is not a location or mapping intent detected, and how these are related to Entity Understanding and Entity-First Indexing.

 

The Entity & Language Series: Frameworks, Translation, Natural Language & APIs (2 of 5)

NOTE: Please use these links to catch up on the previous posts in the series: Article 1Article 2 / Article 3Article 4 / Article 5

By: Cindy Krum 

All of a sudden, we are seeing translation happen everywhere in the Google ecosystem: Google Maps is now available in 39 new languages, Google Translate rolled out new interface with faster access to ‘Conversation Mode,’ Google Translate on Android switched to server-side functionality that allows it to be invoked and work within any app or website on the phone, as shown in the image on the right, where Google Translate has been invoked to translate a tweet in the Twitter native app. Google clearly has reached a new level in their voice and language capabilities!

We are also seeing Google find lots of new uses for their Cloud Natural Language API; Google also just launched ‘Google Talk to Books’ which uses it to allow you to ask questions that it will answer from knowledge it has gained from crawling and processing/understanding the contents of over 100,000 books. They also just launched a new word association game called Semantris, which has two modes that both allow players to work against time to guess at on-screen word relationships to advance ever-increasing special hurdles caused, as more and more words are added to the screen.

And the list goes on. We are also seeing some of this play out in search. Image search results are now clearly pulling from an international index, with captions that have been translated to fit the query-language. Map results include translated entity understanding for some major generic queries, like ‘grocery store’ and ‘ATM’, and they also auto-translate user-reviews for local businesses in to the searcher’s language.

The timing of all of these changes is not a coincidence. It is all a natural side-effect of the recent launch of Mobile-First Indexing, as we call it Entity-First Indexing. This is the second article in a multi-part series about entities, language and their relationship to Google’s shift to Mobile-First/Entity-First Indexing. The previous article provided fundamental background knowledge about the concepts of entity search, entity indexing and what it might mean in the context of Google. This article will focus on the tools that we believe Google used to classify all the content on the web as entities, and organize them based on their relationships, and launch the new indexing methodology. Then we will speculate about what that might mean for SEO. The next three articles in this series will focus research and findings that we completed to show evidence of these theories in play internationally, as they are related to the functioning of the different translation APIs, how those impact Entity Understanding and how personalization plays into Google’s Entity Understanding and search results. 

Jump To:

Fuchsia & Why Entities are So Important to Google

To understand how language and entities fit into the larger picture at Google, you have to be able to see beyond just search and SEO. Google cares a lot about AI, to the point that they just made two people who previously worked on the AI team, the heads of the Search Team. At Google, they also care a lot about the voice search – so much that Google Assistant has already shipped in more than 400 million devices around the world. Finally, Google care about reaching what they call The Next Billion Users – people living outside of North America and Western Europe, who have historically not been on the cutting edge of technology, but are now getting online and becoming active and engaged members of the online community. All of these goals may be brought together with a new software that Google is working, currently under the code name Fuchsia.

Fuchsia is a combination of a browser and an OS. What is the most important about it from a search perspective, is that it works based almost entirely on entities and feeds. The documentation and specifications are still very thin, but if this is the direction that Google is headed, then we can be sure that search of some sort will be tightly integrated in the software. As SEO’s, what we need to remember, is that search is not just web, and this is something that Google now really seems to be taking seriously. Google knows that there are lots of different types of content that people want to surface and interact with on their phones and other devices, and not all of it can be found on websites. This is why entities are so important. They allow web and non-web content to be grouped together, and surface when they are the most appropriate for the context, or surface together, if the context is not clear, to let the user decide. This is where Google Play comes in.

Even now, if you are not sold on the idea of Fuchsia ever impacting your marketing strategy, it is worth looking at the Chrome Feed, which is a default part of all Android phones and part of the Google App on iOS. This customization feed, sometimes called ‘Articles for You’ is almost entirely entity based and according to NiemanLab and TechCrunch, traffic from this source increased 2,100% in 2017. Users get to select the specific topics that they want to ‘follow’ and the feed updates based on those topics, but also shows carousels of related topics, as shown below. Users can click on the triple-dot menu at any time to update the customization of their feed. If you don’t think this is a powerful way of getting news, realize that people can’t search for a news story until they at least have an idea of the topic or keywords that they want to search for – they have to be aware of if to put in a query. You can also think of how Twitter and Facebook work – both in feeds that you customize based on who you are friends with or follow – but most of us wish we could customize those feeds more. Google is hoping to be able to get us there in their own offering!

How Device Context & Google Play Fit In

Once Google launched app indexing, most SEO’s probably thought that Google’s ultimate goal was to integrate apps into normal search results. For awhile, it probably was, but deep linking and app indexing proved to be so problematic and complex for so many companies that it fell off of most people’s radar’s and Google changed course.

Either your app and website had exact web parity, all the time, and it was easy, or you didn’t and it was much more complicated. The problems generally stemmed from large sites with different CMS running the back-ends of their web and app platforms-sometimes even different between Android and iOS. This made mapping between all three systems to establish and maintain the required web-parity between the apps and the website a nightmare. Beyond that, whenever anything moved in the app or the website, content had to be moved everywhere else, to mirror the change. We think that this was one of the many good reasons that Google started advocating PWAs so strongly – it got them out of having to sort out the problems with deep linking and app indexing.

PWA’s allowed one set of code handled app and web interaction, which was brilliant, but what a lot of SEO’s missed, was the announcement that PWA’s were being added to Google Play, Google’s app store. PWAs are essentially ‘websites that took all the right vitamins’ according to Alex Russell from Google, so them being added to Google Play was a big deal! We have suspected it for a long time, but with the addition of PWA’s (and Google Instant Apps) to Google Play, it is finally clear that apps are not being integrated into traditional web search at Google, like most SEOs suspected, BUT, traditional web search is being integrated into Google Play – or at least using the Google Play framework. This fits perfectly into the concept of Entity-First Indexing, because Google Play already uses a cross-device, context-aware, entity-style classification hierarchy for their search!

 

Google Play can also handle the multi-media, cross-device content that Google probably wants to surface more in Mobile-First/Entity-First Indexing, including games, apps, music, movies, TV, etc, as shown below in the Google Play & Monty Python Google Play Search examples. All that content is already integrated, populated and ranking in Google Play. It is also set up well for entity classification, since things are already broken down based on basic classifications, like if they are apps, games, movies, TV shows or books. Within each of those categories, there are sub-categories with related sub-categories. There are also main entities, like developer accounts, or artists, from which multiple apps or albums and/or songs can be surfaced, and these also have relationships already built in – to other genres of related content, so this is all great for Entity Understanding.

 Google Play is already set up to include device-context in it’s search algorithm, so that it only surfaces apps and content that can be downloaded or played on the device that is searching. It is also set up to allow different media types in a SERP. As discussed in the first article in this series, context is incredibly important to Google right now because it is critical for the disambiguation of a searcher’s intent when it comes to entities.

Googles additional focus on context could also make the addition of videos and GIFs to Google Image Search seem potentially more logical, if contextual is considered. Perhaps this is now just a contextual grouping of visually oriented content, which would make it easier to interact with on devices like a TV, where you might use voice search or assisted search, casting or sharing your screen from a phone or laptop to the larger screen so that the viewing experience can be shared. Bill Slawski explains that many of Google’s recent patents focus on “user’s needs” and “context”….One of those was about Context Vectors, which Google told us [sic] involved the us of context terms from knowledge bases, to help identify the meaning of terms that might have more than one meaning” We think that the ‘knowledge base’ that Google is referring to in this patent documentation is actually Google Knowledge and similar data repositories that may have since been merged into the Knowledge Graph. The current status of Google Image Search could just be a middle-term result, that will change more, as more classification and UX positioning is added to the front-end side of the search interface.

From a linguistic perspective, Google Play was also a great candidate to use as a new indexing framework. For all the categories of content, but especially for apps, the categories that are available in the store stay the same in every language, though they are translated. More importantly though, metadata that app developers or ASOs submit to describe their apps in the store is auto-translated in all languages, so that your app can be surfaced for appropriate keyword searches in any language. So Google Play is already set up for a basic entity understanding, with all the hreflang information and hierarchical structure already in place.

Are Local Businesses Already Being Treated Like Apps?

If you are not focused on Local SEO, you might not be aware of the massive number of changes that have launched for GoogleMyBusiness (GMB)  listings in the past couple of weeks, in the time since the March 17th update. In general, small business owners have recently been given a lot more control of how their small business look in the Google Knowledge Graph listings. This includes: the ability to add and edit a business description that shows at the top of the listing, the ability to actively edit the menu of services that the business offers, and more.

Before March 17, Google had also quietly been testing Google Posts, which allowed small businesses to use their GMB accounts to publish calls to action, and allow searchers to take actions directly from the GMB – Knowledge Graph panel, including booking appointments and reservations. It is essentially a micro-blogging platform that lets business owners make direct updates to their business listing whenever they want, and this is a big deal. Joel Headley and Miriam Ellis do a great job of covering it on the Moz Blog.

All of this makes it seem very much like Google is empathizing with, and trying to fix one of the biggest pains of small businesses – maintaining their websites. This is another aspect of the Google Play store that fits well in the model we believe Google is going for, is that proven entity owners, such as app developers, are able to edit their app listings at will, to help market them and optimize them for search. If Google can empower small business owners to build out their GMB listings, and keep them current, then it will save them a lot of time and money, and many of them would be just as happy, or happier with that solution then having the burden and cost of maintaining a website.

From Google’s perspective, they just want to have the best and most accurate data that they can, as quickly and efficiently as they can. Google knows that small businesses often struggle to communicate business changes to web development teams in real time, and budget constraints may keep them from making changes as often as they would like. By empowering the business owners to control the listing directly, and even allowing them to set up calls to action and send push-notifications, Google is really creating a win-win situation for many small businesses. There are some obvious SEO questions about how easy or hard it will be to optimize GMB listings in the complete absence of a website, but this is an area to watch. Google is likely using off-line engagement data, and travel radiuses to inform how widely a business’s ranking radius should be, and how relevant it is for various queries, so we could be in all-new territory here, in terms of optimization and success metrics are concerned.

Global Search Algorithms are Better than Local

The websites that Google currently ranks in search results are translated by the website creators or their staff, but this is not necessarily true of the other entities that are ranked, for instance Knowledge Graph results, and related concepts that are linked there, like apps, videos and music. In these, Google is often using their own tools to translate content for presentation in search results (as they do aggressively with Android apps) or actively deciding that translation is not necessary, as is common with most media. They do this translation with basic translation APIs and Natural Language APIs and sometimes, potentially human assistance.

Without a language agnostic, unifying principle, organizing, sorting and surfacing all the information in the world will just get more and more unwieldy for Google over time. This is why, in our best guess, Google is not translating the entire web – they are just doing rough translations for the sake of entity classification. From there, they are ranking existing translations in search results, and then their language APIs makes it possible to translate other untranslated content with APIs, on an as-needed basis, which may become more important as voice search grows in adoption. For Google, it is actually easier to unify their index on a singular set of language agnostic entities, than it is to crawl and index all of the concepts in all of the languages, without the unifying, organizing principles of entities.

This synthesis of information necessary for entity classification may actually create more benefit than is immediately apparent to most SEOs; most SEOs assume that there is an appropriate keyword for everything, but in reality, language translation is often not symmetrical or absolute. We have probably all heard that Eskimos have more than 50 words for ‘snow’. These 50 words are not all exact translations but have slight variations in meaning which often do not directly translate in other languages. Similarly, you may have been exposed to the now-trendy Danish concept of ‘Hygge,’ which is a warm, soft homey feeling that one can create, which usually includes snacks and candle light, but again, there is no a direct translation for this word in English. If we required direct translation for classification, much of the richer and more detailed and nuanced meaning would be lost. This could also include loss of larger data concepts that are valuable across international borders, as postulated in the example below:

EX: If I am a Danish climate researcher, and we develop a method for measuring a the carbon footprint of a community, we create a new keyword to describe this new ‘collective community carbon footprint measurement’ concept, and the keyword is, ‘voresfodspor.’ This word exists only in Danish, but the concept is easily described in other languages. We don’t want the data and our research to be lost just because the keyword does not universally translate, so we need to tie it to a larger entity – ‘climate change,’ ‘climate measurement, ‘carbon measurement,’ ‘community measurement.’ Entity understanding is not perfect translation, but it is great for making sure that concepts don’t get lost or ignored. It is great for allowing further refinement by humans or by machine learning and AI down the road.

We know that the nature and content of languages in the world changes over time, much more quickly than the nature and number of entities (described at length in the previous article). Keying Google’s index off of a singular list of entities, in this case, based in English, makes surfacing content on the ever-growing web faster than it would be if entities had to be coded into the hierarchy of all languages individually. This is perhaps why in John Mueller’s recent AMA, John clearly said that Google wants to get away from having language and country-specific search algorithms. According to John, “For the most part, we try not to have separate algorithms per country or language. It doesn’t scale if we have to do that. It makes much more sense to spend a bit more time on making something that works across the whole of the web. That doesn’t mean that you don’t see local differences, but often that’s just a reflection of the local content which we see.”  

MarketFinder Tool is an Entity Classification Engine

In discussing Entity-First Indexing and the process by which Google may have approached it, we think it is useful to look at the tools that they have released recently, incase they can give us insights into what Google’s tech teams have been focusing on. The assumption here is that Google often seems to release cut-down versions of internal tools and technologies, once they are ready to start helping marketers take advantage of the new options that Google has been focusing on in the background. The best example here is the Page Speed Insights tool, that came out after the PageSpeedy server utility became available and the internal Google Page Speed Team had been working on helping speed up Chrome, and helping webmasters speed up their own web pages for a couple years.

In the past couple months, along with the many other translation and language-oriented new releases, Google has launched the MarketFinder and promoted it to their advertising and AdWords clients (Big thanks to Bill Hunt, one of the most notable PPC experts in the industry, for pointing this out to me!) In this tool, you can input a URL and it will quickly will tell you what advertising categories it believes are most appropriate for the URL, as you can see below in the www.Chewy.com example; from there, it will tell you what markets and languages show the most potential for marketing and advertising success in these topics, depending on if you sell products on the site. From there it gives you detailed information about each of the markets where it suggests you should advertise, including a country profile, economic profile, search and advertising information, online profile, purchase behavior and logistics for the country.

What is important to understand about the tool is that it is not telling you the value of the keyword but the value of the keyword concept – or the entity based on the automatic categorization of the site. The keyword and its related concepts, translated in to all the relevant languages, in all the countries where people might be searching for this topic or related topics. It is ALMOST like Google published a lite version of their ‘Entity Classification Engine’ and made available for PPC marketers to help them find the best markets for their advertising efforts – regardless of language, currency and other ideas that are often tied to countries, currencies and languages, but are less tied to entities.

The other thing that is interesting about the tool, which could be a coincidence, or could be related to Mobile-First Indexing and Entity classification, is that it does not allow you to evaluate pages – only domains – but it evaluates domains very quickly. It is almost as if it is pulling the classification of each domain from an existing entity database – like Google already has all of the domains classified by what entities they are most closely related to. This part is still unclear, but interesting from an SEO perspective. If it is telling us exactly how a domain has been classified, we can verify that we agree with the classification, or potentially do things to try to alter the classification in future crawls.

Cloud Native Language API Tool

The next somewhat newly released tool, and what many of the newest translation technology has been based on is the Google Cloud Natural API, which uses natural language technologies to help reveal the meaning of texts and how Google breaks it down into different linguistic structures to understand it. According to Google, the API uses the same Machine Learning technology that Google relies on for Google Search and Google Assistant. When you visit the API documentation, you can interact with the API directly, even without a project integration, by dropping text into the text box half way down the page. The first things that it does is to classify the submitted text based on it’s understanding of it, as entities! The tab is even called the ‘entities’ tab in the tool. (Those who doubt the importance of entities, probably also don’t realize how hard it must have been to develop this technology for all languages around the world – The level of commitment to developing and honing a tool like this is quite impressive!)

As you can see in the example below, with text taken from the MobileMoxie home page, our Toolset is somewhat correctly identified as a consumer good, though it might be better described as a ‘SaaS marketing service.’ A lot of keywords that the Cloud Natural Language API should be able to identify are identified as ‘other’ which might mean that it needs more context. It is also interesting that many of the words in the submission are totally dropped out and not evaluated at all. This probably means that these words are not impacting our entity classification at all, or at least not very much – because they did not add significant uniqueness or clarification to the text. What is interesting here, is that many of these words are classic marketing terminology, so it is possible that they are only being ignored BECAUSE something in the text has been identified as a Consumer Product.

For SEO’s, this tool might be a great way to evaluate new page copy, before it goes live, to determine how it might impact the evaluation and entity classification of a domain. If it turns out that a domain has been mis-classified, this tool might be the best option for quick guidance about how to change on-page text for a more accurate entity classification.

NOTE: Changing the capitalization on ‘MobileMoxie Toolset’ did change that classification from ‘Consumer Product’ to ‘Other’ but that did not change the number of words in the sentence that were evaluated, nor did removing the mention of the Toolset from the sentence all together.

Beyond just entity classification, another way the API reveals meaning is by determining Salience and Sentiment scores for an entity. According to Google, “Salience shows the importance or centrality of an entity to the entire document text.” In this tool, sentiment can probably only be evaluated based on what is submitted in the text box, using a score from 0 to 1, with zero representing low salience and 1 representing high salience, but in any real algorithm, we are guessing that salience is measured as a relationship with multiple metrics including the relationship to the page, to the entire domain and possibly to the larger entity as a whole, if there is one.

Sentiment isn’t defined, but it is generally agreed to be the positivity or negativity associated with a particular concept and in this, Google provides a score from -1.0 which is very negative, to 1.0, which is very positive. The magnitude of this score is described as the strength of the sentiment (probably in the context of the page or potentially on a more granular sentence level,) regardless of the score.

The next part of the tool is a separate Sentiment Analysis section which is a bit hard to understand because it has new numbers and scoring, different from what was used numbers in the Entities section of the tool. There are three sets of Sentiment and Magnitude scores. They are not labeled, so it is not entirely clear why there are three or what each of the three scores is associated with. Since only one of the Entities warranted a score of anything but 0, it is hard to know where the scores of .3 to .9 are coming from here, but a legend explains that 1- to -0.25 is red, presumably bad, -0.25 [sic] to 0.25 is yellow, presumably neutral, and 0.25 [sic] to 1 is green, presumably positive. Since this is different from the scoring used for Sentiment on the Entities tab, it is a bit hard to tell. It seems that Google offers more details about Sentiment Analysis Values in separate documentation but until the feedback from this tool is more clear it will probably not be too useful for SEO.

The next tab in this tool is very interesting – it is the Syntax evaluation. It basically breaks the sentences down, and shows how it understands each piece of it as a part of language. Using this in conjunction with the information on the Entity tab will allow you to understand how Google believes searchers are able to interact with Entities on your site.

After that is the shortest, but in my mind, most important information – the Categories. This takes whatever you have put into the tool and assigns it a Category tab, essentially telling you what part of Google’s Knowledge Graph the information that you submitted would be classified as. A full list of the categories that you can be classified as can be found here: https://cloud.google.com/natural-language/docs/categories

Two Parts of an Entity Classification Engine

While the value of these two tools to marketers might be hard to understand, their value and what they represent to Google is huge. We believe that these two tools together make up parts of what made it possible for Google to switch from the old method of indexing to the Entity-First Indexing. They are basically both Entity Classification Engines that use the same core, internationally translated entity hierarchy to either show how language and entity classification is done, in the case of the natural language API or show the financial results of entity classification for a businesses marketing plan in international markets, in the case of the market finder. It is basically the upstream and downstream impacts of entity classification!

How Marketers Can Start Getting Value from the Tools

The value of these new Google tools for digital marketers is still evolving but here are some steps SEOs can take to start better understanding and using them for thinking about entities in the context of their SEO efforts:

  • Make sure Google is categorizing your domain content correctly. Use the toolset to make sure that Google is classifying the most important pages on your site, like your homepage, as expected, since inaccurate classification could negatively impact your SEO. Google will struggle to display your page in the search results to the right people at the right time if Google has an incorrect and/or incomplete understanding of the page’s content. The MarketFinder tool can be used to determine how Google might be evaluating the domain as a whole, and the Cloud Natural Language API can be used to evaluate content on a page by page or tab by tab basis. If Google is classifying your site in an unexpected way, investigate which keywords on the page might be contributing to this misclassification.
  • Read Google’s Natural Language API documentation about Sentiment Analysis. As described earlier in this article, the Sentiment section in the Natural Language API is not labeled clearly, so it will likely be challenging for most SEOs to use it in its current form. Google has separate documentation with more details about Sentiment Analysis that is worth checking out because it offers a bit more context, but more clarity from Google about Sentiment would be ideal. We’ll be keeping an eye open for documentation updates from Google that may help fill in the gaps.
  • Learn as much as you can about “Entities” in the context of search. Entities can be a tough concept to understand, but we recommend keeping it top-of-mind. As Google moves into a new era that is focused much more on voice and cross-device interaction, entities will grow in importance, and it will be challenging to get the full value out of the Google tools without that foundational knowledge. Here are some great resources that will help you build that knowledge: the previous article in this series about “Entity-First Indexing,” this excellent article by Dave Davies about one of Google’s patents on entity relationships, this great patent breakdown by Bill Slawski, and Google’s official documentation about Analyzing Entities using the Google Natural Language API.
  • Understand alternate theories about Mobile-First Indexing. MobileMoxie recently published a four-part series investigating various changes in search results and other aspects of the Google ecosystem that seem related to the switch to Mobile-First Indexing, but have not been elucidated by Google. Most SEO’s and Google representatives are focusing on tactical changes and evaluations that need to be done on a website, but it is also important to not lose sight of the larger picture, and what Google’s larger, long term goals are, to understand how these changes fit into that mix. This will help you relate entities, entity search and entity indexing to your company’s larger strategy more readily.

Essential Entity Elements  – Critical Requirements for Correct Classification of an Entity

Over time, Google will find new things that need to be classified as entities, or old things will need to be re-classified as different kinds of entities. SEO’s will 1) need to know what type of entity that they would want to be classified as, and then 2) need to know what are the critical requirements that Google needs to find to classify something as a specific type of entity.

To do this, SEO’s will need to determine what Google considers to be the essential elements for similar entities that are correctly classified and ranking well in their top relevant searches. The various types of entities that Google recognizes and highlights in Knowledge Graph panels will have unifying elements that will change from entity to entity but will be the same for similar groups of entities or types of content. For instance, local businesses have had the same requirements for a long time, generally abbreviated as NAP – Name, Address and Phone number. This could be built out to include a logo and an image of the business. In other cases, like for a movie, most movie Knowledge Graph entries have a name, cast list, run time, age rating, release date, promo art and a video trailer. If your business is not classified as a particular kind of entity, and would like to be, then this will be and important step to take.

Conclusion

In the long-run, this model could be difficult for publishers and companies that are not original content creators, but this is probably by design. Websites that use an ‘aggregation and monetization’ model, or that survive primarily on ad revenue will struggle more; this is Google’s model, and they don’t appreciate the competition, and also, it hurts their user’s experience when they search! Google wants to raise the bar for quality content and limit the impact that low quality contributors have on the search ecosystem. By focusing more on entities, they also focus more on original, authoritative content, so this is easily a net-positive result for them. In the short term, it could even decrease the amount of urgency around Google’s effort to provide more safety and security for searchers, and minimizing the negative impact of ads, popups, malware and other nefarious online risks.

While many SEO’s, designers and developers will see moves in this direction as a huge threat, small business owners and users will probably see it as a huge benefit. Perhaps it will make the barrier to entry on the web high enough that nefarious actors will look elsewhere for spam and easy-money opportunities, and the web will become a more reliable, high-quality experience, on all devices. We can only hope. In the meantime, don’t get caught up on old SEO techniques and miss what is at the top of all of your actual search results – Knowledge Graph and entities.

This is the second article in a five part series about entities and language, and their relationship to the change to Mobile-First Indexing – what we are calling Entity-First Indexing. This article focused on the tools that Google used to classify the web, and reindex everything in an entity hierarchy. The next three articles will focus on our international research, and how the various translation APIs impact search results and Entity Understanding around the world, and how personal settings impact Google’s Entity Understanding and search results on an individual basis.

 

The Entity & Language Series: Entity-First Indexing with Mobile-First Crawling (1 of 5)

NOTE: Please use these links to catch up on the previous posts in the series: Article 1Article 2 / Article 3Article 4 / Article 5

By: Cindy Krum 

Mobile-First Indexing has been getting a lot of attention recently, but in my mind, most of it misses the point. Talking about Mobile-First Indexing only in terms of the different user-agent seems like a gross oversimplification. It is very unlikely that Google would need more than two years just to change the user-agent and viewport of the crawler – They have had both a desktop and mobile crawler since 2013 (or earlier if you count the WAP crawler), and Google has changed the user-agent and view-port of the primary crawler before, multiple times with minimal fanfare. Sure, Google is now using a different crawler for finding content to index, but my best SEO instincts say that Mobile-First Indexing is about much more than the different primary user-agent.

From what I can see, Google’s change to Mobile-First Indexing is much more about an entity classification and translation than it is about a different user-agent and viewport size for the bot. I believe this so much that I have started calling Mobile-First Indexing ‘Entity-First Indexing’. It is much more accurate and descriptive of the challenges and changes that SEO’s are about to face with Mobile-First/Entity-First Indexing. This article will focus on what the change to Entity-First Indexing means, plain-sight signals that ‘Entity-First Indexing’ is already underway, and how the change will impact SEO in the future.

This is the first in an article series that will dive much deeper into how Google understands languages and entities, how they use them in indexing and algorithms and why that is important for SEO. It will review what entities are and how they interact with language and keywords. Then it will speculate on how organizing their index based on entities might benefit Google, how they might have accomplished it during the switch to Mobile-First Indexing and how device context might be used in the future to help surface the right content within an entity. It wraps up with a discussion of what can go wrong with indexing based on entities, and what Google has said on the topic of Mobile-First Indexing.

The next article in this series will focus on the tools that Google used to break down the languages of the web and classify all the sites into entities, and then subsequent articles will focus on research that we completed that show how entity indexing works in different linguistic contexts, based on the different Google APIs that are used, and how those impact Google’s Entity Understanding.  Finally, the last article in the series will focus on how individual phone settings and search conditions like GPS location can impact query results, even when the query does not have a local intent, like a query for a local business might. 

Jump To:

Entity Understanding & Understanding Entities

Historically, Google’s reliance on links and keywords as the primary means of surfacing content in a search has eschewed the idea that the world had some larger order, hierarchy or organizing principle than language, but it does — it has entities! Entities are ideas or concepts that are universal and exist outside of language. As Dave Davies describes, in an excellent article about one of Google’s patents on entity relationships, “an entity is not simply a person, place or thing but also its characteristics. These characteristics are connected by relationships. If you read a [Google] patent, the entities are referred to as ‘nodes,’ and the relationships as ‘edges.’”

With that in mind, Entity Understanding is a process by which Google strives to understand and organize the relationships between different ‘nodes’ and ‘edges’ – or more readily, different thoughts, concepts, ideas and things and their modifying descriptors. They organize them into a hierarchy of relationships that is roughly what we all know as the Google Knowledge Graph. It is somewhat related to Semantic Understanding, but Semantic Understanding is based on language, and this is one step preceding the language, to be more conceptual, and universal; it is language agnostic.

Entities can be described by keywords, but can also be described by pictures, sounds, smells, feelings and concepts; (Think about the sound of a train station – it brings up a somewhat universal concept for anyone who might hear it, without needing a keyword.) A unified index that is based on entity concepts, eliminates the need for Google to sort through the immense morass of changing languages and keywords in all the languages in the world; instead, they can align their index based on these unifying concepts (entities), and then stem out from there in different languages as necessary.

The value of entities can be a bit hard to understand, but from the perspective of efficiency in search, the concept can’t be overstated. The internet has altered the way many of us think about knowledge, to make it seem like knowledge might be infinite, imperceivable and unending, but from the pragmatic and practical perspective of a search engine, this is not exactly true. While the potential of knowledge MAY be infinite, the number of ideas that we can describe, or that are regularly searched or discussed is somewhat limited. In fact, it used to fit in an encyclopedia, or at least into a library. For many years in history, libraries indexed all of the knowledge that they had available, and most carried more information than any one human could peruse in a lifetime. It is with this limitation that we must approach trying to understand ‘entities’ from the perspective of a search engine.

From a search engine perspective, it is important to understand that domains can be entities, but often have larger entities like ‘brands’ above them in an entity hierarchy. Indexing based on entities is what will allow Google to group all of a brand’s international websites as one entity, and switch in the appropriate one for the searcher, based on their individual country and language, as John Mueller describes in his recent Reddit AMA:

“You don’t need rel-alternate-hreflang. However, it can be really useful on international websites, especially where you have multiple countries with the same language. It doesn’t change rankings, but helps to get the “right” URL swapped in for the user. If it’s just a matter of multiple languages, we can often guess the language of the query and the better-fitting pages within your site. Eg, if you search for “blue shoes” we’ll take the English page (it’s pretty obvious), and for “blaue schuhe” we can get the German one. However, if someone searches for your brand, then the language of the query isn’t quite clear. Similarly, if you have pages in the same language for different countries, then hreflang can help us there.”

Notice how he talks about the brand as a whole, despite the fact that there might be different brand ccTLD domains or urls in the hreflang. Before Entity-First Indexing, the right international version of the website would have been more determined by algorithmic factors including links, because the websites were not grouped together under the brand and evaluated together as an entity. This concept is illustrated below in the first inverted pyramid. Historically, getting the correct ccTLD version of a site to rank in different countries was a constant struggle, (even with Search Console settings to help,) that this will hopefully solve.

For more topical queries, that are less focused on a brand, the entity relationships may be more loose, and include top resources on the topic, like blogs, books and accredited experts. These groupings could focus on domains, but depending on the strength of engagement with other content, such as popular podcast on a niche topic, the domain could be less prominently displayed or expressed in the entity ranking, illustrated below.

The Relationship Between Entities, Languages & Keywords

Remember, when SEO and search all about keywords, it is a language-specific task. Entities are different because they are universal concepts that keywords in any language can only describe. This means that entity-based search is more efficient, because the search engine can query more content faster (all languages at once), to find the best information. The algorithm can cut through the noise and nuance of language, spelling and keywords, and use entities and context to surface the appropriate type of response for the specific query. Though entities are language-agnostic, language is critical for informing Google’s Entity Understanding. It is this process that probably made the transition to Mobile-First Indexing so slow; the entire web had to be classified and re-indexed as entities, which is no-small task.

NOTE: While many SEO’s agree that the hreflang protocol was established to help train Google’s machine learning algorithms to build and refine their translation API’s, we believe it was ALSO used, more holistically, to develop its Entity and Contextual understanding of the web, because it allowed Google to quickly compare the same textual content, in the same context, across many languages all at once.

(Did anyone wonder why so many of the questions that John Mueller responded to in the Reddit AMA were about hreflang? Probably, because it is so important for Google’s ability to index domains based on entities, then switch the correct version of the content in based on language and location signals. Together with Schema, hreflang tagging is like Google’s Rosetta Stone for the internet and Entity Understanding. This is also why Mobile-First Indexing was rolled out globally, instead of US-first, one country at a time,  like the last major change to indexing, Caffeine was. It is by design that Entity-First Indexing can’t be rolled out one country at a time.)

If you think about it, language is fluid; it is changing every day, as new slang is added and new words come in and out of vogue. This is even seen in the nuances of pronunciation and spelling, and it happens not only in English but in every language. It even happens in subversive ways, with images and icons, (as any teen who has sent dirty text messages with a standard set of emojis can tell you.) But rapid changes to language can also be empowering and political, as you can see in the tweet below, about the #MeToo movement in China, which has been suppressed by certain groups in mainstream communication.

Google does care about communication, and has actually enabled even more emoji to work in Chrome recently, potentially to help enable empowering political movements, but also simply because their focus on PWAs means that more and more chat and communication apps will be leveraging browser code for core functionality. This shift to enable emojis could also hint at the potential that Google is concerned about trying to index, as many chat apps and social networks transition to crawlable PWAs, instead of having content locked away, much harder to crawl and index in native apps; the level of public communication in crawler-accessible browsers could grow exponentially.

What Does It Mean to Index on Entities & Why Would Google Do it?

To be clear, entity understanding has existed at Google for a long time, but it has not been core to indexing; It has been a modifier. I believe that the shift to Mobile-First Indexing is a reorganization of the index – based on entity understanding; roughly, a shift from organizing the index based on the Link Graph to organizing it based on the Knowledge Graph. Continuing to organize and surface content based on the Link Graph is just not scalable for Google’s long-term understanding of information and the web and it is definitely questionable in terms of the development of AI and multi-dimensional search responses that go beyond the browser.

For years Google has been trying to distance themselves from the false economy that they created, based on the relative value of links from one page to another, but they have not been able to do it because it was core to the system – it was part of how content was discovered, prioritized and organized. As Dave Davies says, “The idea that we can push our rankings forward through entity associations, and not just links, is incredibly powerful and versatile. Links have tried to serve this function and have done a great job, but there are a LOT of advantages for Google to move toward the entity model for weighting as well as a variety of other internal needs.” While neither Dave nor I are recommending you abandon linking as a strategy, we all know that it is something Google has been actively advocating for years.

Constantly crawling and indexing content based on something as easy to manipulate as the Link Graph and as fluid as language is hard, resource intensive, and inefficient for Google; And it would only grow more inefficient over time, as the amount of information on the web continues to grow. It is also limiting in terms of machine learning and artificial intelligence, because it allows the country and language-specific algorithms to evolve separately, which John Mueller specifically said in his Reddit AMA that they don’t want to do. Separate algorithms would limit potential growth of Google’s AI and ensure that larger, more populous country and language combinations remained much more advanced, while other smaller groups continued to lag and be ripe for abuse by spammers. Finally, most crucially for Google’s long term goals, Google would not be able to benefit from the multiplier effect that ‘aggregation of ALL the information’ could have for the volume of machine learning and artificial intelligence training data that could be processed by their systems, if only they could they could get around the problem of language  …  And this is why entities are so powerful!

Just a Guess – How I Imagine Entity Indexing Works

With all that in mind, here is my vision of how Mobile-First Indexing works, or will eventually work, with entity indexing. Possible problems you may have experienced related to the new indexing process (which may have started around March 7th) are noted in parentheses next to the proposed step that I believe may be causing the problem:

  1. Content is crawled for Mobile-First Indexing (Most of the content has already been crawled and re-indexed. You have been notified in Search Console, but Mobile-First Indexing probably began at least 3 months before the notification so that the New Search Console could begin building up the data and comparing it to old Search Console data to validate it before the notification was sent.)
    • The User-Agent is: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    • The Aspect Ratio is: 410x730px (or probably 411x731px), with an emulated DPR (devicePixelResolution) of 2.625
      Please Note – This may be variable and will probably change as new phones come out. This is why building in Responsive Design is the safest bet.
  2. Entire domains are re-indexed in the new Mobile-First Indexing process. This happened one domain at a time, rather than one page at a time. The bot only follows links that are accessible to it as a smartphone, so only content and code that is present for the mobile user-agent is re-indexed with the Mobile-First Indexing process, (whatever that may be). It may still be crawled periodically by the desktop crawler. This detail has not been made clear in the Google communication.
    • Old web content and desktop-only content that that was historically in the index but can’t be found by the mobile crawler but will remain in Google’s index, but will not receive any value that is associated with being evaluated in the Mobile-First Indexing process.
    • In addition to being evaluated for the content and potentially for mobile rendering, domains are evaluated for entity understanding using on-page text, metadata, Schema and other signals.
      • The domain itself is considered an entity. It has relationships to other domain and non-domain entities. Pages on the domain are indexed along with the domain-entity, rather than the larger entity concept.  (Entity Clustering, Re-indexing XML Sitemaps)
      • Links from the Link Graph are aggregated attributed to all alternate versions of the page equally, including mobile and desktop versions of the page as well as hreflang translated versions of the page. The same is true of Schema markup – especially if it is on the English or X-default version of the page. Google still uses local information as a ranking signal, but the signals may change in relationship to the larger entity.
        • Links continue to impact rankings, though they are less critical for indexing. The current Link Graph is probably noted so that the impact of aggregation can be rolled out slowly, over time in the algorithmic rankings. We can assume that links will remain part of the algorithm for a long time, or potentially even forever, until Google has vetted and tested many other suitable replacements. The most likely replacement will probably be some type of Real User Metric (RUM) similar to what we are seeing Google do with Page Speed as Tom Anthony brilliantly describes, but this may be some time off.
      • Pages (URLs on the domain) become interchangeable entities of the domain, which can be switched in as necessary depending on the search language, query, context and potentially, the physical location of the searcher. International versions of a page now share most or all ranking signals. (Weird international results)
      • Google’s understanding of structural content like subdomains and their association with specific countries as well as XML sitemap locations may be reset, and may need to be re-established in New Search Console. (Previously properly indexed XML sitemap files re-appearing in SERPs)
  3. Google’s newly organized index is based on entity hierarchy grouped roughly according to the Knowledge Graph, instead of however it has been organized historically, (we assume based somehow on the Link Graph.) This provides an efficiency benefit for Google but is not intended to *directly* impact rankings – Indexing and ranking are two different parts of the process for a search engine. It may however, make it easier for Google to de-emphasize links as part of the algorithm in the future. Remember, Knowledge Graph entities, Topic Carousels, App Packs, Map Packs, Utilities and other elements that so often surface at the top of SERPs now, do so without any links at all.The indexing establishes associations through an entity’s relationships to other entity concepts, and these associations can be loose or strong. These relationships are informed by their relative placement in the Knowledge Graph (proximity), but also probably fed by the historical information from the Link Graph. (FYI: Danny from Google specified that it is not a new index or a different index; It is the same index. We are simply speculating that this one index has been reorganized.)
  4. The entity hierarchy includes both domain and non-domain entities. Google will use their machine learning to inform, build-out and fine-tune the entity understanding of all entities over time
    • Non-domained Entities: Entities without a domain, like ideas, concepts or things in the Knowledge Graph are given a Google URL that represents their location in the index (or Knowledge Graph).
      • Indexed content like apps, maps, videos, audio and personal content deep linked on a personal phone also fall into this category. (EX: app deep links or system deep links, like ones for contacts – The contacts utility is essentially just an app.) Remember that more and more content that people eagerly consume is not ON websites, even if it is purchased from websites – though this may change with the rise of PWAs.
      • These non-domain entities are indexed along with existing websites in the hierarchy.
      • Temporary Google URLs are given to non-domain entities. The URL is not necessarily meant to build up traditional ranking signals, but instead, the URL is simply an encoded locator, so that the item can be found in the index. Once un-encoded, a unique ID allows the entity to be related to other content in the index, and surfaced in a search result whenever the top-level entity is the most appropriate result.
Follow-Up Discussion from Conferences Last Year: It seems like the idea that URL’s are optional might be an overstatement. Google still needs URLs to index content, they just don’t have to be unique, optimized, static or on a domain that an SEO optimizes. Google is creating Dynamic Link URLs for loads of types of content – especially when the content might qualify as an entity, and just putting it on different Google short links.  If you have certain kinds of content that you want indexed but it doesn’t have a URL, Google will essentially just give it one. Examples include locations such as businesses, but also locations that don’t have specific addresses like cities, countries and regions. They are also giving URLs to Google Actions/Assistant Apps, and information that appears to be indexed as part of an instant app, such as movies in the Google Play Movies & TV app. Types of music, bands, musicians, musical instruments, actors, painters, cartoon characters – really anything that might have an entry in a an incredibly comprehensive encyclopedia is getting a Google Short link.
      • Domain Entities: These are simply websites, which have historically been Google’s crawling and indexing focus. They are entities that already have their own domains, and don’t need to be given temporary URLs from Google.
        • Entities can be parts of other entities, so just because a website is a domain entity on its own, that does not preclude it from being a part of a larger concept, like the Florence & the Machine official website URL which is included as part of the official Google entity.
        • Larger entities like ‘brands’ may be related to domains but sit above the domains in the entity hierarchy. International brands could have many domains, and so the international brand is an entity, and the domains that are a part of it are also entities. Similarly, there could be concepts that are entities, that are smaller than domains, lower in the hierarchy.

5. Search rankings and entity relationships will be fed, reinforced or put up for re-evaluation using automated machine learning processes that are based on the user-behavior and engagement with the SERPs over time, especially when Google perceives a potential gap in their understanding.

    • At launch, the big entity concepts will be strong for head-term searches, but the long-tail results will be weaker and Google can fall back on traditional web SERPs and the content that has yet to be migrated to Mobile-First Indexing whenever they want. Google will use machine learning and AI to localize and improve more niche results. (Weird long-tail results, Unrecognized entities)
    • In the short term, newly perceived relationships will only lead to a temporary change in rankings, but in the long term, with enough signals, sustained changes in entity relationships could trigger a re-crawl of the domain so that the content can be re-evaluated by the Mobile-First Indexing process for additional Entity Understanding and categorization.

6. New types of assets that can rank will be indexed based on entity understanding, rather than the presence or absence of a website.

[Note from the author: I am not a systems architect, database manager, sys-admin or even a developer. I am just a lowly SEO trying to make sense of what the smart people at Google do. Please forgive me for any poorly worded technical descriptions or missteps, and let me know if you have corrections or alternate theories. I would love to hear them!]

Does the Crawler Render Now or Later?

The other major change that might be part of the Mobile-First Indexing process is that indexing and ranking now seem less tightly tied to rendering.  This is surprising, since Google has historically focused so much on mobile UX as a dimension of feedback to webmasters. But feedback has also always been in the context of Google’s PageSpeed Insights tool, which as Tom Anthony describes, is now fed by Real User Metrics (RUM) rather than data that it synthesizes during an on-demand page render, as the tool previously did.

Most SEO’s have been focused on how the change to Mobile-First Indexing will impact crawling of their content, which is important because it happens before indexing. Whatever is not crawled, is not indexed, or at least that is how it worked before. But if the Mobile-First Indexing process has changed something about when and how the bot renders the page, this could be substantial. Is it possible that once Google knows about a domain, it is just waiting on RUM rendering data to be collected and compiled from real-user rendering sources for some of the data?

This is all still very unclear, but some SEO’s have reported that content that was previously penalized because of interstitials is now ranking again, which was previously not allowed. John Mueller also recently specified that Google could index CSS grid layouts even though Google’s rendering engine, Chrome 41 does not support them. This does not seem to be a one-off thing either – Where Google used to be limited to indexing what it could render without changing tabs, now Google says it can index everything on all tabs, as long no on-click events are required to fetch content from the server. In potentially related news, John also says that parameters no longer hinder URL rankings or need to be managed in Search Console – something that Google has been saying for awhile, but so far, has never really been 100% true, but in a recent Google Hangout, it was explained that they are now just considered signals for crawling, rather than rules; it is possible that they signal Google to use a different type of rendering engine, after the content is indexed – this is something that we would love for John to expand on in future discussions.

Rendering is the most time and resource-intensive part of crawling, but recently, Google has not seemed worried about developers building their progressive web apps (PWAs) as single-page apps (SPAs). If unique URLs on a domain are just attributed to the domain entity anyway, (or if links are less important for indexing over-all), perhaps the entity as a whole can be rendered and evaluated later, with crawlers looking for deep links, long parameterized urls, JavaScript server requests for content from the server, or regular web URLs from internal links. If rendering doesn’t matter, or different bots can crawl the entity as needed, maybe Google will just lift whatever text it can, and try again with different bots later, as needed.

What Can Go Wrong When You Index on Entities?

As noted above, many SEO’s have noticed weird anomalies in the SERPs since the major update in March. Many of these anomalies seem much more related to indexing rather than ranking – Changes in how an entire query or domain is perceived, strong domain-clustering, changes to AMP, Answer Boxes and Knowledge Graph inclusions, changes in schema inclusions and problems with local and international content and sitemaps. My supposition here is some content, signals and indexing instructions may have been lost during the Entity-First Indexing process, but there are other things that can go wrong too.

From what we can tell, Google is still doing a great job responding to head-term queries, surfacing Knowledge Graph Entities like they have been for awhile. The problems only seem to come in for long-tail queries, where most SEOs focus. This is where Google’s Entity Understanding may be more vague or the relationships between different entities may be more complex.

The switch to Entity-First Indexing will certainly create instances where Google misunderstands context or makes assumptions that are wrong about when and where something is relevant to a searcher. Hopefully, this all gets sorted out quickly or rolled back until it is fixed. The fact that Google has announced that they will stop showing Google Instant results, where they used to include the keyword level entity disambiguation, may be a sign that they are worried it would expose too much of the inner workings of the system, at least in the short term. But they do still appear to include simple definitions and occasionally a link to a Wikipedia result in the instant results now, but that is it for now. It is interesting though that the old style of Google Instant results do still appear to be supported in the Google Assistant App, as shown below, but this could be temporary:

It is important to understand that Google’s Entity Understanding appears to be keyed off of the English definitions of words in most cases, so this means that there will be instances when the English concept of something is broken compared to the rest of the world’s concept of the same thing, like with pharmacies, as described in Article 4. Other examples might be the US reversal of the sports games ‘soccer’ and ‘football’ or disambiguation of the word ‘cricket’ where it is a popular sport instead of just a chirping bug – both quite strong and widely understood concepts that are regionally very different. In these cases, it is hard to know what to do, other than find a way to let Google know that they have made a mistake. 

Is Now Really the Time for Entities?

The biggest and most jarring change that has happened since the March update, was when temporarily Google replaced the normal response to queries about the time, with a single-entry answer, as shown below on the right.

This type of result only lasted a few days, and you can see why in the image below – Google was over-classifying too many queries as ‘time queries’ and this was causing problems; A query for a brand of scotch was being misunderstood as a time query.Google tried to perceive the intent of the query, but failed miserably, possibly because there were not enough entities included in the Knowledge Graph or Google’s index, possibly because they were not taking enough context into account or most likely, a bit of both. This will be a big risk in the early days of Entity-First Indexing. For brands, missing classification or mis-classification is the biggest risk.  I have been told that Time Magazine and the New York Times experienced similar problems during this test.

 

Context is King

With all this change, it is important to remember that Google’s mission is not only limited to surfacing information that is available on a user-facing domain. Google does not consider itself a utility that’s only job is to surface website content, and you shouldn’t either! Surfacing content on the web and surfacing websites are different. Google’s goal is to surface the most useful information to the searcher, and sometimes that will depend on the context that they are searching in. Google wants to serve their users, and the best information for their users may be a video, a song, a TV show, a deep link in an app, a web utility, or an answer from the Knowledge Graph.

Context allows Google to disambiguate multiple versions of a single entity, to know which one is the most relevant to the user at the time of their search. To better understand a complex entity and its indexing and how that might work, let’s look at the example of Monty Python. Among other things, Monty Python is in fact a domain, but it also the name of a comedy group, the name of a series comedy skits and compilations on video, a YouTube Channel, and part of the name of multiple albums of recorded comedy. When someone searches for the keyword ‘Monty Python’ how could Google know which one of those things they are looking for? They really couldn’t unless they knew more about the context of the search. If the user is searching on a computer, they could want any of those things, but if they are searching in a car or on a Google Home device, or something else without a screen, they are most likely looking for something with just audio – not videos. If they are searching on a TV, they are more likely looking for video. If they are searching on a computer or a phone, there is a chance they are looking for information, but if they are searching on a TV, the likelihood that they want to read information is low-they probably want to just watch a video.

Contextual signals are particularly important for delivering a great experience to mobile users. Google has been open about this, such as in this “Think With Google” article published in 2017 about advertising to mobile users, where Google says, “When we reach someone on mobile…there are loads of context signals, like time and location…To really break through with mobile users, it’s important to take full advantage of their context.”

When we index based on only keywords – keywords like ‘watch’ ‘listen’ ‘video’ ‘audio’ ‘play’ ‘clip’ ‘episode’ are necessary. When you index based on entity, the understanding of the search query is more natural, based on context. With context instead of additional keywords, queries become more simple, basic and natural. Indexing on entities allows Google to surface the right content based not only on the keyword query but also the context of the device that they start the search from, like a TV, a Google Home, a phone, a web-enabled car system or something else! We get closer to natural language.

The problem that SEO’s have is that we have focused on the broadest and context-free devices first – our computers. This makes it hard to conceive of how strong a signal context could be in determining which part of an entity is most relevant in a particular search but start to think about all the new devices that are getting Google Assistant and how Google will work on those devices.

Someone searching on a Google Home or Android Auto might not be able to use a website at all. They will be much more interested in audio. Someone searching on a TV is also probably more interested in videos and apps than they are in websites. SEO’s who limit their understanding of their job to optimizing website experiences will limit their success. Just because Google crawls and indexes the web, does not mean that they are limited to websites, and SEO’s should not be either.

Discussion with Google

This change to the time queries has since been rolled back, but when it happened, I tweeted that this was a clear indication of Mobile-First Indexing. Danny Sullivan, a long-time friend, search personality, SEO expert, and Google’s Search Liaison explained that it had nothing to do with Mobile-First Indexing, which I found confusing. I realize now that my tweet didn’t convey my more robust belief that Mobile-First Indexing is all about Entity Understanding, but we can suffice to say that Google officially conceives of these two concepts as separate. Perhaps they are two separate projects, but I find it impossible to believe that they are totally unrelated. To me, it seems self-evident that the goal of any change towards Mobile-First [anything], especially if it was meant to support voice-search, would improve Entity Understanding. But in his response, Danny seemed to assert that Mobile-First Indexing has absolutely nothing to do with Entity Understanding.

Danny gave an analogy that I love, about Mobile-First Indexing being like removing old paper books from a library and replacing them with the same thing in an e-book format. This analogy was provided to prove the point that there is only one index, not a separate mobile and a desktop index which Danny emphasized as a very important point. This seems perfectly aligned to illustrate the efficiency of entity-based indexing – I love it! An eBook would not need to keep multiple paper copies of translated versions of the text, but could potentially be translated on the fly – the same way we describe language agnostic entity understanding here and in Article 4 of the Mobile-First Indexing series. It is overwhelmingly disappointing that Google is not willing to talk about this part of the change to Mobile-First Indexing, and that Danny is willing to give the analogy but not willing to discuss the full depth of the explanation at this point.

The only problem is that library analogy is at odds with the explanation that is being given from John Mueller from the Webmaster team, that it is just about a change to the user-agent. If the only thing that changes is the user-agent, how do we get an eBook from the same crawler that previously only gave us paper books? Unfortunately, after the library analogy, the conversation got derailed (as it has before with other Google representatives) to focus on the number of indexes Google was using to organize content. The ‘one index vs. multiple indexes’ point is a point that can be a bit confusing because some Google representatives repeatedly explained and implied that there was an old ‘desktop-oriented’ index (that we have been using historically) and a new ‘Mobile-First’ index that content was migrating too.

There is a lot to be confused about, starting with the change in messaging from when Google was telling us about sites “being moved into the Mobile First Index one domain at a time;” to the “same index, different crawler,” line that is now the official, go-to talking point on this topic, for Google representatives. The position allows Google to say that desktop content will be maintained even if it is not accessible by the mobile crawler, which makes the discussion of the new crawler…almost irrelevant! If desktop content will see no negative effect from the change, why bother making any accommodations for it at all? But ultimately, this ‘one index’ mantra is a nuanced point that really doesn’t matter and I think it is a bit of a red herring. The same index can have different partitions, formatting or organization, virtual partitions or any number of designations that make it function like one or two indexes as necessary. It is also true that one index can exist and simply be re-organized one domain at a time, without duplication. The net result for users and SEO’s does not change.

Conclusion

Google has made a big investment in voice search and Google Assistant and recently doubled down in AI by promoting two new heads of search as people with extensive backgrounds in machine learning and artificial intelligence.  All of these things should be taken as a sign of change in the lives and job descriptions of SEO’s. As more and more devices become web-enabled, and fewer and fewer of the best results for users are websites, the context for search is getting much broader.

New strategies will include adding audio versions of text-only content, adding video and voice-interactive versions of content, and getting all these assets indexed and associated correctly with the main entity. They will also include optimizing non-website entities, like Knowledge Graph relationships to ensure that main entities are correctly correlated with the domain and all of its assets. They will include monitoring the translation and entity understanding, to make sure that all the interactions are happening correctly for users around the world, and they will include monitoring feedback like reviews, which Google will be using more and more to automate the sorting and filtering of content for voice navigation. They will also no-doubt include technical recommendations like use of Schema and JSON-LD to mark up content, transition to Responsive Design and AMP only design, transition to PWA and PWAMP.

This has been the first of a five part article series on Google’s new Entity-First Indexing (what everyone else calls Mobile-First Indexing) and how it is related to language. Future articles in this series will provide deeper information about the relationship between language, location and entity understanding, and how these things can impact search results. The next article in the series will focus on the tools that Google has made available to marketers, that we think offer a good view into how their language and entity understanding works and the following three articles will walk through specific research we have done in this area to validate and explain our theories. This will include one article about the Google language APIs, one about how language impacts Entity Understanding and one about how personalization impacts Google’s Entity Understanding, and changes search results for individual users. The final article in the series will focus on how individual phone language settings and physical location can change search results for individuals, making it even harder for SEO’s to predict what a search result will look like, and how their content will rank in different search scenarios. 

 

Mobile-First Indexing or a Whole New Google? The Local & International Impact – Article 4 of 4

 


By: Cindy Krum 

Many technologists theorize that people will look back on the time-period we are living in now and describe it as a modern Industrial Revolution. Surely the massive expansion of trackable data will create a significant shift in how many aspects of business are approached, and the big tech companies, Apple, Google (Alphabet), Microsoft, Facebook and Amazon, have a huge amount of power in the marketplace because they control the flow of data. They fight to stay ahead, and one-up each and to even disrupt their own business models with newer, more innovative options and opportunities. ‘So, what does this have to do with SEO and Mobile-First Indexing’ you might ask – Actually a lot!

For awhile now, Google has been talking about micro-moments. Google describes these as instances where searchers are making a decision about their ultimate goal in a search. According to Google, micro-moments mostly fall into four categories: “I want to know,” “I want to go,” “I want to do” and “I want to buy.” While one entity or topic can easily fit into more than one micro-moment, each of the articles in these series has focused primarily on one of the micro-moments. The first article focused on information search, or “I want to know.” The second article focused on media, which fits with “I want to do,” as in, “I want to watch a video,” I want to play a podcast, “I want to listen to a song.” It could also include, “I want to turn on the lights,” or “I want to cast this image to the TV.” The third article was about shopping, so obviously fits with “I want to buy”. This last article will focus on location and maps, so it is mostly about the “I want to go,” micro-moment, but it also takes a more global perspective, and will discuss how language and location fit into all of the other micro moments.


Entity Understanding & How CCTLDs Might Factor In
Let’s start with Global location as a concept in search. Historically, different country-versions of Google have operated with different algorithms that were updated at different times. As far as we know, the US algorithm was always the first to receive new updates, and other countries algorithms would be updated later, as Google was able to modify each update to fit the different local and linguistic needs of each country. Google would geo-detect a user’s location, either by the IP block or using other localization signals, and then often redirect the user to the local country version of Google that fit their specific location. As they got more sophisticated, they would geo-locate a person based on the GPS location of their phone to determine which Google ccTLD was most appropriate, and pass that information off to all other logged-in devices. This all changed a few months ago when Google announced that they will now be serving the same algorithmic results from each ccTLD of Google, and that the results would vary instead, based simply on the location of the searcher – So now, theoretically, starting a search on Google.nl is the same as starting a search at Google.com or Google.mx. This may indicate a substantial milestone completion for Mobile-First Indexing and the entity understanding of the web.

It makes a lot of sense for Google to leverage the GPS location of a phone in a Mobile-First world, so this should be no surprise. But there are actually a lot of places where language settings can be changed in Google properties, so the increased reliance on these settings as algorithmic variables could complicate the predictability of SEO – something that is fine with Google as long as it also creates good results for searchers. As you can see in the search settings and troubleshooting guide below, Google is getting hints about languages from both Chrome and Android settings, which are not necessarily always in sync.

If a search is performed ‘Incognito,’ or the primary phone on the Google Account is turned off, logged-out, or otherwise not giving a good location signal, Google reverts to the IP address, as shown below.This may seem inconsequential, but it is definitely not.

The easiest way to understand the long-term potential impact of this change is within Google’s Image Search. Image search has historically been quite disconnected from true user-intent, because it focused on the keyword a user searched for, in the specific language that they searched in – not the actual goal of the search – which was of course, to find a picture of something. In reality, if someone is searching for an image of a ‘blue chair,’ they are searching for a concept, not a keyword. The language that the text surrounding the image is inconsequential. Any pictures of a ‘blue chair’ would be a good result, regardless of the language of the text that the keyword ‘blue chair’ is written in. A search for ‘blauwe stoel’ (Dutch), ‘blauer stuhl’ (German), or ‘青い椅子’ (Japanese) should all return basically the same image results for a blue chair but historically, with the keyword-relevance algorithms, they did not. Now with an index based on entity understanding, along with Google’s translation API’s (already available for many languages) , the intent of the search and the related entity understanding will become much more critical that the un-translated keyword relevance.

As expressed in the previous article in this series, most image search results appear to already be derived from the Mobile-First Index. This change to entity understanding is easiest to see with images, because the goal of an image search is visual, and thus, not focused on language or location at all. In a web result, the language matters much more, because the intent would more likely include opening a web page and reading text. The same is true of a voice search, unless we assume that translation APIs would be used to live translate any results for a voice output. This seems unlikely in the short term, especially for long web pages, but potentially much more likely for search results with limited text, such as a Google Action in the near-term.


The Impact Schema & JSON-LD Have on Mobile-First Indexing
Whenever entity understanding comes up, especially in SEO circles, it ultimately leads you to a conversation about the importance of Schema markup and JSON-LD. These are coded methods of describing the content of a website, app, database, etc., that are easy for search engines to parse and understand. Historically, this type of markup has been used to help companies describe local addresses for maps and provide basic information about recipes, so that Google can show images, calories and cooking times directly in a search result. Schema also helps product pages display star rankings and other great things like that directly in SERPs.

What many SEO’s might not realize, is that Schema and JSON-LD are English based codes; so even if they are being used internationally, where content is written in a different language, the markup code is still in English. This makes international entity understanding easier for Google, because it converts the basic information for categorizing entities on websites around the World into English. This is HUGE in terms of potential impacts on international SEO, but also expeditious for Google and entity understanding as a whole.

Beyond using Schema and JSON-LD as the Rosetta Stone of the web, English-based entity-understanding has a secondary implication. Google’s most successful search algorithm has always been in English, so when all content is linked together with a unified ‘schema’ for entity understanding, it is easier to have algorithmic searches based on the US/English version of the search algorithm. While the effort to get this all set up and working properly is very large in the beginning, in the long-run, this move could save Google a lot of time and make surfacing search results faster, especially as the content of the web continues to grow at (near) exponential rates. It should get them out of maintaining different versions of the search algorithm for each language and country combination and in the long-run, so most likely it will be even more important for webmasters to mark up their content with the appropriate Schema, when the content is not natively in English.


Entity Understanding & Dynamic Links in Google Maps
Beyond just Image Search results, entity understanding currently seems to be more prevalent for Google outlets that address entertainment-oriented queries in Google, Google Play and Google Now, as discussed in the second article in this series, but we are also seeing strong examples of entity understanding in Google Shopping and Google Express as well as Google Maps. For all of these topics, language is less relevant or potentially even a hindrance for surfacing what a user wants. For instance, if you are in the Netherlands, searching Google Maps for a ‘grocery store’ in English, your intent and final destination should be the same as if you searched for the Dutch version of the keyword (markt/supermarkt). In fact, using entity understanding in map search solves a huge problem for travelers, because needing to translate a keyword like this into a local language before searching for it in Google (or Google Maps) is a slow and error-prone process.

You can already see the beginning of entity understanding in some versions of Google Maps now, with buttons that represent the most common entity searches that Google would like to understand and surface: Restaurant, Grocery Store, Pharmacy, Cafe, Gas Station and ATM, shown below. Notice in the image on the right, that synonyms for the entity understanding of the query are listed with each location that is surfaced, boxed in red. This does not happen in every map query for every entity – it appears to happen only where the entity understanding is strong enough that Google has hit a certain confidence threshold. At this point, ‘supermarket is the only entity that is consistently showing the entity keywords with the specific locations in Map search results. Google seems to be using their AI to build out the entity understanding in Maps over time, so sometimes you can click one of the standard buttons like ‘Pharmacy’ for an entity concept that Google is still learning, and see that the keywords are NOT included, possibly indicating that the confidence in the result or entity understanding is not as strong.

This is important, because some concepts, like ‘pharmacies’ are harder to define, or don’t translate as readily across borders, as illustrated with the Amsterdam and Denver example searches below. In the US, pharmacies like Walgreen’s, CVS and Duane Reed are all-purpose stores where you can pick up prescription drugs, but also many other things including snacks, toiletries and makeup; but in many other countries, including the Netherlands, where one of the example screenshots was taken, pharmacies are much more limited, focusing only on prescription drugs. Google may be trying to disambiguate the query intent, deciding if an American would be just as happy with the Dutch equivalent of a ‘corner store’, ‘convenience store’ or ‘bodega’, even though they clicked on the ‘pharmacy’ button. What is interesting here is that the English understanding in the search from Denver does not appear have entity understanding either, even in the US. This indicates that Google is insisting on multilingual entity understanding in all cases, including in English, where it has the greatest native understanding already, before entities can achieve the same keyword inclusion level of confidence that the Grocery Stores are currently getting.

Restaurants and tourist destinations are more universal, so Google’s AI is generally more robust for these types of locations (though there is sometimes confusion with supermarkets that have eat-in delis). You can see in the image above, that in a Maps browse screen, Google is not only highlighting different types of location-groupings especially for restaurants, but is showing the time of day and weather at the top of the screen, which also impacts the results that are being suggested. It is noon, so we are being shown lunch restaurants, and the weather is cold, but not raining, so outdoor activities are ok, but not preferred. Most likely, these recommendations are based on simple logic like this, as well as crowd-sourced data about what other people have historically chosen to do on days like this.

The inclusion of a dynamic (shareable) link that is associated with the map (the three dot, triangular share links), each of the entity results in the map and each of the specific locations suggested in the map should give us a pretty clear idea of how this part of the Mobile-First Index is organized. There are entities, they are grouped, and other entities live within those entities. The entities are more or less relevant to different context, like time of day and weather, and they don’t require unique domains to be surfaced. The dynamic links allow them to be indexed, surfaced and shared without necessarily needing a website or traditional links. This concept will be critical moving forward into the future of Mobile-First Indexing.


Local Search, Shopping, Inventory Control & Just in Time Delivery
Rob Thomas, leading software analyst from IBM says that “The rise of the Data Era, coupled with software and connected device sprawl, creates an opportunity for some companies to outperform others. Those who figure out how to apply this advantage will drive unprecedented wealth creation and comprise the new S&P 500.” His prediction is that there will be no more ‘tech companies’ per se, but that all companies will be tech companies by default, and that the tech and the data will be deeply integrated into the fabric of every company. Well-known thought-leader in the technology space, Benidict Evans agrees by saying that simply “It is easier for software to enter other industries than for other industries to hire software people.”  I agree and think that there is a big risk that the largest tech companies will invade more industries; empire-building with conquests in the form of M&A in offline sectors; we have seen this already with Amazon’s purchase of WholeFoods grocery store. But it could be more complicated than that; It seems likely that Google sees itself as a potential middle-man and will position itself to help businesses harness the data and stave-off their own wholesale takeover by tech companies in offline industries.  

Cross-device, multi-sensor data will be revolutionary, in many ways, and in the long run, it will allow Google to directly index offline goods, further bridging the gap between on and offline information. This will be powerful and revolutionary in its own right. Many online retailers have tried to launch websites tapped into the local store inventory systems, to allow shoppers to find items online and reserve them for pickup in the store, rather than paying for shipping. Target, BestBuy and DSW have all tried, but most encounter significant struggles because current inventory control systems are so bad, Google could easily fix this problem with cheap data sensors. It is possible that Google’s next move will be towards better Just-In-Time (JIT) inventory control systems that help business know the reality of their inventory, rather than simply knowing rough estimates which are only updated every 24 hours, as they do now.

Google may also be hedging its long-term bets on their ability to manage fleets of driverless cars, which they have been working on for many years. This would allow them to pull together information from product and inventory-oriented sensors, as well as information about maps, traffic and orders to seamlessly execute Just-in-Time deliveries for all kinds of stores, with maximal efficiency and minimal incremental cost. The new expectation will be set, where potential customers can order something that they are interested in, have it delivered in 24 hours, and return it in the next 24 hours if it does not meet their expectations, all using the internet, without leaving their house. No more waiting in line or trying things on in store dressing rooms.

Similarly, grocery delivery apps have been around for a number of years, but none of the stores made it easy enough to casually add things to the list or place orders. Being able to update a weekly grocery list using just your voice may change all that. When voice search is combined with AI, cloud data and mapping or potentially even driverless cars for Just-in-Time delivery services, we get something that Google would really be interested in. Looking into the future, if Google’s ultimate goal is to use sensors to help companies index offline inventory for Just-in-Time delivery, potentially using fleets of Google’s driverless cars, the long-term result of Mobile-First Indexing could ultimately help smaller, more local businesses, empowering small-scale retailers to compete more directly with the large, enterprise e-commerce sites. The model might help people easily get food that is locally grown, or organic and in-season, delivered on a regular basis without the markup that is associated with most retail stores; small start-ups could spend the money they would have invested in the physical store on inventory planning and order management systems. Shopping may resume a local focus, leveraging the internet to compete with the larger, global retailers and this would fit well with the direction of society, the demands of Millennials, and Google’s goals to reach the next Million Users. 


Is this A Whole New Google?
With all of this consolidation at Google, and the movement towards Mobile-First indexing, what should we expect next – Is this a whole new Google? Well in short, yes – I think so. The next thing to come will be Fuchsia – Google’s new cross-device, OS-browser combo, and one of the most important rumors that no one in the SEO community is really talking about. The rumor that Google Chrome could merge with the Android OS and launch a web-OS has been around for awhile, and it most certainly would fit well with Mobile-First Indexing. At first it was unclear if Fuchsia was just another mobile OS, but according to Kyle Bradsahw, author of the new Fuchsia Friday at 9to5Google, “Now that we’ve seen it up and running on a Pixelbook, it seems more likely Fuchsia could eventually supplant both Android and Chrome OS.

Google has wanted a unified experience that pulled together the web browser, app stores and operating system for a long time. The first try Google had at unifying the browser and OS was with their ChromeBooks, which had the Chrome operating system, and allowed users to access and download software for the Chrome Store. The Chrome Store offered Chrome Plugins and Apps, that function somewhat similarly to a PWA, leveraging the browser code for the core functionality, Plugins using the normal Chrome layout, and apps that could install minimal software but also re-style the browser display to suit the needs of the app. (It is rumored that much of the team working on PWAs at Google came from the old Chrome Store team. My guess is that they pioneered the concept for PWAs there. In my mind, it is still easiest to understand a PWA as browser-plugins that get to re-style the browser window.) It is possible that the next major handset launch will happen at Google I/O 2018 and showcase Fuchsia. Perhaps it is telling that the branding for the event seem to depict a wind or current map – like ‘The Winds of Change’? (Though I think if they were really going to go for it they would have done it all in bright pink, or fuchsia.)

Unsurprisingly, Fuschia focuses heavily on Entity understanding. From his review of the Fuschia documentation, Kyle suggests that “Entities are created and shared in JSON, a format which is designed to be human-readable and is nearly universal with parsing available in most modern programming languages. We also briefly learned last week, that Ledger [individualized Google device software that enables quick task and device switching for one user] is also designed to handle JSON objects well. This is certainly no coincidence. Ledger will almost certainly directly keep track of entities, among its other duties ….  Improved Dynamic Entity Extraction: The new entity extractor adds support for Microdata and listens for mutation events so that entities can be re-scraped. When the entities in the page change an event is triggered and the web view updates the context store’s list of entities.” This all fits extremely well with how we have described Mobile-First Indexing’s heavy reliance on JSON markup and other methods of entity understanding for dynamic presentation of data and on-going AI.

Additionally, for another indication of the good fit with Mobile-First Indexing, Kyle explains that Fuchsia leans heavily on Android Instant Apps. “Android Instant Apps allow users to try out new apps before committing to install. To do this though, developers have to create a specially-made and stripped-down modular app using the Instant Apps SDK. Where Fuchsia differs however, is that there is seemingly no distinction between an installed app and an “Instant” version. Whether you install it manually or run instantly from a suggestion, the app is the same… The most important thing though, is that these processes will be completely transparent. It looks like Google is building Fuchsia so that when users know what they want to do next, Fuchsia will be happy to accommodate. You won’t have to worry about whether or not you have the app you want installed, saving you from filling your devices with apps you “might need someday.” If Fuchsia is a success, this could be monumental change for SEO and progress for entity understanding and Google’s AI. Traditional ASO will fall into the past and be replaced with context-aware surfacing of Instant Apps.

From a more pragmatic, and immediate SEO perspective, Fuchsia is still important. Remember that the Google’s “Speed” updates is set to launch ‘this July’ and the update to fix the problematic cached AMP URLs slated to launch ‘sometime in the middle of the Summer,’ so these both could be timed to coincide with a more full-throated launch of Mobile-First Indexing at Google I/O. Beyond that, we have already seen movements in the Chrome browser announcements that fit well with  Mobile-First Indexing, and with a unified browser-OS, like what is being tested with Fuchsia: Chrome will not have a visible address bar, which is important if it is launching content without URLs, and for PWAs launching full-screen; Chrome will focus more on offline caching, important if it is running as both OS and browser); Chrome will auto-adjust for zooming in DevTools; if they can do it in DevTools, they can do it elsewhere, and this is an important feature for any OS-browser combo that might work on a non-standard device display, like on TV and car displays) and Chrome will focus much more on HTTPS, which is very important if it is running as both OS and browser.


Conclusion
In the same way that ‘tech’ may no-longer be as relevant or descriptive of a classification for any company, ‘the internet’ may no-longer be as relevant of a concept for where people spend their time, money and attention. It will just be a necessary part of the equation that will fade into the background and become a ‘given’ or a ‘constant’. Access to high-tech data processing and analysis on the internet will become analogous to having access to a unique skill, material or machine needed in the previous century. This is going a long way to break down barriers, and from the perspective of SEO, the most important barrier that is being broken down is a linguistic one. Mobile-First Indexing, and entity understanding, along with translation APIs is allowing Google to index the world based on a single set of ideas, which will speed up the organizing and surfacing of information, and help expedite the management of algorithms. It will also make international AI and machine learning for voice search more robust and meaningful much faster, with more data feeding the same system, instead of having the systems all segmented by country or language. 

Technology has driven us to a new ‘on-demand’ economy, but the newest, most innovative opportunity might actually be the one that is closest to home. On-demand goods and services that are organized and ordered on the internet, but then appear nearly immediately, might be the next big thing. The concept of tech companies using ubiquitous cross-device internet, data and sensors to radically change non-tech industries and the concept of a Mobile-First Index could go hand-in-hand, especially if indexing of offline products and inventory becomes a reality. When searchers seek out these goods and services, their end goal is not the website, but the good or service.

The prospect of Fuchsia could be huge, both for Mobile-First Indexing, but also AI and the Internet of Things. If it is a success, it will fundamentally change the job of marketers and SEO’s, hopefully for the better. Unfortunately, marketers have gotten so used to marketing on the website or in the physical store that they can’t imagine their jobs without those options, but Mobile-First Indexing could help make this part of the new reality. This is the fourth and final article in a series about Mobile-First Indexing. It explained how the entity understanding described previous articles, which focused on information searches, media searches and shopping searches come together globally, and how that will change SEO in the long run. Finally, it also covered how this change to Mobile-First Indexing may indicate, at least from an SEO perspective, that soon we really may be dealing with a whole new Google.

 

Other Articles in this Series:
Is Mobile-First the Same as Voice-First? (Article 1 of 4)
How Media & PWAs Fit Into the Larger Picture at Google (Article 2 of 4)
How Shopping Might Factor Into the Larger Picture (Article 3 of 4)
The Local & International Impact (Article 4 of 4)

SEO Strategy for Voice Search in Shopping – Supplemental for Article 3 of The Mobile-First Indexing Deep Dive

By: Cindy Krum

Google’s transition to Mobile-First Indexing is likely to shift a lot of SEO attention to voice search, especially as more and more devices are available with Google Assistant voice search capability built in. Amazon Alexa is also being added to a number of web-enabled connected TV’s, and they have big plans to monetize it with ads as well, making Amazon and Alexa more of a threat to Google and Google Assistant than they already are. In fact, many connected devices will actually now offer both Amazon Alexa and Google Assistant, so that users can choose, change or use both, depending on their needs and what they are trying to achieve.

Price is almost always the primary consideration for a person searching for something to buy online, so it is helpful to break down a searcher’s most likely behavior, in relationship to the price and risk associated with the purchase. If you assume a marketplace where the variation on price is minimal, then the purchase behavior will generally follow a predictable pattern: the cheaper and more consumable the product is, the lower the risk it will be to the consumer, and the more likely they are to purchase it with little information or comparison. The more expensive an item is, the more potential risk the shopper feels, so they will need more information and potentially, additional time for evaluation and comparison. This is important for voice-search and Mobile-First Indexing because it will impact the primary focus and SEO tasks necessary to optimize the products for search.

It also tends to be true that, the cheaper something is, the more likely a person is to be loyal to the store that sells the product, rather than the product itself. As the price goes up, the person is more likely to be brand loyal and impressed by features, but less store-loyal. At the top of the price spectrum, the more likely the consumer is to be interested in features, evaluating and comparing a small group of brands, but not particularly store-loyal. These rules are not set in stone though, and all three behaviors are possible for all three levels of price. This is important because it will impact how initial product discovery in voice search might happen.

The voice searcher will begin a search with their primary loyalty and then move out from there, depending on the risk that they feel is associated with a bad decision. You can actually use this model to determine, not only what is the main focus of loyalty for the searcher, but also to understand their perception of risk associated with a purchase. With that information, a seller could even tailor responses or information available to the voice-search utility, to respond with the best information, that will both answer the question, but also ease the searcher’s perception of risk with the purchase. An outline of these different use-cases is included below, with potential SEO opportunities for each kind of voice-only search-transaction.

  • Low Risk/Low-Cost Products: – Voice Search Most Likely by Store, then Features and Brand:
    • Searchers are more likely to search for what a store has in their inventory so that they can add the purchase to an existing order other similar purchases from the same store.
    • A great example here is paper towels:
    • These are things that are purchased often, and so once the user finds an item they like, they will probably stick with that one as long as the store carries it.
    • If the store stops carrying the preferred item, the user is more likely to test and find new items, rather than switching stores.
    • Users may be trying to add something on to an existing order to save on delivery or get loyalty points.
    • If you just order any paper towels that a store has in stock, you might not love what you get, but they will probably serve the intended purpose.
  • For SEO:
      • Being easy to find within a limited store context is important. Optimization could focus on surfacing within different store-specific search environments.
      • Brand-loyalty is weak, being well associated with competitive brands, and surfacing that information, deals or coupons easily is important.
      • Users are generally not looking for new products or innovation here, so when new products are launched, they must be associated with default products, as a new alternative. Deals and coupons will be successful here too.
      • Store brands have a huge opportunity here, because they know the pricing of all of their inventory, and can leverage existing customer loyalty. Offering consistently low prices on store-brands, offering more loyalty points for the purchase of store brands or discounted shipping will be incredibly persuasive.
      • Features may be set as default filters, so making sure that products are categorized correctly is essential. EX: Searchers may have a default setting to only see ‘gluten-free’ or ‘vegan’ food options. Fruit, vegetables, nuts and some packaged goods may not be marketing specifically to these shoppers, but their items could definitely be a good fit for these shoppers. They need to be classified correctly so that they surface in the most possible results.
  • Medium Risk/Medium Cost  – Voice Search Most Likely by Brand, then Features and Store:
    • As products get higher in price, they get higher in risk, and thus, people are more likely to be brand-loyal. These are things that people may have purchased before and had mixed experiences with, so they know which brands they had good experiences with, and which ones they want to avoid. They may have stores in mind because they know that the store sells the product, but they are mostly interested in finding the brand that they know that they like, for the best price.
    • A good example here is a high-quality sleeping bag:
    • This is an investment that is meant to last, and keep a person camping warm and comfortable.
    • This may also be something that plays into a person’s self-concept or conspicuous consumption, so brands are associated with external acceptance and consensus.
    • People are likely to shop at any store that offers the brand and product that they want, especially if it has desirable features.
    • If they make a bad decision, they might be uncomfortable and have to buy a replacement product sooner than expected.
  • For SEO:
      • Being grouped correctly by brand, and being associated with competitive brands with similar reputations is important.
      • Sorting and filtering by features is a secondary concern but still important.
      • Being easily found within a limited store environment is of least importance. The only variation here is if there is a store-brand that has a strong reputation (like REI’s house brand of luggage – it is the best!).

 

  • High Risk/High-Cost Products – Voice Search Most Likely by Features, then Brand and Store:
    • As product prices get even higher, the likelihood of someone having lots of experience with different brands and options goes way down, and so people focus mostly on features. They may have preferences for the brand and the store, but the main considerations are price and features.
    • A good example here is a car:
    • People generally only know about their own personal criteria of features. They may have brands in mind, that they like, but are easily persuaded to try new brands, since they don’t have as much of a history of experience to inform their evaluations of different brands.
    • Searchers may have brands or stores in mind, especially if there are pricing deals to be had, but the main decision criterion after the price is features.
    • Replacement purchase are not easily made, and the person may have to keep what they got, even if they don’t like it.
    • If they make a bad decision and don’t like what they get. they may have lost a lot of money, and may even risk personal safety.
  • For SEO:
      • Search will be focused on features and feature groupings. Think first about building awareness of features to get into the comparison-set.
      • Think in terms of feature grids, because this is how searchers will filter all of the possible results to generate a consideration set. If features are optional, the safest thing is to assume all optional features are added, for filtering, except in the case of price – price categorization should be based on the least optional features, and clarification. Clarification should be added to the process as soon as possible.
      • Comparison utilities and lists may be a good way to get into a consideration set. Searches like ‘Safest’ ‘Highest rated for safety’ ‘top safety rating’ are things that could easily be entered into a comparison grid, once a consideration set is established.
      • Optimizing for features, and having a good explanation of the importance of the feature will help searchers feel comfortable with their decisions.
      • If you are a store or a brand, you will have to focus on ways to decrease the risk by promoting guarantees and good return policies.

 

Other Articles in this Series:
Is Mobile-First the Same as Voice-First? (Article 1 of 4)
How Media & PWAs Fit Into the Larger Picture at Google (Article 2 of 4)
How Shopping Might Factor Into the Larger Picture (Article 3 of 4)
The Local & International Impact (Article 4 of 4)

Mobile-First Indexing or a Whole New Google? How Shopping Might Factor Into the Larger Picture – Article 3 of 4

By: Cindy Krum

E-commerce has always been a cornerstone of SEO. Sellers need to rank well for keywords that are related to the products that they sell so that searchers can find the products and buy them. The switch to Mobile-First Indexing and entity understanding for search could be a huge opportunity for retailers, or it could be a threat. Especially if the voice-search and ordering component is limited or optimized for a certain set of retailers, or those who are will willing to cut Google into the action. Voice-only online payment and offline delivery add a bit of complexity to the mix, especially if less-durable items like groceries are being delivered, on-demand. SEO’s will need to prepare themselves for potential changes in e-commerce. This could have a lot to do with voice search and Mobile-First Indexing.

The previous two articles in this series focused on major consolidations that are happening across a number of the Google brands that all seem related to Mobile-First Indexing. These new innovations facilitate frictionless movement of information and experiences from one device to another, regardless of the differing capabilities of the various devices. The first article focused on Mobile-First Indexing in a basic information retrieval context and the second focused on the media and entertainment context. This is the third article in the series, and it will tie Mobile-First Indexing to current and potential future changes with Google’s online payment options, Google Wallet and Android pay, as well as changes in Google Shopping and Google Express. It will also discuss the challenges and opportunities that e-commerce sites might face as Google fights to reclaim its market share in online shopping and protect its ad revenue from Apple, Amazon and other potential threats. The fourth and final article in the series will outline the geographic implications of Mobile-First Indexing, both international and local, especially as they occur in Google maps. Then it will speculate more about how the preponderance of evidence in these four articles strongly suggests that we may soon be dealing with a whole new Google, and what it will mean for SEO.


Consolidation in Mobile Payment

At Google I/O this year, Google representatives were very enthusiastic about a new one-click, cross-device registration, authentication, and payment that will be available soon, to make PWA shopping experiences much better. According to the speakers, the same technology that Google uses to make cross-device media consumption seamless will also be used to make any kind of cross-device authentication secure and seamless. Google’s new PWA-enabled cross-device payment system will work from the web, so will be easily accessible and secure on any device with a browser including iOS devices.

Historically, Google Wallet has been associated with Google-at-large and included by default in Google Chrome. Android Pay, on the other hand, has always been associated with the Android mobile operating system, as the default payment management system on the phone. As you can see on the right, it can hold multiple credit cards and even includes options for mobile carrier billing (T-mobile), and PayPal. From what we can tell, the main thing that made both systems necessary was Google’s desire to enable payment on iOS devices. These two options could continue to be maintained as separate systems, or they could be combined, allowing them to aggregate all the users that have signed-up with the different services all into one.

With one unified payment system, it will be easier for Google to integrate voice ‘buy’ commands into Google Home and Google Assistant for all of the cross-device shopping experiences in and outside of the Google ecosystem. With the increased focus on Shopping, and media, the ability to use voice commands to execute purchases without users needing to touch a device will be an especially big deal. The result of this strategy will be a seamless, frictionless shopping experience that is secure and deeply integrated into all of Google’s offerings.

When payment systems are deeply integrated and frictionless across all devices, it removes one of the biggest hassles for online shopping and makes people more likely to buy. It will go a long way to helping combat Google’s loss of users to Amazon Video, Netflix, Hulu, iTunes and other media outlets, as discussed for the relationship to Mobile-First Indexing in the previous article. It will also help protect Google from losing more mobile and desktop shopping market share to Amazon Prime, Amazon Fresh, Whole Foods and other future e-commerce competitors. Google believes that they can match their competitors in terms of their inventory and pricing – especially for digital goods, but eventually, they also hope to match Amazon Prime with cheap and immediate shipping of physical goods. This will be especially viable in the future if Google is able to use driverless cars for the automated pick-up and delivery of goods. This concept is discussed more, as it relates to Google maps, in the next article in this series.


Mobile & Voice Search Technology for Commerce

Google’s Mobile-First Indexing will enable users to interact in a voice-only way to find the information that they need about products and services without a web search – at least the way we conceive of web search today. Either your brand will be strong enough in the mind of the consumer that they will ask for it by name, or your product and its specifications will be the best-fit of the needs of the consumer. Brands that can demonstrate that they are the best-fit for the needs of the consumer will win the business through (hopefully) honest and unbiased direct comparison, based on features and specifications that are comparable because they are well marked-up in Schema. The internet has already led to a decreasing viability for over-saturated offline retail stores, as shown in the graph below, and Mobile-First Indexing may further decrease its importance, especially in industries that are more low-risk, repeat-purchase and voice-oriented, like groceries.

Online shopping is changing the offline shopping world in pretty significant ways. The massive growth and adoption of Amazon Prime have removed many of the barriers that kept people from buying things online. Similarly, regular and variable subscription services like Amazon Subscribe & Save, StitchFix, ShoeDazzle, Ipsy, BirchBox, BarkBox, and the like are changing how people prefer to shop, as is on-demand product delivery that is enabled by Sharing-Economy apps like Postmates and TaskRabbit, both of which can be used for food or small-purchase product delivery. The ability to casually shop or even narrow-down potential shopping options using your voice could be a game-changer for all of the online shopping business models in the near future, as adoption grows. Voice-only ordering, like what is already available and growing in adoption with the Amazon Alexa integration, allows people to make quick, low-risk purchase with minimal hassle, but also could allow people to do some simple research and filtering for more complex decisions.

Understanding how voice search will be used for shopping, and how to optimize content to work well in that context is a huge topic, so we moved the details about that into a separate post, linked here. The essence of it is that people will search by store, by brand, and by features. The likelihood that the search to start with one or the other is based on the price, risk, and regularity with which the user purchases the product. Success in optimizing for voice search will be based on an SEO’s ability to express and manage inventory-level information about the products, to help them surface in store-specific searches, capitalize on and reinforces brand loyalty, and filter appropriately for advanced searches, based on the features of the product.

Ultimately, voice search should make simple purchases easier and add confidence and satisfaction to more complex purchases. When the stakes of a purchase are higher, voice filtering and comparison could help users more accurately compare and evaluate a larger variety of options and information, actually giving them opportunity to ‘talk-through’ the complicated purchases and consider even more products and points of comparison. An automated system that passes no judgment about the questions, priorities or requirements or how long you take to make a decision, which could even improve the shopping experience and level of satisfaction that some users feel with their purchase.

 

Changes to Google Shopping & Google Express

Google has two shopping portals; Google Shopping, which has been around for many years, and Google Express, which is relatively new. Google Shopping allows users to search and compare product offerings from many online retailers who have agreed to share part of the revenue from each purchase with Google. Google Express is similar, but in Google Express, product availability is determined by your zip code, and more quick shipping options are available. It does not appear that Google is building huge AI driven warehouses like Amazon, in order to facilitate this quick shipping. Instead, they seem to be delivering items directly from retail stores or small warehouses. If Google Express is eventually able to further decrease their delivery time, its could be a huge boon for local supermarkets that get on-board. Offline retailers (grocery stores and others) could use Google Express as a replacement for individual store-run and app-run delivery services.

This is relevant for Mobile-First Indexing because Google Home and Google Assistant’s voice-controlled shopping list is already integrated with Google Express, making it much easier for people to place one-time or recurring orders for products that they need using only their voice. This makes Google Express at least somewhat competitive with Amazon Alexa’s voice ordering capabilities, so Google might be able to take back some of the market shares for product-oriented searches from Amazon. It is important to note that the Shopping List was originally part of Google Keep, which is already integrated with Google Docs, so there appears to be a more holistic, base-level integration happening there.

To this end, I expect we might see a merging of Google Shopping with Google Express in the next two years, to shift of power away from Amazon Prime. Google may even be willing to take a financial hit or break even to accomplish this task, just as Amazon did in the early days of its existence and of the launch of Amazon Prime, to build momentum and buy-in for the brand. NOTE: We don’t work much with PLAs or Google Shopping, so this is all just based on external observations.

To make this idea really profitable, Google will have to isolate a new advantage that consumers want. Google Express does offer free shipping, but the total value of your order from each different store has to meet variable thresholds for each, as you can see in the image below. This might be fine for big shoppers, or shoppers who really just want easy online ordering and quick delivery from companies who are not known for those services, and most of the stores in the list are not. The upside is that Google Express will store and enable information for each of the stores’ loyalty programs, so that you can still accumulate loyalty points, now without leaving the house.

If you already have logins and account details for a site, beyond a simple loyalty card, it will store and manage those as well. When you add this loyalty and account management system to what we suspect is happening with the Google payment utilities, there appears to be an even more powerful consolidation opportunity on the horizon. Once the two payment utilities are merged, the resulting product can be rolled into the Google Express payment system to manage all payment options as well as loyalty accounts and logins, making the long-term prospect for automated and voice-only ordering from Google Express that much more appealing. Loyalty points are good, but voice-only, same-day delivery and automated repeat delivery and billing, like Amazon Subscribe & Save offers are great.

Companies originally seemed slow to onboard with Google Express, but there appears to have been a recent surge in the number of major department stores that are integrating with Google Express. The current list of brands includes: Target, Walmart, Overstock.com, Wayfare, Whole Foods, CostCo, Kohls, Fry’s, The Home Depot, Walgreens, Bed Bath & Beyond, Ulta Beauty, Guitar Center, Hayneedle, JOANN, Payless Shoes, ACE Hardware, Pier 1, Sur le Table, Toys’R’Us, REI, The Vitamin Shoppe and more. As you can see in the list on the right, taken from Google Shopping, most of the top participants in Google Shopping are already part of Google Express. Some stores like Pier 1, Sur le Table, Fry’s, CostCo, WholeFoods, Ulta Beauty, REI, Payless Shoes appear to be part of Google Express but do not appear to be part of Google Shopping; still, there is definitely a large amount of crossover, and this may be further corroboration of an impending merger.

A potential consolidation of Google Express and Google Shopping may seem tangential for SEO because technically these are both sponsored search options, not organic search. While this is true, I  believe that it will still be important for SEOs to focus here, or at least be aware of it. Voice ordering may be compelling enough for consumers that they may change search behavior to the only shop with utilities that facilitate it like Amazon and Google. This is exactly what Google wants because they get revenue from each transaction.

In the future, most shopping with Google may be ‘Sponsored’ to one degree or another, which could be further altered to help get them out of future antitrust problems in the EU, where they are paying a €2.42 billion (~$2.73BN) fine. In this case, it was proved that inclusion of products in Google Shopping made it harder for organic products to surface, so if Google could make all product inclusion the result of the submission, and not rank products organically at all, that would be a win-win for Google. It is hard to know exactly how this change might happen, but it seems likely (or at least possible) that products which are not available in Google Express or Google Shopping will become even harder to find, especially in a voice search context. Items that can not be easily surfaced or compared in voice search would suffer. Changes like this may force us to redefine the distinction between paid search engine marketing (SEM) and organic search optimization (SEO), as the base requirements for success in SEM will begin to look more and more like skills that have historically been associated with SEO.  


Addition of Product Information in Google Images

Moving out of Sponsored product optimization and into a more pure-play organic optimization discussion about e-commerce and Mobile-First Indexing, we have to start with Google Images. It is quite possible that Google Image Search is already coming from the Mobile-First Index and has been for about the past 6 months. The images all appear to have been re-indexed, based on schema and entity understanding, rather than being indexed exclusively on the text content of the page where the image is from. With this new indexing, images are identified with special symbols when they are GIFs, still images that represent videos, news, recipes, and products, as shown below.

The change was most noticeable when product images got marked up with a little icon that indicates that they were a product, and all the vital stats for the product got lifted into the image search results ‘details’ information, as shown below, so that images could be filtered based on color, price, size, seller and other shopping oriented criteria, which is also shown below, though unfortunately the sorting feature seems to have been disabled for now.

This kind of image tagging shows that Google is not drawing a hard line between image results, and shopping results, news results, and any other image content. It seems like the entity understanding of the new Mobile-First Index allows these concepts to ‘sit together’ despite coming from different types of sources, which is nice because it adds utility and fits with how many people already shop online; finding items that they are interested in, then searching Google Images to see examples of how bloggers have styled an item of clothing, how the furniture or fixture looks in various rooms, or how the sporting gear looks after a real workout.

This presentation of image search results is also great for comparison shopping allowing competitors like eBay to filter into the consideration set, where they might not previously have been.


INFO: Historically, Google Shopping used image recognition to populate ‘Related Items’ and have been for awhile. I wrote about this phenomenon over two years ago (Nov 2015).: When searchers click on an image of a garment on a mannequin, all of the ‘Related Images’ are on mannequins. Searchers who click on an image of an item on a human model are only presented ‘Related Images’ on human models. This is still true. (So having product images that look a lot like higher-end competitor product images is still a great way to show up in the Related Items results of Google Shopping.)


Google began adding entity-style filters to the top of Google Images about 18 months ago, and these may have been the beginning of Google’s entity understanding for images. Before that, Google Shopping and Google Images have both been using their image matching algorithms to recommend similar items, described in the info box to the right, but this new image indexing seems much more driven by Schema and entity understanding, as discussed in Article 1, rather than image recognition. Image recognition and matching is still very important to Google, but it has to fit into their new entity understanding of the web. Google is even asking users to help with AI image recognition and categorization training tasks, using a crowdsourcing app which asks users questions about images to help them categorize images more accurately and make their entity understanding of images better.

The other indication that Google Images is already using the Mobile-First Index is the Dynamic ‘Share’ Links that are now included with all the images. Remember, Google uses these links when they ingest an entire database of information without requiring that URLs are available to associate with each concept, idea or entity. The share links are visible with the new save button and vertical triple-dot ‘more’ button with each image, but these may become more prominent over time. Currently, most of the images in Google’s Image search results do appear to have URLs, but this may not always be the case; especially if Google begins ingesting entire product catalogs directly from a database that is marked up with JSON-LD, instead of crawling e-commerce websites.

Even though Google Images is pulling in information from e-commerce sites, it is not pulling in Google Shopping results, aside from the Sponsored carousel at the top of the page. The images always appear to link to their original seller, so unlike Google Shopping and Google Express, in Google Images Google is not getting cut-into any of the transactions.

Since Image Search result are organized with entity understanding, another interesting phenomenon related to image search is starting to pop up. The ‘People Also Searched For’ boxes that were just recently added to Google desktop results, are now also appearing in image search results, in the image grid, as if they were an image, but each of the options is clickable and will allow users to filter down to those related images. Examples are shown below:

This could be hard to understand at first, but in a cross-device context, they make plenty of sense. Image searches will obviously struggle in a voice-only/eyes-free context, but they could be very useful in a voice-assisted scenario, such as a person who wants to search images on a voice-enabled TV. Adding a multiple-choice option for drill-down into the images will probably improve that user-experience significantly, but still help the voice recognition AI, because they cause the user to prompt the system with related options that it is already expecting; much like a multiple choice question in a voice-only experience.


Changes to Google Search Console & Google Analytics

Last but not least is tracking. Tracking is incredibly important for e-commerce SEO but may get much more complicated after Mobile-First Indexing. By now, everyone in the industry is aware that there is a new version of Google Search Console available, but there are some things that we might be able to glean from this about the Mobile-First Indexing update. The New Search Console is accessing different data that the Old Search Console could not; it is not just a re-skin of the same data. It seems to have been launched separately, on a different subdomain in order to keep the information about the two indexes from overlapping and creating duplication or confusion. Google may have also have used statistics from the New Search Console to compare with the Old Search Console, until they achieved some threshold of similarity in reporting, to know when the rankings and behavior in the Mobile-First Index were starting to match up with the rankings and behavior of the traditional Desktop-First Index.


NOTE: At the Friends of Search conference last week, Garry Ilyes did not deny the relationship between the new Search Console and the Mobile-First Index, and said that there was some relationship.


The main signal that indicates the relationship between Mobile-First Indexing and the New Search Console is the section that measures, ….wait for it….indexing! Since Mobile-First Indexing is new and will work for content that does not require a static web URL, that can’t be tracked by Google Analytics, Google needs a tool to report on the new non-URL’d content; hence the Indexing Report in the New Search Console. Most likely, web masters are only seeing URLs in their accounts because they only have websites set up to track to Google Search Console. It seems likely that once Mobile-First Indexing is fully launched, Google will start providing more information about how to ensure other things like Native Apps, Instant Apps, single-URL PWA’s, databases and other content are doing as far as Mobile-First Indexing.

Conclusion

E-Commerce is an important part of the internet economy but changes spurred by Google’s transition to Mobile-First Indexing could change how shoppers find what they are looking for online, and how SEO’s manage and optimize their online sales. The potential for voice-only shopping and ordering of products could be a huge opportunity for many companies, who are able to move quickly and adapt to the changing landscape in this space. Capitalizing on existing systems from Google or finding your own way to ensure secure payment and quick delivery will be critical to competing for the shoppers business.

The first article in this series focused on how Mobile-First Indexing might change basic information retrieval and the second article focused surfacing and serving media and entertainment across different devices with voice search. This article tied Mobile-First Indexing to current and potential future changes with Google’s online payment options, Google Wallet, and Android pay, and also discussed the potential impact of Google Shopping and Google Express. The fourth and final article in the series will outline the geographic implications of Mobile-First Indexing. It will focus more on Google Maps but discuss local and international implications for Mobile-First Indexing. It will speculate more about how all the changes discussed in this article series indicate that we may soon be dealing with a whole new Google.

Other Articles in this Series:
Is Mobile-First the Same as Voice-First? (Article 1 of 4)
How Media & PWAs Fit Into the Larger Picture at Google (Article 2 of 4)
How Shopping Might Factor Into the Larger Picture (Article 3 of 4)
The Local & International Impact (Article 4 of 4)