The Entity & Language Series: Frameworks, Translation, Natural Language & APIs (2 of 5)April 20, 2018
By: Cindy Krum
All of a sudden, we are seeing translation happen everywhere in the Google ecosystem: Google Maps is now available in 39 new languages, Google Translate rolled out new interface with faster access to ‘Conversation Mode,’ Google Translate on Android switched to server-side functionality that allows it to be invoked and work within any app or website on the phone, as shown in the image on the right, where Google Translate has been invoked to translate a tweet in the Twitter native app. Google clearly has reached a new level in their voice and language capabilities!
We are also seeing Google find lots of new uses for their Cloud Natural Language API; Google also just launched ‘Google Talk to Books’ which uses it to allow you to ask questions that it will answer from knowledge it has gained from crawling and processing/understanding the contents of over 100,000 books. They also just launched a new word association game called Semantris, which has two modes that both allow players to work against time to guess at on-screen word relationships to advance ever-increasing special hurdles caused, as more and more words are added to the screen.
And the list goes on. We are also seeing some of this play out in search. Image search results are now clearly pulling from an international index, with captions that have been translated to fit the query-language. Map results include translated entity understanding for some major generic queries, like ‘grocery store’ and ‘ATM’, and they also auto-translate user-reviews for local businesses in to the searcher’s language.
The timing of all of these changes is not a coincidence. It is all a natural side-effect of the recent launch of Mobile-First Indexing, as we call it Entity-First Indexing. This is the second article in a multi-part series about entities, language and their relationship to Google’s shift to Mobile-First/Entity-First Indexing. The previous article provided fundamental background knowledge about the concepts of entity search, entity indexing and what it might mean in the context of Google. This article will focus on the tools that we believe Google used to classify all the content on the web as entities, and organize them based on their relationships, and launch the new indexing methodology. Then we will speculate about what that might mean for SEO. The next three articles in this series will focus research and findings that we completed to show evidence of these theories in play internationally, as they are related to the functioning of the different translation APIs, how those impact Entity Understanding and how personalization plays into Google’s Entity Understanding and search results.
- Fuchsia & Why Entities are So Important to Google
- How Device Context & Google Play Fit In
- Are Local Businesses Already Being Treated Like Apps?
- Global Search Algorithms are Better than Local
- Two Parts of an Entity Classification Engine
- How Marketers Can Start Getting Value from the Tools
- Essential Entity Elements – Critical Requirements for Correct Classification of an Entity
Fuchsia & Why Entities are So Important to Google
To understand how language and entities fit into the larger picture at Google, you have to be able to see beyond just search and SEO. Google cares a lot about AI, to the point that they just made two people who previously worked on the AI team, the heads of the Search Team. At Google, they also care a lot about the voice search – so much that Google Assistant has already shipped in more than 400 million devices around the world. Finally, Google care about reaching what they call The Next Billion Users – people living outside of North America and Western Europe, who have historically not been on the cutting edge of technology, but are now getting online and becoming active and engaged members of the online community. All of these goals may be brought together with a new software that Google is working, currently under the code name Fuchsia.
Fuchsia is a combination of a browser and an OS. What is the most important about it from a search perspective, is that it works based almost entirely on entities and feeds. The documentation and specifications are still very thin, but if this is the direction that Google is headed, then we can be sure that search of some sort will be tightly integrated in the software. As SEO’s, what we need to remember, is that search is not just web, and this is something that Google now really seems to be taking seriously. Google knows that there are lots of different types of content that people want to surface and interact with on their phones and other devices, and not all of it can be found on websites. This is why entities are so important. They allow web and non-web content to be grouped together, and surface when they are the most appropriate for the context, or surface together, if the context is not clear, to let the user decide. This is where Google Play comes in.
Even now, if you are not sold on the idea of Fuchsia ever impacting your marketing strategy, it is worth looking at the Chrome Feed, which is a default part of all Android phones and part of the Google App on iOS. This customization feed, sometimes called ‘Articles for You’ is almost entirely entity based and according to NiemanLab and TechCrunch, traffic from this source increased 2,100% in 2017. Users get to select the specific topics that they want to ‘follow’ and the feed updates based on those topics, but also shows carousels of related topics, as shown below. Users can click on the triple-dot menu at any time to update the customization of their feed. If you don’t think this is a powerful way of getting news, realize that people can’t search for a news story until they at least have an idea of the topic or keywords that they want to search for – they have to be aware of if to put in a query. You can also think of how Twitter and Facebook work – both in feeds that you customize based on who you are friends with or follow – but most of us wish we could customize those feeds more. Google is hoping to be able to get us there in their own offering!
How Device Context & Google Play Fit In
Once Google launched app indexing, most SEO’s probably thought that Google’s ultimate goal was to integrate apps into normal search results. For awhile, it probably was, but deep linking and app indexing proved to be so problematic and complex for so many companies that it fell off of most people’s radar’s and Google changed course.
Either your app and website had exact web parity, all the time, and it was easy, or you didn’t and it was much more complicated. The problems generally stemmed from large sites with different CMS running the back-ends of their web and app platforms-sometimes even different between Android and iOS. This made mapping between all three systems to establish and maintain the required web-parity between the apps and the website a nightmare. Beyond that, whenever anything moved in the app or the website, content had to be moved everywhere else, to mirror the change. We think that this was one of the many good reasons that Google started advocating PWAs so strongly – it got them out of having to sort out the problems with deep linking and app indexing.
PWA’s allowed one set of code handled app and web interaction, which was brilliant, but what a lot of SEO’s missed, was the announcement that PWA’s were being added to Google Play, Google’s app store. PWAs are essentially ‘websites that took all the right vitamins’ according to Alex Russell from Google, so them being added to Google Play was a big deal! We have suspected it for a long time, but with the addition of PWA’s (and Google Instant Apps) to Google Play, it is finally clear that apps are not being integrated into traditional web search at Google, like most SEOs suspected, BUT, traditional web search is being integrated into Google Play – or at least using the Google Play framework. This fits perfectly into the concept of Entity-First Indexing, because Google Play already uses a cross-device, context-aware, entity-style classification hierarchy for their search!
Google Play can also handle the multi-media, cross-device content that Google probably wants to surface more in Mobile-First/Entity-First Indexing, including games, apps, music, movies, TV, etc, as shown below in the Google Play & Monty Python Google Play Search examples. All that content is already integrated, populated and ranking in Google Play. It is also set up well for entity classification, since things are already broken down based on basic classifications, like if they are apps, games, movies, TV shows or books. Within each of those categories, there are sub-categories with related sub-categories. There are also main entities, like developer accounts, or artists, from which multiple apps or albums and/or songs can be surfaced, and these also have relationships already built in – to other genres of related content, so this is all great for Entity Understanding.
Google Play is already set up to include device-context in it’s search algorithm, so that it only surfaces apps and content that can be downloaded or played on the device that is searching. It is also set up to allow different media types in a SERP. As discussed in the first article in this series, context is incredibly important to Google right now because it is critical for the disambiguation of a searcher’s intent when it comes to entities.
Googles additional focus on context could also make the addition of videos and GIFs to Google Image Search seem potentially more logical, if contextual is considered. Perhaps this is now just a contextual grouping of visually oriented content, which would make it easier to interact with on devices like a TV, where you might use voice search or assisted search, casting or sharing your screen from a phone or laptop to the larger screen so that the viewing experience can be shared. Bill Slawski explains that many of Google’s recent patents focus on “user’s needs” and “context”….One of those was about Context Vectors, which Google told us [sic] involved the us of context terms from knowledge bases, to help identify the meaning of terms that might have more than one meaning” We think that the ‘knowledge base’ that Google is referring to in this patent documentation is actually Google Knowledge and similar data repositories that may have since been merged into the Knowledge Graph. The current status of Google Image Search could just be a middle-term result, that will change more, as more classification and UX positioning is added to the front-end side of the search interface.
From a linguistic perspective, Google Play was also a great candidate to use as a new indexing framework. For all the categories of content, but especially for apps, the categories that are available in the store stay the same in every language, though they are translated. More importantly though, metadata that app developers or ASOs submit to describe their apps in the store is auto-translated in all languages, so that your app can be surfaced for appropriate keyword searches in any language. So Google Play is already set up for a basic entity understanding, with all the hreflang information and hierarchical structure already in place.
Are Local Businesses Already Being Treated Like Apps?
If you are not focused on Local SEO, you might not be aware of the massive number of changes that have launched for GoogleMyBusiness (GMB) listings in the past couple of weeks, in the time since the March 17th update. In general, small business owners have recently been given a lot more control of how their small business look in the Google Knowledge Graph listings. This includes: the ability to add and edit a business description that shows at the top of the listing, the ability to actively edit the menu of services that the business offers, and more.
Before March 17, Google had also quietly been testing Google Posts, which allowed small businesses to use their GMB accounts to publish calls to action, and allow searchers to take actions directly from the GMB – Knowledge Graph panel, including booking appointments and reservations. It is essentially a micro-blogging platform that lets business owners make direct updates to their business listing whenever they want, and this is a big deal. Joel Headley and Miriam Ellis do a great job of covering it on the Moz Blog.
All of this makes it seem very much like Google is empathizing with, and trying to fix one of the biggest pains of small businesses – maintaining their websites. This is another aspect of the Google Play store that fits well in the model we believe Google is going for, is that proven entity owners, such as app developers, are able to edit their app listings at will, to help market them and optimize them for search. If Google can empower small business owners to build out their GMB listings, and keep them current, then it will save them a lot of time and money, and many of them would be just as happy, or happier with that solution then having the burden and cost of maintaining a website.
From Google’s perspective, they just want to have the best and most accurate data that they can, as quickly and efficiently as they can. Google knows that small businesses often struggle to communicate business changes to web development teams in real time, and budget constraints may keep them from making changes as often as they would like. By empowering the business owners to control the listing directly, and even allowing them to set up calls to action and send push-notifications, Google is really creating a win-win situation for many small businesses. There are some obvious SEO questions about how easy or hard it will be to optimize GMB listings in the complete absence of a website, but this is an area to watch. Google is likely using off-line engagement data, and travel radiuses to inform how widely a business’s ranking radius should be, and how relevant it is for various queries, so we could be in all-new territory here, in terms of optimization and success metrics are concerned.
Global Search Algorithms are Better than Local
The websites that Google currently ranks in search results are translated by the website creators or their staff, but this is not necessarily true of the other entities that are ranked, for instance Knowledge Graph results, and related concepts that are linked there, like apps, videos and music. In these, Google is often using their own tools to translate content for presentation in search results (as they do aggressively with Android apps) or actively deciding that translation is not necessary, as is common with most media. They do this translation with basic translation APIs and Natural Language APIs and sometimes, potentially human assistance.
Without a language agnostic, unifying principle, organizing, sorting and surfacing all the information in the world will just get more and more unwieldy for Google over time. This is why, in our best guess, Google is not translating the entire web – they are just doing rough translations for the sake of entity classification. From there, they are ranking existing translations in search results, and then their language APIs makes it possible to translate other untranslated content with APIs, on an as-needed basis, which may become more important as voice search grows in adoption. For Google, it is actually easier to unify their index on a singular set of language agnostic entities, than it is to crawl and index all of the concepts in all of the languages, without the unifying, organizing principles of entities.
This synthesis of information necessary for entity classification may actually create more benefit than is immediately apparent to most SEOs; most SEOs assume that there is an appropriate keyword for everything, but in reality, language translation is often not symmetrical or absolute. We have probably all heard that Eskimos have more than 50 words for ‘snow’. These 50 words are not all exact translations but have slight variations in meaning which often do not directly translate in other languages. Similarly, you may have been exposed to the now-trendy Danish concept of ‘Hygge,’ which is a warm, soft homey feeling that one can create, which usually includes snacks and candle light, but again, there is no a direct translation for this word in English. If we required direct translation for classification, much of the richer and more detailed and nuanced meaning would be lost. This could also include loss of larger data concepts that are valuable across international borders, as postulated in the example below:
EX: If I am a Danish climate researcher, and we develop a method for measuring a the carbon footprint of a community, we create a new keyword to describe this new ‘collective community carbon footprint measurement’ concept, and the keyword is, ‘voresfodspor.’ This word exists only in Danish, but the concept is easily described in other languages. We don’t want the data and our research to be lost just because the keyword does not universally translate, so we need to tie it to a larger entity – ‘climate change,’ ‘climate measurement, ‘carbon measurement,’ ‘community measurement.’ Entity understanding is not perfect translation, but it is great for making sure that concepts don’t get lost or ignored. It is great for allowing further refinement by humans or by machine learning and AI down the road.
We know that the nature and content of languages in the world changes over time, much more quickly than the nature and number of entities (described at length in the previous article). Keying Google’s index off of a singular list of entities, in this case, based in English, makes surfacing content on the ever-growing web faster than it would be if entities had to be coded into the hierarchy of all languages individually. This is perhaps why in John Mueller’s recent AMA, John clearly said that Google wants to get away from having language and country-specific search algorithms. According to John, “For the most part, we try not to have separate algorithms per country or language. It doesn’t scale if we have to do that. It makes much more sense to spend a bit more time on making something that works across the whole of the web. That doesn’t mean that you don’t see local differences, but often that’s just a reflection of the local content which we see.”
MarketFinder Tool is an Entity Classification Engine
In discussing Entity-First Indexing and the process by which Google may have approached it, we think it is useful to look at the tools that they have released recently, incase they can give us insights into what Google’s tech teams have been focusing on. The assumption here is that Google often seems to release cut-down versions of internal tools and technologies, once they are ready to start helping marketers take advantage of the new options that Google has been focusing on in the background. The best example here is the Page Speed Insights tool, that came out after the PageSpeedy server utility became available and the internal Google Page Speed Team had been working on helping speed up Chrome, and helping webmasters speed up their own web pages for a couple years.
In the past couple months, along with the many other translation and language-oriented new releases, Google has launched the MarketFinder and promoted it to their advertising and AdWords clients (Big thanks to Bill Hunt, one of the most notable PPC experts in the industry, for pointing this out to me!) In this tool, you can input a URL and it will quickly will tell you what advertising categories it believes are most appropriate for the URL, as you can see below in the www.Chewy.com example; from there, it will tell you what markets and languages show the most potential for marketing and advertising success in these topics, depending on if you sell products on the site. From there it gives you detailed information about each of the markets where it suggests you should advertise, including a country profile, economic profile, search and advertising information, online profile, purchase behavior and logistics for the country.
What is important to understand about the tool is that it is not telling you the value of the keyword but the value of the keyword concept – or the entity based on the automatic categorization of the site. The keyword and its related concepts, translated in to all the relevant languages, in all the countries where people might be searching for this topic or related topics. It is ALMOST like Google published a lite version of their ‘Entity Classification Engine’ and made available for PPC marketers to help them find the best markets for their advertising efforts – regardless of language, currency and other ideas that are often tied to countries, currencies and languages, but are less tied to entities.
The other thing that is interesting about the tool, which could be a coincidence, or could be related to Mobile-First Indexing and Entity classification, is that it does not allow you to evaluate pages – only domains – but it evaluates domains very quickly. It is almost as if it is pulling the classification of each domain from an existing entity database – like Google already has all of the domains classified by what entities they are most closely related to. This part is still unclear, but interesting from an SEO perspective. If it is telling us exactly how a domain has been classified, we can verify that we agree with the classification, or potentially do things to try to alter the classification in future crawls.
Cloud Native Language API Tool
The next somewhat newly released tool, and what many of the newest translation technology has been based on is the Google Cloud Natural API, which uses natural language technologies to help reveal the meaning of texts and how Google breaks it down into different linguistic structures to understand it. According to Google, the API uses the same Machine Learning technology that Google relies on for Google Search and Google Assistant. When you visit the API documentation, you can interact with the API directly, even without a project integration, by dropping text into the text box half way down the page. The first things that it does is to classify the submitted text based on it’s understanding of it, as entities! The tab is even called the ‘entities’ tab in the tool. (Those who doubt the importance of entities, probably also don’t realize how hard it must have been to develop this technology for all languages around the world – The level of commitment to developing and honing a tool like this is quite impressive!)
As you can see in the example below, with text taken from the MobileMoxie home page, our Toolset is somewhat correctly identified as a consumer good, though it might be better described as a ‘SaaS marketing service.’ A lot of keywords that the Cloud Natural Language API should be able to identify are identified as ‘other’ which might mean that it needs more context. It is also interesting that many of the words in the submission are totally dropped out and not evaluated at all. This probably means that these words are not impacting our entity classification at all, or at least not very much – because they did not add significant uniqueness or clarification to the text. What is interesting here, is that many of these words are classic marketing terminology, so it is possible that they are only being ignored BECAUSE something in the text has been identified as a Consumer Product.
For SEO’s, this tool might be a great way to evaluate new page copy, before it goes live, to determine how it might impact the evaluation and entity classification of a domain. If it turns out that a domain has been mis-classified, this tool might be the best option for quick guidance about how to change on-page text for a more accurate entity classification.
NOTE: Changing the capitalization on ‘MobileMoxie Toolset’ did change that classification from ‘Consumer Product’ to ‘Other’ but that did not change the number of words in the sentence that were evaluated, nor did removing the mention of the Toolset from the sentence all together.
Beyond just entity classification, another way the API reveals meaning is by determining Salience and Sentiment scores for an entity. According to Google, “Salience shows the importance or centrality of an entity to the entire document text.” In this tool, sentiment can probably only be evaluated based on what is submitted in the text box, using a score from 0 to 1, with zero representing low salience and 1 representing high salience, but in any real algorithm, we are guessing that salience is measured as a relationship with multiple metrics including the relationship to the page, to the entire domain and possibly to the larger entity as a whole, if there is one.
Sentiment isn’t defined, but it is generally agreed to be the positivity or negativity associated with a particular concept and in this, Google provides a score from -1.0 which is very negative, to 1.0, which is very positive. The magnitude of this score is described as the strength of the sentiment (probably in the context of the page or potentially on a more granular sentence level,) regardless of the score.
The next part of the tool is a separate Sentiment Analysis section which is a bit hard to understand because it has new numbers and scoring, different from what was used numbers in the Entities section of the tool. There are three sets of Sentiment and Magnitude scores. They are not labeled, so it is not entirely clear why there are three or what each of the three scores is associated with. Since only one of the Entities warranted a score of anything but 0, it is hard to know where the scores of .3 to .9 are coming from here, but a legend explains that 1- to -0.25 is red, presumably bad, -0.25 [sic] to 0.25 is yellow, presumably neutral, and 0.25 [sic] to 1 is green, presumably positive. Since this is different from the scoring used for Sentiment on the Entities tab, it is a bit hard to tell. It seems that Google offers more details about Sentiment Analysis Values in separate documentation but until the feedback from this tool is more clear it will probably not be too useful for SEO.
The next tab in this tool is very interesting – it is the Syntax evaluation. It basically breaks the sentences down, and shows how it understands each piece of it as a part of language. Using this in conjunction with the information on the Entity tab will allow you to understand how Google believes searchers are able to interact with Entities on your site.
After that is the shortest, but in my mind, most important information – the Categories. This takes whatever you have put into the tool and assigns it a Category tab, essentially telling you what part of Google’s Knowledge Graph the information that you submitted would be classified as. A full list of the categories that you can be classified as can be found here: https://cloud.google.com/natural-language/docs/categories
Two Parts of an Entity Classification Engine
While the value of these two tools to marketers might be hard to understand, their value and what they represent to Google is huge. We believe that these two tools together make up parts of what made it possible for Google to switch from the old method of indexing to the Entity-First Indexing. They are basically both Entity Classification Engines that use the same core, internationally translated entity hierarchy to either show how language and entity classification is done, in the case of the natural language API or show the financial results of entity classification for a businesses marketing plan in international markets, in the case of the market finder. It is basically the upstream and downstream impacts of entity classification!
How Marketers Can Start Getting Value from the Tools
The value of these new Google tools for digital marketers is still evolving but here are some steps SEOs can take to start better understanding and using them for thinking about entities in the context of their SEO efforts:
- Make sure Google is categorizing your domain content correctly. Use the toolset to make sure that Google is classifying the most important pages on your site, like your homepage, as expected, since inaccurate classification could negatively impact your SEO. Google will struggle to display your page in the search results to the right people at the right time if Google has an incorrect and/or incomplete understanding of the page’s content. The MarketFinder tool can be used to determine how Google might be evaluating the domain as a whole, and the Cloud Natural Language API can be used to evaluate content on a page by page or tab by tab basis. If Google is classifying your site in an unexpected way, investigate which keywords on the page might be contributing to this misclassification.
- Read Google’s Natural Language API documentation about Sentiment Analysis. As described earlier in this article, the Sentiment section in the Natural Language API is not labeled clearly, so it will likely be challenging for most SEOs to use it in its current form. Google has separate documentation with more details about Sentiment Analysis that is worth checking out because it offers a bit more context, but more clarity from Google about Sentiment would be ideal. We’ll be keeping an eye open for documentation updates from Google that may help fill in the gaps.
- Learn as much as you can about “Entities” in the context of search. Entities can be a tough concept to understand, but we recommend keeping it top-of-mind. As Google moves into a new era that is focused much more on voice and cross-device interaction, entities will grow in importance, and it will be challenging to get the full value out of the Google tools without that foundational knowledge. Here are some great resources that will help you build that knowledge: the previous article in this series about “Entity-First Indexing,” this excellent article by Dave Davies about one of Google’s patents on entity relationships, this great patent breakdown by Bill Slawski, and Google’s official documentation about Analyzing Entities using the Google Natural Language API.
- Understand alternate theories about Mobile-First Indexing. MobileMoxie recently published a four-part series investigating various changes in search results and other aspects of the Google ecosystem that seem related to the switch to Mobile-First Indexing, but have not been elucidated by Google. Most SEO’s and Google representatives are focusing on tactical changes and evaluations that need to be done on a website, but it is also important to not lose sight of the larger picture, and what Google’s larger, long term goals are, to understand how these changes fit into that mix. This will help you relate entities, entity search and entity indexing to your company’s larger strategy more readily.
Essential Entity Elements – Critical Requirements for Correct Classification of an Entity
Over time, Google will find new things that need to be classified as entities, or old things will need to be re-classified as different kinds of entities. SEO’s will 1) need to know what type of entity that they would want to be classified as, and then 2) need to know what are the critical requirements that Google needs to find to classify something as a specific type of entity.
To do this, SEO’s will need to determine what Google considers to be the essential elements for similar entities that are correctly classified and ranking well in their top relevant searches. The various types of entities that Google recognizes and highlights in Knowledge Graph panels will have unifying elements that will change from entity to entity but will be the same for similar groups of entities or types of content. For instance, local businesses have had the same requirements for a long time, generally abbreviated as NAP – Name, Address and Phone number. This could be built out to include a logo and an image of the business. In other cases, like for a movie, most movie Knowledge Graph entries have a name, cast list, run time, age rating, release date, promo art and a video trailer. If your business is not classified as a particular kind of entity, and would like to be, then this will be and important step to take.
In the long-run, this model could be difficult for publishers and companies that are not original content creators, but this is probably by design. Websites that use an ‘aggregation and monetization’ model, or that survive primarily on ad revenue will struggle more; this is Google’s model, and they don’t appreciate the competition, and also, it hurts their user’s experience when they search! Google wants to raise the bar for quality content and limit the impact that low quality contributors have on the search ecosystem. By focusing more on entities, they also focus more on original, authoritative content, so this is easily a net-positive result for them. In the short term, it could even decrease the amount of urgency around Google’s effort to provide more safety and security for searchers, and minimizing the negative impact of ads, popups, malware and other nefarious online risks.
While many SEO’s, designers and developers will see moves in this direction as a huge threat, small business owners and users will probably see it as a huge benefit. Perhaps it will make the barrier to entry on the web high enough that nefarious actors will look elsewhere for spam and easy-money opportunities, and the web will become a more reliable, high-quality experience, on all devices. We can only hope. In the meantime, don’t get caught up on old SEO techniques and miss what is at the top of all of your actual search results – Knowledge Graph and entities.
This is the second article in a five part series about entities and language, and their relationship to the change to Mobile-First Indexing – what we are calling Entity-First Indexing. This article focused on the tools that Google used to classify the web, and reindex everything in an entity hierarchy. The next three articles will focus on our international research, and how the various translation APIs impact search results and Entity Understanding around the world, and how personal settings impact Google’s Entity Understanding and search results on an individual basis.