Google natural language API
NLP is the technology behind such popular language applications as:
- Google Translate
- Microsoft Word
- OK Google, Siri, Cortana, and Alexa
NLP is the framework that powers Google BERT.
The Google natural language API consists of the following five services.
1. Syntax Analysis
Google breaks down a query into individual words and extracts linguistic information for each of them.
For example, the query “who is the father of science?” is broken down via syntax analysis into individual parts such as:
- Who tag = pronoun
- Is tag (singular present number) = singular
- The tag = determiner
- Father tag (noun number) = singular
- Of tag = preposition
- Science tag = noun
2. Sentiment Analysis
Google’s sentiment analysis system assigns an emotional score to the query. Here are some examples of sentiment analysis:
Please note: The above values and examples are all taken randomly. This is done to make you understand the concept of sentiment analysis done by Google. The actual algorithm that Google uses is different and confidential.
3. Entity Analysis
In this process, Google picks up “entities” from a query and generally uses Wikipedia as a database to find the entities in the query.
For example, in the query “what is the age of Selena Gomez?”, Google detects “Selena Gomez” as the entity and returns a direct answer to the searcher from Wikipedia:
4. Entity Sentiment Analysis
Google goes a step further and identifies the sentiment in the overall document containing the entities. While processing web pages, Google assigns a sentiment score to each of the entities depending on how they are used in the document. The scoring is similar to the scoring done during sentiment analysis.
5. Text Classification
Imagine having a large database of categories and subcategories like DMOZ (a multilingual open-content directory of World Wide Web links). When DMOZ was active, it classified a website into categories and subcategories and even more subcategories.
This is what text classification does. Google matches the closest subcategory of web pages depending on the query entered by the user.
For example, for a query like “design of a butterfly,” Google might identify different subcategories like “modern art,” “digital art,” “artistic design,” “illustration,” “architecture,” etc., and then choose the closest matching subcategory.
In the words of Google:
“One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples.”
To solve the problem of a shortage of training data, Google went a step further and designed Google AutoML Natural Language that allows users to create customized machine learning models. Google’s BERT model is an extension of the Google AutoML Natural Language.
Please note: The Google BERT model understands the context of a webpage and presents the best documents to the searcher. Don’t think of BERT as a method to refine search queries; rather, it is also a way of understanding the context of the text contained in the web pages.