Google’s SMITH Algorithm Outranks BERT

Recently Google published a research paper on a new algorithm called SMITH that claims to outdo BERT in understanding long queries and documents. What makes this new model excel better is that it can comprehend passages within documents in the same way that BERT interprets words and sentences, which enables the Google algorithms to read and understand longer documents.

Limitations of Google’s BERT Algorithm

BERT algorithm uses Transformer, an attention mechanism that understands the contextual relation between words in a text. In its simplest form, Transformer includes two separate mechanisms- an encoder that reads the text and a decoder that predict the hidden words from the context. In the past few years, such self-attention based mechanisms like Transformers… and BERT have achieved tremendous performance in text matching. Because of the quadratic computational complexity of self-attention concerning input text length they are still limited to a few sentences or one paragraph.

Why Is It Difficult To Comprehend Long Documents?

The researchers Liu Yang Mingyang, Zhang Cheng Li, Michael Bendersky, Marc Najork in the paper quotes that

“Compared to semantic matching between short texts, or between short and long texts, semantic matching between long texts is a more challenging task due to a few reasons:

1) When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;

2) Long documents contain internal structure like sections, passages, and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;

3) The processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design “

According to the researchers, the issue of matching long queries to long content has not been adequately explored which they seek to resolve using the SMITH algorithm.

What is the SMITH Algorithm?

The current model BERT (Bidirectional Encoder Representations from Transformers) is designed to understand the full context of a word by understanding the context of sentences. Thereby allowing the algorithm to fully comprehend the intent behind each search query. Such algorithms are also trained on data sets to predict hidden words from the context within the sentences.

Likewise, the SMITH model is trained to understand passages within the context of the complete document and to predict the next block of sentences are. Under the opinion of researchers, such training helps the algorithm understand larger documents a lot better than the BERT algorithm.

Also Read: The Ultimate Checklist for International SEO

They also claim that the SMITH model outperforms many states of the art models, including BERT, for understanding long-form content.

They say

“The experimental results on several benchmark datasets show that our proposed SMITH model outperforms previous state-of-the-art Siamese matching models including HAN, SMASH, and BERT for long-form document matching.
Moreover, our proposed model increases the maximum input text length from 512 to 2048 when compared with BERT-based baseline methods.”

Is Google Using the SMITH Algorithm?

Generally, Google does not specify the algorithm it is using. Hence it would be purely speculative to say whether or not it is in use unless Google announces formally that the SMITH algorithm is in use to comprehend passages within web pages.

Building a strong brand recognition needs work on SEO, content marketing, social media marketing among other things. Contact us to know how we can help you build your brand visibility.