Thank you!

1 min readJun 11, 2019

Thank you!

Well if you don’t want to train it from scratch then it you would have to go along with the existing vocabulary of BERT. The default BERT model, has a vocabulary of 30522 words which is significantly less than, let’s say… GLoVe, but keep in mind that BERT uses the wordpiece technique for its vocabulary. Meaning that, words are split into pieces which allows it to cover words that not in the vocabulary in the first place.

Additionally, a good resource for understanding BERT is this:

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations…

jalammar.github.io

Hope this helps. :)

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations…

Written by Andreas Pogiatzis

Responses (1)