1 min readApr 26, 2019
Thanks for your questions!
So basically BERT is a language model which can be fine-tuned for many NLP tasks. The extra input data like sentence and positional embeddings are used for fine-tuning these tasks mainly (i.e. SQuAD).
In this case, these extra inputs are simply ignored as I just want to extract word features from BERT.
I have used this approach for a downstream feature-based approach and it seems that is working so I hope this answers your question.
Best,
Antreas