Andreas Pogiatzis
1 min readMay 4, 2019

--

Hey,

If I remember correctly the only difference is that the labels are an array instead of a single value, no?

Also, here is a NER example using BERT in a BiLSTM — CRF architecture. Plus there is the option to set the layer as trainable (i.e. to be fine-tuned) or not.

Note that the code was very rushed so there many improvements possible. Also, there are some parts that need a further explanation but I may as well and grab the chance to write a blog post about it.

Hope this helps mate.

--

--

Andreas Pogiatzis
Andreas Pogiatzis

Written by Andreas Pogiatzis

☰ PhD Candidate @ UoG ● Combining Cyber Security with Data Science ● Writing to Understand

Responses (1)