Bidirectional Encoder Representations from Transformers


Reference: Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).

The following pre-trained BERT models are available from the gluonnlp.model.get_model API:

  book_corpus_wiki_en_uncased book_corpus_wiki_en_cased wiki_multilingual wiki_multilingual_cased wiki_cn
bert_24_1024_16 x x x

where bert_12_768_12 refers to the BERT BASE model, and bert_24_1024_16 refers to the BERT LARGE model.

BERT for Sentence Pair Classification

GluonNLP provides the following example script to fine-tune sentence pair classification with pre-trained BERT model.

Download the MRPC dataset:

$ curl -L -o
$ python3 --data_dir glue_data --tasks MRPC

Use the following command to fine-tune the BERT model for classification on the MRPC dataset.

$ GLUE_DIR=glue_data python --batch_size 32 --optimizer bertadam --epochs 3 --gpu --seed 1 --lr 2e-5

It gets validation accuracy of 87.3%, whereas the the original Tensorflow implementation give evaluation results between 84% and 88%.