Bookmarked models/official/projects/token_dropping at master · tensorflow/models · GitHub

Token dropping aims to accelerate the pretraining of transformer models such as BERT without degrading its performance on downstream tasks.

A BERT model pretrained using this token dropping method is not different to a BERT model pretrained in the conventional way: a BERT checkpoint pretrained with token dropping can be viewed and used as a normal BERT checkpoint, for finetuning etc.