Tuesday, March 10, 2020

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators


This video explains the new Replaced Token Detection pre-training objective introduced in ELECTRA. ELECTRA is much more compute efficient due to defining the loss on the entire input sequence and avoiding the introduction of the [MASK] token into the self-supervised learning task. ELECTRA-small is trained on 1 GPU for 4 days and outperforms GPT trained with 30x more compute. ELECTRA is on par with RoBERTa and XLNet with 1/4 of the compute and surpasses those models with the same level of compute! Thanks for watching! Please Subscribe! Paper Link: ELECTRA: https://ift.tt/2oXvoFQ BERT: https://ift.tt/2CCYvU9

No comments:

Post a Comment