Tuesday, July 21, 2020

Don't Stop Pretraining!


This video explains a study on the benefits of continued pre-training with RoBERTa. Even though RoBERTa is trained on 160GB of uncompressed text from a massive range of sources, the authors show continued gains by continuing pre-training not only in the domain of the downstream task (i.e. massive collections of amazon reviews, news articles, computer science, biomedical research papers), but further gains by doing pre-training (masked language modeling) on the data for the task itself (especially helpful when there is unlabeled data that is better curated for that task than the more broad "domain" of the task). Thanks for watching! Please Subscribe! Paper Links: Don't Stop Pretraining: https://ift.tt/2WEdjdt RoBERTa: https://ift.tt/32SZycF

No comments:

Post a Comment