Tuesday, October 27, 2020

Self-Training improves Pre-Training for Natural Language Understanding


This video explains a new paper that shows benefits by Self-Training after Language Modeling to improve the performance of RoBERTa-Large. The paper goes on to show Self-Training gains in Knowledge Distillation and Few-Shot Learning as well. They also introduce an interesting unlabeled data filtering algorithm, SentAugment that improves performance and reduces the computational cost of this kind of self-training looping. Thanks for watching! Please Subscribe! Paper Links: Paper Link: https://ift.tt/2JcWhzt Distributed Representations of Words and Phrases: https://ift.tt/1PAG0Kt Rethinking Pre-training and Self-training: https://ift.tt/2ULTfFp Don't Stop Pretraining: https://ift.tt/2WEdjdt Universal Sentence Encoder: https://ift.tt/2uwxVZJ Common Crawl Corpus: https://ift.tt/1St4m0m Fairseq: https://ift.tt/2K3FbUs BERT: https://ift.tt/2pMXn84 Noisy Student: https://ift.tt/2Q8GfYV POET: https://ift.tt/2xUnFwp PET - Small Language Models are Also Few-Shot Learners: https://ift.tt/3mGNGV1 Chapters:

No comments:

Post a Comment