Thursday, August 13, 2020

Well-Read Students Learn Better


Should you pre-train your compressed transformer model before knowledge distillation from an off-the-shelf teacher? This paper says yes and explores a few details behind this pipeline. Thanks for watching! Please Subscribe! Paper Links: Well-Read Students Learn Better: https://ift.tt/3amnUiT Patient Knowledge Distillation: https://ift.tt/3fNIb28 DistilBERT: https://ift.tt/2Y2cZa2 Don't Stop Pretraining: https://ift.tt/2WEdjdt SimCLRv2: https://ift.tt/2PQ2MrW AllenNLP MLM Demo: https://ift.tt/30TA9QV. HuggingFace Transformers: https://ift.tt/38KJ1K4

No comments:

Post a Comment