Tuesday, February 11, 2020

Turing-NLG, DeepSpeed and the ZeRO optimizer


Microsoft has trained a 17-billion parameter language model that achieves state-of-the-art perplexity. This video takes a look at the ZeRO optimizer that enabled this breakthrough. ZeRO allows you to do model- and data-parallelism without having huge cuts in training speed. https://ift.tt/2OFzlJa https://ift.tt/2ScRwYK https://ift.tt/2S8L84U https://ift.tt/2OMUTU5 Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

No comments:

Post a Comment