Wednesday, April 1, 2020

Distributed TensorFlow model training on Cloud AI Platform (TF Dev Summit '20)


Cruise machine learning platform team worked with Google CMLE team together to enable distributed Tensorflow model training with Horovod in 2019. We will present the work we have done and the learning around training performance analysis, fault tolerant, monitoring and cost management. Speaker: Yang Fan - Software Engineer Resources: AI Platform → https://goo.gle/38olZIc GitHub Horovod → https://goo.gle/2PUFEJo Distributed training with TensorFlow → https://goo.gle/39wMWdY Cruise Origin → https://goo.gle/2Tv5XIo Watch all TensorFlow Dev Summit 2020 sessions → https://goo.gle/TFDS20 Subscribe to the TensorFlow YouTube channel → https://goo.gle/TensorFlow

No comments:

Post a Comment