Free artificial intelligence and machine learning video tutorial resource: July 2020

Wednesday, July 29, 2020

Self-training with Noisy Student improves ImageNet classification (Paper Explained)

The abundance of data on the internet is vast. Especially unlabeled images are plentiful and can be collected with ease. This model investigates a new method for incorporating unlabeled data into a supervised learning pipeline. First, a teacher model is trained in a supervised fashion. Then, that teacher is used to label the unlabeled data. Next, a larger student model is trained on the combination of all data and achieves better performance than the teacher by itself. OUTLINE: 0:00 - Intro & Overview 1:05 - Semi-Supervised & Transfer Learning 5:45 - Self-Training & Knowledge Distillation 10:00 - Noisy Student Algorithm Overview 20:20 - Noise Methods 22:30 - Dataset Balancing 25:20 - Results 30:15 - Perturbation Robustness 34:35 - Ablation Studies 39:30 - Conclusion & Comments Paper: https://ift.tt/2Q8GfYV Code: https://ift.tt/2X9cbyR Models: https://ift.tt/2Mopwjh Abstract: We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. Models are available at this https URL. Code is available at this https URL. Authors: Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Tuesday, July 28, 2020

An AI Learned To See Through Obstructions!

❤️ Check out Snap's Residency Program and apply here: https://ift.tt/3jfqDPm ❤️ Try Snap's Lens Studio here: https://ift.tt/2ArswSh 📝 The paper "Learning to See Through Obstructions" is available here: https://ift.tt/332MIe9 https://ift.tt/2x1u7S7 📝 Try it out here: https://ift.tt/3g6ZgVC 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Thumbnail background image credit: https://ift.tt/2P6bGRT Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Tren Black - Siraj Raval Podcast #6

Podcast #6 is with Tren Black. Tren is a tech youtuber and one of my more vocal critics. We talk about the pros & cons of online courses, Computer Science, our backgrounds, the youtube lifestyle, and he freestyle raps about data structures at the end. I hope you enjoy the conversation as much as I did. The future of Education depends on all of us irreverently pushing the boundaries of what is perceived as normal by our peers. Education is not their word to define, it’s yours. We have to learn how to learn using the Internet, questioning every archaic tool that’s being used now, from the physical lecture hall to the proctored exam. Gamify your learning and don’t ever let anyone tell you that you can’t. Computer Science is my one true love, and I refuse to accept that all the ways it's being taught now are enough to educate the world. I want every man, woman, and child on this planet to have basic Computer Science literacy. Knowledge and action are inextricably connected, and the best way to prepare ourselves for the technological changes happening across every spectrum of our lives is with this knowledge. We have to completely reinvent Computer Science Education these next few years, and then the rest of Education will follow. Subscribe for more educational videos about Computer Science!

Sunday, July 26, 2020

[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)

#ai #dqn #deepmind After the initial success of deep neural networks, especially convolutional neural networks on supervised image processing tasks, this paper was the first to demonstrate their applicability to reinforcement learning. Deep Q Networks learn from pixel input to play seven different Atari games and outperform baselines that require hand-crafted features. This paper kicked off the entire field of deep reinforcement learning and positioned DeepMind as one of the leading AI companies in the world. OUTLINE: 0:00 - Intro & Overview 2:50 - Arcade Learning Environment 4:25 - Deep Reinforcement Learning 9:20 - Deep Q-Learning 26:30 - Experience Replay 32:25 - Network Architecture 33:50 - Experiments 37:45 - Conclusion Paper: https://ift.tt/2dCgzS3 Abstract: We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Saturday, July 25, 2020

AI Creates Dogs From Cats…And More!

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of this paper is available here: https://ift.tt/2EjYXZH 📝 The paper "StarGAN v2: Diverse Image Synthesis for Multiple Domains" is available here: - Paper: https://ift.tt/2Sc4UvH - Code: https://ift.tt/2Yg4uWW - Youtube Video: https://youtu.be/0EVh5Ki4dIY The paper with the latent space material synthesis is available here: https://ift.tt/2HhNzx5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Friday, July 24, 2020

Momentum Predictive Representations Explained!

This video explains Momentum Predictive Representations, the latest advancement in Data-Efficient Reinforcement Learning by using an auxiliary contrastive self-supervised learning loss. This is a very interesting setup of the contrastive learning problem with temporal consistency rather than comparing augmented views of the same image alone. Thanks for watching! Please Subscribe! Paper Links: Momentum Predictive Representations: https://ift.tt/2Ei9qF1 Momentum Contrastive Learning: https://ift.tt/2xtZ81r CURL: https://ift.tt/3dLGpxF Bootstrap your own Latent: https://ift.tt/38m1mhx Can RL from Pixels be as Efficient as RL from State? https://ift.tt/3eKC8uu MuZero: https://ift.tt/37lLv1o Offline RL survey: https://ift.tt/2ZVnqg3 ICML 2020 Model-Based RL: https://ift.tt/3fGKp44 RL with Augmented Data: https://ift.tt/2YqpgFo Chapters 0:00 Beginning 0:13 Quick Overview 2:52 Data-Efficient Deep RL 3:48 Directions to Data-Efficient RL 5:58 Momentum Contrastive Learning (MoCo) 7:18 Bootstrap your own Latent (Power of Contrast w/ Online and Target Network) 8:14 CURL, using this in RL 9:08 Temporal Contrastive Loss 10:02 Algorithm Pseudocode Walkthrough 12:52 Ablations 13:20 MPR for Monte Carlo Tree Search as in MuZero - Future Work

Thursday, July 23, 2020

[Classic] ImageNet Classification with Deep Convolutional Neural Networks (Paper Explained)

#ai #research #alexnet AlexNet was the start of the deep learning revolution. Up until 2012, the best computer vision systems relied on hand-crafted features and highly specialized algorithms to perform object classification. This paper was the first to successfully train a deep convolutional neural network on not one, but two GPUs and managed to outperform the competition on ImageNet by an order of magnitude. OUTLINE: 0:00 - Intro & Overview 2:00 - The necessity of larger models 6:20 - Why CNNs? 11:05 - ImageNet 12:05 - Model Architecture Overview 14:35 - ReLU Nonlinearities 18:45 - Multi-GPU training 21:30 - Classification Results 24:30 - Local Response Normalization 28:05 - Overlapping Pooling 32:25 - Data Augmentation 38:30 - Dropout 40:30 - More Results 43:50 - Conclusion Paper: https://ift.tt/10K1otj Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Tuesday, July 21, 2020

This AI Creates Beautiful 3D Photographs!

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of this paper is available here: https://ift.tt/39emCpQ 📝 The paper "3D Photography using Context-aware Layered Depth Inpainting" is available here: https://ift.tt/3ebIHHr Try it out! Weights & Biases notebook: https://ift.tt/3hqG6um Or try it out here - Author notebook: https://ift.tt/2WGelWu 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Don't Stop Pretraining!

This video explains a study on the benefits of continued pre-training with RoBERTa. Even though RoBERTa is trained on 160GB of uncompressed text from a massive range of sources, the authors show continued gains by continuing pre-training not only in the domain of the downstream task (i.e. massive collections of amazon reviews, news articles, computer science, biomedical research papers), but further gains by doing pre-training (masked language modeling) on the data for the task itself (especially helpful when there is unlabeled data that is better curated for that task than the more broad "domain" of the task). Thanks for watching! Please Subscribe! Paper Links: Don't Stop Pretraining: https://ift.tt/2WEdjdt RoBERTa: https://ift.tt/32SZycF

Neural Architecture Search without Training (Paper Explained)

#ai #research #machinelearning Neural Architecture Search is typically very slow and resource-intensive. A meta-controller has to train many hundreds or thousands of different models to find a suitable building plan. This paper proposes to use statistics of the Jacobian around data points to estimate the performance of proposed architectures at initialization. This method does not require training and speeds up NAS by orders of magnitude. OUTLINE: 0:00 - Intro & Overview 0:50 - Neural Architecture Search 4:15 - Controller-based NAS 7:35 - Architecture Search Without Training 9:30 - Linearization Around Datapoints 14:10 - Linearization Statistics 19:00 - NAS-201 Benchmark 20:15 - Experiments 34:15 - Conclusion & Comments Paper: https://ift.tt/2MWkfOd Code: https://ift.tt/2WJxYNz Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU. Code to reproduce our experiments is available at this https URL. Authors: Joseph Mellor, Jack Turner, Amos Storkey, Elliot J. Crowley Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sunday, July 19, 2020

[Classic] Generative Adversarial Networks (Paper Explained)

#ai #deeplearning #gan GANs are of the main models in modern deep learning. This is the paper that started it all! While the task of image classification was making progress, the task of image generation was still cumbersome and prone to artifacts. The main idea behind GANs is to pit two competing networks against each other, thereby creating a generative model that only ever has implicit access to the data through a second, discriminative, model. The paper combines architecture, experiments, and theoretical analysis beautifully. OUTLINE: 0:00 - Intro & Overview 3:50 - Motivation 8:40 - Minimax Loss Function 13:20 - Intuition Behind the Loss 19:30 - GAN Algorithm 22:05 - Theoretical Analysis 27:00 - Experiments 33:10 - Advantages & Disadvantages 35:00 - Conclusion Paper: https://ift.tt/2bW7dzq Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. Authors: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Friday, July 17, 2020

Can an AI Learn Lip Reading?

❤️ Check out Snap's Residency Program and apply here: https://ift.tt/3jfqDPm ❤️ Try Snap's Lens Studio here: https://ift.tt/2ArswSh 📝 The paper "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis" is available here: https://ift.tt/3hblVAl Our earlier video on the "bag of chips" sound reconstruction is available here: https://www.youtube.com/watch?v=2i1hrywDwPo 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Thumbnail background image credit: https://ift.tt/3hbg8Lc Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Thursday, July 16, 2020

[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality

#ai #research #word2vec Word vectors have been one of the most influential techniques in modern NLP to date. This paper describes Word2Vec, which the most popular technique to obtain word vectors. The paper introduces the negative sampling technique as an approximation to noise contrastive estimation and shows that this allows the training of word vectors from giant corpora on a single machine in a very short time. OUTLINE: 0:00 - Intro & Outline 1:50 - Distributed Word Representations 5:40 - Skip-Gram Model 12:00 - Hierarchical Softmax 14:55 - Negative Sampling 22:30 - Mysterious 3/4 Power 25:50 - Frequent Words Subsampling 28:15 - Empirical Results 29:45 - Conclusion & Comments Paper: https://ift.tt/2cP9cCD Code: https://ift.tt/20qhZ0S Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible. Authors: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Tuesday, July 14, 2020

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)

#ai #research #resnet ResNets are one of the cornerstones of modern Computer Vision. Before their invention, people were not able to scale deep neural networks beyond 20 or so layers, but with this paper's invention of residual connections, all of a sudden networks could be arbitrarily deep. This led to a big spike in the performance of convolutional neural networks and rapid adoption in the community. To this day, ResNets are the backbone of most vision models and residual connections appear all throughout deep learning. OUTLINE: 0:00 - Intro & Overview 1:45 - The Problem with Depth 3:15 - VGG-Style Networks 6:00 - Overfitting is Not the Problem 7:25 - Motivation for Residual Connections 10:25 - Residual Blocks 12:10 - From VGG to ResNet 18:50 - Experimental Results 23:30 - Bottleneck Blocks 24:40 - Deeper ResNets 28:15 - More Results 29:50 - Conclusion & Comments Paper: https://ift.tt/1UzqRP9 Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sunday, July 12, 2020

I'M TAKING A BREAK... (Channel Update July 2020)

Past, Present & Future of this Channel. OUTLINE: 0:00 - I'm going on a break 0:20 - Channel Stats 1:20 - Other Platforms 4:20 - Drama Videos 5:30 - Flatland 8:40 - SpineNet Thumbnail 9:55 - Future Content 12:55 - How do I select papers? 15:50 - Financial Support, Ads & Merch 18:50 - Conclusion Our Flatland Repo: https://ift.tt/2YglTz5 Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C LinkedIn: https://ift.tt/2Zo6XRA If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://ift.tt/2DuKOZ3 Patreon: https://ift.tt/390ewRH Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Saturday, July 11, 2020

Deep Ensembles: A Loss Landscape Perspective (Paper Explained)

#ai #research #optimization Deep Ensembles work surprisingly well for improving the generalization capabilities of deep neural networks. Surprisingly, they outperform Bayesian Networks, which are - in theory - doing the same thing. This paper investigates how Deep Ensembles are especially suited to capturing the non-convex loss landscape of neural networks. OUTLINE: 0:00 - Intro & Overview 2:05 - Deep Ensembles 4:15 - The Solution Space of Deep Networks 7:30 - Bayesian Models 9:00 - The Ensemble Effect 10:25 - Experiment Setup 11:30 - Solution Equality While Training 19:40 - Tracking Multiple Trajectories 21:20 - Similarity of Independent Solutions 24:10 - Comparison to Baselines 30:10 - Weight Space Cross-Sections 35:55 - Diversity vs Accuracy 41:00 - Comparing Ensembling Methods 44:55 - Conclusion & Comments Paper: https://ift.tt/2RwurQL Abstract: Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. Developing the concept of the diversity--accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods. Finally, we evaluate the relative effects of ensembling, subspace based methods and ensembles of subspace based methods, and the experimental results validate our hypothesis. Authors: Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, July 10, 2020

Gradient Origin Networks (Paper Explained w/ Live Coding)

Neural networks for implicit representations, such as SIRENs, have been very successful at modeling natural signals. However, in the classical approach, each data point requires its own neural network to be fit. This paper extends implicit representations to an entire dataset by introducing latent vectors of data points to SIRENs. Interestingly, the paper shows that such latent vectors can be obtained without the need for an explicit encoder, by simply looking at the negative gradient of the zero-vector through the representation function. OUTLINE: 0:00 - Intro & Overview 2:10 - Implicit Generative Models 5:30 - Implicitly Represent a Dataset 11:00 - Gradient Origin Networks 23:55 - Relation to Gradient Descent 28:05 - Messing with their Code 37:40 - Implicit Encoders 38:50 - Using GONs as classifiers 40:55 - Experiments & Conclusion Paper: https://ift.tt/3gMbFhS Code: https://ift.tt/2BWs6t2 Project Page: https://ift.tt/2VVtLG2 My Video on SIREN: https://youtu.be/Q5g3p9Zwjrk Abstract: This paper proposes a new type of implicit generative model that is able to quickly learn a latent representation without an explicit encoder. This is achieved with an implicit neural network that takes as inputs points in the coordinate space alongside a latent vector initialised with zeros. The gradients of the data fitting loss with respect to this zero vector are jointly optimised to act as latent points that capture the data manifold. The results show similar characteristics to autoencoders, but with fewer parameters and the advantages of implicit representation networks. Authors: Sam Bond-Taylor, Chris G. Willcocks Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C

Thursday, July 9, 2020

NVAE: A Deep Hierarchical Variational Autoencoder (Paper Explained)

VAEs have been traditionally hard to train at high resolutions and unstable when going deep with many layers. In addition, VAE samples are often more blurry and less crisp than those from GANs. This paper details all the engineering choices necessary to successfully train a deep hierarchical VAE that exhibits global consistency and astounding sharpness at high resolutions. OUTLINE: 0:00 - Intro & Overview 1:55 - Variational Autoencoders 8:25 - Hierarchical VAE Decoder 12:45 - Output Samples 15:00 - Hierarchical VAE Encoder 17:20 - Engineering Decisions 22:10 - KL from Deltas 26:40 - Experimental Results 28:40 - Appendix 33:00 - Conclusion Paper: https://ift.tt/320lDIi Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels. Authors: Arash Vahdat, Jan Kautz Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C

DeepMind x UCL | Deep Learning Lectures | 12/12 | Responsible Innovation

What can we do to build algorithms that are safe, reliable and robust? And what are the responsibilities of technologists who work in this area? In this talk, Chongli Qin and Iason Gabriel explore these questions — connected through the lens of responsible innovation — in two parts. In the first part, Chongli explores the question of why and how we can design algorithms that are safe, reliable and trustworthy through the lens of specification driven machine learning. In the second part, Iason looks more closely at ethical dimensions of machine learning, at the responsibility of researchers, and at processes that can structure ethical deliberation in this domain. Taken together, they suggest that there are important measures that we can, and should, put in place — if we want to build systems that are beneficial to society. Download the slides here: https://ift.tt/2DqUZOt Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speaker Bios: Chongli Qin is a research scientist at DeepMind, her primary interest is in building safer, more reliable and more trustworthy machine learning algorithms. Over the past several years, she has contributed in developing algorithms to make neural networks more robust to noise. Key parts of her research focuses on functional analysis: properties of neural network that can naturally enhance robustness. She has also contributed in building mathematical frameworks to verify/guarantee that certain properties hold for neural networks. Prior to DeepMind, Chongli studied in Cambridge, where she studied the mathematics tripos and scientific computing before doing a PhD in bioinformatics. Iason Gabriel is a Senior Research Scientist at DeepMind where he works in the ethics research team. His work focuses on the applied ethics of artificial intelligence, human rights, and the question of how to align technology with human values. Before joining DeepMind, Iason was a Fellow in Politics at St John’s College, Oxford, and a member of the Centre for the Study of Social Justice (CSSJ). He holds a doctorate in Political Theory from the University of Oxford and spent a number of years working for the United Nations in post-conflict environments. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

DeepMind x UCL | Deep Learning Lectures | 11/12 | Modern Latent Variable Models

This lecture, by DeepMind Research Scientist Andriy Mnih, explores latent variable models, a powerful and flexible framework for generative modelling. After introducing this framework along with the concept of inference, which is central to it, Andriy focuses on two types of modern latent variable models: invertible models and intractable models. Special emphasis is placed on understanding variational inference as a key to training intractable latent variable models. Note this lecture was originally advertised as lecture 9. Download the slides here: https://ift.tt/2Dl27M5 Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speak Bio: Andriy Mnih is a Research Scientist at DeepMind. He works on generative modelling, representation learning, variational inference, and gradient estimation for stochastic computation graphs. He did his PhD on learning representations of discrete data at the University of Toronto, where he was advised by Geoff Hinton. Prior to joining DeepMind, Andriy was a post-doctoral researcher at the Gatsby Unit, University College London, working with Yee Whye Teh. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

Wednesday, July 8, 2020

Inside TensorFlow - New TF Lite Converter

In this episode of Inside TensorFlow, Software Engineer Yu-Cheng Ling demonstrates to us the new TF Lite Converter. Let us know what you think about this presentation in the comments below and make sure to subscribe! TensorFlow Lite → https://goo.gle/2Wk5MPM TensorFlow Lite Converter → https://goo.gle/2YDCVFM TensorFlow MLIR → https://goo.gle/2Y6WnOd MLIR overview → https://goo.gle/3aKHfcC TF Ops → https://goo.gle/38zCHGc TF Lite Ops → https://goo.gle/3f8C0WT TensorFlow to TensorFlow Lite Transformation passes → https://goo.gle/2Z4Z96V TensorFlow Lite legalization patterns → https://goo.gle/3iM9Bs3 TensorFlow Lite optimization pattern (TableGen) → https://goo.gle/3ecu59v TensorFlow Lite optimize (C++) → https://goo.gle/38M5zvb Add the Inside TensorFlow playlist → https://goo.gle/Inside-TensorFlow Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow

Addendum for Supermasks in Superposition: A Closer Look (Paper Explained)

I take a closer look at "Supermasks in Superposition" after I've already done a video on it. Specifically, I look at: 1. The intuition and theoretical justification behind the G objective, 2. Whether Supermasks and Superposition can be viewed as two distinct ideas and 3. The Paper's Broader Impact Statement. OUTLINE: 0:00 - Intro & Overview 2:00 - SupSup Recap 4:00 - In-Depth Analysis of the G Objective 20:30 - Superposition without Supermasks 25:40 - Broader Impact Statement 36:40 - Conclusion 37:20 - Live Coding Part 1 on SupSup: https://youtu.be/3jT1qJ8ETzk My Code: https://ift.tt/38AvQMP Paper: https://ift.tt/2BMrcPL Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C

Tuesday, July 7, 2020

SupSup: Supermasks in Superposition (Paper Explained)

Supermasks are binary masks of a randomly initialized neural network that result in the masked network performing well on a particular task. This paper considers the problem of (sequential) Lifelong Learning and trains one Supermask per Task, while keeping the randomly initialized base network constant. By minimizing the output entropy, the system can automatically derive the Task ID of a data point at inference time and distinguish up to 2500 tasks automatically. OUTLINE: 0:00 - Intro & Overview 1:20 - Catastrophic Forgetting 5:20 - Supermasks 9:35 - Lifelong Learning using Supermasks 11:15 - Inference Time Task Discrimination by Entropy 15:05 - Mask Superpositions 24:20 - Proof-of-Concept, Task Given at Inference 30:15 - Binary Maximum Entropy Search 32:00 - Task Not Given at Inference 37:15 - Task Not Given at Training 41:35 - Ablations 45:05 - Superfluous Neurons 51:10 - Task Selection by Detecting Outliers 57:40 - Encoding Masks in Hopfield Networks 59:40 - Conclusion Paper: https://ift.tt/2BMrcPL Code: https://ift.tt/3iy8Mmi My Video about Lottery Tickets: https://youtu.be/ZVVnvZdUMUk My Video about Supermasks: https://youtu.be/jhCInVFE2sc Abstract: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network. Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C

Monday, July 6, 2020

[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1

I share my progress of implementing a research idea from scratch. I attempt to build an ensemble model out of students of label-free self-distillation without any additional data or augmentation. Turns out, it actually works, and interestingly, the more students I employ, the better the accuracy. This leads to the hypothesis that the ensemble effect is not a process of extracting more information from labels. OUTLINE: 0:00 - Introduction 2:10 - Research Idea 4:15 - Adjusting the Codebase 25:00 - Teacher and Student Models 52:30 - Shipping to the Server 1:03:40 - Results 1:14:50 - Conclusion Code: https://ift.tt/3f5JHgi References: My Video on SimCLRv2: https://youtu.be/2lkUNDZld-4 Born-Again Neural Networks: https://ift.tt/2k3fMKN Deep Ensembles: A Loss Landscape Perspective: https://ift.tt/2RwurQL Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB Parler: https://ift.tt/38tQU7C

Sunday, July 5, 2020

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)

#machinelearning #ai #google The high-level architecture of CNNs has not really changed over the years. We tend to build high-resolution low-dimensional layers first, followed by ever more coarse, but deep layers. This paper challenges this decades-old heuristic and uses neural architecture search to find an alternative, called SpineNet that employs multiple rounds of re-scaling and long-range skip connections. OUTLINE: 0:00 - Intro & Overview 1:00 - Problem Statement 2:30 - The Problem with Current Architectures 8:20 - Scale-Permuted Networks 11:40 - Neural Architecture Search 14:00 - Up- and Downsampling 19:10 - From ResNet to SpineNet 24:20 - Ablations 27:00 - My Idea: Attention Routing for CNNs 29:55 - More Experiments 34:45 - Conclusion & Comments Papers: https://ift.tt/2YY641E Code: https://ift.tt/2NdsUhT Abstract: Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. Using similar building blocks, SpineNet models outperform ResNet-FPN models by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, SpineNet-190 achieves 52.5% AP with a MaskR-CNN detector and achieves 52.1% AP with a RetinaNet detector on COCO for a single model without test-time augmentation, significantly outperforms prior art of detectors. SpineNet can transfer to classification tasks, achieving 5% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. Code is at: this https URL. Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song Thumbnail art by Lucas Ferreira Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, July 4, 2020

This is Geometry Processing Made Easy

❤️ Check out Linode here and get $20 free credit on your account: https://ift.tt/2LaDQJb 📝 The paper "Monte Carlo Geometry Processing: A Grid-Free Approach to PDE-Based Methods on Volumetric Domains" is available here: https://ift.tt/2VLOFY0 Implementations: - https://twitter.com/iquilezles/status/1258218688726962183 - https://twitter.com/iquilezles/status/1258237114958802944 - https://ift.tt/3eY2Lx2 Our mega video on Multiple Importance Sampling: https://www.youtube.com/watch?v=TbWQ4lMnLNw Koiava’s MIS implementation: https://ift.tt/1vHDH1F My course at the Vienna University of Technology on light transport is available here. It is completely free for everyone: https://ift.tt/2rdtvDu 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

#ai #attention #transformer #deeplearning Transformers are famous for two things: Their superior performance and their insane requirements of compute and memory. This paper reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs. OUTLINE: 0:00 - Intro & Overview 1:35 - Softmax Attention & Transformers 8:40 - Quadratic Complexity of Softmax Attention 9:40 - Generalized Attention Mechanism 13:45 - Kernels 20:40 - Linear Attention 25:20 - Experiments 28:30 - Intuition on Linear Attention 33:55 - Connecting Autoregressive Transformers and RNNs 41:30 - Caveats with the RNN connection 46:00 - More Results & Conclusion Paper: https://ift.tt/3g9qKtf Website: https://ift.tt/2YO6lV3 Code: https://ift.tt/38rVYsZ My Video on Attention: https://youtu.be/iDulhoQ2pro My Video on BERT: https://youtu.be/-9evrZnBorM Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from (N2) to (N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences. Authors: Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, July 3, 2020

On the Measure of Intelligence by François Chollet - Part 4: The ARC Challenge (Paper Explained)

In this part, we look at the ARC challenge as a proposed test of machine intelligence. The dataset features 1000 tasks that test rapid generalization based on human core knowledge priors, such as object-ness, symmetry, and navigation. OUTLINE: 0:00 - Intro 0:55 - What is ARC? 6:30 - The Goals of ARC 10:40 - Assumed Priors & Examples 21:50 - An Imagined Solution 28:15 - Consequences of a Solution 31:00 - Weaknesses 31:25 - My Comments & Ideas Paper: https://ift.tt/2CfFoxr ARC: https://ift.tt/2oSNQ2o Abstract: To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans. Authors: François Chollet Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, July 2, 2020

BERTology Meets Biology: Interpreting Attention in Protein Language Models (Paper Explained)

Proteins are the workhorses of almost all cellular functions and a core component of life. But despite their versatility, all proteins are built as sequences of the same 20 amino acids. These sequences can be analyzed with tools from NLP. This paper investigates the attention mechanism of a BERT model that has been trained on protein sequence data and discovers that the language model has implicitly learned non-trivial higher-order biological properties of proteins. OUTLINE: 0:00 - Intro & Overview 1:40 - From DNA to Proteins 5:20 - BERT for Amino Acid Sequences 8:50 - The Structure of Proteins 12:40 - Investigating Biological Properties by Inspecting BERT 17:45 - Amino Acid Substitution 24:55 - Contact Maps 30:15 - Binding Sites 33:45 - Linear Probes 35:25 - Conclusion & Comments Paper: https://ift.tt/2VGc5y6 Code: https://ift.tt/3idS1N9 My Video on BERT: https://youtu.be/-9evrZnBorM My Video on Attention: https://youtu.be/iDulhoQ2pro Abstract: Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at this https URL. Authors: Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, July 1, 2020

Amazing AR Effects Are Coming!

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their mentioned post is available here: https://ift.tt/35gPw6D 📝 The paper "Consistent Video Depth Estimation" is available here: https://ift.tt/3aRWGQ1 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)

Google builds a 600 billion parameter transformer to do massively multilingual, massive machine translation. Interestingly, the larger model scale does not come from increasing depth of the transformer, but from increasing width in the feedforward layers, combined with a hard routing to parallelize computations on up to 2048 TPUs. A very detailed engineering paper! OUTLINE: 0:00 - Intro & Overview 4:10 - Main Results 5:10 - Mixture-of-Experts 16:00 - Difference to Scaling Classic Transformers 18:50 - Backpropagation in Mixture-of-Experts 20:05 - MoE Routing Algorithm in GShard 38:20 - GShard Einsum Examples 47:40 - Massively Multilingual Translation 56:00 - Results 1:11:30 - Conclusion & Comments ERRATA: I said the computation of MoE scales linearly, but actually, it's sub(!)-linear. Paper: https://ift.tt/2VB5vJ0 Abstract: Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler. It provides an elegant way to express a wide range of parallel computation patterns with minimal changes to the existing model code. GShard enabled us to scale up multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding. We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art. Authors: Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

This anime was made by AI

I asked AI to make me a DBZ Episode based on the trolley problem. Will Goku sacrifice 1 life to save the lives of 5 others? How was this made? I used Open AI's GPT-2 AI model via https://ift.tt/2LmtszK & then paid an animator to best interpret the story the AI Generated. This story is the result of feeding the GPT-2 AI the very first sentence, then I cherry picked the best one that had a resolution, any resolution, to the trolley problem. Animated by Chey Fergus - https://ift.tt/2VBQEhv Exact Generated Script: (Human) Goku & Vegeta see a trolley coming, problem is, if they switch the rail it will kill 1 person instead of five, so they (AI) switch both rail's. Goku is reluctant to do this but, unfortunately for him, he needs to do it or they will only save 4 people instead of 5. She reminds Goku he has to be a hero for everyone, after a brief moment of hesitation Goku reluctantly takes the first person that comes to mind, and the train explodes. She later tells Vegeta she is glad he noticed him just in time. After Super Saiyan Vegeta returns to normal, he believes that this was the result of Goku's training and that now he will be able to save everyone. Vegeta says he did the right thing and that this is an official Kai, therefore he decides to go search for the rest of them. SUBSCRIBE FOR MORE: http://jabrils.com/yt WISHLIST MY VIDEO GAME: https://ift.tt/33NgHFz SUPPORT ON PATREON: https://ift.tt/2pZACkg JOIN DISCORD: https://ift.tt/2QkDa9O Please follow me on social networks: twitter: https://twitter.com/jabrils_ instagram: https://ift.tt/2QNVYvI REMEMBER TO ALWAYS FEED YOUR CURIOSITY