Sunday, May 31, 2020

Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)


Do we really need dot-product attention? The attention mechanism is a central part of modern Transformers, mainly due to the dot-product attention mechanism. This paper changes the mechanism to remove the quadratic interaction terms and comes up with a new model, the Synthesizer. As it turns out, you can do pretty well like that! OUTLINE: 0:00 - Intro & High Level Overview 1:00 - Abstract 2:30 - Attention Mechanism as Information Routing 5:45 - Dot Product Attention 8:05 - Dense Synthetic Attention 15:00 - Random Synthetic Attention 17:15 - Comparison to Feed-Forward Layers 22:00 - Factorization & Mixtures 23:10 - Number of Parameters 25:35 - Machine Translation & Language Modeling Experiments 36:15 - Summarization & Dialogue Generation Experiments 37:15 - GLUE & SuperGLUE Experiments 42:00 - Weight Sizes & Number of Head Ablations 47:05 - Conclusion Paper: https://ift.tt/3cldH5V My Video on Transformers (Attention Is All You Need): https://youtu.be/iDulhoQ2pro My Video on BERT: https://youtu.be/-9evrZnBorM Abstract: The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models. But is it really required? This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models. Via extensive experiments, we find that (1) random alignment matrices surprisingly perform quite competitively and (2) learning attention weights from token-token (query-key) interactions is not that important after all. To this end, we propose \textsc{Synthesizer}, a model that learns synthetic attention weights without token-token interactions. Our experimental results show that \textsc{Synthesizer} is competitive against vanilla Transformer models across a range of tasks, including MT (EnDe, EnFr), language modeling (LM1B), abstractive summarization (CNN/Dailymail), dialogue generation (PersonaChat) and Multi-task language understanding (GLUE, SuperGLUE). Authors: Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, May 30, 2020

DeepMind’s New AI Helps Detecting Breast Cancer


❤️ Check out Linode here and get $20 free credit on your account: https://ift.tt/2LaDQJb 📝 The paper "International evaluation of an AI system for breast cancer screening" is available here: https://ift.tt/2QPxHF9 ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://ift.tt/2icTBUb - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)


Watch my as I struggle my way up the glorious path of using the DETR object detection model in PyTorch. Original Video on DETR: https://youtu.be/T35ba_VXkMY Their GitHub repo: https://ift.tt/2ZDh1q6 My Colab: https://ift.tt/3gBWEzQ OUTLINE: 0:00 - Intro 0:45 - TorchHub Model 2:00 - Getting an Image 6:00 - Image to PyTorch Tensor 7:50 - Handling Model Output 15:00 - Draw Bounding Boxes 20:10 - The Dress 22:00 - Rorschach Ink Blots 23:00 - Forcing More Predictions 28:30 - Jackson Pollock Images 32:00 - Elephant Herds Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, May 29, 2020

GPT-3: Language Models are Few-Shot Learners (Paper Explained)


How far can you go with ONLY language modeling? Can a large enough language model perform NLP task out of the box? OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding. OUTLINE: 0:00 - Intro & Overview 1:20 - Language Models 2:45 - Language Modeling Datasets 3:20 - Model Size 5:35 - Transformer Models 7:25 - Fine Tuning 10:15 - In-Context Learning 17:15 - Start of Experimental Results 19:10 - Question Answering 23:10 - What I think is happening 28:50 - Translation 31:30 - Winograd Schemes 33:00 - Commonsense Reasoning 37:00 - Reading Comprehension 37:30 - SuperGLUE 40:40 - NLI 41:40 - Arithmetic Expressions 48:30 - Word Unscrambling 50:30 - SAT Analogies 52:10 - News Article Generation 58:10 - Made-up Words 1:01:10 - Training Set Contamination 1:03:10 - Task Examples https://ift.tt/2Xdo3Ac https://ift.tt/36FzDY4 Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, May 28, 2020

DETR: End-to-End Object Detection with Transformers (Paper Explained)


Object detection in images is a notoriously hard task! Objects can be of a wide variety of classes, can be numerous or absent, they can occlude each other or be out of frame. All of this makes it even more surprising that the architecture in this paper is so simple. Thanks to a clever loss function, a single Transformer stacked on a CNN is enough to handle the entire task! OUTLINE: 0:00 - Intro & High-Level Overview 0:50 - Problem Formulation 2:30 - Architecture Overview 6:20 - Bipartite Match Loss Function 15:55 - Architecture in Detail 25:00 - Object Queries 31:00 - Transformer Properties 35:40 - Results ERRATA: When I introduce bounding boxes, I say they consist of x and y, but you also need the width and height. My Video on Transformers: https://youtu.be/iDulhoQ2pro Paper: https://ift.tt/2MjA8xT Blog: https://ift.tt/2B9zron Code: https://ift.tt/2ZDh1q6 Abstract: We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at this https URL. Authors: Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Pattern-Exploiting Training for NLP!


This video explains Pattern-Exploiting Training (PET), a new technique to leverage pre-trained language models to label data for downstream tasks such as Text Classification or Natural Language Inference. This is done by introducing "patterns" or templates that guide the language model to label the review or pair of sentences. This also include a "verbalizer" that maps from the language model's vocabulary into downstream task labels! This tutorial also uses HuggingFace's NLP viewer and Renato Violin's Next Word Prediction Demo that I highly recommend checking out! Thank you for watching! Please Subscribe! Links: Paper (Pattern-Exploiting Training): https://ift.tt/2X7MuPK HuggingFace NLP Viewer: https://ift.tt/2ZRJXL4 Next-Word-Prediction: https://ift.tt/3gjQUue PragmaticML blog summary of PET: https://ift.tt/3dCn2Y8 Self-Training with Noisy Student: https://ift.tt/2Q8GfYV FixMatch: https://ift.tt/2upcQ3A

Transcription-Enriched Joint Embeddings\\for Spoken Descriptions of Images and Videos


Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto Telefonica Research / Universitat Politecnica de Catalunya (UPC) CVPR 2020 Workshop on on Egocentric Perception, Interaction and Computing In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that introducing the human-generated textual transcriptions of the spoken narratives helps to the training procedure yielding to get better embedding representations. The triad speech, image and words allows for a better estimate of the point embedding and show an improving of the performance within tasks like image and speech retrieval, even when text third modality, text, is not present in the task.

Wednesday, May 27, 2020

mixup: Beyond Empirical Risk Minimization (Paper Explained)


Neural Networks often draw hard boundaries in high-dimensional space, which makes them very brittle. Mixup is a technique that linearly interpolates between data and labels at training time and achieves much smoother and more regular class boundaries. OUTLINE: 0:00 - Intro 0:30 - The problem with ERM 2:50 - Mixup 6:40 - Code 9:35 - Results https://ift.tt/2M0bHW1 Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks. Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, May 26, 2020

AI Weekly Update - May 26th, 2020 (#22)


Thank you for watching! Please Subscribe! ZeRO-2 & DeepSpeed: https://ift.tt/2ZDrsdj Open-Sourcing BiT: https://ift.tt/2XhoGYt Yannic Kilcher's analysis of BiT: https://www.youtube.com/watch?v=k1GOF2jmX7c&t=1042s Yannic's reaction to OpenAI Code Generation Demo: https://www.youtube.com/watch?v=utuz7wBGjKM&t=327s Facebook Marketplace and GrokNet: https://ift.tt/2LITD2A Making Sense of Vision and Touch: https://ift.tt/2ZGfUG7 OmniTact: https://ift.tt/3dHXp8b Grounding Language in Play: https://ift.tt/2LHsAEM Simple Sensor Intentions for Exploration: https://ift.tt/2TDICDJ Universal Adversarial Perturbations: https://ift.tt/3giZYQ6 Feature Purification: https://ift.tt/2zn8yNj Self-Supervised Learning in NLP: https://ift.tt/2ZvBkFU AI Generated ArXiv Papers: https://ift.tt/2LTi3pQ Next Word Prediction Github Demo: https://ift.tt/3gjQUue On the value of out-of-distribution testing: https://ift.tt/2LZXx79 ScaledML: https://ift.tt/2TIuFVf Keras BERT SQuAD example: https://ift.tt/2X3MHU5

Can We Make An Image Synthesis AI Controllable?


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf The showcased post is available here: https://ift.tt/2SIjBr4 📝 The paper "Semantically Multi-modal Image Synthesis" is available here: https://ift.tt/3d5DzE0 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)


Does self-supervision really need a lot of data? How low can you go? This paper shows that a single image is enough to learn the lower layers of a deep neural network. Interestingly, more data does not appear to help as long as enough data augmentation is applied. OUTLINE: 0:00 - Overview 1:40 - What is self-supervision 4:20 - What does this paper do 7:00 - Linear probes 11:15 - Linear probe results 17:10 - Results 22:25 - Learned Features https://ift.tt/2TCJtow Abstract: We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset. Authors: Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi Thumbnail Image: https://ift.tt/2zng2zV Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, May 25, 2020

Deep image reconstruction from human brain activity (Paper Explained)


Can you peek into people's brains? Reading human thoughts is a long-standing dream of the AI field. This paper reads fMRI signals from a person and then reconstructs what that person's eyes currently see. This is achieved by translating the fMRI signal to features of a Deep Neural Network and then iteratively optimizing the input of the network to match those features. The results are impressive. OUTLINE: 0:00 - Overview 1:35 - Pipeline 4:00 - Training 5:20 - Image Reconstruction 7:00 - Deep Generator Network 8:15 - Results Paper: https://ift.tt/2ziqt7P My Video on OpenAI Microscope (what I called Atlas): https://youtu.be/Ok44otx90D4 Abstract: The mental contents of perception and imagery are thought to be encoded in hierarchical representations in the brain, but previous attempts to visualize perceptual contents have failed to capitalize on multiple levels of the hierarchy, leaving it challenging to reconstruct internal imagery. Recent work showed that visual cortical activity measured by functional magnetic resonance imaging (fMRI) can be decoded (translated) into the hierarchical features of a pre-trained deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features. Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that our method was able to reliably produce reconstructions that resembled the viewed natural images. A natural image prior introduced by a deep generator neural network effectively rendered semantically meaningful details to the reconstructions. Human judgment of the reconstructions supported the effectiveness of combining multiple DNN layers to enhance the visual quality of generated images. While our model was solely trained with natural images, it successfully generalized to artificial shapes, indicating that our model was not simply matching to exemplars. The same analysis applied to mental imagery demonstrated rudimentary reconstructions of the subjective content. Our results suggest that our method can effectively combine hierarchical neural representations to reconstruct perceptual and subjective images, providing a new window into the internal contents of the brain. Authors: Guohua Shen, Tomoyasu Horikawa, Kei Majima, Yukiyasu Kamitani Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Mariona Carós et al, Automatic Reminiscence Therapy for Dementia. ACM ICMR 2020.


Caros M, Garolera M, Radeva P, Giró-i-Nieto X. Automatic Reminiscence Therapy for Dementia. In ACM International Conference on Multimedia Retrieval (ICMR). Dublin, Ireland: 2020. With people living longer than ever, the number of cases with dementia such as Alzheimer's disease increases steadily. It affects more than 46 million people worldwide, and it is estimated that in 2050 more than 100 million will be affected. While there are not effective treatments for these terminal diseases, therapies such as reminiscence, that stimulate memories from the past are recommended. Currently, reminiscence therapy takes place in care homes and is guided by a therapist or a carer. In this work, we present an AI-based solution to automatize the reminiscence therapy, which consists in a dialogue system that uses photos as input to generate questions. We run a usability case study with patients diagnosed of mild cognitive impairment that shows they found the system very entertaining and challenging. Overall, this paper presents how reminiscence therapy can be automatized by using machine learning, and deployed to smartphones and laptops, making the therapy more accessible to every person affected by dementia. ACM International Conference on Multimedia Retrieval (ICMR) 2020 CVPR Visual Question Answering & Visual Dialog Workshop 2020 Learn more at: https://ift.tt/2LTZCBp https://ift.tt/3ehuVm6

Sunday, May 24, 2020

Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)


Can you plan with a learned model of the world? Yes, but there's a catch: The better your planning algorithm is, the more the errors of your world model will hurt you! This paper solves this problem by regularizing the planning algorithm to stay in high probability regions, given its experience. https://ift.tt/2LU9MlF Abstract: Trajectory optimization using a learned model of the environment is one of the core elements of model-based reinforcement learning. This procedure often suffers from exploiting inaccuracies of the learned model. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the model of the environment. We show that the proposed regularization leads to improved planning with both gradient-based and gradient-free optimizers. We also demonstrate that using regularized trajectory optimization leads to rapid initial learning in a set of popular motor control tasks, which suggests that the proposed approach can be a useful tool for improving sample efficiency. Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, May 23, 2020

The NeurIPS Borader Impact Statement


For the first time, all authors submitting to the NeurIPS conference are forced to write a statement about the broader impact of their research on society. The messaging around this and how exactly this can influence the paper acceptance process is highly confusing. OUTLINE: 0:00 - Intro 0:30 - VentureBeat Article 1:35 - Official Communication 9:55 - Special Ethics Reviewers 11:00 - Unofficial Communication 22:55 - Conclusion Sources: https://ift.tt/38NHhQw https://ift.tt/2TzR1Im https://ift.tt/3d0Pra8 https://ift.tt/39QYMQ6 https://ift.tt/2HNxT3w https://ift.tt/2ywMHCC https://ift.tt/2Xn3Ve7 https://ift.tt/2J6Dst2 https://ift.tt/2TvAs0e https://gdpr-info.eu/ Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

DeepMind Made A Superhuman AI For 57 Atari Games! 🕹


❤️ Check out Lambda here and sign up for their GPU Cloud: https://ift.tt/35NkCT7 📝 The paper "Agent57: Outperforming the Atari Human Benchmark" is available here: https://ift.tt/3aIfim0 https://ift.tt/39xTXKX ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://ift.tt/2icTBUb - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m #Agent57

Friday, May 22, 2020

When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)


BERT is a giant model. Turns out you can prune away many of its components and it still works. This paper analyzes BERT pruning in light of the Lottery Ticket Hypothesis and finds that even the "bad" lottery tickets can be fine-tuned to good accuracy. OUTLINE: 0:00 - Overview 1:20 - BERT 3:20 - Lottery Ticket Hypothesis 13:00 - Paper Abstract 18:00 - Pruning BERT 23:00 - Experiments 50:00 - Conclusion https://ift.tt/2yqSAkL Abstract: Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time. Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

AI-Based Game Engines with GameGAN research paper


Learning to Simulate Dynamic Environments with GameGAN: https://ift.tt/2A0E8jT GameGAN short video: https://www.youtube.com/watch?v=4OzJUNsPx60&feature=youtu.be Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://ift.tt/2AZiVqD Support the content: https://ift.tt/2qsKFOO Twitter: https://twitter.com/sentdex Instagram: https://ift.tt/2J4Oa4h Facebook: https://ift.tt/1OI3cwB Twitch: https://ift.tt/2pcWGaq

GameGAN Explained!


This video explains the new Neural Game Engine GameGAN from researchers at NVIDIA! This paper uses Deep Learning to store Pacman inside of a learned world model such that you can play the game by sending actions to the generative neural network. This video will describe the problem and how the proposed solution through careful architecture and loss function design! Thanks for Watching! Please Subscribe! Paper Links: NVIDIA GameGAN Blog Post: https://ift.tt/2ypJkxf NVIDIA Quick video presenting GameGAN: https://www.youtube.com/watch?v=BYt6r8z6pUY World Models: https://ift.tt/2IYv5zG GauGAN (SPADE layer) demo video: https://www.youtube.com/watch?v=p5U4NgVGAwg Four Novel Approaches to Manipulating Fabric: https://ift.tt/2Zp6eQt Intuitively Understanding Variational Autoencoders: https://ift.tt/2BJBr5O How much Knowledge Can You Pack into the Parameters of a Language Model? https://ift.tt/2WRaJBs MuZero: https://ift.tt/2qKQkRA Neural Turing Machines: https://ift.tt/2ai9KUr GAN Compression: https://ift.tt/3ggTGQX CycleGAN: https://ift.tt/2opD3rk Yann LeCun's 2020 ICLR Keynote (Importance of multi-modal predictions mentioned in video): https://ift.tt/3dzm8vf Regularizing Trajectory Optimization with Denoising Autoencoders: https://ift.tt/2TwseVH

Thursday, May 21, 2020

[News] OpenAI Model Generates Python Code


This code completion engine can write an entire function from just the name! OpenAI demonstrates what happens when you learn a language model on thousands of GitHub Python repositories. Source Clip: https://youtu.be/fZSFNUT6iY8 Full Video: https://ift.tt/2LMiuSX Kite: https://kite.com/ TabNine: https://ift.tt/3gaf4r9 Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, May 20, 2020

Now We Can Relight Paintings…and Turns Out, Photos Too! 🎨


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation for this paper is available here: https://ift.tt/3g4iprI 📝 The paper "Generating Digital Painting Lighting Effects via RGB-space Geometry" is available here: https://ift.tt/2ynntGI The brush synthesizer project is available here: https://ift.tt/2ze9qDC Image credits: Style2Paints Team Pepper and Carrot David Revoy - https://ift.tt/2WN2SVs 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

AI Weekly Update - May 20th, 2020 (#21)


Thank you for watching! Please Subscribe! I apologize about the Audio Quality, I had the mic too close to my mouth and didn't realize this until I finished editing the video. I didn't have the energy to re-record this episode, but will fix this in the future. Thanks for understanding, I hope you can still get some information out of this video. Paper Links: Plan2Explore: https://ift.tt/2Lrtepu Movement Pruning: https://ift.tt/2ZhGoh8 Visual Guide to Data Augmentation in NLP: https://ift.tt/3cI1J7i NVIDIA Ampere GPU: https://ift.tt/2Xgo59j NVIDIA MegatronBERT: https://ift.tt/2LqHv5Q Flowtron: https://ift.tt/3bxSDbV Data Echoing: https://ift.tt/3bmjLKQ Salesforce ESPRIT: https://ift.tt/3deZsjM Meta-Dataset: https://ift.tt/3dBvExY MOReL: https://ift.tt/3cPWF0M Bayesian Bits: https://ift.tt/3cOt1Jn Prototypical Contrastive Learning: https://ift.tt/2WZKQ1p Oscar Vision-Language Model: https://ift.tt/361BUMY Harmful Memes: https://ift.tt/2yR1c4i HuggingFace meets Weights and Biases: https://ift.tt/2TogvbI HuggingFace nlp library: https://ift.tt/2WxRcpt Transformer Reinforcement Learning: https://ift.tt/2z6snIE BERT and DistilBERT: https://ift.tt/3cOAckP Intro to BERT with HuggingFace and PyTorch: https://ift.tt/35UCWdH Sayak Paul's Interview with Colin Raffel: https://ift.tt/2zIiv7S

Investigating Human Priors for Playing Video Games (w/ Live Gameplay)


Why are humans so good at video games? Maybe it's because a lot of games are designed with humans in mind. What happens if we change that? This paper removes the influence of human priors from a game and ends up with a pretty fun experience. Paper: https://ift.tt/2KozaxZ Website: https://ift.tt/2F4Dfo9 Code: https://ift.tt/2ADDHMv Abstract: What makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making. This paper investigates the role of human priors for solving video games. Given a sample game, we conduct a series of ablation studies to quantify the importance of various priors on human performance. We do this by modifying the video game environment to systematically mask different types of visual information that could be used by humans as priors. We find that removal of some prior knowledge causes a drastic degradation in the speed with which human players solve the game, e.g. from 2 minutes to over 20 minutes. Furthermore, our results indicate that general priors, such as the importance of objects and visual consistency, are critical for efficient game-play. Videos and the game manipulations are available at this https URL Authors: Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Thomas L. Griffiths, Alexei A. Efros Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, May 19, 2020

iMAML: Meta-Learning with Implicit Gradients (Paper Explained)


Gradient-based Meta-Learning requires full backpropagation through the inner optimization procedure, which is a computational nightmare. This paper is able to circumvent this and implicitly compute meta-gradients by the clever introduction of a quadratic regularizer. OUTLINE: 0:00 - Intro 0:15 - What is Meta-Learning? 9:05 - MAML vs iMAML 16:35 - Problem Formulation 19:15 - Proximal Regularization 26:10 - Derivation of the Implicit Gradient 40:55 - Intuition why this works 43:20 - Full Algorithm 47:40 - Experiments Paper: https://ift.tt/2Ab3XKU Blog Post: https://ift.tt/32S41LE Abstract: A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks. Authors: Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, May 18, 2020

[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)


Huggingface released its newest library called NLP, which gives you easy access to almost any NLP dataset and metric in one convenient interface. We will combine this with a BERT model from Huggingface's Transformers library to build a sentiment classifier for IMDB. OUTLINE: 0:00 - Intro 1:30 - Boilerplate 3:20 - PyTorch Lightning Module 9:50 - Load Dataset 12:15 - Tokenization 20:50 - Torch Tensors 25:50 - Data Loader 28:00 - Create BERT Model 32:00 - Implement Validation and Train Step 47:00 - Run & Recap 50:20 - Epilogue My Code: https://ift.tt/3cJJEpg NLP Library: https://ift.tt/2WxRcpt Tutorial Colab: https://ift.tt/2LKsnkh Transformers Library: https://ift.tt/2lgsdXY Pytorch Lightning: https://ift.tt/2G5WIGR Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, May 17, 2020

Planning to Explore via Self-Supervised World Models (Paper Explained)


What can an agent do without any reward? Explore the world! While many formulations of intrinsic rewards exist (Curiosity, Novelty, etc.), they all look back in time to learn. Plan2Explore is the first model that uses planning in a learned imaginary latent world model to seek out states where it is uncertain about what will happen. OUTLINE: 0:00 - Intro & Problem Statement 3:30 - Model 5:10 - Intrinsic Motivation 9:05 - Planning in Latent Space 11:15 - Latent Disagreement 16:30 - Maximizing Information Gain 21:00 - More problems with the model 26:45 - Experiments 32:10 - Final Comments Paper: https://ift.tt/2Aj2CVs Website: https://ift.tt/2Lrtepu Code: https://ift.tt/2yXhGYO Abstract: Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code at this https URL Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, May 16, 2020

NX Xavier Devkit + latest updates from NVIDIA!


Xavier NX devkit more info and purchasing: https://nvda.ws/3bqcNEx NVIDIA GTC 2020 Part 6: Ampere/A100: https://www.youtube.com/watch?v=onbnb_D1wC8 Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://ift.tt/2AZiVqD Support the content: https://ift.tt/2qsKFOO Twitter: https://twitter.com/sentdex Instagram: https://ift.tt/2J4Oa4h Facebook: https://ift.tt/1OI3cwB Twitch: https://ift.tt/2pcWGaq

Two Shots of Green Screen Please!


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation for this paper is available here: https://ift.tt/3g17wXR 📝 The paper "Background Matting: The World is Your Green Screen" is available here: https://ift.tt/2XgzUxX 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

[News] Facebook's Real-Time TTS system runs on CPUs only!


Facebook AI's new Text-To-Speech system is able to create 1 second of speech in as little as 500ms, making it real-time. What's even more impressive is the fact that this does not require a rack of GPUs, but runs on merely 4 CPUs. OUTLINE: 0:00 - Intro 1:00 - Problem Formulation 3:20 - System Explanation 15:00 - Speeding up the computation https://ift.tt/2X3O8R2 Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, May 15, 2020

Training an AI to create poetry (NLP Zero to Hero - Part 6)


Through this series so far you’ve been learning the basics of NLP using TensorFlow. You saw how to tokenize and then sequence text, preparing it to train neural networks. You saw how sentiment in text can be represented with embeddings, and how the semantics of text over long stretches might be learned using recurrent neural networks and LSTMs. In this video we’ll put all of that together into a fun scenario -- creating a model and training it on the lyrics to traditional Irish songs. Irish songs generator Colab → https://goo.gle/3aSTLGx Predict Shakespeare with Cloud TPUs and Keras → https://goo.gle/2zy4A40 NLP Zero to Hero playlist → https://goo.gle/nlp-z2h Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow

My Channel Is Dying.. I need your Help..


SUBSCRIBE FOR MORE: http://jabrils.com/yt WISHLIST MY VIDEO GAME: https://ift.tt/33NgHFz SUPPORT ON PATREON: https://ift.tt/2pZACkg JOIN DISCORD: https://ift.tt/2QkDa9O Please follow me on social networks: twitter: https://twitter.com/jabrils_ instagram: https://ift.tt/2QNVYvI REMEMBER TO ALWAYS FEED YOUR CURIOSITY

Weight Standardization (Paper Explained)


It's common for neural networks to include data normalization such as BatchNorm or GroupNorm. This paper extends the normalization to also include the weights of the network. This surprisingly simple change leads to a boost in performance and - combined with GroupNorm - new state-of-the-art results. https://ift.tt/2Ly3Puq Abstract: In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: this https URL. Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, May 14, 2020

Neural Networks from Scratch - P.5 Hidden Layer Activation Functions


Neural Networks from Scratch book, access the draft now: https://nnfs.io NNFSiX Github: https://ift.tt/2VybXkn Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3 Python 3 basics: https://ift.tt/37OxERs Intermediate Python (w/ OOP): https://ift.tt/2UKxT97 Mug link for fellow mug aficionados: https://amzn.to/3bvkZ6B Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://ift.tt/2AZiVqD Support the content: https://ift.tt/2qsKFOO Twitter: https://twitter.com/sentdex Instagram: https://ift.tt/2J4Oa4h Facebook: https://ift.tt/1OI3cwB Twitch: https://ift.tt/2pcWGaq #nnfs #python #neuralnetworks

[Trash] Automated Inference on Criminality using Face Images


This paper sets out to build a classifier to distinguish criminals from non-criminals using nothing but a face picture. I explore why the research is trash and what lessons we can learn from it. https://ift.tt/2eLV0PF Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, May 13, 2020

Catherine Nelson & Hannes Hapke interview (TensorFlow Meets)


Welcome to a brand new episode of TensorFlow Meets hosted by Laurence Moroney, AI Advocate for TensorFlow. Today, we are joined by Catherine Nelson (Senior Data Scientist, Concur Labs) and Hannes Hapke (Senior Data Scientist, Concur Labs) to discuss their upcoming book “Building ML Pipelines” and why they decided on TensorFlow Extended (TFX) for their work. Read “Building ML Pipelines” → https://goo.gle/35NGmPo Watch more episodes of TensorFlow Meets → https://goo.gle/TensorFlow-Meets Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow

Faster Neural Network Training with Data Echoing (Paper Explained)


CPUs are often bottlenecks in Machine Learning pipelines. Data fetching, loading, preprocessing and augmentation can be slow to a point where the GPUs are mostly idle. Data Echoing is a technique to re-use data that is already in the pipeline to reclaim this idle time and keep the GPUs busy at all times. https://ift.tt/2NXVUur Abstract: In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce "data echoing," which reduces the total computation used by earlier pipeline stages and speeds up training whenever computation upstream from accelerators dominates the training time. Data echoing reuses (or "echoes") intermediate outputs from earlier pipeline stages in order to reclaim idle capacity. We investigate the behavior of different data echoing algorithms on various workloads, for various amounts of echoing, and for various batch sizes. We find that in all settings, at least one data echoing algorithm can match the baseline's predictive performance using less upstream computation. We measured a factor of 3.25 decrease in wall-clock time for ResNet-50 on ImageNet when reading training data over a network. Authors: Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, May 12, 2020

Can We Teach a Robot Hand To Keep Learning?


❤️ Check out Linode here and get $20 free credit on your account: https://ift.tt/2LaDQJb 📝 The paper "Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation" is available here: https://ift.tt/2Yc6l1c ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://ift.tt/2icTBUb - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Group Normalization (Paper Explained)


The dirty little secret of Batch Normalization is its intrinsic dependence on the training batch size. Group Normalization attempts to achieve the benefits of normalization without batch statistics and, most importantly, without sacrificing performance compared to Batch Normalization. https://ift.tt/2HVFc77 Abstract: Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems --- BN's error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN's usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries. Authors: Yuxin Wu, Kaiming He Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, May 11, 2020

AI Weekly Update - May 11th, 2020 (#20)


Thank you for watching! Please Subscribe! Machine Learning Street Talk: https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ Paper Links: Deep Learning with Graph-Structured Representations: https://ift.tt/2yw1gGr Yoshua Bengio ICLR 2020 Keynote: https://ift.tt/3dzm8vf Transformers are GNNs: https://ift.tt/2vv1zzK Meta-Learning Curiosity Algorithms: https://ift.tt/2zEq8fp A Critical Analysis of Self-Supervised Learning: https://ift.tt/3brDDw6 The Rebirth of Robotics: https://ift.tt/3bbhems Four Novel Approaches to Manipulating Fabric: https://ift.tt/3b8ZDfd AI and Efficiency: https://ift.tt/3dnOvwf Exploring Bayesian Optimization: https://ift.tt/2yw1gGr Offline RL Survey: https://ift.tt/3dCav6K Steerability in GANs: https://ift.tt/2ynKucF The Enhanced POET: https://ift.tt/2WyXyDP Pattern-Exploiting Training: https://ift.tt/3dCn2Y8 Deformer: https://ift.tt/3dyDb0w

Concept Learning with Energy-Based Models (Paper Explained)


This is a hard paper! Energy-functions are typically a mere afterthought in current machine learning. A core function of the Energy - its smoothness - is usually not exploited at inference time. This paper takes a stab at it. Inferring concepts, world states, and attention masks via gradient descent on a learned energy function leads to an interesting framework with many possibilities. Paper: https://ift.tt/2DscjBd Blog: https://ift.tt/2SWCtCp Videos: https://ift.tt/2WJBN4t Abstract: Many hallmarks of human intelligence, such as generalizing from limited experience, abstract reasoning and planning, analogical reasoning, creative problem solving, and capacity for language require the ability to consolidate experience into concepts, which act as basic building blocks of understanding and reasoning. We present a framework that defines a concept by an energy function over events in the environment, as well as an attention mask over entities participating in the event. Given few demonstration events, our method uses inference-time optimization procedure to generate events involving similar concepts or identify entities involved in the concept. We evaluate our framework on learning visual, quantitative, relational, temporal concepts from demonstration events in an unsupervised manner. Our approach is able to successfully generate and identify concepts in a few-shot setting and resulting learned concepts can be reused across environments. Example videos of our results are available at this http URL Authors: Igor Mordatch Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, May 10, 2020

[News] Google’s medical AI was super accurate in a lab. Real life was a different story.


A closer look at a story of how the deployment of AI brings its own challenges and what can go wrong. https://ift.tt/2VH7IDh Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, May 9, 2020

This AI Does Nothing In Games…And Still Wins!


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation for this paper is available here: https://ift.tt/3cuWJ65 📝 The paper "Adversarial Policies" is available here: https://ift.tt/2uFKQJH 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Big Transfer (BiT): General Visual Representation Learning (Paper Explained)


One CNN to rule them all! BiT is a pre-trained ResNet that can be used as a starting point for any visual task. This paper explains what it takes to pre-train such a large model and details how fine-tuning on downstream tasks is done best. Paper: https://ift.tt/2MTC9kO Code & Models: TBA Abstract: Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance. Authors: Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, May 8, 2020

Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)


When AI makes a plan it usually does so step by step, forward in time. But often it is beneficial to define intermediate goals to divide a large problem into easier sub-problems. This paper proposes a generalization of MCTS that searches not for the best next actions to take, but for the best way to sub-divide the problem recursively into problems so tiny that they can each be solved in a single step. Paper: https://ift.tt/3cj5znk Site: https://ift.tt/2yuv7iH Abstract: Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments. Authors: Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, May 7, 2020

Long Short-Term Memory for NLP (NLP Zero to Hero - Part 5)


Welcome to episode 5 of our Natural Language Processing with TensorFlow series. In this video we’re going to take a look at how to manage the understanding of context in language across longer sentences, where we can see that the impact of a word early in the sentence can determine the meaning and semantics of the end of the sentence. We’ll use something called an LSTM, or Long Short-Term Memory to achieve this. NLP Zero to Hero playlist → https://goo.gle/nlp-z2h Subscribe to the TensorFlow channel → https://goo.gle/TensorFlow

WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)


An in-depth look at this channel's analytics. Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, May 6, 2020

AI Summarizes Scientific Papers


This video explains the TLDR dataset from AI2!! This is a really cool dataset that can facilitate the work involved with researching machine learning! This video also explores the BART summarization model used as a baseline for this task! Thanks for watching, Please Subscribe! Paper Links: TLDR: https://ift.tt/3diopuA TLDR AI2 Demo: https://ift.tt/2KQwHxT BART: https://ift.tt/2oNKlKK BART HuggingFace Docs: https://ift.tt/2yySzew BERT: https://ift.tt/2pMXn84 ELI5: https://ift.tt/3do5Hlr? XSum: https://ift.tt/2A66qcK CNN/DM: https://ift.tt/2oKCzN4 Cora Dataset: https://ift.tt/2WbMBcx Compressive Transformers: https://ift.tt/2uqNAdY Thanks for watching! Please Subscribe!

Finally, A Blazing Fast Fluid Simulator!


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf The shown blog post is available here: https://ift.tt/339uzcB 📝 The paper "Fast Fluid Simulations with Sparse Volumes on the GPU" is available here: https://ift.tt/2CbxXF5 📸 Our Instagram page is available here: https://ift.tt/2KBCNkT 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Reinforcement Learning with Augmented Data (Paper Explained)


This ONE SIMPLE TRICK can take a vanilla RL algorithm to achieve state-of-the-art. What is it? Simply augment your training data before feeding it to the learner! This can be dropped into any RL pipeline and promises big improvements across the board. Paper: https://ift.tt/3fpqB5A Code: https://ift.tt/2xJJH5u Abstract: Learning from visual observations is a fundamental yet challenging problem in reinforcement learning (RL). Although algorithmic advancements combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) sample efficiency of learning and (b) generalization to new environments. To this end, we present RAD: Reinforcement Learning with Augmented Data, a simple plug-and-play module that can enhance any RL algorithm. We show that data augmentations such as random crop, color jitter, patch cutout, and random convolutions can enable simple RL algorithms to match and even outperform complex state-of-the-art methods across common benchmarks in terms of data-efficiency, generalization, and wall-clock speed. We find that data diversity alone can make agents focus on meaningful information from high-dimensional observations without any changes to the reinforcement learning method. On the DeepMind Control Suite, we show that RAD is state-of-the-art in terms of data-efficiency and performance across 15 environments. We further demonstrate that RAD can significantly improve the test-time generalization on several OpenAI ProcGen benchmarks. Finally, our customized data augmentation modules enable faster wall-clock speed compared to competing RL techniques. Our RAD module and training code are available at this https URL. Authors: Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, May 5, 2020

TAPAS: Weakly Supervised Table Parsing via Pre-training (Paper Explained)


Answering complex questions about tabular information is hard. No two tables are alike and sometimes the answer you're looking for is not even in the table and needs to be computed from a subset of the cells. Surprisingly, this model can figure it all out by itself through some clever input encoding and loss engineering. Paper: https://ift.tt/2SDcfEG Code: https://ift.tt/2zQL3fo Abstract: Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition, the generated logical forms are only used as an intermediate step prior to retrieving the denotation. In this paper, we present TAPAS, an approach to question answering over tables without generating logical forms. TAPAS trains from weak supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding aggregation operator to such selection. TAPAS extends BERT's architecture to encode tables as input, initializes from an effective joint pre-training of text segments and tables crawled from Wikipedia, and is trained end-to-end. We experiment with three different semantic parsing datasets, and find that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL and WIKITQ, but with a simpler model architecture. We additionally find that transfer learning, which is trivial in our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art. Authors: Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Martin Eisenschlos Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, May 4, 2020

Chip Placement with Deep Reinforcement Learning (Paper Explained)


The AI Singularity is here! Computers designing new computers! It takes human experts multiple weeks to design new computer chips. What looks like a large game of Tetris is actually a very complex optimization problem. This paper uses Deep Reinforcement Learning to solve this optimization both faster and better than humans. https://ift.tt/2zipSm0 Abstract: In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks. Authors: Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, May 3, 2020

3 Lessons from Interviewing Jonathan Frankle


Please Subscribe to Machine Learning Street Talk! https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ Full Conversation with Jonathan: https://www.youtube.com/watch?v=SfjJoevBbjU&t=1320s Paper Links discussed in this video: Linear Mode Connectivity and the Lottery Ticket Hypothesis: https://ift.tt/2Q1qYYj Dissecting Pruned Neural Networks: https://ift.tt/2ybGkVe

I talk to the new Facebook Blender Chatbot


This is what a 9 Billion parameter transformer can do. I take a look at FAIR's new paper "Recipes for building an open-domain chatbot" and try out their chatbot live! Jump to 3:00 to see the chatbot in action. Paper: https://ift.tt/3aSrjEY Blog: https://ift.tt/2Yk9U5i Code: https://ift.tt/3aNdEPv Abstract: Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available under the collective name Blender. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models. Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, May 2, 2020

Neural Network Dreams About Beautiful Natural Scenes


❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf The shown blog post is available here: https://ift.tt/2QgzovF 📝 The paper "Manipulating Attributes of Natural Scenes via Hallucination" is available here: https://ift.tt/2VVY001 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Jukebox: A Generative Model for Music (Paper Explained)


This generative model for music can make entire songs with remarkable quality and consistency. It can be conditioned on genre, artist, and even lyrics. Blog: https://ift.tt/2WbzKpE Paper: https://ift.tt/2WpqUoq Code: https://ift.tt/2Wnd8CJ Abstract: We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable. We are releasing thousands of non cherry-picked samples, along with model weights and code. Authors: Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, May 1, 2020

[ML Coding Tips] Separate Computation & Plotting using locals


Here's a lazy way to separate computation and subsequent analysis in a notebook without the overhead of manually saving local variables. WARNING: Don't do this in a serious project. Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Neural Networks from Scratch - P.4 Batches, Layers, and Objects


Neural Networks from Scratch book: https://nnfs.io NNFSiX Github: https://ift.tt/2VybXkn Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3 Neural Networks IN Scratch (the programming language): https://youtu.be/eJ1HdTZAcn4 Python 3 basics: https://ift.tt/37OxERs Intermediate Python (w/ OOP): https://ift.tt/2UKxT97 Mug link for fellow mug aficionados: https://amzn.to/2KFwsWn Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://ift.tt/2AZiVqD Support the content: https://ift.tt/2qsKFOO Twitter: https://twitter.com/sentdex Instagram: https://ift.tt/2J4Oa4h Facebook: https://ift.tt/1OI3cwB Twitch: https://ift.tt/2pcWGaq #nnfs #python #neuralnetworks