Free artificial intelligence and machine learning video tutorial resource: June 2020

Tuesday, June 30, 2020

Object-Centric Learning with Slot Attention (Paper Explained)

Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at. By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in both supervised and unsupervised settings. It does so by introducing a slot attention module that combines an attention mechanism with dynamic routing. OUTLINE: 0:00 - Intro & Overview 1:40 - Problem Formulation 4:30 - Slot Attention Architecture 13:30 - Slot Attention Algorithm 21:30 - Iterative Routing Visualization 29:15 - Experiments 36:20 - Inference Time Flexibility 38:35 - Broader Impact Statement 42:05 - Conclusion & Comments Paper: https://ift.tt/2VsMQ1W My Video on Facebook's DETR: https://youtu.be/T35ba_VXkMY My Video on Attention: https://youtu.be/iDulhoQ2pro My Video on Capsules: https://youtu.be/nXGHJTtFYRU Abstract: Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks. Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, June 29, 2020

Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)

We've become very good at making generative models for images and classes of images, but not yet of sets of images, especially when the number of sets is unknown and can contain sets that have never been encountered during training. This paper builds a probabilistic framework and a practical implementation of a generative model for sets of images based on variational methods. OUTLINE: 0:00 - Intro & Overview 1:25 - Problem Statement 8:05 - Architecture Overview 20:05 - Probabilistic Model 33:50 - Likelihood Function 40:30 - Model Architectures 44:20 - Loss Function & Optimization 47:30 - Results 58:45 - Conclusion Paper: https://ift.tt/2Vt8tPQ Abstract: Images with shared characteristics naturally form sets. For example, in a face verification benchmark, images of the same identity form sets. For generative models, the standard way of dealing with sets is to represent each as a one hot vector, and learn a conditional generative model p(x|y). This representation assumes that the number of sets is limited and known, such that the distribution over sets reduces to a simple multinomial distribution. In contrast, we study a more generic problem where the number of sets is large and unknown. We introduce Set Distribution Networks (SDNs), a novel framework that learns to autoencode and freely generate sets. We achieve this by jointly learning a set encoder, set discriminator, set generator, and set prior. We show that SDNs are able to reconstruct image sets that preserve salient attributes of the inputs in our benchmark datasets, and are also able to generate novel objects/identities. We examine the sets generated by SDN with a pre-trained 3D reconstruction network and a face verification network, respectively, as a novel way to evaluate the quality of generated sets of images. Authors: Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Carlos Guestrin, Josh M. Susskind Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, June 28, 2020

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection (Paper Explained)

Object detection often does not occur in a vacuum. Static cameras, such as wildlife traps, collect lots of irregularly sampled data over a large time frame and often capture repeating or similar events. This model learns to dynamically incorporate other frames taken by the same camera into its object detection pipeline. OUTLINE: 0:00 - Intro & Overview 1:10 - Problem Formulation 2:10 - Static Camera Data 6:45 - Architecture Overview 10:00 - Short-Term Memory 15:40 - Long-Term Memory 20:10 - Quantitative Results 22:30 - Qualitative Results 30:10 - False Positives 32:50 - Appendix & Conclusion Paper: https://ift.tt/38ftsL9 My Video On Attention Is All You Need: https://youtu.be/iDulhoQ2pro Abstract: In static monitoring cameras, useful contextual information can stretch far beyond the few seconds typical video understanding models might see: subjects may exhibit similar behavior over multiple days, and background objects remain static. Due to power and storage constraints, sampling frequencies are low, often no faster than one frame per second, and sometimes are irregular due to the use of a motion trigger. In order to perform well in this setting, models must be robust to irregular sampling rates. In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera. Specifically, we propose an attention-based approach that allows our model, Context R-CNN, to index into a long term memory bank constructed on a per-camera basis and aggregate contextual features from other frames to boost object detection performance on the current frame. We apply Context R-CNN to two settings: (1) species detection using camera traps, and (2) vehicle detection in traffic cameras, showing in both settings that Context R-CNN leads to performance gains over strong baselines. Moreover, we show that increasing the contextual time horizon leads to improved results. When applied to camera trap data from the Snapshot Serengeti dataset, Context R-CNN with context from up to a month of images outperforms a single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution based baseline) by 11.2% mAP. Authors: Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, June 27, 2020

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (Paper Explained)

Backpropagation is one of the central components of modern deep learning. However, it's not biologically plausible, which limits the applicability of deep learning to understand how the human brain works. Direct Feedback Alignment is a biologically plausible alternative and this paper shows that, contrary to previous research, it can be successfully applied to modern deep architectures and solve challenging tasks. OUTLINE: 0:00 - Intro & Overview 1:40 - The Problem with Backpropagation 10:25 - Direct Feedback Alignment 21:00 - My Intuition why DFA works 31:20 - Experiments Paper: https://ift.tt/3873KZ7 Code: https://ift.tt/2VoGOzI Referenced Paper by Arild Nøkland: https://ift.tt/2cPIbzi Abstract: Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport. Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

How Do Neural Networks Learn? 🤖

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of a previous work we covered is available here: https://ift.tt/3cuWJ65 📝 The paper "CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization" is available here: https://ift.tt/3d1pfvC 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Friday, June 26, 2020

On the Measure of Intelligence by François Chollet - Part 3: The Math (Paper Explained)

In this part, we go over the formal definition of the measure of intelligence. In order to do this, we have to frame and quantify the notions of generalization difficulty, priors, and experience in terms of algorithmic complexity. OUTLINE: 0:00 - Intro & Recap 2:50 - Concept Schema 10:00 - Algorithmic Complexity 13:00 - Definitions 15:25 - Generalization Difficulty 18:55 - Developer Aware Generalization Difficulty 22:40 - Priors 25:10 - Experience 30:50 - The Measure Of Intelligence 38:00 - An Ideal Intelligence Benchmark 42:30 - Conclusion Paper: https://ift.tt/2CfFoxr Part 1: https://youtu.be/3_qGrmD6iQY Part 2: https://youtu.be/THcuTJbeD34 Abstract: To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans. Authors: François Chollet Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, June 25, 2020

Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)

Neural networks are very good at predicting systems' numerical outputs, but not very good at deriving the discrete symbolic equations that govern many physical systems. This paper combines Graph Networks with symbolic regression and shows that the strong inductive biases of these models can be used to derive accurate symbolic equations from observation data. OUTLINE: 0:00 - Intro & Outline 1:10 - Problem Statement 4:25 - Symbolic Regression 6:40 - Graph Neural Networks 12:05 - Inductive Biases for Physics 15:15 - How Graph Networks compute outputs 23:10 - Loss Backpropagation 24:30 - Graph Network Recap 26:10 - Analogies of GN to Newtonian Mechanics 28:40 - From Graph Network to Equation 33:50 - L1 Regularization of Edge Messages 40:10 - Newtonian Dynamics Example 43:10 - Cosmology Example 44:45 - Conclusions & Appendix Paper: https://ift.tt/2CqXiQY Code: https://ift.tt/2VinUu0 Abstract: We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn. Authors: Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, Shirley Ho Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, June 23, 2020

An AI Made All of These Faces

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of this paper is available here: https://ift.tt/31cJX9g You can even play with their notebook below! https://ift.tt/3hU1mJH 📝 The paper "Adversarial Latent Autoencoders" is available here: https://ift.tt/3bvoqLk 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Women at DeepMind | Applying for Technical Roles

It’s no secret that the gender gap still exists within STEM. Despite a slight increase in recent years, studies show that women only make up about a quarter of the overall STEM workforce in the UK, for example. While the reasons vary, many women report feeling held back by a lack of representation, clear opportunities and information on what working in the sector actually involves. Closing the gap within STEM is not a quick fix. Various organisations like Women in Machine Learning (WiML) actively work to help create a more inclusive environment where the successes of women are amplified. They also stand as an important point of information for the many women who want to learn more about what it’s like to work in STEM. For this year’s International Women in Engineering Day, we asked the Women in Machine Learning (WiML) community to share their questions about technical interviewing. To answer and to discuss what it’s actually like to work at DeepMind, we brought together Mihaela Rosca (Research Engineer), Feryal Behbahani (Research Scientist) and Kate Parkyn (Recruitment Lead - Research & Engineering). Read more advice from Mihaela, Feryal and Kate on our blog: https://ift.tt/2B1Lt3r Find out more about WiML: https://ift.tt/2UuZein Find out more about DeepMind: https://deepmind.com/

RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)

Counting repeated actions in a video is one of the easiest tasks for humans, yet remains incredibly hard for machines. RepNet achieves state-of-the-art by creating an information bottleneck in the form of a temporal self-similarity matrix, relating video frames to each other in a way that forces the model to surface the information relevant for counting. Along with that, the authors produce a new dataset for evaluating counting models. OUTLINE: 0:00 - Intro & Overview 2:30 - Problem Statement 5:15 - Output & Loss 6:25 - Per-Frame Embeddings 11:20 - Temporal Self-Similarity Matrix 19:00 - Periodicity Predictor 25:50 - Architecture Recap 27:00 - Synthetic Dataset 30:15 - Countix Dataset 31:10 - Experiments 33:35 - Applications 35:30 - Conclusion & Comments Paper Website: https://ift.tt/37YgJMH Colab: https://ift.tt/3hTn3d1 Abstract: We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Authors: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, June 22, 2020

DeepMind x UCL | Deep Learning Lectures | 10/12 | Unsupervised Representation Learning

Unsupervised learning is one of the three major branches of machine learning (along with supervised learning and reinforcement learning). It is also arguably the least developed branch. Its goal is to find a parsimonious description of the input data by uncovering and exploiting its hidden structures. This is presumed to be more reminiscent of how the brain learns compared to supervised learning. Furthermore, it is hypothesised that the representations discovered through unsupervised learning may alleviate many known problems with deep supervised and reinforcement learning. However, lacking an explicit ground truth goal to optimise towards, developmental progress in unsupervised learning has been slow. In this talk DeepMind Research Scientist Irina Higgins and DeepMind Research Engineer Mihaela Rosca give an overview the historical role of unsupervised representation learning and difficulties with developing and evaluating such algorithms. They then take a multidisciplinary approach to think about what might make a good representation and why, before doing a broad overview of the current state of the art approaches to unsupervised representation learning. Download the slides here: https://ift.tt/3fNRyPS Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speaker Bios: Irina is a research scientist at DeepMind, where she works in the Frontiers team. Her work aims to bring together insights from the fields of neuroscience and physics to advance general artificial intelligence through improved representation learning. Before joining DeepMind, Irina was a British Psychological Society Undergraduate Award winner for her achievements as an undergraduate student in Experimental Psychology at Westminster University, followed by a DPhil at the Oxford Centre for Computational Neuroscience and Artificial Intelligence, where she focused on understanding the computational principles underlying speech processing in the auditory brain. During her DPhil, Irina also worked on developing poker AI, applying machine learning in the finance sector, and working on speech recognition at Google Research. Mihaela Rosca is a Research Engineer at DeepMind and PhD student at UCL, focusing on generative models research and probabilistic modelling, from variational inference to generative adversarial networks and reinforcement learning. Prior to joining DeepMind, she worked for Google on using deep learning to solve natural language processing tasks. She has an MEng in Computing from Imperial College London. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

DeepMind x UCL | Deep Learning Lectures | 9/12 | Generative Adversarial Networks

Generative adversarial networks (GANs), first proposed by Ian Goodfellow et al. in 2014, have emerged as one of the most promising approaches to generative modeling, particularly for image synthesis. In their most basic form, they consist of two "competing" networks: a generator which tries to produce data resembling a given data distribution (e.g., images), and a discriminator which predicts whether its inputs come from the real data distribution or from the generator, guiding the generator to produce increasingly realistic samples as it learns to "fool" the discriminator more effectively. This lecture discusses the theory behind these models, the difficulties involved in optimising them, and theoretical and empirical improvements to the basic framework. It also discusses state-of-the-art applications of this framework to other problem formulations (e.g., CycleGAN), domains (e.g., video and speech synthesis), and their use for representation learning (e.g., VAE-GAN hybrids, bidirectional GAN). Note: this lecture was originally advertised as number 11 in the series. Download the slides here: https://ift.tt/3828cIV Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speaker Bios: Jeff Donahue is a research scientist at DeepMind on the Deep Learning team, currently focusing on adversarial generative models and unsupervised representation learning. He has worked on the BigGAN, BigBiGAN, DVD-GAN, and GAN-TTS projects. He completed his Ph.D. at UC Berkeley, focusing on visual representation learning, with projects including DeCAF, R-CNN, and LRCN, some of the earliest applications of transferring deep visual representations to traditional computer vision tasks such as object detection and image captioning. While at Berkeley he also co-led development of the Caffe deep learning framework, which was awarded with the Mark Everingham Prize in 2017 for contributions to the computer vision community. Mihaela Rosca is a Research Engineer at DeepMind and PhD student at UCL, focusing on generative models research and probabilistic modelling, from variational inference to generative adversarial networks and reinforcement learning. Prior to joining DeepMind, she worked for Google on using deep learning to solve natural language processing tasks. She has an MEng in Computing from Imperial College London. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

DeepMind x UCL | Deep Learning Lectures | 8/12 | Attention and Memory in Deep Learning

Attention and memory have emerged as two vital new components of deep learning over the last few years. This lecture by DeepMind Research Scientist Alex Graves covers a broad range of contemporary attention mechanisms, including the implicit attention present in any deep network, as well as both discrete and differentiable variants of explicit attention. It then discusses networks with external memory and explains how attention provides them with selective recall. It briefly reviews transformers, a particularly successful type of attention network, and lastly looks at variable computation time, which can be seen as a form of 'attention by concentration'. Download the slides here: https://ift.tt/310FLJM Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speaker Bio: Alex Graves completed a BSc in Theoretical Physics at the University of Edinburgh, Part III Maths at the University of Cambridge and a PhD in artificial intelligence at IDSIA with Jürgen Schmidhuber, followed by postdocs at the Technical University of Munich and with Geoff Hinton at the University of Toronto. He is now a research scientist at DeepMind. His contributions include the Connectionist Temporal Classification algorithm for sequence labelling (widely used for commercial speech and handwriting recognition), stochastic gradient variational inference, the Neural Turing Machine / Differentiable Neural Computer architectures, and the A2C algorithm for reinforcement learning. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

DeepMind x UCL | Deep Learning Lectures | 7/12 | Deep Learning for Natural Language Processing

This lecture, by DeepMind Research Scientist Felix Hill, is split into three parts. First, he discusses the motivation for modelling language with ANNs: language is highly contextual, typically non-compositional and relies on reconciling many competing sources of information. This section also covers Elman's Finding Structure in Time and simple recurrent networks, the importance of context and transformers. In the second part, he explores unsupervised and representation learning for language from Word2Vec to BERT. Finally, Felix discusses situated language understanding, grounding and embodied language learning. Download the slides here: https://ift.tt/37OqX2g Find out more about how DeepMind increases access to science here: https://ift.tt/3dnjF7D Speaker Bio: Felix Hill is a Research Scientist working on grounded language understanding, and has been at DeepMind for almost 4 years. He studied pure maths as an undergrad, then got very interested in linguistics and psychology after reading the PDP books by McClelland and Rumelhart, so started graduate school at the University of Cambridge, and ended up in the NLP group. To satisfy his interest in artificial neural networks, he visited Yoshua Bengio's lab in 2013 and started a series of collaborations with Kyunghyun Cho and Yoshua applying neural nets to text processing. This led to some of the first work on transfer learning with sentence representations (and a neural crossword solver). He also interned at FAIR in NYC with Jason Weston. At DeepMind, he's worked on developing agents that can understand language in the context of interactive 3D worlds, together with problems relating to mathematical and analogical reasoning. About the lecture series: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning. In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.

Sunday, June 21, 2020

SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

Implicit neural representations are created when a neural network is used to represent a signal as a function. SIRENs are a particular type of INR that can be applied to a variety of signals, such as images, sound, or 3D shapes. This is an interesting departure from regular machine learning and required me to think differently. OUTLINE: 0:00 - Intro & Overview 2:15 - Implicit Neural Representations 9:40 - Representing Images 14:30 - SIRENs 18:05 - Initialization 20:15 - Derivatives of SIRENs 23:05 - Poisson Image Reconstruction 28:20 - Poisson Image Editing 31:35 - Shapes with Signed Distance Functions 45:55 - Paper Website 48:55 - Other Applications 50:45 - Hypernetworks over SIRENs 54:30 - Broader Impact Paper: https://ift.tt/2NcBNpo Website: https://ift.tt/2CeIu7T Abstract: Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions. Authors: Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, Gordon Wetzstein Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, June 20, 2020

Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)

This paper proposes SimCLRv2 and shows that semi-supervised learning benefits a lot from self-supervised pre-training. And stunningly, that effect gets larger the fewer labels are available and the more parameters the model has. OUTLINE: 0:00 - Intro & Overview 1:40 - Semi-Supervised Learning 3:50 - Pre-Training via Self-Supervision 5:45 - Contrastive Loss 10:50 - Retaining Projection Heads 13:10 - Supervised Fine-Tuning 13:45 - Unsupervised Distillation & Self-Training 18:45 - Architecture Recap 22:25 - Experiments 34:15 - Broader Impact Paper: https://ift.tt/3hKMixV Code: https://ift.tt/39HpA5o Abstract: One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9\% ImageNet top-1 accuracy with just 1\% of the labels (≤13 labeled images per class) using ResNet-50, a 10× improvement in label efficiency over the previous state-of-the-art. With 10\% of labels, ResNet-50 trained with our method achieves 77.5\% top-1 accuracy, outperforming standard supervised training with all of the labels. Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

NVIDIA’s AI Recreated PacMan! 👻

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of this paper is available here: https://ift.tt/3hNA4Vm 📝 The paper "Learning to Simulate Dynamic Environments with GameGAN" is available here: https://ift.tt/2A0E8jT Our paper with the neural renderer is available here: https://ift.tt/2HhNzx5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Friday, June 19, 2020

On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)

In this part, we go much more in-depth into the relationship between intelligence, generality, skill, experience, and prior knowledge and take a close look at what priors are built into humans. This will form the basis for comparing the intelligence of humans and AI systems. OUTLINE: 0:00 - Intro & Recap 3:00 - Optimize for Generality 5:45 - Buying Skill with Data and Priors 12:40 - The Human Scope 17:30 - Human Priors 24:05 - Core Knowledge 28:50 - Comments & Conclusion Paper: https://ift.tt/2CfFoxr Tim Scarfe's Video: https://youtu.be/GpWLZUbPhr0 Abstract: To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans. Authors: François Chollet Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, June 18, 2020

Image GPT: Generative Pretraining from Pixels (Paper Explained)

BERT and GPT-2/3 have shown the enormous power of using generative models as pre-training for classification tasks. However, for images, pre-training is usually done with supervised or self-supervised objectives. This paper investigates how far you can get when applying the principles from the world of NLP to the world of images. OUTLINE: 0:00 - Intro & Overview 2:50 - Generative Models for Pretraining 4:50 - Pretraining for Visual Tasks 7:40 - Model Architecture 15:15 - Linear Probe Experiments 24:15 - Fine-Tuning Experiments 30:25 - Conclusion & Comments Paper: https://ift.tt/2YKKAEf Blog: https://ift.tt/2Yap1hh Code: https://ift.tt/2YdnreJ Abstract: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features. Authors: Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, June 17, 2020

Surprise Video With Our New Paper On Material Editing! 🔮

📝 Our "Photorealistic Material Editing Through Direct Image Manipulation" paper is available here: https://ift.tt/2EytbF6 The previous paper with the microplanet scene is available here: https://ift.tt/2HhNzx5 ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://ift.tt/2icTBUb - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m #NeuralRendering

BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)

Self-supervised representation learning relies on negative samples to keep the encoder from collapsing to trivial solutions. However, this paper shows that negative samples, which are a nuisance to implement, are not necessary for learning good representation, and their algorithm BYOL is able to outperform other baselines using just positive samples. OUTLINE: 0:00 - Intro & Overview 1:10 - Image Representation Learning 3:55 - Self-Supervised Learning 5:35 - Negative Samples 10:50 - BYOL 23:20 - Experiments 30:10 - Conclusion & Broader Impact Paper: https://ift.tt/30XKSds Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods intrinsically rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using the standard linear evaluation protocol with a ResNet-50 architecture and 79.6% with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, June 16, 2020

TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)

Image-to-Image translation usually requires corresponding samples or at least domain labels of the dataset. This paper removes that restriction and allows for fully unsupervised image translation of a source image to the style of one or many reference images. This is achieved by jointly training a guiding network that provides style information and pseudo-labels. OUTLINE: 0:00 - Intro & Overview 1:20 - Unsupervised Image-to-Image Translation 7:05 - Architecture Overview 14:15 - Pseudo-Label Loss 19:30 - Encoder Style Contrastive Loss 25:30 - Adversarial Loss 31:20 - Generator Style Contrastive Loss 35:15 - Image Reconstruction Loss 36:55 - Architecture Recap 39:55 - Full Loss 42:05 - Experiments Paper: https://ift.tt/3e2yM6P Code: https://ift.tt/2Y7j8kR Abstract: Every recent image-to-image translation model uses either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision at minimum. However, even the set-level supervision can be a serious bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose the truly unsupervised image-to-image translation method (TUNIT) that simultaneously learns to separate image domains via an information-theoretic approach and generate corresponding images using the estimated domain labels. Experimental results on various datasets show that the proposed method successfully separates domains and translates images across those domains. In addition, our model outperforms existing set-level supervised methods under a semi-supervised setting, where a subset of domain labels is provided. The source code is available at this https URL Authors: Kyungjune Baek, Yunjey Choi, Youngjung Uh, Jaejun Yoo, Hyunjung Shim Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, June 15, 2020

A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)

Even though LSTMs and GRUs solve the vanishing and exploding gradient problems, they have trouble learning to remember things over very long time spans. Inspired from bistability, a property of biological neurons, this paper constructs a recurrent cell with an inherent memory property, with only minimal modification to existing architectures. OUTLINE: 0:00 - Intro & Overview 1:10 - Recurrent Neural Networks 6:00 - Gated Recurrent Unit 14:40 - Neuronal Bistability 22:50 - Bistable Recurrent Cell 31:00 - Neuromodulation 32:50 - Copy First Benchmark 37:35 - Denoising Benchmark 48:00 - Conclusion & Comments Paper: https://ift.tt/2Y5WedU Code: https://ift.tt/2ACofRg Abstract: Recurrent neural networks (RNNs) provide state-of-the-art performances in a wide variety of tasks that require memory. These performances can often be achieved thanks to gated recurrent cells such as gated recurrent units (GRU) and long short-term memory (LSTM). Standard gated cells share a layer internal state to store information at the network level, and long term memory is shaped by network-wide recurrent connection weights. Biological neurons on the other hand are capable of holding information at the cellular level for an arbitrary long amount of time through a process called bistability. Through bistability, cells can stabilize to different stable states depending on their own past state and inputs, which permits the durable storing of past information in neuron state. In this work, we take inspiration from biological neuron bistability to embed RNNs with long-lasting memory at the cellular level. This leads to the introduction of a new bistable biologically-inspired recurrent cell that is shown to strongly improves RNN performance on time-series which require very long memory, despite using only cellular connections (all recurrent connections are from neurons to themselves, i.e. a neuron state is not influenced by the state of other neurons). Furthermore, equipping this cell with recurrent neuromodulation permits to link them to standard GRU cells, taking a step towards the biological plausibility of GRU. Authors: Nicolas Vecoven, Damien Ernst, Guillaume Drion Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, June 14, 2020

On the Measure of Intelligence (Introduction)

This video will present some of the ideas in "On the Measure of Intelligence". I think it's really interesting to think of the scale of generalization from absent, local, broad, to extreme generalization. I think local to broad will quickly become the same idea with the use of data augmentation and generative models. Extreme generalization seems highly ambitious, but I think something like POET at least puts an actionable framework together to think about it. The video will also briefly mention human prior knowledge, the cognitive hierarchy, and benchmarks in AI. Thanks for watching! Please Subscribe! Please also check out Yannic Kilcher's part 1 series on this paper and be on the lookout for Tim's overview of this on "Machine Learning Dojo with Tim Scarfe". We will be releasing our discussion of the paper on Machine Learning Street Talk very soon. Paper Links: On the Measure of Intelligence: https://ift.tt/36TesBG Yannic Kilcher Part 1: Foundations: https://www.youtube.com/watch?v=3_qGrmD6iQY DermGAN: https://ift.tt/2vwCAMt On the Steerability of GANs: https://ift.tt/2PxnVdn POET: https://ift.tt/2xUnFwp AI-GAs: https://ift.tt/325yHZi Animal AI Olympics: https://ift.tt/338skGA Thanks again for watching!

SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow

The Lottery Ticket Hypothesis has shown that it's theoretically possible to prune a neural network at the beginning of training and still achieve good performance, if we only knew which weights to prune away. This paper does not only explain where other attempts at pruning fail, but provides an algorithm that provably reaches maximum compression capacity, all without looking at any data! OUTLINE: 0:00 - Intro & Overview 1:00 - Pruning Neural Networks 3:40 - Lottery Ticket Hypothesis 6:00 - Paper Story Overview 9:45 - Layer Collapse 18:15 - Synaptic Saliency Conservation 23:25 - Connecting Layer Collapse & Saliency Conservation 28:30 - Iterative Pruning avoids Layer Collapse 33:20 - The SynFlow Algorithm 40:45 - Experiments 43:35 - Conclusion & Comments Paper: https://ift.tt/2UFVA4m Code: https://ift.tt/3fzNK4H My Video on the Lottery Ticket Hypothesis: https://youtu.be/ZVVnvZdUMUk Street Talk about LTH: https://youtu.be/SfjJoevBbjU Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.9 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that data must be used to quantify which synapses are important. Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, June 13, 2020

How Well Can an AI Learn Physics? ⚛

❤️ Check out Lambda here and sign up for their GPU Cloud: https://ift.tt/35NkCT7 📝 The paper "Learning to Simulate Complex Physics with Graph Networks" is available here: https://ift.tt/2YshsBw https://ift.tt/2BYWu5z 🌊 The thesis on fluids is available here: https://ift.tt/2pj2zSw ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://ift.tt/2icTBUb - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)

Determining the stability properties of differential systems is a challenging task that involves very advanced symbolic and numeric mathematical manipulations. This paper shows that given enough training data, a simple language model with no underlying knowledge of mathematics can learn to solve these problems with remarkably high accuracy. OUTLINE: 0:00 - Intro & Overview 3:15 - Differential System Tasks 11:30 - Datasets & Models 15:15 - Experiments 21:00 - Discussion & My Comments Paper: https://ift.tt/3foFILU My Video on Deep Learning for Symbolic Mathematics: https://youtu.be/p3sAF3gVMMA Abstract: Can advanced mathematical computations be learned from examples? Using transformers over large generated datasets, we train models to learn properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect estimates of qualitative characteristics of the systems, and good approximations of numerical quantities, demonstrating that neural networks can learn advanced theorems and complex computations without built-in mathematical knowledge. Authors: François Charton, Amaury Hayat, Guillaume Lample Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, June 12, 2020

VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)

Pre-training a CNN backbone for visual transfer learning has recently seen a big push into the direction of incorporating more data, at the cost of less supervision. This paper investigates the opposite: Visual transfer learning by pre-training from very few, but very high-quality samples on an image captioning task. OUTLINE: 0:00 - Intro & Overview 1:00 - Pre-Training for Visual Tasks 3:40 - Quality-Quantity Tradeoff 5:50 - Image Captioning 8:35 - VirTex Method 14:30 - Linear Classification 20:30 - Ablations 22:05 - Fine-Tuning 25:45 - Attention Visualization 27:30 - Conclusion & Remarks Paper: https://ift.tt/2MNsw6Z Code: https://ift.tt/3cTZFZ1 Abstract: The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet. Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, we aim to learn high-quality visual representations from fewer images. To this end, we revisit supervised pretraining, and seek data-efficient alternatives to classification-based pretraining. We propose VirTex -- a pretraining approach using semantically dense captions to learn visual representations. We train convolutional networks from scratch on COCO Captions, and transfer them to downstream recognition tasks including image classification, object detection, and instance segmentation. On all tasks, VirTex yields features that match or exceed those learned on ImageNet -- supervised or unsupervised -- despite using up to ten times fewer images. Authors: Karan Desai, Justin Johnson Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, June 11, 2020

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Transformers are notoriously resource-intensive because their self-attention mechanism requires a squared number of memory and computations in the length of the input sequence. The Linformer Model gets around that by using the fact that often, the actual information in the attention matrix is of lower rank and can be approximated. OUTLINE: 0:00 - Intro & Overview 1:40 - The Complexity of Self-Attention 4:50 - Embedding Dimension & Multiple Heads 8:45 - Formal Attention 10:30 - Empirical Investigation into RoBERTa 20:00 - Theorem: Self-Attention is Low Rank 28:10 - Linear Self-Attention Method 36:15 - Theorem: Linear Self-Attention 44:10 - Language Modeling 46:40 - NLP Benchmarks 47:50 - Compute Time & Memory Gains 48:20 - Broader Impact Statement 49:55 - Conclusion Paper: https://ift.tt/3fc6l6t Abstract: Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses O(n2) time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n2) to O(n) in both time and space. The resulting linear transformer, the \textit{Linformer}, performs on par with standard Transformer models, while being much more memory- and time-efficient. Authors: Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Wednesday, June 10, 2020

End-to-End Adversarial Text-to-Speech (Paper Explained)

Text-to-speech engines are usually multi-stage pipelines that transform the signal into many intermediate representations and require supervision at each step. When trying to train TTS end-to-end, the alignment problem arises: Which text corresponds to which piece of sound? This paper uses an alignment module to tackle this problem and produces astonishingly good sound. OUTLINE: 0:00 - Intro & Overview 1:55 - Problems with Text-to-Speech 3:55 - Adversarial Training 5:20 - End-to-End Training 7:20 - Discriminator Architecture 10:40 - Generator Architecture 12:20 - The Alignment Problem 14:40 - Aligner Architecture 24:00 - Spectrogram Prediction Loss 32:30 - Dynamic Time Warping 38:30 - Conclusion Paper: https://ift.tt/2A4yJsw Website: https://ift.tt/2MNVAuR Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs. Our proposed generator is feed-forward and thus efficient for both training and inference, using a differentiable monotonic interpolation scheme to predict the duration of each input token. It learns to produce high fidelity audio through a combination of adversarial feedback and prediction losses constraining the generated audio to roughly match the ground truth in terms of its total duration and mel-spectrogram. To allow the model to capture temporal variation in the generated audio, we employ soft dynamic time warping in the spectrogram-based prediction loss. The resulting model achieves a mean opinion score exceeding 4 on a 5 point scale, which is comparable to the state-of-the-art models relying on multi-stage training and additional supervision. Authors: Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Tuesday, June 9, 2020

OpenAI’s Jukebox AI Writes Amazing New Songs 🎼

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation of this paper is available here: https://ift.tt/2XLLkJR 📝 The paper "Jukebox: A Generative Model for Music" is available here: https://ift.tt/2WbzKpE 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Thumbnail background image credit: https://ift.tt/3f9x9o0 Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)

Code migration between languages is an expensive and laborious task. To translate from one language to the other, one needs to be an expert at both. Current automatic tools often produce illegible and complicated code. This paper applies unsupervised neural machine translation to source code of Python, C++, and Java and is able to translate between them, without ever being trained in a supervised fashion. OUTLINE: 0:00 - Intro & Overview 1:15 - The Transcompiling Problem 5:55 - Neural Machine Translation 8:45 - Unsupervised NMT 12:55 - Shared Embeddings via Token Overlap 20:45 - MLM Objective 25:30 - Denoising Objective 30:10 - Back-Translation Objective 33:00 - Evaluation Dataset 37:25 - Results 41:45 - Tokenization 42:40 - Shared Embeddings 43:30 - Human-Aware Translation 47:25 - Failure Cases 48:05 - Conclusion Paper: https://ift.tt/2Uk8NQk Abstract: A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin. Authors: Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Monday, June 8, 2020

JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!

Join me to solve the NeurIPS 2020 challenge on multi-agent reinforcement learning in the flatland environment. This challenge has participants optimize a complex train scheduling system, subject to accidents, delays and re-routing. The plan is to solve this as a community with no expectations of winning and fully in the open. Discord: https://ift.tt/3dJpBrR Community GitHub Repo: https://ift.tt/2YglTz5 Neurips 2020 Flatland Challenge: https://ift.tt/2AStbB3 Flatland Environment: https://ift.tt/2MHvCZX OUTLINE: 0:00 - Intro 1:00 - The Flatland Environment 2:00 - The NeurIPS 2020 Flatland Challenge 3:20 - Let's do this as a Community 4:10 - Ground Rules 6:15 - Conclusion Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ift.tt/3dJpBrR BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Sunday, June 7, 2020

BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)

Proper evaluation of text generation models, such as machine translation systems, requires expensive and slow human assessment. As these models have gotten better in previous years, proxy-scores, like BLEU, are becoming less and less useful. This paper proposes to learn a proxy score and demonstrates that it correlates well with human raters, even as the data distribution shifts. OUTLINE: 0:00 - Intro & High-Level Overview 1:00 - The Problem with Evaluating Machine Translation 5:10 - Task Evaluation as a Learning Problem 10:45 - Naive Fine-Tuning BERT 13:25 - Pre-Training on Synthetic Data 16:50 - Generating the Synthetic Data 18:30 - Priming via Auxiliary Tasks 23:35 - Experiments & Distribution Shifts 27:00 - Concerns & Conclusion Paper: https://ift.tt/2y7CYSL Code: https://ift.tt/2Yc7y8G Abstract: Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgments. We propose BLEURT, a learned evaluation metric based on BERT that can model human judgments with a few thousand possibly biased training examples. A key aspect of our approach is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT provides state-of-the-art results on the last three years of the WMT Metrics shared task and the WebNLG Competition dataset. In contrast to a vanilla BERT-based approach, it yields superior results even when the training data is scarce and out-of-distribution. Abstract: Thibault Sellam, Dipanjan Das, Ankur P. Parikh Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Saturday, June 6, 2020

This AI Helps Controlling Virtual Quadrupeds! 🐕

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their instrumentation for this previous work is available here: https://ift.tt/2wthYVQ 📝 The paper "CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion" is available here: https://ift.tt/2MwUvHX 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)

Neural Architecture Search is usually prohibitively expensive in both time and resources to be useful. A search strategy has to keep evaluating new models, training them to convergence in an inner loop to find out if they are any good. This paper proposes to abstract the problem and extract the essential part of the architecture to be optimized into a smaller version and evaluates that version on specifically custom learned data points to predict its performance, which is much faster and cheaper than running the full model. OUTLINE: 0:00 - Intro & High-Level Overview 1:00 - Neural Architecture Search 4:30 - Predicting performance via architecture encoding 7:50 - Synthetic Petri Dish 12:50 - Motivating MNIST example 18:15 - Entire Algorithm 23:00 - Producing the synthetic data 26:00 - Combination with architecture search 27:30 - PTB RNN-Cell Experiment 29:20 - Comments & Conclusion Paper: https://ift.tt/2ZJWfFl Code: https://ift.tt/3gYZ4Zx Abstract: Neural Architecture Search (NAS) explores a large space of architectural motifs -- a compute-intensive process that often involves ground-truth evaluation of each motif by instantiating it within a large network, and training and evaluating the network with thousands of domain-specific data samples. Inspired by how biological motifs such as cells are sometimes extracted from their natural environment and studied in an artificial Petri dish setting, this paper proposes the Synthetic Petri Dish model for evaluating architectural motifs. In the Synthetic Petri Dish, architectural motifs are instantiated in very small networks and evaluated using very few learned synthetic data samples (to effectively approximate performance in the full problem). The relative performance of motifs in the Synthetic Petri Dish can substitute for their ground-truth performance, thus accelerating the most expensive step of NAS. Unlike other neural network-based prediction models that parse the structure of the motif to estimate its performance, the Synthetic Petri Dish predicts motif performance by training the actual motif in an artificial setting, thus deriving predictions from its true intrinsic properties. Experiments in this paper demonstrate that the Synthetic Petri Dish can therefore predict the performance of new motifs with significantly higher accuracy, especially when insufficient ground truth data is available. Our hope is that this work can inspire a new research direction in studying the performance of extracted components of models in an alternative controlled setting. Authors: Aditya Rawal, Joel Lehman, Felipe Petroski Such, Jeff Clune, Kenneth O. Stanley Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Friday, June 5, 2020

CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)

Many object detectors focus on locating the center of the object they want to find. However, this leaves them with the secondary problem of determining the specifications of the bounding box, leading to undesirable solutions like anchor boxes. This paper directly detects the top left and the bottom right corners of objects independently, along with descriptors that allows to match the two later and form a complete bounding box. For this, a new pooling method, called corner pooling, is introduced. OUTLINE: 0:00 - Intro & High-Level Overview 1:40 - Object Detection 2:40 - Pipeline I - Hourglass 4:00 - Heatmap & Embedding Outputs 8:40 - Heatmap Loss 10:55 - Embedding Loss 14:35 - Corner Pooling 20:40 - Experiments Paper: https://ift.tt/2Eq7VDA Code: https://ift.tt/2S31D08 Abstract: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors. Authors: Hei Law, Jia Deng Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Thursday, June 4, 2020

Movement Pruning: Adaptive Sparsity by Fine-Tuning (REUPLOAD w/ better sound)

Deep neural networks are large models and pruning has become an important part of ML product pipelines, making models small while keeping their performance high. However, the classic pruning method, Magnitude Pruning, is suboptimal in models that are obtained by transfer learning. This paper proposes a solution, called Movement Pruning and shows its superior performance. OUTLINE: 0:00 - Intro & High-Level Overview 0:55 - Magnitude Pruning 4:25 - Transfer Learning 7:25 - The Problem with Magnitude Pruning in Transfer Learning 9:20 - Movement Pruning 22:20 - Experiments 24:20 - Improvements via Distillation 26:40 - Analysis of the Learned Weights Paper: https://ift.tt/2ZhGoh8 Code: https://ift.tt/2Bv8rzL Abstract: Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters. Authors: Victor Sanh, Thomas Wolf, Alexander M. Rush Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB

Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)

Wednesday, June 3, 2020

Is Style Transfer For Fluid Simulations Possible? 🌊

❤️ Check out Weights & Biases and sign up for a free demo here: https://ift.tt/2YuG7Yf Their showcased post is available here: https://ift.tt/2A1zIt8 📝 The paper "Lagrangian Neural Style Transfer for Fluids" is available here: https://ift.tt/2zQWLqO 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh More info if you would like to appear here: https://ift.tt/2icTBUb Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://ift.tt/2TnVBd3 Károly Zsolnai-Fehér's links: Instagram: https://ift.tt/2KBCNkT Twitter: https://twitter.com/twominutepapers Web: https://ift.tt/1NwkG9m

Learning To Classify Images Without Labels (Paper Explained)

How do you learn labels without labels? How do you classify images when you don't know what to classify them into? This paper investigates a new combination of representation learning, clustering, and self-labeling in order to group visually similar images together - and achieves surprisingly high accuracy on benchmark datasets. OUTLINE: 0:00 - Intro & High-level Overview 2:15 - Problem Statement 4:50 - Why naive Clustering does not work 9:25 - Representation Learning 13:40 - Nearest-neighbor-based Clustering 28:00 - Self-Labeling 32:10 - Experiments 38:20 - ImageNet Experiments 41:00 - Overclustering Paper: https://ift.tt/3eQqHCd Code: https://ift.tt/2MpCJWP Abstract: Is it possible to automatically classify images without the use of ground-truth annotations? Or when even the classes themselves, are not a priori known? These remain important, and open questions in computer vision. Several approaches have tried to tackle this problem in an end-to-end fashion. In this paper, we deviate from recent works, and advocate a two-step approach where feature learning and clustering are decoupled. First, a self-supervised task from representation learning is employed to obtain semantically meaningful features. Second, we use the obtained features as a prior in a learnable clustering approach. In doing so, we remove the ability for cluster learning to depend on low-level features, which is present in current end-to-end learning approaches. Experimental evaluation shows that we outperform state-of-the-art methods by huge margins, in particular +26.9% on CIFAR10, +21.5% on CIFAR100-20 and +11.7% on STL10 in terms of classification accuracy. Furthermore, results on ImageNet show that our approach is the first to scale well up to 200 randomly selected classes, obtaining 69.3% top-1 and 85.5% top-5 accuracy, and marking a difference of less than 7.5% with fully-supervised methods. Finally, we applied our approach to all 1000 classes on ImageNet, and found the results to be very encouraging. The code will be made publicly available. Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB