Resource of free step by step video how to guides to get you started with machine learning.
Sunday, April 12, 2020
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
DDL is an auxiliary task for an agent to learn distances between states in episodes. This can then be used further to improve the agent's policy learning procedure. Paper: https://ift.tt/2VsWotk Blog: https://ift.tt/2K0Xxmy Abstract: Reinforcement learning requires manual specification of a reward function to learn a task. While in principle this reward function only needs to specify the task goal, in practice reinforcement learning can be very time-consuming or even infeasible unless the reward function is shaped so as to provide a smooth gradient towards a successful outcome. This shaping is difficult to specify by hand, particularly when the task is learned from raw observations, such as images. In this paper, we study how we can automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state. These dynamical distances can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently. We show that dynamical distances can be used in a semi-supervised regime, where unsupervised interaction with the environment is used to learn the dynamical distances, while a small amount of preference supervision is used to determine the task goal, without any manually engineered reward function or goal examples. We evaluate our method both on a real-world robot and in simulation. We show that our method can learn to turn a valve with a real-world 9-DoF hand, using raw image observations and just ten preference labels, without any other supervision. Videos of the learned skills can be found on the project website: this https URL. Authors: Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://ift.tt/38iX6OV Minds: https://ift.tt/37igBpB
Subscribe to:
Post Comments (Atom)
-
JavaやC++で作成された具体的なルールに従って動く従来のプログラムと違い、機械学習はデータからルール自体を推測するシステムです。機械学習は具体的にどのようなコードで構成されているでしょうか? 機械学習ゼロからヒーローへの第一部ではそのような疑問に応えるため、ガイドのチャー...
-
#deeplearning #noether #symmetries This video includes an interview with first author Ferran Alet! Encoding inductive biases has been a lo...
-
Using GPUs in TensorFlow, TensorBoard in notebooks, finding new datasets, & more! (#AskTensorFlow) [Collection] In a special live ep...
-
How to Do PS2 Filter (Tiktok PS2 Filter Tutorial), AI tiktok filter Create your own PS2 Filter photos with this simple guide! 🎮📸 Please...
-
Challenge scenario You were recently hired as a Machine Learning Engineer at a startup movie review website. Your manager has tasked you wit...
-
#ai #attention #transformer #deeplearning Transformers are famous for two things: Their superior performance and their insane requirements...
-
Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the p...
-
Hello Friends, In this episode we will explore AI tool Craiyan which helps us to create images just by providing the text information. ht...
-
Why are humans so good at video games? Maybe it's because a lot of games are designed with humans in mind. What happens if we change t...
-
#alibi #transformers #attention Transformers are essentially set models that need additional inputs to make sense of sequence data. The mo...
No comments:
Post a Comment