Tuesday, October 20, 2020

Vokenization Explained!


This video explains a new approach to Visually supervise Language models that achieves performance gains on Language-Only tasks like the GLUE benchmark and SQuAD question answering. This is done by constructing a token-image matching (vokens) and classifying corresponding tokens with a a weakly supervised loss function. Thanks for watching! Please Subscribe! Paper Links: Vokenization: https://ift.tt/3lYmiAy ImageBERT: https://ift.tt/398zzAe VilBERT: https://ift.tt/3dKAWbD LXMERT: https://ift.tt/31lEAE0 UNITER: https://ift.tt/31r8Du6 Visual Genome: https://ift.tt/1lVDTtg 12-in-1: Multi-task Vision and Language Representation Learning: https://ift.tt/2H7VZcD How Context Affects Language Models' Factual Predictions: https://ift.tt/3o2SrZA Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines: https://ift.tt/3dBurrA ConVIRT: https://ift.tt/2IRJsKV Climbing towards NLU: https://ift.tt/2IRJsKV Weak Supervision: A New Programming Paradigm for Machine Learning: https://ift.tt/2Tt0Bim Thanks for watching! Chapters 0:00 Introduction 1:16 Idea of Vision-Language Models 2:40 Overview of Vokenization 3:38 Voken Examples 4:45 Weak Supervision 6:00 Image Retrieval for Supervision 7:47 What is Grounded Language? 8:25 Issues with Existing Datasets 10:28 Exciting Results for Vision-Language! 13:07 Multi-Modal Learning 14:45 On Meaing, Form, and Understanding 16:04 Information Retrieval in NLP

No comments:

Post a Comment