Friday, February 7, 2020

ImageBERT


This video explores the ImageBERT model from Microsoft Research! This is a really interesting combination of vision and language tokens to achieve state of the art results on MSCOCO and Flickr30k image and text retrieval tasks! I hope this video helped you get a better sense of how image and text tokens can be combined in the transformer architecture and how self-attention uses visual tokens to inform the text task output of BERT's masked language modeling! Paper Links: ImageBERT: https://ift.tt/398zzAe Conceptual Captions: https://ift.tt/2M1bcIx Thanks for watching! Please Subscribe!

No comments:

Post a Comment