Prof. Vicente Ordóñez R.

Department of Computer Science at Rice University, Houston

Speaker 3

Envisioning the Next Generation of Vision and Language Models

Training large scale models that learn about the world purely through language has proven impressive in terms of the capabilities that these models can acquire. However, models that are trained with text and images have also produced an impressive set of recent results. I will summarize the extent to which vision-and-language models have the potential to replace some purely visually trained models and some of the evolution and progress of vision-and-language models throughout the years. I will also use the opportunity to discuss some recent works in my group in this area including CLIP-Lite, an effort to investigate how to train CLIP models on limited and scarce data, Attention-Mask-Consistency (AMC), a technique to improve the visual grounding capabilities of vision-and-language models by aligning them with human provided explanations, and SynCLIP which is an effort to improve the compositional reasoning of vision-and-language models through the use of synthetically and procedurally generated data.

Prof. Vicente Ordóñez Román is an Associate Professor in the Department of Computer Science at Rice University where he directs a research group focusing on computer vision, natural language processing and machine learning. He is also an Amazon Visiting Academic at Amazon Alexa AI. His focus is on building efficient visual recognition models that can perform tasks that leverage both images and text. He is a recipient of a Best Paper Award at the conference on Empirical Methods in Natural Language Processing (EMNLP) 2017 and the Best Paper Award -- Marr Prize at the International Conference on Computer Vision (ICCV) 2013. He has also been the recipient of an NSF CAREER Award, an IBM Faculty Award, a Google Faculty Research Award, a Facebook Research Award, and a Google Inclusion Research Award. Previously, he was Assistant Professor in the Department of Computer Science at the University of Virginia. Vicente obtained his PhD in Computer Science at the University of North Carolina at Chapel Hill, an MS at Stony Brook University, and an engineering degree at the Escuela Superior Politécnica del Litoral in Ecuador. In the past, he has also been a visiting researcher at the Allen Institute for Artificial Intelligence and a visiting professor at Adobe Research.