Unifying Vision Representation
Abstract

The rapid advancement of artificial intelligence (AI) techniques, propelled by foundation models like LLM, has ignited transf ormative revolutions. However, unlike language, foundation models in other modalities are trailing behind, presenting hurdles in expanding AI systems to diverse唧lications. Vision, unlike language, comprises natural signals captured from the environment, encompassing diverse representations such as 3D structures like point clouds and meshes, as well as 2D images and videos. Consequently, vision involves a more intricate and redundant representation, making the development of a foundational model within the vision community a formidable task. This seminar aims to explore vision systems through the lens of self-supervised representation learning, a cornerstone in many foundation models including LLM. The talk will assess existing challenges in mainstream vision self-supervised learning methods, propose feasible solutions, and delve into promising directions for further investigation. Additionally, this seminar will discuss future endeavors in developing versatile representations across modalities, tasks, and architectures, which can propel the evolution of the vision foundation model.

 

Speaker: Dr Tong ZHANG
Date: 2 May 2024 (Thusday)
Time: 3:00pm – 4:00pm
PosterClick here

 

Biography

Dr Tong ZHANG received the B.S. and M.S. degrees from Beihang University, Beijing, China, and New York University, New York, United States in 2011 and 2014 respectively, and he received the Ph.D. degree from the Australian National University, Canberra, Australia in 2020. He is working as a postdoctoral researcher at the Image and Visual Representation Lab (IVRL), EPEL. He was awarded the ACCV 2016 Best Student Paper Honorable Mention and the CVPR 2020 Paper Award Nominee. His research interests include subspace clustering, deep geometric learning, 3D vision, and representation learning.