In-person workshop at NeurIPS 2024

Date: December 14, 2024

Contact Email: sslneurips2024@gmail.com

Location: Room TBD, Vancouver, Canada

Time: 9 AM -- 5:30 PM

Speakers: here

Paper Submission: OpenReview

Self-supervised learning (SSL) is an approach of representation learning that does not rely on human-labeled data. Instead, it creates auxiliary tasks from unlabeled input data and learns representations by solving these tasks. SSL has shown significant success across various domains such as images (e.g., MAE, DINO, MoCo, PIRL, SimCLR), speech (e.g., wav2vec, Whisper), and text (e.g., BERT, GPT, Llama). It has also demonstrated promising results in other data modalities including graphs, time-series, and audio. Recent large language models—predominantly trained on web-scale data using self-supervised methods—have exhibited remarkable generalizability and are beginning to transform numerous research fields. SSL, without using human-provided labels, can achieve performance comparable to or even surpassing that of fully supervised methods. Furthermore, generative SSL techniques such as Imagen, Stable Diffusion, and SORA have significantly enhanced the artistic capabilities of AI models.

Existing research on self-supervised learning (SSL) has primarily concentrated on enhancing empirical performance without substantial theoretical underpinnings. Although SSL approaches are empirically effective across various benchmarks, their theoretical foundations and practical applications remain less explored. Key questions such as the reasons behind the superior performance of certain auxiliary tasks, the requisite amount of unlabeled data for learning effective representations, the impact of neural architectures on SSL performance, and the practical scenarios where SSL outperforms supervised models, are still largely unanswered.

In the 5th iteration of this workshop, we aim to address these gaps by fostering a dialogue between theory and practice, especially in the context of LLMs. We bring together SSL-interested researchers from various domains to discuss the theoretical foundations of empirically well-performing SSL approaches and how the theoretical insights can further improve SSL’s empirical performance.

We invite submissions of both theoretical works and empirical works, and the intersection of the two. The topics include but are not limited to:

  • Theoretical foundations of SSL
  • SSL for computer vision, natural language processing, robotics, speech processing, time-series analysis, graph analytics, etc.
  • Sample complexity of SSL methods
  • Theory-driven design of auxiliary tasks in SSL
  • Comparative analysis of different auxiliary tasks
  • Comparative analysis of SSL and supervised approaches
  • Information theory and SSL
  • SSL for healthcare, social media, neuroscience, biology, social science, etc.
  • Cognitive foundations of SSL

This year, this workshop will have a few invited talks by leading researchers from diverse backgrounds including theoretical ML, CV, NLP, robotics, etc. The workshop will also have contributed talks, poster sessions, and panel discussion session to share perspectives on establishing foundational understanding of existing SSL approaches and theoretically-principled ways of developing new SSL methods. We accept short (max 4 pages) submissions. Each will be peer-reviewed by at least three reviewers. The accepted papers are allowed to be submitted to other conference venues. Please see our call for papers for details.

Previous editions of our workshop: 2020, 2021, 2022, 2023.