Building Content Embedding with Self Supervised Learning


Twitter is what's happening in the world right now. In order to understand and organize content on the platform, we leverage a semantic text representation that is useful across a variety of tasks. Because content on Twitter spans a wide range of diverse topics and is constantly changing, supervised training that traditionally relies on human-annotated corpora proves to be expensive and unscalable.
Sijun He and Kenny Leung share their experience building and serving self-supervised content representations for heterogeneous content on Twitter. They also highlight various applications of content embeddings in recommendation systems, as well as the engineering challenge of maintaining such embeddings at scale.


Kenny is interested in bringing human capabilities - particularly language, vision, and the acquisition of everyday knowledge - to modern technology.