notesum.ai
Published at December 10Learning Self-Supervised Audio-Visual Representations for Sound Recommendations
cs.CV
cs.MM
cs.SD
eess.AS
Released Date: December 10, 2024
Authors: Sudha Krishnamurthy1
Aff.: 1Sony Interactive Entertainment, San Mateo, CA

| Encoder | Loss | Acc(%) |
| baseline | BCE | 69.5 |
| attention | BCE+margin | 87.4 |
| contrastive | ||
| fine-tuned | BCE+margin | 87.8 |
| attention | contrastive |