SCORE! investigates a new way of matching sound to archival videos by using deep learning.

Recent advances in deep learning have allowed us to take raw data, such as images and video, and reduce it to latent representations. A latent representation is a point in a low dimensional space — the latent space — from which the data can be recovered. The most important, high-level variations in the data, such as facial expression or painting style, are mapped to directions in the latent space. Moving the point around in a latent space corresponds to, for instance, turning a photograph of a frowning person into a smiling one, or turning a photograph of a man into one of a woman.

SCORE! aims to be a playful and engaging proof-of-concept, showing the potential of these techniques. Latent semantic data is generated and meaningfully mapped to sounds in order to automatically compose music that directly corresponds to the film’s characteristics: if the scenery changes from a city to nature, or if a character suddenly smiles, the music will respond. The results from SCORE! will provide several user groups (artists, general public and heritage institutes) with new, innovative mechanisms, tools and experiences.

This project is financed by the NWO Creatieve Industrie KIEM programme.

Project partners

Sound and Vision, VU University, Lakker