January 13, 2017 | Deep learning for better audio and video content archiving
CEA Tech institute List recently worked with France’s national television broadcasting company to develop an automated, real-time analysis system to facilitate audio and video content annotation
Manually annotating audio and video content does not generally provide complete or consistent enough information. France's national television broadcasting company turned to List to come up with solution to get more out of its content. List leveraged its strong know-how in deep learning for image recognition to develop a tool capable of automatically identifying nearly 15,000 people and around 20 different sports in real time. The institute completed a demonstration of its semantic analysis tool for videos on footage of the 2016 French Open tennis tournament.
List's researchers started with a neural network that was already capable of identifying 2,000 different faces. They added a learning layer to the network to increase the number of faces to what was required for the project. "We searched the internet for images of the 15,000 people on our partner's list and we integrated them into the system," said a List researcher. "This brought our image recognition rate to 95% and gave us a model that we can expand as needed." The researchers also developed a tool to identify around 20 different sports from video footage.
France's major broadcasting companies have expressed interest in List's semantic analysis tools, so there should be ample opportunity to take the results of these advances even further!