Artificial intelligence program by a Greek lead
A team of researchers led by a Greek IT specialist created an Artificial Intelligence program that “reads” people’s lips remotely with great precision. The program was developed by researchers from Google and its British subsidiary, Deep Mind, which pioneered internationally in Artificial Intelligence, led by the Ph.D. candidate in mechanical learning, Yannis Assael.
For the millions of people who can not hear, reading the lips of others is a “window” for communication, beyond the sign language. But “reading” the lips is not easy and is often inaccurate.
The new “smart” system has an average error rate of 41% in the correct understanding of the words that make up the lips. This percentage may seem high, but the best computational method to date had had a failure rate of 77%, so the new program has almost reduced the mistakes to half.
The researchers, led by Assael and Brendan Shillingford, who made the relevant pre-publication in arXiv, according to the “Science”, created algorithms better than any others in the past, which are doing a better job even compared to professional lip readers.
The creation of lip-readable algorithms has been terribly difficult to date. Researchers fed their system with 140,000 hours of YouTube videos showing people talking in English along with the corresponding transcripts. Then, they left the machine learning system alone to learn how to “connect” the different movements of the lips with the corresponding phonemes and eventually with the corresponding words.
The system is based on artificial neural networks, that is, a group of algorithms that each performs a different and simpler work, while at the same time are all associated and collaborating to process the information, just like human brain neurons do.
After the system “self-educated” itself, it was tested by the researchers on reading people’s lips for a 37-minute video that the system had never before “seen”. The program gave errors to 41% of the words, but people – even oral-reading specialists – who saw the same video, had an average failure rate of 93% – in real conditions, of course, not from a video, the rate of human failure is somewhat smaller, as the human brain can use other elements, such as the body language of the speaker.
In any case, although it is a real progress, it is clear that a 40% failure rate in word recognition means that the system still needs substantial improvement. When this happens, the system can be used more widely in everyday life.
Yannis Assael was a student of Anatolia College in Thessaloniki and studied Applied Informatics at the University of Macedonia (2008-2013). He then graduated in computer science at Oxford University where he now completes his Ph.D. in Mechanical Learning while working on Google’s Deep Mind.