Interview with Clarifai founder Matthew Zeiler

Published September 28, 2015   |   

Matthew Zeiler is the founder and CEO of deep-learning startup Clarifai, which uses machine learning and deep neural networks to develop the world’s advanced image recognition system. He received a Ph.D. in machine learning and image recognition, and his research produced the top 5 results in the 2013 ImageNet classification competition.


1. Tell us about Clarifai. How was it created?
I created Clarifai straight out of my PhD at New York University. I had the chance to work with some of the pioneers of neural networks (Geoff Hinton in my undergrad degree at U of Toronto and Yann LeCun while doing my PhD under the supervision of Rob Fergus at NYU). Learning from these pioneers helped me push the limits of neural networks for object recognition in my final year of my PhD. As I built demos around these models and tried my own images I instantly knew they were ready to solve real world applications. I saw a real opportunity to expand these models into products valued and loved by people, developers, and businesses of all types, and therefore I started Clarifai. Immediately after starting Clarifai, the very first models I created won the top 5 places in the international ImageNet 2013 competition, putting Clarifai in front of everyone from startups to fortune 500 companies.
2. How does the technology work, what models do you typically use and why?
The technology is based on what we call artificial neural networks. They are computer algorithms that were originally designed to simulate how the brain works. This is a very loose meaning of them because nobody fully understands how the brain operates, but these algorithms resemble neurological findings quite closely. They are built up in what are known as neural network layers. The more layers, the deeper the model is said to be. Hence the alternative name to these methods, deep learning.
In practice, the power in these algorithms is derived from their ability to map input data to output data by only looking at the data itself. The more data you show it, the smarter it gets because it sees more variations in the world. With image recognition, the input data are typically pixels and the output data are typically categories that represent the things you want to recognize. The layers of the network start in a random state, but soon adapt to predict the output categories. In this training process, the layers form a collective understanding of the input image, from colors and edges to corners and curves all the way up to full object representations and predictions of the corresponding categories.
3. What are the new advances in image-video recognition technology that have fueled the “Deep Learning” renaissance in recent years? And how do you compete with giants like Google, Yahoo, Facebook and Microsoft…?
Neural network algorithms had an initial surge of interest in the 1980s, and little has changed algorithmically since then. The two key components driving recent advancement are the vast increases in computational power and the availability of training data. For computation, graphics processors have become the “goto” computation device, performing about 30x faster than traditional computer CPUs. This was first used around 5 years ago and enabled much more complex models to process much more data, ultimately providing markedly better performance. Graphics processors also happen to be small, relatively cheap, and easy to operate without a massive data center.
Additionally, with the exponential increase in data across the internet, valuable data sources have emerged to help train these systems. This is one of Clarifai’s core focuses on unlocking the potential value living within existing enterprise data.
4. What are the challenges you face at the moment?
The biggest challenge currently is keeping up with the growth of the company. We’re always looking for more of the world’s best researchers and engineers to tackle the world’s most difficult machine learning problems. We have a lot of exciting new features queued up and a new suite of products coming soon. 5. You launched Clarifai to bring large scale deep learning into everyday use. Tell us about some of your future applications. I would love to when they are ready for launch. We’ll have plenty to sell you soon!
6. Who are the major customers of Clarifai’s technology?
We have built an incredible platform solving problems across numerous industries, including events & weddings, travel, real estate, ecommerce, consumer photography, social media, advertising, brand analytics, and medical imagery. To highlight a few interesting applications, we work with Style Me Pretty, a wedding site, to enable search over large collections of user-uploaded wedding photos. In the travel industry, we work with Trivago to automatically analyze hotel photos to determine which have ocean views, and distinguish shots of the bedroom and of the lobby. We are also helping to push the limits of medical imagery. We can train our systems to understand images from x-rays, CT scans, and MRIs. While those are immediate and useful applications, we’re most interested in the potential impact of new devices outside the hospital. To that end, we are partnering with device makers to use images from within the ear, nose, and mouth canals to diagnose dozens of diseases without the need for a doctor, which allows the treatment center to shift out of the hospital and into nursing homes, schools and developing countries where our technology can fundamentally change lives.
Matthew Zeiler will be speaking at the RE.WORK Deep Learning Summit, in San Franicsco on 28-29 January, 2016. Book your ticket using our discount code BDMS20 for 20% off tickets! For more information and to register, visit the event website here.