Developing computer vision systems that talk back
- Machine perception: Seeking a deeper understanding (2014 VaCAS Annual Report)
- Through human eyes: It takes a crowd to teach a computer to see (2013 VaCAS Annual Report)
Download the PDF version
of this article (990 kB).
Devi Parikh, an assistant professor of electrical and computer engineering (ECE), has received a Young Investigator Program (YIP) award from the Army Research Office. The three-year award is for $150,000, and will support the development of computer vision systems that can alert the operator when circumstances are likely to cause the system to fail. According to Parikh, "having fewer unpleasant surprises improves the overall user experience and gains operator trust." She plans to reduce these surprises.
These computer vision systems will be able to identify the potential causes of failure and describe the problem using human terms. For example, Parikh explains that a system trying to recognize a specific person's face might tell the operator that it will fail if there is poor lighting, if the face is turned away, or if the person is wearing sunglasses. This lets the operator know how to best deploy the system: if an operator knows that the system will probably fail with poor lighting, she might be able to reposition the camera for better light. With this kind of communication, even in cases where the operator can't improve the situation, he would at least know when to trust it and when not to.
Another scenario involves an operator who notices a situation where the system will fail and can raise a flag to invoke a different recognition algorithm specific to that case. The system will start by giving the user a "specification sheet"—a list of situations that might cause the program to fail. This empowers the user to watch out for any of these situations and tell the system when to use an alternate algorithm.
Adaptation to new challenges
Parikh's research also opens possibilities for improving the system itself, either by using better training data or by self-compensating for potential failures. Some failure situations can be alleviated with additional specific training data. If a facial recognition system reports that it is likely to fail when a woman is wearing heavy makeup, the system might be improved when trained with more pictures of women wearing makeup, Parikh suggests.
Other systems can be programmed to compensate for certain failures. For example, a system identifying a person from a video feed could be programmed to first focus on frames where a person is facing the camera—when the algorithm has the highest chance of success—and then apply those identifications to the difficult frames where the person is turned away.
In order to determine and describe potential failure scenarios, Parikh will use clustering techniques to group the failures by similar features, then use crowdsourcing to allow a large number of people to describe the clusters. One cluster of failed faces that were not properly identified might turn out to be faces of people wearing sunglasses, and the people looking at the cluster would note that the similarity is the sunglasses.
This research has many applications, ranging from autonomous vehicle perception to mining large quantities of visual data on social media.
Parikh teaches an Introduction to Computer Vision class and an Advanced Computer Vision class that focuses on reading and critiquing recent and classical papers. In addition, she and Assistant Professor Dhruv Batra organize a machine vision and learning reading group where students can learn about cutting-edge techniques and get feedback on their work. The two recently organized a Mid-Atlantic Computer Vision (MACV) workshop at Virginia Tech, which was attended by more than 100 computer vision faculty members and students from neighboring universities, including Carnegie Mellon University, Duke University, George Mason University, George Washington University, Johns Hopkins University, the University of Maryland, and the University of North Carolina.