How does a human understand a 3-D structure? Obviously, by identifying lines, shapes symmetries and the patterns and relationships between them in things like buildings, sidewalks and everyday objects. But can a computer be taught to do the same?
Zihan Zhou, assistant professor of information sciences and technology at Penn State, plans to develop a new data-driven framework for structure discovery, leveraging the availability of massive visual data and recent advances in machine learning techniques.
These techniques could then be applied to a wide spectrum of real-world computer vision problems, including 3-D modeling of urban environments, virtual and augmented reality, and autonomous driving. The research could also impact cognitive sciences, by suggesting new computational mechanisms for image understanding; and human-robot interaction, by enabling robots to reason in terms of geometric shape, physics, and dynamics.
“We want a computer to see 3-D space as humans do,” said Zhou. “This particular award and project are about structure perception, which has been largely ignored in 3-D vision. This is something that has not been done before.”
“If a robot recognizes something as a specific type of structure, then it knows how to interact with it,” said Zhou. “For example, if a robot is able to recognize a structure with a flat top, it would know that it could put an object like a cup on it.”
This framework may also impact the work of architects, designers, and engineers.
“If you think of those architects, they are working with 3-D models every day,” said Zhou. “If they build something, they first create line drawings. So if a computer can understand doors and windows in the drawings, it would be very useful for architectural design and engineering.”
Zhou did his intern with Adobe and there he developed an interest in this topic. He also studied the relationship between camera motion and environment.
“I tried to extract some kinds of structures from the videos and the sequence of the camera,” he said. “At that point it was to analyze camera trajectory for the movie industry, but later we realized it was more systematic.”
Now, at Penn State, Zhou hopes to leverage the interdisciplinary network to advance his work.
“IST has people working in diverse areas, and many of them can be impacted by this kind of work,” he said. “This has generated a lot of interest in different areas. We are looking to extend this beyond and to find applications to make this more collaborative.”
“About 70 percent of information we obtain is from visual cues from our eyes,” he concluded. “Obviously we have areas like natural language processing to help understand speaking and sounds, but human vision is the dominating factor in how we understand this world. To make the computer see the world as we do is one of the most exciting areas in artificial intelligence and computer science.”