Deep Learning for Computer Vision

Deep Convolutional Neural Networks (DNNs) have achieved high performance in visual recognition tasks such as image classification, object detection, and semantic segmentation. At the University of Washington, we design new DNN-based architectures as well as systems for important real-world applications such as digital pathology, expression recognition, and assistive technologies.

Group members

Linda Shapiro

Hannaneh Hajishirzi

Deepali Aneja

Sachin Mehta

Bindita Chaudhuri

Beibin Li

Meredith Wu

Recent projects

Project Page

Efficient Convolutional Neural Networks for Mobile Devices

We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power. ESPNet is 22 times faster (on a standard GPU) and 180 times smaller than the state-of-the-art semantic segmentation network PSPNet, while its category-wise accuracy is only 8% less. We evaluated EPSNet on a variety of semantic segmentation datasets including Cityscapes, PASCAL VOC, and a breast biopsy whole slide image dataset. Under the same constraints on memory and computation, ESPNet outperforms all the current efficient CNN networks such as MobileNet, ShuffleNet, and ENet on both standard metrics and our newly introduced performance metrics that measure efficiency on edge devices. Our network can process high resolution images at a rate of 112 and 9 frames per second on a standard GPU and edge device, respectively.

Project Page

Expression Recognition using Deep Neural Nets

In order to create a successful animated story, the emotional state of a character must be staged so that it is unmistakable and clear. The viewer's perception of its facial expressions is key to successfully staging the emotion. Traditionally animators and automatic expression transfer systems rely on geometric markers and features modeled on human faces to create character expressions, yet these features do not accurately transfer to stylized character faces. Relying on human geometric features alone to generate stylized character expressions leads to expressions that are perceptually confusing or different from the intended expression. Our framework avoids these pitfalls by learning how to transfer human facial expressions to character expressions that are both perceptually consistent and geometrically correct.

Project Page

Digital Pathology: Accuracy, Viewing Behavior and Image Characterization

Pathologic assessment for a cancer diagnosis has for years been the “gold standard” guiding clinical care and research endeavors. However, diagnostic misclassification that can compromise clinical and research outcomes occurs in the interpretation of breast biopsy specimens, particularly at the benign to malignant and in-situ to invasive thresholds. At this point in time, technological advances are enabling widespread use of digitized whole slides. However, while this new technology has not been verified compared to glass slides it provides a much more feasible medium for evaluating visual screening behavior than a traditional microscope. For these reasons, we have designed a research program that will evaluate diagnostic accuracy on digital compared to glass slides and evaluate individuals visual screening of digital images. In-depth scientific evaluation of the visual screening behind each pathologist’s review of a slide, including characteristics of both viewing behavior and image regions they deem important, is critical to understanding the possible sources of errors and identifying the most effective viewing techniques for reducing errors. Bringing together a multi-faceted group of investigators, will move us closer to our ultimate goal of ensuring high-quality clinical care.


  1. Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation, ECCV, 2018.
  2. Sachin Mehta, Ezgi Mercan, Jamen Bartlett, Donald Weaver, Joann Elmore and Linda Shapiro. Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images, MICCAI, 2018.
  3. Sachin Mehta, Ezgi Mercan, Jamen Bartlett, Donald Weaver, Joann Elmore and Linda Shapiro. Learning to Segment Breast Biopsy Whole Slide Images, WACV, 2018.
  4. Deepali Aneja, Bindita Chaudhari, Alex Colburn, Gary Faigin, Linda G. Shapiro, and Barbara Mones. Learning to Generate 3D Stylized Character Expressions from Humans, WACV, 2018.
  5. Deepali Aneja, Alex Colburn, Gary Faigin, Linda G. Shapiro, Barbara Mones. Modeling Stylized Character Expressions via Deep Learning, ACCV, 2016.