Jianbo Shi

I work on computer vision, machine learning, and broader questions in dynamic intelligence. My research has studied image segmentation, grouping, object recognition, human behavior analysis, first-person vision, and computational models that connect perception with action and memory.

I received my B.A. in Computer Science and Mathematics from Cornell University in 1994 and my Ph.D. in Computer Science from UC Berkeley in 1998. Before joining Penn, I was a research faculty member at the Robotics Institute at Carnegie Mellon University.

Dynamic intelligence: visual memory, learning, and systems that reason from visual experience.
Perception: normalized cuts, spectral grouping, segmentation, object recognition, and visual organization.
People: human activity, body analysis, social interactions, and first-person visual understanding.

Research Areas

Embodied First-Person Vision

Models that interpret intention, social interaction, and sensorimotor behavior from egocentric video.

Image Segmentation and Grouping

Graph-based visual grouping, normalized cuts, contour organization, and spectral methods.

Human Recognition and Activity

Algorithms for understanding human motion, activity, identity, gesture, and behavior in video.

Open-source Packages

ncut-pytorch

Nyström Normalized Cuts PyTorch

Normalized Cut with Nyström approximation, run on million-scale graph in milliseconds. O(n) time complexity, O(1) space complexity.

Website Huggingface GitHub Slides

Selected Projects and Publications

Vibe Spaces for Creatively Connecting and Expressing Visual Concepts

Huzheng Yang, Katherine Xu, Andrew Lu, Michael D. Grossberg, Yutong Bai, Jianbo Shi

CVPR 2026

Paper Website GitHub Demo

Artifacts and attention sinks: Structured approximations for efficient vision transformers

Andrew Lu, Wentinn Liao, Liuhui Wang, Huzheng Yang, Jianbo Shi

arXiv

Paper

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Katherine Xu, Lingzhi Zhang, Jianbo Shi

WACV 2025

Paper GitHub

AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space

Huzheng Yang, James Gee*, Jianbo Shi*

arXiv

Paper Website

Brain Decodes Deep Nets

Huzheng Yang, James Gee*, Jianbo Shi*

CVPR 2024 Highlight, 2.8% acceptance rate

Paper Website GitHub Poster

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, ... , Jianbo Shi, et al.

CVPR 2024

A large-scale multimodal multiview video dataset and benchmark for skilled human activity from egocentric and exocentric perspectives.

Paper Website

Amodal Completion via Progressive Mixed Context Diffusion

Katherine Xu, Lingzhi Zhang, Jianbo Shi

CVPR 2024 Highlight Paper, 2.8% of submissions

Paper GitHub Website

Memory Encoding Model

Huzheng Yang, James Gee*, Jianbo Shi*

arXiv Algonauts 2023, competition winner

Paper Website GitHub Leaderboard

Object Detection in Video with Spatiotemporal Sampling Networks

Gedas Bertasius, Lorenzo Torresani, Jianbo Shi

European Conference on Computer Vision, 2018

Paper

Egocentric basketball prediction preview

Egocentric Basketball Motion Planning from a Single Image

Gedas Bertasius, Aaron Chan, Jianbo Shi

Computer Vision and Pattern Recognition, 2018

Paper

Unsupervised Learning of Important Objects from First-Person Videos

Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi

International Conference on Computer Vision, 2017

Paper

Normalized Cuts

Normalized Cuts and Image Segmentation

Jianbo Shi, Jitendra Malik

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000

A graph partitioning framework for image segmentation and perceptual grouping.

Paper Code

Resources

Current Ph.D. Students

Postdoctoral Researchers

Hyun Soo Park

Associate Professor, McKnight Presidential Fellow, University of Minnesota

Ph.D. Dissertation/ Thesis Supervised

Stella Yu, Professor, University of Michigan; Adjunct Professor, UC Berkeley "Computational Models of Perceptual Organization", Carnegie Mellon University, May 2003.
Abhinav Gupta, Professor, Carnegie Mellon University "Beyond Nouns and Verbs", 2009, U. Maryland. Co-advisor with Larry Davis.
Qihui Zhu, Nine Chapters Capital Management "Shape Detection by Packing Contours and Regions", December 2010.
Alexander Toshev, Google "Shape Representations for Object Recognition", December 2010. Co-advisor with K. Daniilidis, B. Taskar, Google.
Praveen Srinivasan, TLM, Perception at Aurora "Holistic Shape-Based Object Recognition Using Bottom-up Image Structures", February 2011.
Elena Bernardis, Assistant Professor of Medical School, UPenn "Finding Dots in Microscopic Images", April 2011. Co-advisor with S. Yu.
Song Gang, Google "Quantitative Analysis of Thoracic Computed Tomography Images", May 2013. Co-advisor with J. Gee.
Katerina Fragkiadaki, Associate Professor, Carnegie Mellon University "Multi-granularity Representations for Human Interactions: Pose, Motion and Intention", September 2013.
Jeff Byrne, STR "Shape Representations using Nested Descriptors", April 2014.
Weiyu Zhang, Apple "Visual Recognition via Matching Discriminative Deformable Patterns", Proposal May 2014.
Gedas Bertasius, Assistant Professor, UNC "Embodied Visual Perception Models for Human Behavior", 2019.
Jyh-Jing Hwang, Research Scientist at Waymo Research "Learning Image Segmentation with Relation-Centric Loss and Representation", 2020.
Lingzhi Zhang, Research Scientist at Adobe "Bridging Visual Generation and Recognition", 2023.