JoonHo (Brian) Lee


I completed by Masters in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. I'm advised by Professor Byron Boots in the UW Robot Learning Lab since Spring 2021. My research focuses on Robotic Vision and Deep Learning. I also received my BS degree at the University of Washington. My current research interest lie in end-to-end learning for autonomous driving, imitation learning, and generalizable perception (open set recognition, domain adaptation).

Email: joonl4(at)cs(dot)washington(dot)edu
LinkedIn Profile
Research Blogs

GitHub Profile

Using Attention in Computer Vision

This blog surveys set of DNN architectures in Computer Vision that uses attention.

(ViT) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale


Vision Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows


Swin Transformer

Shifting Window

Full SWin architecture

(PVT, PVTv2) Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

[PVTv1 ArXiv] [PVTv2 ArXiv]

Pyramid Vision Transformer

Pyramid Vision Transformer architecture

Pyramid Vision Transformer v2

PVT Swin detection comparison

Image GPT


SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds



SWformer details

(left)Strided Sparse Window Partition (right)SWFormer block

Multi-scale feature fusion

SW Diffusion

SW detection results