Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Predicting and Optimizing Runtime Performance of Deep Learning Models


In this tutorial, we will introduce techniques to easily find the underutilization and performance bottlenecks of GPUs for deep-learning (DL) workloads. After that, we will do a brief introduction to CUDA programming as an example of a current way of addressing typical performance bottlenecks and underutilization in DL workloads. And we will wrap it up by introducing a new DL compiler Hidet (ASPLOS2023 paper), that allows rapid development of performant tensor programs in a higher-level language such as Python.

Scope and Objectives

This tutorial has the following objectives. First, we will demonstrate how to use modern tools to rapidly profile DNN workloads that you can adopt in your day-to-day research and/or work. Second, we will cover the basics of the CUDA programming model to provide the necessary background for the motivation of the new DL compiler Hidet. Thirdly, we will introduce Hidet and demonstrate its expressive power relative to the CUDA. At the end of this tutorial, you will have everything you need to get started with Hidet to rapidly develop performant tensor programs.


To get the most out of the tutorial, please preferably have the following ready when you attend this tutorial:

  1. Bring a laptop computer with Visual Studio Code. Install the Remote-SSH plugin, which will be used to launch profiling on a remote workstation.
  2. (Recommended) Have a remote workstation running Linux with a NVIDIA GPU that you can ssh into. You also need to have Python and CUDA installed.

March 25th, 2023

1:40 – 3:00
Find Inefficiencies and Rapid Model Profiling with CentML DeepView
Yubo Gao
3:00 – 3:20
Brief Introduction to CUDA Programming
Yaoyao Ding
3:20 – 3:40
Coffee Break
3:40 – 5:00
Build Tensor Programs with Hidet in Python
Yaoyao Ding


DeepView Slides

Hidet Slides


Gennady Pekhimenko

Assistant Professor at University of Toronto, CEO at CentML Inc.

Yaoyao Ding

PhD student at University of Toronto, Research SDE at CentML Inc.

Yubo Gao

PhD student at University of Toronto, Research SDE at CentML Inc.

Anand Jayarajan

PhD student at University of Toronto,
Chief Software Architect at CentML Inc.

See also

  • Skyline: Interactive performance profiling and debugging tool for PyTorch neural networks.
  • Hidet: An open-source efficient deep learning framework.