Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Predicting and Optimizing Runtime Performance of Deep Learning Models


In this tutorial, we will introduce techniques to easily find the underutilization and performance bottlenecks of GPUs for deep-learning (DL) workloads. After that, we will do a brief introduction to CUDA programming as an example of a current way of addressing typical performance bottlenecks and underutilization in DL workloads. And we will wrap it up by introducing a new DL compiler Hidet (ASPLOS2023 paper), that allows rapid development of performant tensor programs in a higher-level language such as Python.

Scope and Objectives

This tutorial has the following objectives. First, we will demonstrate how to use modern tools to rapidly profile DNN workloads that you can adopt in your day-to-day research and/or work. Second, we will cover the basics of the CUDA programming model to provide the necessary background for the motivation of the new DL compiler Hidet. Thirdly, we will introduce Hidet and demonstrate its expressive power relative to the CUDA. At the end of this tutorial, you will have everything you need to get started with Hidet to rapidly develop performant tensor programs.


Gennady Pekhimenko

Assistant Professor at University of Toronto, CEO at CentML Inc.

Yaoyao Ding

PhD student at University of Toronto, Research SDE at CentML Inc.

Yubo Gao

PhD student at University of Toronto, Research SDE at CentML Inc.

Anand Jayarajan

PhD student at University of Toronto,
Chief Software Architect at CentML Inc.


March 25th, 2023

1:40 – 3:00

Find Inefficiencies and Rapid Model Profiling with CentML DeepView

To get the most out of this tutorial, please read the preparation section.
Yubo Gao
3:00 – 3:20
3:20 – 4:00
Brief Introduction to CUDA Programming
Yaoyao Ding
4:00 – 4:10
4:10 – 5:00
Build Tensor Programs with Hidet in Python
Yaoyao Ding

See also

  • Skyline: Interactive performance profiling and debugging tool for PyTorch neural networks.
  • Hidet: An open-source efficient deep learning framework.