Learn how to write high-performance CUDA kernels directly in Python, using tools and best practices that maximize GPU acceleration. In this NVIDIA GTC 2025 session, we’ll explore kernel structure, memory management, thread coordination, and optimization strategies—making it easier than ever to integrate CUDA into your Python applications. Speaker: Leo Fang, Python CUDA Tech Lead, NVIDIA Watch more NVIDIA GTC sessions on demand: https://www.nvidia.com/en-us/on-demand/?ncid=so-yout-194474-vt33 CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit Topic: Development and Optimization - Programming Languages / Compilers Level: General Interest NVIDIA technology: CUDA,CUDA-X Replay of NVIDIA GTC 2025 session S72449.