✅ – reduced overhead when running multiple models/processes on the same GPU. ✅ New cuDNN frontend APIs – up to 30% faster attention kernels for transformers. ✅ Windows WSL2 improvements – finally near-native PCIe bandwidth for dual-GPU setups. ⚠️ Breaking change – older CUDA 11.x binaries may need recompilation if using dynamic parallelism.
Just a heads-up for anyone running LLMs, diffusion models, or heavy GPU workloads — the latest NVIDIA CUDA driver (R550+ / CUDA 12.8) brings a few changes worth noting: nvidia cuda driver news
Update if you're running modern transformers or multi-stream workloads. Wait if stuck on legacy CUDA 11.x codebases. nvidia cuda driver news