ROCm 6.4.0 released

Published by

ROCm 6.4.0 has been officially released, featuring significant updates and new capabilities over its predecessor. A key highlight is the introduction of new kernel support for the Megatron-LM framework tailored for ROCm, designed to optimize the training of large language models on AMD GPUs. This includes the addition of fused kernels such as Fused Attention, Layer Norm, and ROPE. Furthermore, on AMD Instinct MI300X systems, users can now utilize the Core Partitioned X-celerator (CPX) mode alongside Non-Uniform Memory Access (NUMA) Per Socket (NPS4) memory mode, which enhances performance for smaller language models.

The new ROCm version also enhances compatibility between the kernel-mode GPU driver and user-space software, allowing users to mix and match versions from up to a year apart, provided hardware support is maintained. The documentation has been updated to improve clarity and usability, particularly for Radeon GPUs used in workstation environments.

Additional enhancements in ROCm 6.4.0 include:

1. Support for PyTorch Versions: Compatibility is extended to PyTorch 2.5 and 2.6.
2. Improved Codec Support: VP9 support has been added to rocDecode and rocPyDecode.
3. Updates to Profiling Tools: The ROCm Compute Profiler and Systems Profiler have received new features and enhancements, including experimental support for multi-node profiling.
4. Enhancements to Libraries: Various libraries, such as rocWMMA and hipTensor, have been optimized for better performance and reduced binary sizes.
5. New Modules in ROCm Data Center Tool: Additional metrics and modules have been integrated, enhancing GPU resource management and monitoring.

The release also introduces improvements in the ROCm Offline Installer Creator, adding support for Oracle Linux 9 and better dependency management for other distributions.

With ROCm 6.4.0, users can expect a more robust and flexible platform for developing and deploying GPU-accelerated applications, particularly in AI and machine learning contexts. As the ecosystem evolves, further enhancements and deprecations are planned, emphasizing the importance of staying updated with the latest releases for optimal performance and support.

In summary, ROCm 6.4.0 not only enhances existing features but also introduces several new capabilities, positioning AMD's ROCm as a competitive platform for high-performance computing and deep learning applications. Users are encouraged to review the detailed release notes and documentation for comprehensive insights into compatibility, installation, and support for various frameworks and tools

ROCm 6.4.0 released

ROCm 6.4.0 has been released, introducing several notable changes since the previous release. These include new support for kernels in the Megatron-LM framework for ROCm, which is a special version of the powerful Megatron-LM, made to help train large language models efficiently on AMD GPUs. The Megatron-LM fork now supports fused kernels such as Fused Attention, Fused Layer Norm, and Fused ROPE. On AMD Instinct MI300X systems, Core Partitioned X-celerator (CPX) mode is now supported in combination with the Non-Uniform Memory Access (NUMA) Per Socket (NPS4) memory mode, enabling better performance with small language models. ROCm 6.4.0 also improves kernel-mode GPU Driver (KMD) and user space software compatibility, allowing users to choose a combination of AMD Kernel-mode GPU Driver (KMD) and ROCm user space software from ROCm releases up to a year apart. 

ROCm 6.4.0 released @ Linux Compatible