Using the Radeon Compute Profiler

The Radeon Compute Profiler provides two main modes of operation

Application Trace Mode

This mode provides a high-level overview of an OpenCL or ROCm/HSA application. It collects the following data:

  1. An API Trace, showing all OpenCL of HSA APIs called by the application, including their arguments and return values.
  2. A timeline showing the call sequence and duration of all OpenCL or HSA APIs called by the host as well as inter-device data transfers and kernels executing on a device.
  3. A set of Summary Pages, providing a set of statistics for the application, as well as the results of detailed analysis of the application.

Performance Counter Mode

This mode collects the following data from the AMD Radeon GPU or APU for each kernel dispatched to the device by either an OpenCL or ROCm/HSA application:

  1. Hardware performance counters.
  2. Statistics from the compiler for each kernel dispatched.

The performance counters and statistics can be used to discover bottlenecks in a particular kernel.

This mode also can also extract the kernel source code, the generated IL code, and the compiled ISA code for an OpenCL kernel dispatched to a GPU.

Additional Data

For OpenCL programs, both profiling modes can also generate Kernel Occupancy information for each kernel dispatched to a GPU. For HSA applications Kernel Occupancy information is only available in Performance Counter mode. See Kernel Occupancy for more information.

Usage Model

To use the profiler, simply run the rcprof executable with the desired command line options, passing in the name of the executable to be profiled. The profiler will launch the executable, and track all data until the application runs to completion. Once it is complete, the profiler data files will be written to disk.

See Command Line Documentation for more information on the supported command line options.