HPC Toolkit 4.2.1 review

by rbytes.net on

HPCToolkit is an open-source suite of multi-platform tools for profile-based performance analysis of applications

License: GPL (GNU General Public License)
File size: 31307K
Developer: The Center for High Performance Software Research
0 stars award from rbytes.net

HPCToolkit is an open-source suite of multi-platform tools for profile-based performance analysis of applications. The figure provides an overview of the toolkit components and their relationships.

Here are some key features of "HPC Toolkit":
hpcrun: a tool for profiling executions of unmodified application binaries using statistical sampling of hardware performance counters.
hpcprof & xprof: tools for interpeting sample-based execution profiles and relating them back to program source lines.
bloop: a tool for analyzing application binaries to recover program structure; namely, to identify where loops are present and what program source lines they contain.
hpcview: a tool for correlating program structure information, multiple sample-based performance profiles, and program source code to produce a performance database.
hpcviewer: a java-based GUI for exploring databases consisting of performance information correlated with program source.

A program called hpcview is at the toolkit's center. It takes performance profiles, program structure information, and, under the direction of a configuration file, correlates it with application source code to produce a browsable performance database.

hpcview also enables the user to define expressions to compute derived metrics as functions other metrics already defined (e.g. measured metrics read from data files or previously-computed derived metrics).

Performance databases are explored using our Java-based hpcviewer user interface that enables one to explore an application's performance data in a top-down fashion and enables one to easily navigate back and forth between performance data and source code.

The user interface presents performance data in a hierarchical display. At any time, you are looking at some program context (program, file, procedure, loop, or line). Also displayed is the data for both the parent and the children of the current context. Up and down arrows on the lines of the display are used to walk the hierarchy.

In order to speed up top-down analysis, the interface also provides `flatten' and `un-flatten' buttons. Their icons hint at their function. `Flatten' modifies the hierarchy by eliding non-leaf children of the current node and replacing them with the grandchildren.

Unflatten reverses this. Since the tables are sorted, the flatten operation makes short work of diving into the program from the top to identify the most important files, procedures, loops and statements.

Performance data manipulated by hpcview can come from any source, as long as the profile data can be translated or saved directly to a standard, profile-like input format. To date, the principal sources of input data for hpcview have been hardware performance counter profiles.

Such profiles are generated by setting up a hardware counter to monitor events of interest (e.g., primary cache misses), to generate a trap when the counter overflows, and then to histogram the program counter values at which these traps occur. For Linux, we developed the hpcrun tool to collect profiles by sampling hardware performance counters.

This tool uses UTK's PAPI library for access to hardware performance counters. A second tool, hpcprof is used to map profiles collected using hpcrun back to program source lines. hpcprof is based on code from Curt Janssen's cprof/vprof profiler. On operating systems other than Linux, we use vendor-supplied tools to collect profile data. On MIPS+Irix platforms, we use SGI's ssrun tool to collect profiles. On Alpha+Tru64, we use either with Compaq's uprofile or DCPI utilities for this purpose.

hpcview and hpcviewer can be used to view profile-like data of any type, not just data sampled from hardware performance counters. To analyze one program that contained many register spills, we built a perl script to examine assembly code generated by the SGI compilers for MIPS+Irix and create profiles that map register spills back to source code lines.

To facilitate automation, the programs in HPCToolkit are intended to be run using scripts and configuration files. Once these are set up, rerunning the program to collect new data, and all of the steps that go into generating a browsable dataset can be completely automated. The scripts automate the collection of data and conversion of profile data into a common, XML-based format.

Other performance tools (e.g. SGI's ssrun) report performance data at the line, procedure, and program level. However, since much of the time in scientific programs is spent in loops; having data at the loop level as well is critical to facilitate performance tuning.

For this reason, HPCToolkit includes a binary analyzer bloop that extracts loop nesting structure from application binaries and uses symbol table line map information to map this structure back to the source programs level. Because bloop works on binaries, this process is independent of the language used (though in practice it can be somewhat compiler dependent).

The loop nesting structure information produced by bloop enables hpcview to associate performance data with each loop in a program without incurring any additional overhead for data collection during program execution.

Supported platforms: Pentium+Linux, Opteron+Linux, Athlon+Linux, Itanium+Linux, Alpha+Tru64 and MIPS+Irix.

HPCToolkit is open-source software released with a BSD-like license.

HPC Toolkit 4.2.1 keywords