Efficient Code Region Characterization Through Automatic Performance Counters Reduction Using Machine Learning Techniques

Warning

This publication doesn't include Faculty of Sports Studies. It includes Institute of Computer Science. Official publication website can be found on muni.cz.
Authors

HARUTYUNYAN Suren CÉSAR Eduardo SIKORA Anna FILIPOVIČ Jiří DUTTA Akash JANNESARI Ali ALCARAZ Jordi

Year of publication 2024
Type Article in Proceedings
Conference European Conference on Parallel Processing
MU Faculty or unit

Institute of Computer Science

Citation
Web URL
Doi http://dx.doi.org/10.1007/978-3-031-69577-3_2
Keywords Performance counters; Automatic dimension reduction; machine learning ensambles; parallel region classification
Description Leveraging hardware performance counters provides valuable insights into system resource utilization, aiding performance analysis and tuning for parallel applications. The available counters vary with architecture and are collected at execution time. Their abundance and the limited number of registers for measurement make gathering laborious and costly. Efficient characterization of parallel regions necessitates a dimension reduction strategy. While recent efforts have focused on manually reducing the number of counters for specific architectures, this paper introduces a novel approach: an automatic dimension reduction technique for efficiently characterizing parallel code regions across diverse architectures. The methodology is based on Machine Learning ensembles because of their precision and ability at capturing different relationships between the input features and the target variables. Evaluation results show that ensembles can successfully reduce the number of hardware performance counters that characterize a code region. We validate our approach on CPUs using a comprehensive dataset of OpenMP regions, showing that any region can be accurately characterized by 8 relevant hardware performance counters. In addition, we also apply the proposed methodology on GPUs using a reduced set of kernels, demonstrating its effectiveness across various hardware configurations and workloads.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info