Euler
Huawei adds new compiler to OpenEuler operating system
OpenEuler Community has officially announced that the Bisheng compiler is now officially added to the OpenEuler operating system software repository. The latest software addition could be fetched for use cases.
In terms of benchmark, the new Huawei compiler for the OpenEuler operating system has version 2.1.0 and improves 24.3 percent of the hardware and software performance to run smoother and better applications.
Bisheng Compiler:
Bisheng Compiler is a high-performance, reliable, and easily extensible compiler created by Huawei Compiler Lab. It supports C/C++/Fortran and other programming languages.
The compiler also enhances and introduces a variety of compilation and optimization technologies, aiming at certain application scenarios. It is optimized especially in high-performance computing (HPC) scenarios to obtain better performance benefits.
Bisheng 2.1.0 was released on December 30 last year and the current version enhances the loop optimization, structure reorganization optimization, block reorder optimization features, improves the performance of multiple sub-items of SPEC CPU 2017 and HPC workload.
The update adds support for pow initialization immediate data fitting, mathematical function control, and other precision control options, to further enhance the precision tuning options.
This update support multi-threaded parallel programming technology and Input/output enhancements (Fortran 2003) / asynchronous IO features to meet the Kunpeng scene’s needs for the Fortran language ecology.
Features Optimized with this update:
Bisheng Compiler adopts a variety of enhanced compilation optimization techniques, including but not limited to the following optimization features:
Loop optimization:
- Including Loop Unswitching: reduce the number of executions of branch jumps.
- Loop unroll-and-jam: Improve memory and cache locality and utilization.
- Loop Fusion: Directly reuse values in other loops, exposing more instruction scheduling opportunities.
- Loop Distribution: Reduce register pressure in loops and expose more vectorization opportunities.
- Loop Unrolling: This can reduce the number of dynamic instructions and discover more optimization opportunities, such as data reuse, wider instruction scheduling, and improved vectorization Data concurrency.
Memory layout optimization
Convert Array of Structures (AoS) to Structure of Arrays (SoA), and rearrangement optimization of arrays. Through the above method, the hit rate of the Cache will be improved, thereby improving the performance of the program.
Software prefetch
By cooperating with the Kunpeng processor, the Bisheng compiler can accurately model the hardware-related characteristics, so that the compiler prefetch analysis code can accurately simulate the memory access characteristics of the Kunpeng processor, and then insert accurate prefetch instructions into the code, thereby improving the performance of the processor. Cache hit rate to improve program performance.
Automatic vectorization
Combined with the Kunpeng NEON / SVE instruction set, Bisheng Compiler enhances vector automation, converting scalar programs that perform similar operations into vectorized programs so that computer programs can use one instruction to process multiple data and improve program performance.
Autotuner
Based on the ML automatic search technology, through multiple iterations, the optimal option is found in the optimizable space, and then the target program with better performance is compiled.
- Performance – test environment:
- OS: OpenEuler 20.03
- CPU: Kunpeng 920
The Bisheng compiler development team conducted performance evaluation based on the Bisheng compiler version 2.1.0. The SPEC CPU 2017 test report showed that the Bisheng compiler 2.1.0 achieved a comprehensive score of 399 points, and the GCC 9.3.0 comprehensive score was 321 points. Under the same hardware and software environment, the performance of Bisheng compiler is 24.3% higher than that of GCC.
(via – ithome)