Flang Vs. Gfortran: LAPACK Test Discrepancy

by Alex Johnson 44 views

This article delves into a peculiar discrepancy observed during LAPACK (Linear Algebra PACKage) testing: the Flang-21 compiler failed four tests, while the Gfortran-15 compiler achieved a perfect pass rate. This difference, although seemingly minor given the extensive test suite, raises important questions about compiler optimization strategies and numerical accuracy, which we will explore in depth.

Initial Observations and Setup

The initial report highlights a situation where, using seemingly similar optimization flags, Flang-21 reports a few failures within the LAPACK test suite, while Gfortran-15 sails through with no errors. The environment configurations are as follows:

For Flang-21:

CC = clang
CFLAGS = -O3
FC = flang-21
FFLAGS = -O2
FFLAGS_DRV = $(FFLAGS)
FFLAGS_NOOPT = -O0

For Gfortran-15:

CC = gcc
CFLAGS = -O3
FC = gfortran
FFLAGS = -O2 -frecursive
FFLAGS_DRV = $(FFLAGS)
FFLAGS_NOOPT = -O0 -frecursive

The key difference in the setup is the use of -frecursive flag for Gfortran, which allows Fortran subroutines to call themselves recursively. This is included due to the specific needs of the codebase being compiled. The compilation process involves navigating to the LAPACK directory and running make lapack_testing -j to initiate the tests.

Detailed Test Results

Upon executing the tests, the following results were observed:

Flang-21 Results:

--> LAPACK TESTING SUMMARY  <--
 Processing LAPACK Testing output found in the TESTING directory
SUMMARY nb test run numerical error other error
================ =========== ================= =================
REAL 1569648 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1568910 1 (0.000%) 1 (0.000%)
COMPLEX 1020850 2 (0.000%) 0 (0.000%)
COMPLEX16 1030797 0 (0.000%) 0 (0.000%)

--> ALL PRECISIONS 5190205 3 (0.000%) 1 (0.000%)

Gfortran-15 Results:

--> LAPACK TESTING SUMMARY  <--
 Processing LAPACK Testing output found in the TESTING directory
SUMMARY nb test run numerical error other error
================ =========== ================= =================
REAL 1569648 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1570470 0 (0.000%) 0 (0.000%)
COMPLEX 980455 0 (0.000%) 0 (0.000%)
COMPLEX16 1030797 0 (0.000%) 0 (0.000%)

--> ALL PRECISIONS 5151370 0 (0.000%) 0 (0.000%)

The Flang-21 results show a few numerical errors in DOUBLE PRECISION and COMPLEX tests, while Gfortran-15 reports zero errors across all precisions. This is where the investigation begins.

Why the Discrepancy? Potential Causes

Several factors could contribute to these discrepancies between Flang-21 and Gfortran-15 when running the LAPACK test suite.

Compiler Optimizations

Compiler optimizations can significantly impact the numerical results of floating-point computations. Different compilers employ varying optimization techniques, some of which might alter the order of operations or introduce approximations that lead to slight variations in the final results. The -O2 and -O3 flags used here instruct the compilers to optimize the code for speed, but these optimizations can sometimes compromise numerical accuracy, especially in computationally intensive tasks like those found in LAPACK.

Numerical Accuracy and Floating-Point Arithmetic

Numerical accuracy is paramount in linear algebra libraries like LAPACK. The IEEE 754 standard defines how floating-point numbers should be represented and how arithmetic operations should be performed. However, even with this standard, differences in how compilers handle floating-point operations can lead to variations in results. For instance, some compilers might use extended precision internally for intermediate calculations, which can affect the final outcome.

Differences in Compiler Implementation

Compiler implementation details can play a crucial role. Flang and Gfortran are built by different teams with different approaches to code generation and optimization. These differences can manifest as variations in how the compilers interpret and execute the Fortran code in LAPACK. The way a compiler handles loops, array indexing, or function calls can all influence the numerical results.

Library Versions and Dependencies

Library versions and dependencies could also be a factor. LAPACK relies on other libraries, such as BLAS (Basic Linear Algebra Subprograms), for low-level computations. If Flang and Gfortran are linked against different versions of these libraries, it could introduce discrepancies. Ensuring that both compilers are using the same versions of BLAS and other dependencies is essential for a fair comparison.

Test Suite Sensitivity

Test suite sensitivity is another aspect to consider. Some tests might be designed to push the limits of numerical precision, making them highly sensitive to even minor variations in computation. If the failing tests in Flang-21 fall into this category, it might indicate that the compiler is slightly less robust in handling edge cases.

Investigating the Issue

To pinpoint the exact cause of the discrepancy, a systematic investigation is necessary. Here’s a potential approach:

  1. Isolate the Failing Tests: Identify the specific LAPACK tests that are failing in Flang-21. This will help narrow down the scope of the investigation.
  2. Examine the Input Data: Analyze the input data used in the failing tests to see if there are any patterns or characteristics that might be causing issues.
  3. Reduce Optimization Levels: Recompile the code with lower optimization levels (e.g., -O1 or -O0) to see if the failures disappear. This can help determine if the compiler optimizations are the culprit.
  4. Compare Generated Code: If possible, compare the assembly code generated by Flang-21 and Gfortran-15 for the failing tests. This can reveal differences in how the compilers are handling the computations.
  5. Check Floating-Point Settings: Ensure that both compilers are using the same floating-point settings. Flags like -ffloat-store (for Gfortran) can affect how floating-point numbers are stored and handled.
  6. Update Compilers and Libraries: Use the latest versions of Flang, Gfortran, LAPACK, and BLAS to rule out any bugs or issues that have been fixed in newer releases.
  7. Consult Upstream: Before making any drastic changes, check with the development teams of Flang and LAPACK to see if they are aware of any known issues or recommended workarounds.

Practical Implications

While a few failing tests out of millions might seem insignificant, they can have practical implications in real-world applications. In scientific computing and engineering, where precision is critical, even small numerical errors can propagate and lead to incorrect results. Therefore, it’s essential to understand the limitations of the tools being used and to validate the results against known benchmarks.

Conclusion

The discrepancy between Flang-21 and Gfortran-15 in the LAPACK test suite highlights the complexities of compiler optimization and numerical accuracy. While Gfortran-15's perfect score inspires confidence, Flang-21's few failures serve as a reminder of the need for careful validation and testing. By systematically investigating the causes of these failures, developers can gain insights into the behavior of their compilers and ensure the reliability of their numerical computations. Further investigation, including those suggested, is necessary to determine if these small set of errors represents a systemic failure in certain classes of problems, or an anomaly. Understanding these differences is crucial for developers and researchers who rely on these tools for their work. By addressing these issues, we can continue to improve the reliability and accuracy of scientific computing.

For more information on LAPACK and its testing procedures, visit the Netlib LAPACK page. This site provides comprehensive documentation, software, and resources for linear algebra computations.