Habakkuk

Fortran code analysis for performance prediction

Download as .zip Download as .tar.gz View on GitHub

habakkuk

Introduction

Habakkuk is a tool that performs analysis of Fortran code fragments for the purposes of performance prediction. Habakkuk is built-around the Fortran parser developed in the f2py project. It parses Fortran source code and uses the resulting Abstract Syntax Tree to compute various metrics and construct a Directed Acyclic Graph showing data-flow through the code.

For example, processing the following Fortran code fragment:

Example DAG

produces the following graph:

Example DAG

Getting started

You will need the 'git' revision control system installed. You can then clone the repository to your local machine:

git clone https://github.com/arporter/habakkuk.git

You will then need to set your python path like so:

cd habakkuk
export PYTHONPATH=${PWD}/f2py_93:${PWD}/src

Having done this you should be all set to try the tool on some Fortran code. There are various examples in src/tests/test_files. The tool may be run like so:

cd habakkuk/src
python ./make_dag.py tests/test_files/triple_product.f90

You should then see output similar to the following:

Wrote DAG to test_triple_product.gv
Stats for DAG test_triple_product:
  0 addition operators.
  0 subtraction operators.
  2 multiplication operators.
  0 division operators.
  0 fused multiply-adds.
  2 FLOPs in total.
  0 array references.
  0 distinct cache-line references.
  Did not find any array/memory references
  Whole DAG in serial:
    Sum of cost of all nodes = 2 (cycles)
    2 FLOPs in 2 cycles => 1.0000*CLOCK_SPEED FLOPS
  Everything in parallel to Critical path:
    Critical path contains 4 nodes, 2 FLOPs and is 2 cycles long
    FLOPS (ignoring memory accesses) = 1.0000*CLOCK_SPEED
Wrote DAG to test_triple_product_step0.gv
Wrote DAG to test_triple_product_step1.gv
Wrote DAG to test_triple_product_step2.gv
Schedule contains 2 steps:
0 * None (cost = 1)
1 * None (cost = 1)
  Estimate using computed schedule:
    Cost of schedule as a whole = 2 cycles
    FLOPS from schedule (ignoring memory accesses) = 1.0000*CLOCK_SPEED
  Estimate using perfect schedule:
    Cost if all ops on different execution ports are perfectly overlapped = 2 cycles
  e.g. at 3.85 GHz, these different estimates give (GFLOPS): 
  No ILP  |  Computed Schedule  |  Perfect Schedule | Critical path
   3.85   |          3.85       |        3.85       |    3.85
No opportunities to fuse multiply-adds

The tool produces a Directed Acyclic Graph (DAG) for the body of the inner-most loop of every loop-nest it encounters. If a routine (or main program unit) contains no loops then a DAG is generated for the executable part of that routine. Each DAG is written to file in the dot language (e.g. test_triple_product.gv in the above example). If you have dot installed (part of the graphviz package) then you can process these files to produce an image of the DAG, e.g.:

cat triple_product_test.gv | dot -Tpng > triple.png