Abstract:
Numerical models are used extensively for simulating complex physical
systems including fluid flows, astronomical events, weather, and
climate.  Many researchers struggle to bring their model developments
from single-computer, interpreted languages to parallel high-performance
computing (HPC) systems.  There are initiatives to make interpreted
languages such as MATLAB, Python, and Julia feasible for HPC
programming.  In this talk I argue that the computational overhead
is far costlier than any potential development time saved.  Instead,
doing model development in C and unix tools from the start minimizes
porting headaches between platforms, reduces energy use on all
systems, and ensures reproducibility of results.


## brcon2020 - 2020-05-02

       title: Energy efficient programming in science

      author: Anders Damsgaard (adc)

     contact: [email protected]
              gopher://adamsgaard.dk
              https://adamsgaard.dk


## About me

* 33 y/o Dane
* #bitreich-en since 2019-12-16

present:

* postdoctoral scholar at Stanford University (US)
* lecturer at Aarhus University (DK)

previous:

* Danish Environmental Protection Agency (DK)
* Scripps Institution of Oceanography (US)
* National Oceanic and Atmospheric Administration (NOAA, US)
* Princeton University (US)

#pause
academic interests:

* ice sheets, glaciers, and climate
* earthquake and landslide physics
* modeling of fluid flows and granular materials


## Numerical modeling

* numerical models used for simulating complex physical systems

 * n-body simulations: planetary formation, icebergs, soil/rock mechanics

 * fluid flows (CFD): aerodynamics, weather, climate


* domains and physical processes split up into small, manageable chunks


## From idea to application


   1. Construct system of equations

     |
     v

   2. Derivation of numerical algorithm

     |
     v

   3. Prototype in high-level language

     |
     v

   4. Re-implementation in low-level language


## From idea to application

,-----------------------------------------------.
|  1. Construct system of equations             |
|                                               |
|    |                                          |
|    v                                          |          _
|                                               |     ___ | | __
|  2. Derivation of numerical algorithm         |    / _ \| |/ /
|                                               |   | (_) |   <
|    |                                          |    \___/|_|\_\
|    v                                          |
|                                               |
|  3. Prototype in high-level language          |
`-----------------------------------------------'
     |                                              _       _
     v                                             | | ___ | | __
                                                   | |/ _ \| |/ /
   4. Re-implementation in low-level language      |_| (_) |   <
                                                   (_)\___/|_|\_\


## Numerical modeling

     task: Solve partial differential equations (PDEs) by stepping through time
           PDEs: conservation laws; mass, momentum, enthalpy

  example: Heat diffusion through homogenous medium

           ∂T
           -- = -k ∇²(T)
           ∂t

   domain:

      .---------------------------------------------------------------------.
      |                                                                     |
      |                                  T                                  |
      |                                                                     |
      '---------------------------------------------------------------------'

## Numerical modeling

   domain: discritize into n=7 cells

      .---------+---------+---------+---------+---------+---------+---------.
      |         |         |         |         |         |         |         |
      |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
      |         |         |         |         |         |         |         |
      '---------+---------+---------+---------+---------+---------+---------'

#pause
* Numerical solution with high-level programming:

   MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t)

   Python: fenics.solve(lhs==rhs, heat_pde, heat_bc)

    Julia: sol = solve(heat_pde, CVODE_BPF(linear_solver=:Diagonal); rel_tol, abs_tol)

       (the above are not entirely equivalent, but you get the point...)


## Numerical solution: Low-level programming

   example BC: outer boundaries constant temperature (T₁ & T₇)

* computing ∇²(T)

      .---------+---------+---------+---------+---------+---------+---------.
      |         |         |         |         |         |         |         |
 t    |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
      |         |         |         |         |         |         |         |
      '----|--\-+----|--\-+-/--|--\-+-/--|--\-+-/--|--\-+-/--|----+-/--|----'
           |   \     |   \ /   |   \ /   |   \ /   |   \ /   |     /   |
           |    \    |    /    |    /    |    /    |    /    |    /    |
           |     \   |   / \   |   / \   |   / \   |   / \   |   /     |
      .----|----+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+----|----.
      |         |         |         |         |         |         |         |
t + dt |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
      |         |         |         |         |         |         |         |
      '---------+---------+---------+---------+---------+---------+---------'
      |<- dx  ->|


## Numerical solution: Low-level programming

* explicit solution with central finite differences:

       for (t=0.0; t<t_end; t+=dt) {
           for (i=1; i<n-1; i++)
               T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
           tmp = T;
           T = T_new;
           T_new = tmp;
       }
#pause

* implicit, iterative solution with central finite differences:

       for (t=0.0; t<t_end; t+=dt) {
           do {
               for (i=1; i<n-1; i++) {
                   T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
               r_norm_max = 0.0;
               for (i=1; i<n-1; i++)
                   if (fabs((T_new[i] - T[i])/T[i]) > r_norm_max)
                       r_norm_max = fabs((T_new[i] - T[i])/T[i]);
               tmp = T;
               T = T_new;
               T_new = tmp;
           } while (r_norm_max < RTOL);
       }


## HPC platforms

* Stagnation of CPU clock frequency

* Performance through massively parallel deployment (MPI, GPGPU)

   * NOAA/DOE NCRC Gaea cluster
       * 2x Cray XC40, "Cray Linux Environment"
       * 4160 nodes, each 32 to 36 cores, 64 GB memory
       * infiniband
       * total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak)

## A (non-)solution

* high-level, interpreted code with extensive solver library -> low-level, compiled, parallel code

* suggested workaround: port interpreted high-level languages to HPC platforms

#pause

NO!

* high computational overhead
* many machines
* reduced performance and energy efficiency


## A better way

   1. Construct system of equations

     |
     v

   2. Derivation of numerical algorithm

     |
     v

   3. Prototype in low-level language

     |
     v

   4. Add parallelization for HPC


## Example: Ice-sheet flow with sediment/fluid modeling


   --------------------------._____                   ATMOSPHERE
               ----->              ```--..
       ICE                                 `-._________________      __
               ----->                             ------>      |vvvv|  |vvv
                                              _________________|    |__|
               ----->                      ,'
                                         ,'    <><      OCEAN
               ---->                    /                        ><>
   ____________________________________/___________________________________
     SEDIMENT  -->
   ________________________________________________________________________

* example: granular dynamics and fluid flow simulation for glacier flow

* 90% of Antarctic ice sheet mass driven by ice flow over sediment

* need to understand ice-basal sliding in order to project sea-level rise


## Algorithm matters

               sphere: git://src.adamsgaard.dk/sphere
                       C++, Nvidia C, cmake, Python, Paraview
                       massively parallel, GPGPU
                       detailed physics
                       20,191 LOC
#pause
                       3 month computing time on nvidia tesla k40 (2880 cores)

#pause
* gained understanding of the mechanics (what matters and what doesn't)
* simplify the physics, algorithm, and numerics

#pause
   1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear
                       C99, makefiles, gnuplot
                       single threaded
                       simple physics
                       2,348 LOC
#pause
                       real: 0m00.07 s on laptop from 2012

#pause
                       ...guess which one is more portable?

## Summary

for numerical simulation:

* high-level languages
   * easy
   * produces results quickly
   * does not develop low-level programming skills
   * no insight into numerical algorithm
   * realistically speaking: no direct way to HPC

* low-level languages
   * require low-level skills
   * saves electrical energy
   * directly to HPC, just sprinkle some MPI on top


## Thanks

   20h && /names #bitreich-en