Abstract:
Numerical models are used extensively for simulating complex physical
systems including fluid flows, astronomical events, weather, and
climate. Many researchers struggle to bring their model developments
from single-computer, interpreted languages to parallel high-performance
computing (HPC) systems. There are initiatives to make interpreted
languages such as MATLAB, Python, and Julia feasible for HPC
programming. In this talk I argue that the computational overhead
is far costlier than any potential development time saved. Instead,
doing model development in C and unix tools from the start minimizes
porting headaches between platforms, reduces energy use on all
systems, and ensures reproducibility of results.
## brcon2020 - 2020-05-02
title: Energy efficient programming in science
author: Anders Damsgaard (adc)
contact:
[email protected]
gopher://adamsgaard.dk
https://adamsgaard.dk
## About me
* 33 y/o Dane
* #bitreich-en since 2019-12-16
present:
* postdoctoral scholar at Stanford University (US)
* lecturer at Aarhus University (DK)
previous:
* Danish Environmental Protection Agency (DK)
* Scripps Institution of Oceanography (US)
* National Oceanic and Atmospheric Administration (NOAA, US)
* Princeton University (US)
#pause
academic interests:
* ice sheets, glaciers, and climate
* earthquake and landslide physics
* modeling of fluid flows and granular materials
## Numerical modeling
* numerical models used for simulating complex physical systems
* n-body simulations: planetary formation, icebergs, soil/rock mechanics
* fluid flows (CFD): aerodynamics, weather, climate
* domains and physical processes split up into small, manageable chunks
## From idea to application
1. Construct system of equations
|
v
2. Derivation of numerical algorithm
|
v
3. Prototype in high-level language
|
v
4. Re-implementation in low-level language
## From idea to application
,-----------------------------------------------.
| 1. Construct system of equations |
| |
| | |
| v | _
| | ___ | | __
| 2. Derivation of numerical algorithm | / _ \| |/ /
| | | (_) | <
| | | \___/|_|\_\
| v |
| |
| 3. Prototype in high-level language |
`-----------------------------------------------'
| _ _
v | | ___ | | __
| |/ _ \| |/ /
4. Re-implementation in low-level language |_| (_) | <
(_)\___/|_|\_\
## Numerical modeling
task: Solve partial differential equations (PDEs) by stepping through time
PDEs: conservation laws; mass, momentum, enthalpy
example: Heat diffusion through homogenous medium
∂T
-- = -k ∇²(T)
∂t
domain:
.---------------------------------------------------------------------.
| |
| T |
| |
'---------------------------------------------------------------------'
## Numerical modeling
domain: discritize into n=7 cells
.---------+---------+---------+---------+---------+---------+---------.
| | | | | | | |
| T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ |
| | | | | | | |
'---------+---------+---------+---------+---------+---------+---------'
#pause
* Numerical solution with high-level programming:
MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t)
Python: fenics.solve(lhs==rhs, heat_pde, heat_bc)
Julia: sol = solve(heat_pde, CVODE_BPF(linear_solver=:Diagonal); rel_tol, abs_tol)
(the above are not entirely equivalent, but you get the point...)
## Numerical solution: Low-level programming
example BC: outer boundaries constant temperature (T₁ & T₇)
* computing ∇²(T)
.---------+---------+---------+---------+---------+---------+---------.
| | | | | | | |
t | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ |
| | | | | | | |
'----|--\-+----|--\-+-/--|--\-+-/--|--\-+-/--|--\-+-/--|----+-/--|----'
| \ | \ / | \ / | \ / | \ / | / |
| \ | / | / | / | / | / |
| \ | / \ | / \ | / \ | / \ | / |
.----|----+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+----|----.
| | | | | | | |
t + dt | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ |
| | | | | | | |
'---------+---------+---------+---------+---------+---------+---------'
|<- dx ->|
## Numerical solution: Low-level programming
* explicit solution with central finite differences:
for (t=0.0; t<t_end; t+=dt) {
for (i=1; i<n-1; i++)
T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
tmp = T;
T = T_new;
T_new = tmp;
}
#pause
* implicit, iterative solution with central finite differences:
for (t=0.0; t<t_end; t+=dt) {
do {
for (i=1; i<n-1; i++) {
T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
r_norm_max = 0.0;
for (i=1; i<n-1; i++)
if (fabs((T_new[i] - T[i])/T[i]) > r_norm_max)
r_norm_max = fabs((T_new[i] - T[i])/T[i]);
tmp = T;
T = T_new;
T_new = tmp;
} while (r_norm_max < RTOL);
}
## HPC platforms
* Stagnation of CPU clock frequency
* Performance through massively parallel deployment (MPI, GPGPU)
* NOAA/DOE NCRC Gaea cluster
* 2x Cray XC40, "Cray Linux Environment"
* 4160 nodes, each 32 to 36 cores, 64 GB memory
* infiniband
* total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak)
## A (non-)solution
* high-level, interpreted code with extensive solver library -> low-level, compiled, parallel code
* suggested workaround: port interpreted high-level languages to HPC platforms
#pause
NO!
* high computational overhead
* many machines
* reduced performance and energy efficiency
## A better way
1. Construct system of equations
|
v
2. Derivation of numerical algorithm
|
v
3. Prototype in low-level language
|
v
4. Add parallelization for HPC
## Example: Ice-sheet flow with sediment/fluid modeling
--------------------------._____ ATMOSPHERE
-----> ```--..
ICE `-._________________ __
-----> ------> |vvvv| |vvv
_________________| |__|
-----> ,'
,' <>< OCEAN
----> / ><>
____________________________________/___________________________________
SEDIMENT -->
________________________________________________________________________
* example: granular dynamics and fluid flow simulation for glacier flow
* 90% of Antarctic ice sheet mass driven by ice flow over sediment
* need to understand ice-basal sliding in order to project sea-level rise
## Algorithm matters
sphere: git://src.adamsgaard.dk/sphere
C++, Nvidia C, cmake, Python, Paraview
massively parallel, GPGPU
detailed physics
20,191 LOC
#pause
3 month computing time on nvidia tesla k40 (2880 cores)
#pause
* gained understanding of the mechanics (what matters and what doesn't)
* simplify the physics, algorithm, and numerics
#pause
1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear
C99, makefiles, gnuplot
single threaded
simple physics
2,348 LOC
#pause
real: 0m00.07 s on laptop from 2012
#pause
...guess which one is more portable?
## Summary
for numerical simulation:
* high-level languages
* easy
* produces results quickly
* does not develop low-level programming skills
* no insight into numerical algorithm
* realistically speaking: no direct way to HPC
* low-level languages
* require low-level skills
* saves electrical energy
* directly to HPC, just sprinkle some MPI on top
## Thanks
20h && /names #bitreich-en