tAdd more content - brcon2020_adc - my presentation for brcon2020 | |
git clone git://src.adamsgaard.dk/.brcon2020_adc | |
Log | |
Files | |
Refs | |
LICENSE | |
--- | |
commit 0a73d2e077f88526c454d3157a496bbdd627e4bc | |
parent 0440b2990984378674500cd0872456946751efef | |
Author: Anders Damsgaard <[email protected]> | |
Date: Mon, 27 Apr 2020 22:16:48 +0200 | |
Add more content | |
Diffstat: | |
M brcon2020_adc.md | 181 +++++++++++++++++++----------… | |
1 file changed, 109 insertions(+), 72 deletions(-) | |
--- | |
diff --git a/brcon2020_adc.md b/brcon2020_adc.md | |
t@@ -25,46 +25,37 @@ systems, and ensures reproducibility of results. | |
## About me | |
+* 33 y/o Dane | |
+* #bitreich-en since 2019-12-16 | |
+ | |
Present: | |
-* 33 y/o Dane, linux/bsd user since 2001 | |
-* #bitreich-en since 2019-12-16 | |
-* EDITOR=vi | |
* Postdoctoral scholar at Stanford University (US) | |
* Lecturer at Aarhus University (DK) | |
-#- | |
- | |
+#pause | |
Previous: | |
+* Danish Environmental Protection Agency (DK) | |
* Scripps Institution of Oceanography (US) | |
* National Oceanic and Atmospheric Administration (NOAA, US) | |
* Princeton University (US) | |
-#- | |
- | |
+#pause | |
Academic interests: | |
* ice sheets, glaciers, and climate | |
-* earthquake physics and landslides | |
+* earthquake and landslide physics | |
* modeling of fluid flows and granular materials | |
+ | |
## Numerical modeling | |
-* models used for complex physical systems (fluid flows, astronomical events, … | |
-* domains and physical processes split up into small, manageable chunks | |
+* numerical models used for simulating complex physical systems | |
+ * n-body simulations: granular materials, gravitational interaction | |
+ * fluid flows: CFD, weather, climate | |
-Numerical models are used extensively for simulating complex physical | |
-systems including fluid flows, astronomical events, weather, and | |
-climate. Many researchers struggle to bring their model developments | |
-from single-computer, interpreted languages to parallel high-performance | |
-computing (HPC) systems. There are initiatives to make interpreted | |
-languages such as MATLAB, Python, and Julia feasible for HPC | |
-programming. In this talk I argue that the computational overhead | |
-is far costlier than any potential development time saved. Instead, | |
-doing model development in C and unix tools from the start minimizes | |
-porting headaches between platforms, reduces energy use on all | |
-systems, and ensures reproducibility of results. | |
+* domains and physical processes split up into small, manageable chunks | |
## Numerical modeling | |
t@@ -77,7 +68,8 @@ systems, and ensures reproducibility of results. | |
∂T | |
-- = -k ∇² T | |
∂t | |
-#- | |
+#pause | |
+ | |
domain: | |
.---------------------------------------------------------------------. | |
t@@ -87,7 +79,7 @@ systems, and ensures reproducibility of results. | |
'---------------------------------------------------------------------' | |
-## Numerical solution | |
+## Numerical modeling | |
task: Solve partial differential equations (PDEs) by stepping through ti… | |
PDEs: conservation laws; mass, momentum, enthalpy | |
t@@ -105,8 +97,9 @@ systems, and ensures reproducibility of results. | |
| T₁ | T₂ | T₃ | T₄ | T₅ | T₆ … | |
| | | | | | | | | |
'---------+---------+---------+---------+---------+---------+---------' | |
-#- | |
- MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t); | |
+ | |
+#pause | |
+ MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t) | |
Python: fenics.solve(lhs==rhs, heat_pde, heat_bc) | |
t@@ -147,56 +140,43 @@ systems, and ensures reproducibility of results. | |
## Numerical solution (finite differences) | |
- .---------+---------+---------+---------+---------+---------+---------. | |
- | | | | | | | | | |
- t | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ … | |
- | | | | | | | | | |
- '----|--\-+----|--\-+-/--|--\-+-/--|--\-+-/--|--\-+-/--|----+-/--|----' | |
- | \ | \ / | \ / | \ / | \ / | / | | |
- | \ | / | / | / | / | / | | |
- | \ | / \ | / \ | / \ | / \ | / | | |
- .----|----+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+----|----. | |
- | | | | | | | | | |
- t+dt | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ … | |
- | | | | | | | | | |
- '---------+---------+---------+---------+---------+---------+---------' | |
- | |
explicit solution with central finite differences: | |
for (t=0.0; t<t_end; t+=dt) { | |
for (i=1; i<n-1; i++) | |
T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt; | |
+ tmp = T; | |
+ T = T_new; | |
+ T_new = tmp; | |
} | |
iterative Jacobian solution with central finite differences: | |
for (t=0.0; t<t_end; t+=dt) { | |
- for (i=1; i<n-1; i++) | |
- T_old[i] = T[i]; | |
do { | |
for (i=1; i<n-1; i++) { | |
- dT[i] = -k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt; | |
- T_new[i] = T_old[i] + dT[i]; | |
+ T_new[i] = -k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt; | |
+ r_norm_max = 0.0; | |
+ for (i=1; i<n-1; i++) | |
+ if (fabs((T_new[i] - T[i])/T[i]) > r_norm_max) | |
+ r_norm_max = fabs((T_new[i] - T[i])/T[i]); | |
+ tmp = T; | |
+ T = T_new; | |
+ T_new = tmp; | |
} | |
} while (r_norm_max < RTOL); | |
} | |
-## HPC platforms | |
- | |
-* Free lunch is over | |
-* Parallelization is key | |
- | |
- | |
## From idea to application | |
- 1. Conceptualization | |
+ 1. Construct system of equations | |
| | |
v | |
- 2. Derivation of mathematical formulation | |
+ 2. Derivation of numerical algorithm | |
| | |
v | |
t@@ -212,12 +192,12 @@ systems, and ensures reproducibility of results. | |
## From idea to application | |
,-----------------------------------------------. | |
- | 1. Conceptualization | | |
+ | 1. Construct system of equations | | |
| | | |
| | | | |
| v | _ | |
| | ___ | | __ | |
- | 2. Derivation of mathematical formulation | / _ \| |/ / | |
+ | 2. Derivation of numerical algorithm | / _ \| |/ / | |
| | | (_) | < | |
| | | \___/|_|\_\ | |
| v | | |
t@@ -231,25 +211,39 @@ systems, and ensures reproducibility of results. | |
(_)\___/|_|\_\ | |
-# Our scientific training includes learning how to make an solid idea, | |
-# translate said idea into a set of equations, and how to implement it | |
-# in high-level programming languages | |
+% Our scientific training includes learning how to make an solid idea, | |
+% translate said idea into a set of equations, and how to implement it | |
+% in high-level programming languages | |
+ | |
+% using high-level languages: | |
+% - quick development == quick results | |
+% - loose touch with numerical workings | |
+% - develop non-transferrable skills | |
+% - code not transferrable between platforms | |
+% - use of loop structures discouraged, library calls encouraged | |
+ | |
+% using high-level languages: | |
+% - slower development == delayed results | |
+% - gain intimate familiarity with numerical workings | |
+% - develop transferrable code and skills | |
+% - high computational performance when done right | |
-# using high-level languages: | |
-# - quick development == quick results | |
-# - loose touch with numerical workings | |
-# - develop non-transferrable skills | |
-# - code not transferrable between platforms | |
-# - use of loop structures discouraged, library calls encouraged | |
+% 4. apply the new algorithm to HPC | |
-# using high-level languages: | |
-# - slower development == delayed results | |
-# - gain intimate familiarity with numerical workings | |
-# - develop transferrable code and skills | |
-# - high computational performance when done right | |
+% requires basic C programming, usually no syscalls besides file IO | |
-# requires basic C programming, usually no syscalls besides file IO | |
+## HPC platforms | |
+ | |
+* Stagnation of CPU clock frequency | |
+ | |
+* Performance through massively parallel deployment (MPI, GPGPU) | |
+ | |
+ * NOAA/NCRC Gaea cluster | |
+ * 2x Cray XC40, "Cray Linux Environment" | |
+ * 4160 nodes, each 32 to 36 cores, 64 GB memory | |
+ * infiniband | |
+ * total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak) | |
## Scaling problem | |
t@@ -258,19 +252,62 @@ New algorithms hard to implement in HPC codes | |
## A (non-)solution | |
-Port/apply high-level languages to HPC platforms | |
+* Suggested workaround: Apply interpreted high-level languages to HPC platforms | |
-high overhead on many machines -> substantially lower performance and energy e… | |
+#pause | |
+ | |
+NO! | |
+ | |
+* high computational overhead | |
+* many machines | |
+* reduced performance and energy efficiency | |
## Measuring computational energy use | |
+## Algorithm matters | |
+ | |
+* example: granular dynamics and fluid flow simulation for glacier flow | |
+ | |
+ sphere: git://src.adamsgaard.dk/sphere | |
+ C++, Nvidia C, cmake, Python, Paraview | |
+ massively parallel, GPGPU | |
+ detailed physics | |
+#pause | |
+ 3 month computing time on nvidia tesla k40 (2880 cores) | |
+ | |
+#pause | |
+* gained understanding of the mechanics (what matters and what doesn't) | |
+* simplify the physics, algorithm, and numerics | |
+ | |
+#pause | |
+ 1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear | |
+ C99, makefiles, gnuplot | |
+ single threaded | |
+ simple physics | |
+#pause | |
+ real: 0m00.07 s on potato laptop from 2012 | |
+ | |
+#pause | |
+ ...guess which one is portable? | |
+ | |
## Summary | |
-* Programming in low-level languages during prototyping can save energy and fr… | |
+for numerical simulation: | |
+ | |
+* high-level languages | |
+ * easy | |
+ * produces results quickly | |
+ * no insight into numerical algorithm | |
+ * no direct way to HPC | |
+ | |
+* low-level languages | |
+ * requires low-level skills | |
+ * saves electrical energy | |
+ * directly to HPC | |
## Thanks | |
- 20h & Freenode/#bitreich-en | |
+ 20h && /names #bitreich-en |