tFurther work on converting latex doc to rst format - sphere - GPU-based 3D dis… | |
git clone git://src.adamsgaard.dk/sphere | |
Log | |
Files | |
Refs | |
LICENSE | |
--- | |
commit 99ea811d3941526f75be48dac1cdf053d41ee933 | |
parent 780f5a4ad608633cea970558ab0a2fad0f749f87 | |
Author: Anders Damsgaard Christensen <[email protected]> | |
Date: Tue, 4 Dec 2012 21:59:47 +0100 | |
Further work on converting latex doc to rst format | |
Diffstat: | |
M doc/sphinx/Makefile | 5 ++++- | |
A doc/sphinx/dem.rst | 6 ++++++ | |
M doc/sphinx/index.rst | 309 +++++++++++++++++++++++++++++… | |
A doc/sphinx/introduction.rst | 14 ++++++++++++++ | |
4 files changed, 331 insertions(+), 3 deletions(-) | |
--- | |
diff --git a/doc/sphinx/Makefile b/doc/sphinx/Makefile | |
t@@ -16,7 +16,7 @@ I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | |
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp… | |
-default: html | |
+default: doxygen-xml html | |
help: | |
@echo "Please use \`make <target>' where <target> is one of" | |
t@@ -43,6 +43,9 @@ help: | |
clean: | |
-rm -rf $(BUILDDIR)/* | |
+doxygen-xml: ../doxygen/Makefile ../doxygen/Doxyfile | |
+ $(MAKE) -C ../doxygen/ | |
+ | |
html: | |
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | |
@echo | |
diff --git a/doc/sphinx/dem.rst b/doc/sphinx/dem.rst | |
t@@ -0,0 +1,6 @@ | |
+Discrete element method | |
+======================= | |
+The discrete element method (or distinct element method) was initially formula… | |
+ | |
+ | |
+ | |
diff --git a/doc/sphinx/index.rst b/doc/sphinx/index.rst | |
t@@ -18,12 +18,317 @@ Contents: | |
:maxdepth: 2 | |
introduction | |
+ dem | |
python_api | |
cpp | |
-Introduction | |
-============ | |
+ | |
+sphere work flow | |
+================ | |
+After compiling the \texttt{SPHERE} binary (see sub-section \ref{subsec:compil… | |
+ \item Setup of particle assemblage, physical properties and conditions… | |
+ \item Execution of \texttt{SPHERE} software, which simulates the parti… | |
+ \item Inspection, analysis, interpretation and visualization of \textt… | |
+ | |
+\subsection{The \texttt{SPHERE} algorithm} | |
+\label{subsec:spherealgo} | |
+The \texttt{SPHERE}-binary is launched from the system terminal by passing the… | |
+#. System check, including search for NVIDIA CUDA compatible devices (\texttt{… | |
+ | |
+#. Initial data import from binary input file (\texttt{main.cpp}). | |
+ | |
+#. Allocation of memory for all host variables (particles, grid, walls, etc.) … | |
+ | |
+#. Continued import from binary input file (\texttt{main.cpp}). | |
+ | |
+#. Control handed to GPU-specific function \texttt{gpuMain(\ldots)} (\texttt{d… | |
+ | |
+#. Memory allocation of device memory (\texttt{device.cu}). | |
+ | |
+#. Transfer of data from host to device variables (\texttt{device.cu}). | |
+ | |
+#. Initialization of Thrust\footnote{\url{https://code.google.com/p/thrust/}} … | |
+ | |
+#. Calculation of GPU workload configuration (thread and block layout) (\textt… | |
+ | |
+#. Status and data written to \verb"<simulation_ID>.status.dat" and \verb"<sim… | |
+ | |
+#. Main loop (while \texttt{time.current <= time.total}) (functions called in … | |
+ | |
+ | |
+ #. \label{loopstart}CUDA thread synchronization point. | |
+ | |
+ #. \texttt{calcParticleCellID<<<,>>>(\ldots)}: Particle-grid hash value calc… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{thrust::sort\_by\_key(\ldots)}: Thrust radix sort of particle-gri… | |
+ | |
+ #. \texttt{cudaMemset(\ldots)}: Writing zero value (\texttt{0xffffffff}) to … | |
+ | |
+ #. \texttt{reorderArrays<<<,>>>(\ldots)}: Reordering of particle arrays, bas… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. Optional: \texttt{topology<<<,>>>(\ldots)}: If particle contact history i… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{interact<<<,>>>(\ldots)}: For each particle: Search of contacts i… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{integrate<<<,>>>(\ldots)}: Updating of spatial degrees of freedom… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{summation<<<,>>>(\ldots)}: Particle contributions to the net forc… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{integrateWalls<<<,>>>(\ldots)}: Updating of spatial degrees of fr… | |
+ | |
+ #. Update of timers and loop-related counters (e.g. \texttt{time.current}), … | |
+ | |
+ #. If file output interval is reached: | |
+ | |
+ \item Optional write of data to output binary (\verb"<simulation_ID>.o… | |
+ \item Update of \verb"<simulation_ID>.status#..bin" (\texttt{device.cu… | |
+ | |
+ \item Return to point \ref{loopstart}, unless \texttt{time.current >= ti… | |
+ | |
+ | |
+#. \label{loopend}Liberation of device memory (\texttt{device.cu}). | |
+ | |
+#. Control returned to \texttt{main(\ldots)}, liberation of host memory (\text… | |
+ | |
+#. End of program, return status equal to zero (0) if no problems where encoun… | |
+ | |
+ | |
+ | |
+Sphere algorithm | |
+================ | |
+The \texttt{SPHERE}-binary is launched from the system terminal by passing the… | |
+ | |
+#. System check, including search for NVIDIA CUDA compatible devices (\texttt{… | |
+ | |
+#. Initial data import from binary input file (\texttt{main.cpp}). | |
+ | |
+#. Allocation of memory for all host variables (particles, grid, walls, etc.) … | |
+ | |
+#. Continued import from binary input file (\texttt{main.cpp}). | |
+ | |
+#. Control handed to GPU-specific function \texttt{gpuMain(\ldots)} (\texttt{d… | |
+ | |
+#. Memory allocation of device memory (\texttt{device.cu}). | |
+ | |
+#. Transfer of data from host to device variables (\texttt{device.cu}). | |
+ | |
+#. Initialization of Thrust\footnote{\url{https://code.google.com/p/thrust/}} … | |
+ | |
+#. Calculation of GPU workload configuration (thread and block layout) (\textt… | |
+ | |
+#. Status and data written to \verb"<simulation_ID>.status.dat" and \verb"<sim… | |
+ | |
+#. Main loop (while \texttt{time.current <= time.total}) (functions called in … | |
+ | |
+ | |
+ #. \label{loopstart}CUDA thread synchronization point. | |
+ | |
+ #. \texttt{calcParticleCellID<<<,>>>(\ldots)}: Particle-grid hash value calc… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{thrust::sort\_by\_key(\ldots)}: Thrust radix sort of particle-gri… | |
+ | |
+ #. \texttt{cudaMemset(\ldots)}: Writing zero value (\texttt{0xffffffff}) to … | |
+ | |
+ #. \texttt{reorderArrays<<<,>>>(\ldots)}: Reordering of particle arrays, bas… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. Optional: \texttt{topology<<<,>>>(\ldots)}: If particle contact history i… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{interact<<<,>>>(\ldots)}: For each particle: Search of contacts i… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{integrate<<<,>>>(\ldots)}: Updating of spatial degrees of freedom… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{summation<<<,>>>(\ldots)}: Particle contributions to the net forc… | |
+ | |
+ #. CUDA thread synchronization point. | |
+ | |
+ #. \texttt{integrateWalls<<<,>>>(\ldots)}: Updating of spatial degrees of fr… | |
+ | |
+ #. Update of timers and loop-related counters (e.g. \texttt{time.current}), … | |
+ | |
+ #. If file output interval is reached: | |
+ | |
+ * Optional write of data to output binary (\verb"<simulation_ID>.outpu… | |
+ * Update of \verb"<simulation_ID>.status#..bin" (\texttt{device.cu}). | |
+ | |
+ #. Return to point \ref{loopstart}, unless \texttt{time.current >= time.tota… | |
+ | |
+ | |
+#. \label{loopend}Liberation of device memory (\texttt{device.cu}). | |
+ | |
+#. Control returned to \texttt{main(\ldots)}, liberation of host memory (\text… | |
+ | |
+#. End of program, return status equal to zero (0) if no problems where encoun… | |
+ | |
+ | |
+ | |
+The length of the computational time steps (\texttt{time.dt}) is calculated vi… | |
+\begin{equation} | |
+\label{eq:dt} | |
+\Delta t = 0.17 \min \left( m/\max(k_n,k_t) \right) | |
+\end{equation} | |
+where $m$ is the particle mass, and $k$ are the elastic stiffnesses. This equa… | |
+ | |
+\subsubsection{Host and device memory types} | |
+\label{subsubsec:memorytypes} | |
+A full, listed description of the \texttt{SPHERE} source code variables can be… | |
+ | |
+The floating point precision operating internally in \texttt{SPHERE} is define… | |
+ | |
+Three-dimensional variables (e.g. spatial vectors in $E^3$) are in global memo… | |
+ | |
+\begin{figure}[htbp] | |
+\label{fig:memory} | |
+\begin{center} | |
+\begin{small} | |
+\begin{tikzpicture}[scale=1, node distance = 2cm, auto] | |
+ % Place nodes | |
+ \node [harddrive] (freadbin) {\textbf{Hard Drive}:\\[1mm] Input binary: \v… | |
+ | |
+ \node [processor, below of=freadbin, node distance=2cm] (cpu) … | |
+ | |
+ \node [harddrive, below of=cpu, node distance=2.5cm] (fwritebin) {\textbf{… | |
+ | |
+ \node [mem, right of=cpu, node distance=2.5cm] (host) {\textbf… | |
+ | |
+ \node [mem, right of=host, node distance=3.5cm] (textures) {\t… | |
+ \node [mem, above of=textures, node distance=2cm] (device) {\t… | |
+ \node [mem, below of=textures, node distance=2cm] (constant) {… | |
+ | |
+ | |
+ \node [processor, right of=textures, node distance=4cm] (gpu) … | |
+ | |
+ | |
+ \draw node [above of=gpu, shape aspect=1, diamond, draw, node distance… | |
+ \draw node [below of=gpu, shape aspect=1, diamond, draw, node distance… | |
+ | |
+ \node [mem, above of=gpu, node distance=3cm] (local) {\textbf{Local re… | |
+ | |
+ \node [mem, below of=gpu, node distance=3cm] (shared) {\textbf{Sha… | |
+ | |
+ % Place hardware description | |
+ \node [above of=freadbin, node distance=1.5cm] {\large Host system}; | |
+ %\node [above of=device, node distance=4cm] {\Large CUDA device}; | |
+ \node [at={(7.0,1.5)}] {\large CUDA device}; | |
+ | |
+ \node [at={(4.0,0.8)}, rotate=90] {PCIe $\times$16 Gen2}; | |
+ \path [draw] (4.3, 2.0) -- (4.3,-0.5); | |
+ \path [draw] (4.3,-3.5) -- (4.3,-6.5); | |
+ | |
+ \node [at={(6.0,-6.3)}] {Off-chip}; | |
+ | |
+ \path [draw, gray] (8.0, 0.5) -- (8.0,-0.5); | |
+ \path [draw, gray] (8.0,-4.5) -- (8.0,-6.5); | |
+ | |
+ \node [at={(10.0,-6.3)}] {On-chip}; | |
+ | |
+ % Draw lines | |
+ \path [draw, -latex', thick] (freadbin) -- (cpu); | |
+ | |
+ \path [draw, -latex'] (cpu) -- (host); | |
+ \path [draw, -latex'] (host) -- (cpu); | |
+ | |
+ \path [draw, -latex', dashed] (host) -- (device); | |
+ \path [draw, -latex', dashed] (device) -- (host); | |
+ | |
+ \path [draw, -latex', dashed] (host) -- (constant); | |
+ \path [draw, -latex', dashed] (constant) -- (host); | |
+ %\path [draw, -latex'] (constant) -- (device); | |
+ | |
+ \path [draw, -latex', dashed] (host) -- (textures); | |
+ \path [draw, -latex', dashed] (textures) -- (host); | |
+ | |
+ %\path [draw, -latex', dashed] (host) -- (shared); | |
+ %\path [draw, -latex', dashed] (shared) -- (host); | |
+ | |
+ \path [draw, -latex'] (device) -- (gpu); | |
+ \path [draw, -latex'] (gpu) -- (device); | |
+ | |
+ \path [draw, -latex'] (textures) -- (gpu); | |
+ \path [draw, -latex'] (gpu) -- (textures); | |
+ \node [at={(7.7,-1.8)}] {\footnotesize Cached}; | |
+ \node [at={(7.7,-2.2)}] {\footnotesize reads}; | |
+ | |
+ \path [draw, -latex'] (constant) -- (gpu); | |
+ %\path [draw, -latex'] (gpu) -- (constant); | |
+ \node [at={(7.7,-2.9)}, rotate=25] {\footnotesize Cached}; | |
+ \node [at={(7.9,-3.2)}, rotate=25] {\footnotesize reads}; | |
+ | |
+ \path [draw, -latex'] (shared) -- (gpu); | |
+ \path [draw, -latex'] (gpu) -- (shared); | |
+ \node [at={(9.85,-3.9)}] {\footnotesize Cached reads}; | |
+ %\node [at={(8.0,-4.2)}, rotate=45] {\footnotesize reads}; | |
+ | |
+ \path [draw, -latex'] (local) -- (gpu); | |
+ \path [draw, -latex'] (gpu) -- (local); | |
+ | |
+ | |
+ %\path [draw, -latex'] (device) -- (shared); | |
+ | |
+ \path [draw, -latex', thick] (cpu) -- (fwritebin); | |
+ | |
+ % Bandwith text | |
+ \node [at={(4.2,-2.3)}] (host-dev) {8 GB/s}; | |
+ %\node [at={(3,-3)}] (PCIe) {(PCIe Gen2)}; | |
+ %\node [at={(6,-2.3)}] (dev-dev) {89.6 GB/s}; | |
+ | |
+\end{tikzpicture} | |
+\end{small} | |
+ | |
+\caption{Flow chart of system memory types and communication paths. RAM: Rando… | |
+\end{center} | |
+ | |
+\end{figure} | |
+ | |
+ | |
+\paragraph{Host memory} is the main random-access computer memory (RAM), i.e. … | |
+ | |
+ | |
+\paragraph{Device memory} is the main, global device memory. It resides off-ch… | |
+ | |
+\marginpar{Todo: Expand section on device memory types} | |
+ | |
+\paragraph{Constant memory} values cannot be changed after they are set, and a… | |
+ | |
+ | |
+ | |
+%\subsection{The main loop} | |
+%\label{subsec:mainloop} | |
+%The \texttt{SPHERE} software calculates particle movement and rotation based … | |
+ | |
+ | |
+\subsection{Performance} | |
+\marginpar{Todo: insert graph of performance vs. np and performance vs. $\Delt… | |
+\subsubsection{Particles and computational time} | |
+ | |
+\subsection{Compilation} | |
+\label{subsec:compilation} | |
+An important note is that the \texttt{C} examples of the NVIDIA CUDA SDK shoul… | |
+ | |
+\texttt{SPHERE} is supplied with several Makefiles, which automate the compila… | |
+ | |
diff --git a/doc/sphinx/introduction.rst b/doc/sphinx/introduction.rst | |
t@@ -0,0 +1,14 @@ | |
+Introduction | |
+============ | |
+The \texttt{SPHERE}-software is used for three-dimensional discrete element me… | |
+ | |
+The ultimate aim of the \texttt{SPHERE} software is to simulate soft-bedded su… | |
+\begin{itemize} | |
+ \item UNIX, Linux or Mac OS X operating system. | |
+ \item GCC, the GNU compiler collection. | |
+ \item A CUDA-enabled GPU with compute capability 1.1 or greater\footnote{See… | |
+ \item The CUDA Developer Drivers and the CUDA Toolkit\footnote{Obtainable fr… | |
+\end{itemize} | |
+For simulation setup and data handling, a Python distribution of a recent vers… | |
+ | |
+ |