Introduction
Introduction Statistics Contact Development Disclaimer Help
tFurther work on converting latex doc to rst format - sphere - GPU-based 3D dis…
git clone git://src.adamsgaard.dk/sphere
Log
Files
Refs
LICENSE
---
commit 99ea811d3941526f75be48dac1cdf053d41ee933
parent 780f5a4ad608633cea970558ab0a2fad0f749f87
Author: Anders Damsgaard Christensen <[email protected]>
Date: Tue, 4 Dec 2012 21:59:47 +0100
Further work on converting latex doc to rst format
Diffstat:
M doc/sphinx/Makefile | 5 ++++-
A doc/sphinx/dem.rst | 6 ++++++
M doc/sphinx/index.rst | 309 +++++++++++++++++++++++++++++…
A doc/sphinx/introduction.rst | 14 ++++++++++++++
4 files changed, 331 insertions(+), 3 deletions(-)
---
diff --git a/doc/sphinx/Makefile b/doc/sphinx/Makefile
t@@ -16,7 +16,7 @@ I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp…
-default: html
+default: doxygen-xml html
help:
@echo "Please use \`make <target>' where <target> is one of"
t@@ -43,6 +43,9 @@ help:
clean:
-rm -rf $(BUILDDIR)/*
+doxygen-xml: ../doxygen/Makefile ../doxygen/Doxyfile
+ $(MAKE) -C ../doxygen/
+
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
diff --git a/doc/sphinx/dem.rst b/doc/sphinx/dem.rst
t@@ -0,0 +1,6 @@
+Discrete element method
+=======================
+The discrete element method (or distinct element method) was initially formula…
+
+
+
diff --git a/doc/sphinx/index.rst b/doc/sphinx/index.rst
t@@ -18,12 +18,317 @@ Contents:
:maxdepth: 2
introduction
+ dem
python_api
cpp
-Introduction
-============
+
+sphere work flow
+================
+After compiling the \texttt{SPHERE} binary (see sub-section \ref{subsec:compil…
+ \item Setup of particle assemblage, physical properties and conditions…
+ \item Execution of \texttt{SPHERE} software, which simulates the parti…
+ \item Inspection, analysis, interpretation and visualization of \textt…
+
+\subsection{The \texttt{SPHERE} algorithm}
+\label{subsec:spherealgo}
+The \texttt{SPHERE}-binary is launched from the system terminal by passing the…
+#. System check, including search for NVIDIA CUDA compatible devices (\texttt{…
+
+#. Initial data import from binary input file (\texttt{main.cpp}).
+
+#. Allocation of memory for all host variables (particles, grid, walls, etc.) …
+
+#. Continued import from binary input file (\texttt{main.cpp}).
+
+#. Control handed to GPU-specific function \texttt{gpuMain(\ldots)} (\texttt{d…
+
+#. Memory allocation of device memory (\texttt{device.cu}).
+
+#. Transfer of data from host to device variables (\texttt{device.cu}).
+
+#. Initialization of Thrust\footnote{\url{https://code.google.com/p/thrust/}} …
+
+#. Calculation of GPU workload configuration (thread and block layout) (\textt…
+
+#. Status and data written to \verb"<simulation_ID>.status.dat" and \verb"<sim…
+
+#. Main loop (while \texttt{time.current <= time.total}) (functions called in …
+
+
+ #. \label{loopstart}CUDA thread synchronization point.
+
+ #. \texttt{calcParticleCellID<<<,>>>(\ldots)}: Particle-grid hash value calc…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{thrust::sort\_by\_key(\ldots)}: Thrust radix sort of particle-gri…
+
+ #. \texttt{cudaMemset(\ldots)}: Writing zero value (\texttt{0xffffffff}) to …
+
+ #. \texttt{reorderArrays<<<,>>>(\ldots)}: Reordering of particle arrays, bas…
+
+ #. CUDA thread synchronization point.
+
+ #. Optional: \texttt{topology<<<,>>>(\ldots)}: If particle contact history i…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{interact<<<,>>>(\ldots)}: For each particle: Search of contacts i…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{integrate<<<,>>>(\ldots)}: Updating of spatial degrees of freedom…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{summation<<<,>>>(\ldots)}: Particle contributions to the net forc…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{integrateWalls<<<,>>>(\ldots)}: Updating of spatial degrees of fr…
+
+ #. Update of timers and loop-related counters (e.g. \texttt{time.current}), …
+
+ #. If file output interval is reached:
+
+ \item Optional write of data to output binary (\verb"<simulation_ID>.o…
+ \item Update of \verb"<simulation_ID>.status#..bin" (\texttt{device.cu…
+
+ \item Return to point \ref{loopstart}, unless \texttt{time.current >= ti…
+
+
+#. \label{loopend}Liberation of device memory (\texttt{device.cu}).
+
+#. Control returned to \texttt{main(\ldots)}, liberation of host memory (\text…
+
+#. End of program, return status equal to zero (0) if no problems where encoun…
+
+
+
+Sphere algorithm
+================
+The \texttt{SPHERE}-binary is launched from the system terminal by passing the…
+
+#. System check, including search for NVIDIA CUDA compatible devices (\texttt{…
+
+#. Initial data import from binary input file (\texttt{main.cpp}).
+
+#. Allocation of memory for all host variables (particles, grid, walls, etc.) …
+
+#. Continued import from binary input file (\texttt{main.cpp}).
+
+#. Control handed to GPU-specific function \texttt{gpuMain(\ldots)} (\texttt{d…
+
+#. Memory allocation of device memory (\texttt{device.cu}).
+
+#. Transfer of data from host to device variables (\texttt{device.cu}).
+
+#. Initialization of Thrust\footnote{\url{https://code.google.com/p/thrust/}} …
+
+#. Calculation of GPU workload configuration (thread and block layout) (\textt…
+
+#. Status and data written to \verb"<simulation_ID>.status.dat" and \verb"<sim…
+
+#. Main loop (while \texttt{time.current <= time.total}) (functions called in …
+
+
+ #. \label{loopstart}CUDA thread synchronization point.
+
+ #. \texttt{calcParticleCellID<<<,>>>(\ldots)}: Particle-grid hash value calc…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{thrust::sort\_by\_key(\ldots)}: Thrust radix sort of particle-gri…
+
+ #. \texttt{cudaMemset(\ldots)}: Writing zero value (\texttt{0xffffffff}) to …
+
+ #. \texttt{reorderArrays<<<,>>>(\ldots)}: Reordering of particle arrays, bas…
+
+ #. CUDA thread synchronization point.
+
+ #. Optional: \texttt{topology<<<,>>>(\ldots)}: If particle contact history i…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{interact<<<,>>>(\ldots)}: For each particle: Search of contacts i…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{integrate<<<,>>>(\ldots)}: Updating of spatial degrees of freedom…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{summation<<<,>>>(\ldots)}: Particle contributions to the net forc…
+
+ #. CUDA thread synchronization point.
+
+ #. \texttt{integrateWalls<<<,>>>(\ldots)}: Updating of spatial degrees of fr…
+
+ #. Update of timers and loop-related counters (e.g. \texttt{time.current}), …
+
+ #. If file output interval is reached:
+
+ * Optional write of data to output binary (\verb"<simulation_ID>.outpu…
+ * Update of \verb"<simulation_ID>.status#..bin" (\texttt{device.cu}).
+
+ #. Return to point \ref{loopstart}, unless \texttt{time.current >= time.tota…
+
+
+#. \label{loopend}Liberation of device memory (\texttt{device.cu}).
+
+#. Control returned to \texttt{main(\ldots)}, liberation of host memory (\text…
+
+#. End of program, return status equal to zero (0) if no problems where encoun…
+
+
+
+The length of the computational time steps (\texttt{time.dt}) is calculated vi…
+\begin{equation}
+\label{eq:dt}
+\Delta t = 0.17 \min \left( m/\max(k_n,k_t) \right)
+\end{equation}
+where $m$ is the particle mass, and $k$ are the elastic stiffnesses. This equa…
+
+\subsubsection{Host and device memory types}
+\label{subsubsec:memorytypes}
+A full, listed description of the \texttt{SPHERE} source code variables can be…
+
+The floating point precision operating internally in \texttt{SPHERE} is define…
+
+Three-dimensional variables (e.g. spatial vectors in $E^3$) are in global memo…
+
+\begin{figure}[htbp]
+\label{fig:memory}
+\begin{center}
+\begin{small}
+\begin{tikzpicture}[scale=1, node distance = 2cm, auto]
+ % Place nodes
+ \node [harddrive] (freadbin) {\textbf{Hard Drive}:\\[1mm] Input binary: \v…
+
+ \node [processor, below of=freadbin, node distance=2cm] (cpu) …
+
+ \node [harddrive, below of=cpu, node distance=2.5cm] (fwritebin) {\textbf{…
+
+ \node [mem, right of=cpu, node distance=2.5cm] (host) {\textbf…
+
+ \node [mem, right of=host, node distance=3.5cm] (textures) {\t…
+ \node [mem, above of=textures, node distance=2cm] (device) {\t…
+ \node [mem, below of=textures, node distance=2cm] (constant) {…
+
+
+ \node [processor, right of=textures, node distance=4cm] (gpu) …
+
+
+ \draw node [above of=gpu, shape aspect=1, diamond, draw, node distance…
+ \draw node [below of=gpu, shape aspect=1, diamond, draw, node distance…
+
+ \node [mem, above of=gpu, node distance=3cm] (local) {\textbf{Local re…
+
+ \node [mem, below of=gpu, node distance=3cm] (shared) {\textbf{Sha…
+
+ % Place hardware description
+ \node [above of=freadbin, node distance=1.5cm] {\large Host system};
+ %\node [above of=device, node distance=4cm] {\Large CUDA device};
+ \node [at={(7.0,1.5)}] {\large CUDA device};
+
+ \node [at={(4.0,0.8)}, rotate=90] {PCIe $\times$16 Gen2};
+ \path [draw] (4.3, 2.0) -- (4.3,-0.5);
+ \path [draw] (4.3,-3.5) -- (4.3,-6.5);
+
+ \node [at={(6.0,-6.3)}] {Off-chip};
+
+ \path [draw, gray] (8.0, 0.5) -- (8.0,-0.5);
+ \path [draw, gray] (8.0,-4.5) -- (8.0,-6.5);
+
+ \node [at={(10.0,-6.3)}] {On-chip};
+
+ % Draw lines
+ \path [draw, -latex', thick] (freadbin) -- (cpu);
+
+ \path [draw, -latex'] (cpu) -- (host);
+ \path [draw, -latex'] (host) -- (cpu);
+
+ \path [draw, -latex', dashed] (host) -- (device);
+ \path [draw, -latex', dashed] (device) -- (host);
+
+ \path [draw, -latex', dashed] (host) -- (constant);
+ \path [draw, -latex', dashed] (constant) -- (host);
+ %\path [draw, -latex'] (constant) -- (device);
+
+ \path [draw, -latex', dashed] (host) -- (textures);
+ \path [draw, -latex', dashed] (textures) -- (host);
+
+ %\path [draw, -latex', dashed] (host) -- (shared);
+ %\path [draw, -latex', dashed] (shared) -- (host);
+
+ \path [draw, -latex'] (device) -- (gpu);
+ \path [draw, -latex'] (gpu) -- (device);
+
+ \path [draw, -latex'] (textures) -- (gpu);
+ \path [draw, -latex'] (gpu) -- (textures);
+ \node [at={(7.7,-1.8)}] {\footnotesize Cached};
+ \node [at={(7.7,-2.2)}] {\footnotesize reads};
+
+ \path [draw, -latex'] (constant) -- (gpu);
+ %\path [draw, -latex'] (gpu) -- (constant);
+ \node [at={(7.7,-2.9)}, rotate=25] {\footnotesize Cached};
+ \node [at={(7.9,-3.2)}, rotate=25] {\footnotesize reads};
+
+ \path [draw, -latex'] (shared) -- (gpu);
+ \path [draw, -latex'] (gpu) -- (shared);
+ \node [at={(9.85,-3.9)}] {\footnotesize Cached reads};
+ %\node [at={(8.0,-4.2)}, rotate=45] {\footnotesize reads};
+
+ \path [draw, -latex'] (local) -- (gpu);
+ \path [draw, -latex'] (gpu) -- (local);
+
+
+ %\path [draw, -latex'] (device) -- (shared);
+
+ \path [draw, -latex', thick] (cpu) -- (fwritebin);
+
+ % Bandwith text
+ \node [at={(4.2,-2.3)}] (host-dev) {8 GB/s};
+ %\node [at={(3,-3)}] (PCIe) {(PCIe Gen2)};
+ %\node [at={(6,-2.3)}] (dev-dev) {89.6 GB/s};
+
+\end{tikzpicture}
+\end{small}
+
+\caption{Flow chart of system memory types and communication paths. RAM: Rando…
+\end{center}
+
+\end{figure}
+
+
+\paragraph{Host memory} is the main random-access computer memory (RAM), i.e. …
+
+
+\paragraph{Device memory} is the main, global device memory. It resides off-ch…
+
+\marginpar{Todo: Expand section on device memory types}
+
+\paragraph{Constant memory} values cannot be changed after they are set, and a…
+
+
+
+%\subsection{The main loop}
+%\label{subsec:mainloop}
+%The \texttt{SPHERE} software calculates particle movement and rotation based …
+
+
+\subsection{Performance}
+\marginpar{Todo: insert graph of performance vs. np and performance vs. $\Delt…
+\subsubsection{Particles and computational time}
+
+\subsection{Compilation}
+\label{subsec:compilation}
+An important note is that the \texttt{C} examples of the NVIDIA CUDA SDK shoul…
+
+\texttt{SPHERE} is supplied with several Makefiles, which automate the compila…
+
diff --git a/doc/sphinx/introduction.rst b/doc/sphinx/introduction.rst
t@@ -0,0 +1,14 @@
+Introduction
+============
+The \texttt{SPHERE}-software is used for three-dimensional discrete element me…
+
+The ultimate aim of the \texttt{SPHERE} software is to simulate soft-bedded su…
+\begin{itemize}
+ \item UNIX, Linux or Mac OS X operating system.
+ \item GCC, the GNU compiler collection.
+ \item A CUDA-enabled GPU with compute capability 1.1 or greater\footnote{See…
+ \item The CUDA Developer Drivers and the CUDA Toolkit\footnote{Obtainable fr…
+\end{itemize}
+For simulation setup and data handling, a Python distribution of a recent vers…
+
+
You are viewing proxied material from mx1.adamsgaard.dk. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.