![]() |
Dendro
5.01
Dendro in Greek language means tree. The Dendro library is a large scale (262K cores on ORNL's Titan) distributed memory adaptive octree framework. The main goal of Dendro is to perform large scale multiphysics simulations efficeiently in mordern supercomputers. Dendro consists of efficient parallel data structures and algorithms to perform variational ( finite element) methods and finite difference mthods on 2:1 balanced arbitary adaptive octrees which enables the users to perform simulations raning from black holes (binary black hole mergers) to blood flow in human body, where applications ranging from relativity, astrophysics to biomedical engineering.
|
Contains utility function for the host related to GPUs. More...
Classes | |
class | _Block |
Functions | |
cudaDeviceProp * | getGPUDeviceInfo (unsigned int device) |
send device information to the gpu More... | |
template<typename T > | |
T * | copyArrayToDevice (const T *in, unsigned int numElems) |
template<typename T > | |
T * | copyValueToDevice (const T *in) |
template<typename T > | |
T * | alloc1DCudaArray (unsigned int sz1) |
template<typename T > | |
T ** | alloc2DCudaArray (unsigned int sz1, unsigned int sz2) |
allocates a 2D cuda array on the device. More... | |
template<typename T > | |
T ** | alloc2DCudaArray (T **&hostPtr, unsigned int sz1, unsigned int sz2) |
allocates a 2D cuda array on the device. More... | |
template<typename T > | |
T ** | alloc2DCudaArray (const T **in, unsigned int sz1, unsigned int sz2) |
allocates a 2D cuda array on the device and copy data. More... | |
template<typename T > | |
void | copyArrayToDeviceAsync (const T *in, T *__deviceptr, unsigned int numElems, const cudaStream_t stream) |
template<typename T > | |
void | copy2DCudaArrayToDeviceAsync (const T **in, T **__devicePtr, unsigned int sz1, unsigned int sz2, const cudaStream_t stream) |
allocates a 2D cuda array on the device and copy data. More... | |
template<typename T > | |
void | copyArrayToHostAsync (T *host_ptr, const T *__deviceptr, unsigned int numElems, const cudaStream_t stream) |
template<typename T > | |
void | dealloc2DCudaArray (T **&__array2D, unsigned int sz1) |
deallocates the 2D cuda array. More... | |
void | computeDendroBlockToGPUMap (const ot::Block *blkList, unsigned int numBlocks, unsigned int *&blockMap, dim3 &gridDim) |
template<typename T > | |
void | copyArrayToHost (T *host_ptr, const T *__device_ptr, unsigned int numElems) |
template<typename T > | |
void | copy2DArrayToHost (T **host_ptr, const T **__device_ptr, unsigned int sz1, unsigned int sz2) |
Contains utility function for the host related to GPUs.
T * cuda::alloc1DCudaArray | ( | unsigned int | sz1 | ) |
allocates 1D array
[in] | sz1 |
T ** cuda::alloc2DCudaArray | ( | unsigned int | sz1, |
unsigned int | sz2 | ||
) |
allocates a 2D cuda array on the device.
[in] | sz1 | dim 1 size |
[in] | sz2 | dim 2 size |
T ** cuda::alloc2DCudaArray | ( | T **& | hostPtr, |
unsigned int | sz1, | ||
unsigned int | sz2 | ||
) |
allocates a 2D cuda array on the device.
[out] | hostPtr | 2D pointer accesible from the host. |
[in] | sz1 | dim 1 size |
[in] | sz2 | dim 2 size |
T ** cuda::alloc2DCudaArray | ( | const T ** | in, |
unsigned int | sz1, | ||
unsigned int | sz2 | ||
) |
allocates a 2D cuda array on the device and copy data.
[in] | sz1 | dim 1 size |
[in] | sz2 | dim 2 size |
void cuda::copy2DArrayToHost | ( | T ** | host_ptr, |
const T ** | __device_ptr, | ||
unsigned int | sz1, | ||
unsigned int | sz2 | ||
) |
copy 2D array from device to memory
[in] | __device_ptr | : 2D pointer to the device |
[in] | sz1 | : size1 |
[in] | sz2 | : size2 |
[out] | host_ptr | host ptr |
void cuda::copy2DCudaArrayToDeviceAsync | ( | const T ** | in, |
T ** | __devicePtr, | ||
unsigned int | sz1, | ||
unsigned int | sz2, | ||
const cudaStream_t | stream | ||
) |
allocates a 2D cuda array on the device and copy data.
[in] | sz1 | dim 1 size |
[in] | sz2 | dim 2 size |
T * cuda::copyArrayToDevice | ( | const T * | in, |
unsigned int | numElems | ||
) |
send mesh blocks to the gpu
[in] | in | : input array |
[in] | out | device pointer where the data is copied to. |
void cuda::copyArrayToDeviceAsync | ( | const T * | in, |
T * | __deviceptr, | ||
unsigned int | numElems, | ||
const cudaStream_t | stream | ||
) |
send mesh blocks to the gpu (async)
[in] | in | : input array |
[in] | out | device pointer where the data is copied to. |
void cuda::copyArrayToHost | ( | T * | host_ptr, |
const T * | __device_ptr, | ||
unsigned int | numElems | ||
) |
copy array from device to memory
[in] | __device_ptr | : device pointer |
[in] | numElems | : number of elements |
[out] | host_ptr | host ptr |
|
inline |
copy value to device
[in] | in | : input value |
[in] | out | device pointer where the data is copied to. |
void cuda::dealloc2DCudaArray | ( | T **& | __array2D, |
unsigned int | sz1 | ||
) |
deallocates the 2D cuda array.
[in] | sz1 | dim 1 size |
cudaDeviceProp * cuda::getGPUDeviceInfo | ( | unsigned int | device | ) |
send device information to the gpu
[in] | device | : gpu device ID |