What is Dendro-4.0 ?
Dendro is a distributed memory partial differential equations (PDEs) solver using numerical methods like, Finite Difference Method, Finite Element Method, Wavelet Method etc. It uses adaptive octree meshes as the geometric discretization.
How do we partition octree among p processors.
We Space Filling Curve (SFC) based flexible partitioning scheme to partition the adaptive octree among 'p' processors. Current implementation of Dendro-5.0 supports both Hilbert curve and Morton Curve. We have experimented with the partitions that we get from using Hilbert and Morton , and we found that when we are moving towards large scale Hilbert curve based partitioning gives, more energy and communication efficient partitions compared to Morton curve.
How to run Dendro-4.0
You can clone the repository using , 'git clone https://github.com/paralab/Dendro4.git'
How to build Dendro-4.0 ?
You need to install http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.5.4.tar.gz in order to use dendro4. We will provide support soon for the latest PETSC version.
You need CMake to build dendro. Create a build directory using 'mkdir build'. Then go into the build directory by 'cd build' then execute 'ccmake ..' to generate the make files. You can build Dendro-5.0 with several options.
- ALLTOALLV_FIX : OFF,Need to turn off
- DIM_2: OFF, This can be turned on if you need to run Dendro-5.0 in 2D case. default: OFF (Which means it assumes 3D domain)
- HILBERT_ORDERING:ON, This specify which SFC to use to partition the data. HILBERT_ORDERING: ON means it uses Hilbert curve, otherwise it uses Morton curve for partitioning.
- PROFILE_TREE_SORT: OFF
- NUM_NPES_THRESHOLD: square root of P (number of processors)
- SPLITTER_SELECTION_FIX: ON. This will perform the data exchange in the octree partitioning in stages. This is mandatory when you run dendro in very large scale.
What can you run ?
See the codes in example folder on how to get things started.
Scalability studies on Dendro-4.0
We have performed octree generation and partitioning up to 262144 cores in ORNL's titan super computer. We have managed to partition 1.3x10^12 octants among 262144 processors with in 4 seconds.
[SC16 Poster: http://sc16.supercomputing.org/sc-archive/tech_poster/poster_files/post245s2-file2.pdf)