[Dune] Question on parallel computation
Aleksejs Fomins
aleksejs.fomins at lspr.ch
Thu Apr 23 10:05:40 CEST 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear Bernd,
I have thought a bit more about A3,
There is a simple algorithm called BFS (breath-first search) that can be employed to slightly improve the performance
It goes like this:
1) Push initial element to the stack
2) Start loop while stack is non-empty
2.1) Take element from the stack and mark it as radius-neighbour
2.2) Receive all real neighbours of this element
2.3) Keep only those that have not been marked already
2.4) Keep only those that are within radius of initial element
2.5) Push kept neighbours to the stack
This way, the complexity is guaranteed to be O(n * n_radius_neighbor)
Cheers,
Aleksejs
On 23/04/15 09:24, Bernd Flemisch wrote:
> Hi Aleksejs,
>
> thank you for your answer. I did not know too much about the MPI Graph Topologies/Communicators yet, they seem to be right for this. I will try to do the following:
>
> A. Setup
> 1. Balance the grid.
> 2. Gatherv element indices and coordinates onto root.
> 3. Root detects neighbors and builds graph topology.
> 4. Scatterv which process needs to provide which numbers for each of his neighbors.
> 5. Broadcast the graph topology, call MPI_Create_graph.
>
> B. Run
> 1. Use MPI_Neighbor_alltoallv to communicate.
>
> Obviously, A.3. can be the performance killer. I hope to get away with the fact that it has to be done only once and an overall still moderate element count. If it gets larger, I have to do more sophisticated tree stuff.
>
> Kind regards
> Bernd
>
> On 04/22/2015 04:25 PM, Aleksejs Fomins wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Dear Bernd,
>>
>> So, the naive algorithm is
>> 1) Determine all neighbours in serial mode using pairwise checking. Complexity O( n(n+1) / 2) for n elements
>> 2) Determine all averages on the master process
>> 2.1) All numbers get communicated to master via comm.gather
>> 2.2) Master calculates all averages. Complexity O(n * n_neighbor_within_radius)
>> 2.3) Master communicates all averages to their owners via comm.scatter
>>
>> My thoughts on optimization of above algorithm
>> 1) If the radius is small compared to size of the mesh, the mesh can be placed into an OCTree. This way, first the octants get compared, and if they do not intersect, then the element pairs within them do not get checked. The complexity will be reduced to approximately O(n * n_neighbor_within_radius / 2)
>>
>> 2) The computation of the averages on master process can be optimized by dynamical programming. Since the sets of radius-neighbours are going to significantly overlap, then, instead of storing the set of all neighbours for each element, it makes sense to store the change of neighbours between this element and some other element close to it, perhaps one of its real neighbours, thus making sort of a tree. The complexity is going to be O(n * K), where K will be approximately (n_neighbor_within_radius)^{2/3}.
>>
>> 3) A different approach would be to send to each element only the values it needs to compute its own average. Then your friend is the new mpi routine MPI_Neighbor_allgather, which allows to specify in parallel the graph of communication, and then each process sends data only to its recepients. Then the computation is cheap, and this type of mpi communication is definitely scalable to very large processor number unlike the standard gather one. Unfortunately, it is still not implemented in Dune, but I have written a wrapper for one of these routines, you can have it if you want to
>>
>> Greetings,
>> Aleksejs
>>
>>
>> On 22/04/15 15:08, Bernd Flemisch wrote:
>>> Dear Dune,
>>>
>>> I want to calculate a number a_e for each element e that is the average of other numbers x_n associated with elements n that lie within a certain radius r of e:
>>>
>>> a_e = avg ( all x_n with n such that dist(e, n) < r )
>>>
>>> Maybe some of you already had a similar problem: How can I achieve this in MPI-parallel on a Dune grid without overlap, with maximal use of the existing Dune infrastructure?
>>>
>>> I can think of determining all the interacting elements n for each element e sequentially on the macro grid before that is balanced over the processes. Based on that pattern, I want to calculate frequently the a_e's.
>>>
>>> Thank you for your thoughts. Kind regards
>>> Bernd
>>>
>> _______________________________________________
>> Dune mailing list
>> Dune at dune-project.org
>> http://lists.dune-project.org/mailman/listinfo/dune
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAEBAgAGBQJVOKfUAAoJEDkNM7UOEwMZyosP/3nbnw9lI4fP14xWP48A7wLU
kg2zb5M5WsYd2AtWi+0Je0rHmGe7nIv2k6NaVIiJE0jwue5TT/kT0F8qUuwecMkO
YWzTySZ9fqZFQNhk10ATQpYbTGbklXtu27hJabCcCdu9wm8d658JZgvjdIh4t1Dh
cxVnnslfT1dOvM97hB0z7XaRWdoDvxT4Em+pnOzw1tFdvIKWxci6C1lB8nfysEyW
MDd49ugfQ54LdfbxPqLK/0AWjoMfe6kyeWPT0lTvQcAdbkQr3djkUYsi0MCY+igH
Rk4XhncuM0VjWf4GwJtjNCYGXJJwpfcWeLvARiZyQczqE+UUDvHFtPtkatcjtbOd
Sia3vE4o4GzlC49T9XVCq3jhWuVJL7aWoCr9HtP3f5JfcHN6eEEpEfWVHvG7U5Q0
GZuA0p9p39EWFJ0T9BchfaYWB1Wrjn6Rfff9Sa4AplvzwP+fDBM9MsoRorEtAnfX
iU43eUhKr12M9NrnIlSBVfjNXhNXLtO/UKHrDD9CytFkAhS/bl3+OdFLKg7vPGqD
uXfr2lO/E03j4ZXxxKJDkERIH+++UvGxrXPVL+VUMy2uf0Vjq7mp6jKCAvbDEGV9
lxxEIuzfOIBveq+8RYvK82mOs7XTwKv9+aIT8vFV3Sew4kJlEZBBU0WhyKdxo5A8
qK0aav9YdcsNFD+qsdK0
=lG+b
-----END PGP SIGNATURE-----
More information about the Dune
mailing list