The argument for performing DMC calculations on a parallel computer is even more compelling than for VMC calculations, because DMC calculations require approximately an order of magnitude more CPU time than the equivalent VMC calculation.
As with VMC calculations, the DMC algorithm is intrinsically parallel.
In the algorithm outlined in section , an ensemble of
walkers is used to evaluate the local energy of the guiding function,
, at each time step of the simulation. In our parallel
version of the DMC algorithm, this ensemble of walkers is distributed
across all NNODES nodes of the parallel machine. Each node is responsible
for performing stages (2)-(8) of the algorithm (the diffusion, drift
and creation/annihilation of walkers) on its own subset of the total
ensemble of walkers.
After all the walkers have been advanced for a block of time steps, the mean energy across all the walkers on all the nodes is used to update the trial energy as in stage (11) of the algorithm.
where is the accumulated local energy of the subset of
walkers on the i
node.
The renormalisation of the number of walkers at the end of a block is performed across all the nodes in the following way.
It is important to try and keep the number of walkers on each node equal, i.e. to `load balance' the algorithm efficiently. The efficiency of the algorithm at any one time step is determined by
where is the total number of walkers
across all the nodes and
is the number of
walkers on the node which at that particular time step has the largest
number of `live' walkers. The efficiency of the algorithm can
therefore be improved in two ways,
In other words any one node can never have more than one more walker than any other node. For the DMC calculations reported on in this thesis, there are typically 10 walkers per node, yielding an average efficiency of approximately 95%.
The parallel DMC algorithm requires a set of equilibrated configurations as an input, in the same way as the serial DMC algorithm. These configurations are produced by instructing each node in the parallel VMC algorithm to write out an equilibrated ensemble of configurations.