Next: 8. Conclusions Up: Parallel fast Fourier transforms Previous: 6. Results

7. Load balancing

The completeness of the plane-wave basis set is usually specified by a single parameter, the kinetic energy cutoff $E_{\mathrm{cut}}$ . Only those plane-waves with a kinetic energy less than $E_{\mathrm{cut}}$ are included in the basis set used to describe the eigenfunctions. The cutoff defines a sphere in momentum-space, which is shown in Fig. 5 in relation to the full FFT grid.

**Figure 5:** The sphere of plane-waves included in the basis set in relation to the full FFT grid.
$\includegraphics [height=60mm]{sphere.eps}$

In order to ensure an even division of labour among the nodes (load balancing) of the parallel computer, it is necessary to assign a roughly equal number of plane-waves to each one. In the traditional method, this is achieved by sorting the -rods in order of their length (once they have been truncated by the cutoff sphere) and assigning the rods to the nodes in turn according to that order. This inevitably requires some book-keeping which can add to the cost of the traditional method.

We note here that the new method automatically satisifies the demand of load balancing. Since the plane-waves assigned to each node are spread uniformly throughout the grid (see e.g. Fig. 3), the effect of the cutoff sphere is to truncate the set of plane-waves on each node in the same fashion, so that the number of plane-waves on each node remains balanced. In fact, the cutoff sphere on the full FFT grid is transformed into a cutoff sphere on the FFT grid on each node in the new method. Thus the book-keeping cost is only associated with the local 3D-FFT performed on each node, whereas in the traditional method it affects the communication pattern between nodes (e.g. so that the packet sizes are no longer identical) and thus has a more significant effect on performance.

Next: 8. Conclusions Up: Parallel fast Fourier transforms Previous: 6. Results

Peter Haynes