In order to ensure an even division of labour among the nodes (load balancing) of the parallel computer, it is necessary to assign a roughly equal number of plane-waves to each one. In the traditional method, this is achieved by sorting the -rods in order of their length (once they have been truncated by the cutoff sphere) and assigning the rods to the nodes in turn according to that order. This inevitably requires some book-keeping which can add to the cost of the traditional method.
We note here that the new method automatically satisifies the demand of load balancing. Since the plane-waves assigned to each node are spread uniformly throughout the grid (see e.g. Fig. 3), the effect of the cutoff sphere is to truncate the set of plane-waves on each node in the same fashion, so that the number of plane-waves on each node remains balanced. In fact, the cutoff sphere on the full FFT grid is transformed into a cutoff sphere on the FFT grid on each node in the new method. Thus the book-keeping cost is only associated with the local 3D-FFT performed on each node, whereas in the traditional method it affects the communication pattern between nodes (e.g. so that the packet sizes are no longer identical) and thus has a more significant effect on performance.