In Fig. 4 we plot the results for two different
FFT grid sizes:
and
. The implementation of each method used the same communication
library call. The results plotted were obtained by averaging the
communication times for 50 convolutions (i.e. both a forward and
backward FFT) on the
grid and 100 convolutions on the
grid. The error bars show the resulting standard deviations. The
dashed and dotted lines show the best fits to Eqs. 4 and
5 respectively.
![]() |
The quality of the fits of Eqs. 4 and 5 to the
data supports our analysis. In particular, the cross-over occurs at a
smaller number of processors for the case of the smaller FFT grid, as
expected. On the machine used here, the new method is out-performed by
the old method for the reasons given above, and would only be
applicable for very small FFT grids or very large numbers (over 1000)
of nodes. However, from the quality of the fit to the analysis of
Sec. 5, we would expect to be able to apply Eqs. 4
and 5 with confidence to estimate how the methods would scale
on other machines. For example, on the cluster of PCs mentioned in
Sec. 5, on which we measured a latency of about
and a bandwidth of
, the
expected cross-over points are 40 and 140 nodes for the
and
grids respectively.