The traditional formulation of density-functional theory [1] (DFT) in terms of a set of extended single-particle wave functions has led to the development of schemes for performing total-energy calculations which require a computational effort which scales as the cube of the system-size (i.e. the number of atoms, electrons or volume of the system). This scaling results from the cost of diagonalizing the Hamiltonian or orthogonalizing the wave functions. Methods based upon the single-particle density-matrix (DM), which is free from orthogonality constraints and short-ranged in real-space, offer the prospect of electronic structure calculations at a cost which scales only linearly with system-size. We investigate one scheme that has been proposed for achieving this goal, which uses a penalty-functional to impose the idempotency constraint on the DM. We show that the form of penalty-functional which must be chosen to obtain a variational principle for the total energy precludes the use of efficient minimization algorithms commonly used in electronic structure calculations. We apply the method to crystalline silicon and show that the desired minimum cannot be found and therefore that the variational principle cannot be used in practice.
In terms of a set of orthonormal orbitals
and
occupation numbers
, the DM
is written as
![]() |
(1) |
![]() |
(3) |
The energy functional is defined by
Exploiting the short-ranged behaviour of the DM, i.e. that
exponentially
[2] as
, by imposing some spatial cut-off
(such
that
for
) results in a linear-scaling method. The
most significant hurdle to overcome is the imposition of the
idempotency constraint. This can be achieved implicitly using a
purifying transformation [3] which has been
implemented in several tight-binding and DFT schemes [4].
An alternative approach to imposing the idempotency constraint
has been proposed by Kohn [5], who suggested minimizing
the functional defined by
Rather than minimizing the non-interacting energy, as proposed by
Kohn, we can instead minimize the interacting energy
self-consistently. Using the square of the DM to calculate
in (5) has the advantage that it
guarantees that the charge density is positive-definite. However, in
order to simplify the analysis here we consider the functional
defined by
![]() |
(7) |
![]() |
(8) |
![]() |
(9) |
![]() |
This is illustrated schematically in Fig. 1 for the case of
a single occupation number corresponding to a state above the chemical
potential. For
the minimum value of
is obtained when
. This outlines the variational principle
established by Kohn, but we note that the functional
is minimal
only in the sense that it takes its minimum value at the ground-state,
but not in the sense that its derivatives vanish at the ground-state,
since they are undefined at that point.
The penalty-functional has a branch point at its minimum, due
to the square-root form employed (6). However, this
square-root is crucial to establishing the variational principle. In
Fig. 2, the effect of using an analytic penalty-functional
(the square of
) is plotted and it is clear that the minimum now
occurs for
for all values of
. The total energy
calculated in this case will no longer be a variational upper bound to
the ground-state energy. We have recently introduced a method for
obtaining accurate estimates of the true ground-state energy from
such nearly-idempotent DMs, and details can be found elsewhere
[7]. Nevertheless, the non-analytic form of the
penalty-functional (6) must be employed if we wish to
obtain a variational principle for the energy.
![]() |
Several schemes exist for directly minimizing functions of many
variables. The simplest of these is the method of steepest descents in
which the gradient of the function is used as a search direction in
the multidimensional space. The minimum value of the function along
this direction is found and the process iterated to convergence. In
Fig. 3a, the results of applying this method to an exactly
quadratic function
are plotted, starting from
the point (5,1). Successive search directions are always orthogonal,
and the method is not guaranteed to find the minimum in a finite
number of steps. The method is clearly inefficient, since all of the
information from previous function and gradient evaluations is ignored
when calculating new search directions. In contrast, the conjugate
gradients method [8] uses this information to construct
independent search directions. Each successive step effectively
eliminates one dimension of the space to be searched, so that the
minimum is found in a number of steps no greater than the
dimensionality of the space. The results for this method are plotted
in Fig. 3b for the same quadratic function plotted in Fig. 3a. This method has been successfully implemented in
traditional electronic structure calculations [9].
![]() |
However, the conjugate gradients method relies on the accuracy of
a quadratic approximation of the function around the minimum. As
observed, the functional has a branch point at its minimum which
arises from the square-root that appears in the penalty-functional,
and is therefore non-analytic at the minimum. No multidimensional
Taylor expansion for the functional exists about the minimum, and so
the local information (functional values and gradients at points) used
to construct the conjugate directions gives a misleading picture of
the global behaviour of the functional. In Fig. 4a the
results of a steepest descents minimization of the function
is plotted, and exactly the same
sequence of points is generated as in Fig. 3a. However, the
results obtained using the conjugate gradients method, plotted in
Fig. 4b, are now worse than for steepest descents.
![]() |
In Fig. 4, the exact line minimum is found in each case. In practice, however, the position of the line minimum is usually estimated, by making a parabolic interpolation of the functional along the search direction using the initial value and first derivative of the functional, and its value at a trial step. In this case, the line minimum estimate will also be wrong, further reducing the efficiency in both cases.
These problems are not confined to the conjugate gradients method alone, but apply equally to any method which attempts to use gradients to build up an estimate of the Hessian of the functional e.g. the Fletcher-Powell or Broyden-Fletcher-Goldfarb-Shanno algorithms [10]. The Car-Parrinello scheme [11] would also fail in this case since the derivative of the penalty-functional would appear in the equations of motion for the molecular-dynamics Lagrangian. Simulated annealing methods [12] such as the Metropolis algorithm [13] will successfully minimize functions of this kind, since they do not use the gradients, but the number of iterations required in this case would increase with system-size as the number of dimensions to be searched increased, thus spoiling the linear scaling of the method.
We can attempt to find a set of conjugate directions for the
non-quadratic functions encountered here. For a function
with a quadratic minimum, a set of conjugate directions
are defined by
![]() |
(11) |
![]() |
(12) |
![]() |
(13) |
When using the conjugate gradients method to minimize functions
which are not exactly quadratic in form, it is common practice to
reset the conjugate directions after a certain number of iterations by
taking a steepest descent step. For functions of the same form as
, the method becomes rapidly less efficient as the number of
successive conjugate gradients steps is increased for the reasons
discussed above. Indeed, for functions of many variables the method
may completely fail to find the true minimum. Instead, the method
appears to converge to a value which is not the minimum. This trend
can be seen in Fig. 4b, in which the conjugate directions
become orthogonal to the gradient. Therefore, the method fails not as
a result of false minima, but because the minimization becomes so slow
as to be indistinguishable from true convergence.
![]() |
In order to see whether such behaviour appears in a genuine DFT
calculation, we have implemented this scheme to perform total energy
calculations, and have applied it to crystalline silicon. The
density-matrix was expanded in separable form in terms of a sparse
Hermitian matrix and a set of localized functions, as in other schemes
[4], and the localized functions themselves were centred on
the ions and expanded in terms of a localized spherical-wave basis-set
[14] using an energy cut-off of 200 eV and angular
momentum components up to . No attempt was made to converge
the calculation with respect to the density-matrix cut-off. The
variational principle proved by Kohn states that for
,
at the minimum. In Fig. 5, we
show the contribution of the penalty-functional
to the
total minimized functional
. As
is increased, this
contribution does not vanish above some critical value (estimated from
(10) and by Kohn's limit to be of the order of 50 eV for
silicon) but rather decreases slowly. Also plotted in Fig. 5 is the corresponding root-mean-square error in the
occupation numbers, which also decreases as
increases. Thus
the total energy
will approach the true ground-state value, but no
variational principle can be invoked, since this only holds for
exactly. This failure to observe Kohn's result in practice is due to
the inability of the minimization procedure to locate the true minimum
of the functional, due to its non-analytic form.
In conclusion, we have implemented a method based upon the variational principle derived by Kohn [5] and demonstrated a fundamental difficulty in using this method in a computational scheme which is due to the non-analyticity of the required penalty-functional. Calculations on crystalline silicon confirm the trends observed from considering simple model functions. We have shown that the variational principle cannot be exploited in practice because the nature of the functional makes it unsuitable for use with current minimization techniques. Other schemes based upon the purifying transformation, or which use an analytic penalty-functional to approximately impose idempotency (and subsequently correct for the error introduced due to the lack of idempotency) will therefore be more efficient.
We acknowledge helpful discussions with E. Sandré. PDH was supported by an EPSRC studentship.