In this Chapter, we will describe -- in tensor notation -- a number of fundamental concepts that
will be highly relevant to the rest of our narrative. Thus, this Chapter will not only help you
increase your proficiency in the tensor notation, but will also help you prepare for what is to
come. Among other topics, we will cover linear combinations, the dot product, differentiation of
multivariable functions, as well as multivariable inverse functions. By the end of the Chapter, we
will be ready to return to Euclidean spaces in order to begin constructing the tensor framework.
8.1Linear combinations in the tensor notation
Let us begin with linear combinations, the fundamental algebraic building block of Linear Algebra.
As a sum of products, a linear combination has the potential to be interpreted as a
contraction but, first, an important adjustment needs to be made.
Consider the decomposition of a vector as a linear combination of the basis vectors , , and
with
coefficients , , and
, i.e.
Since both indices are subscripts,
the above summation does not represent a valid contraction. In order to make it one, either the
coefficients or the
basis vectors must
be enumerated by a superscript. We arbitrarily choose the coefficients. With the
coefficients denoted by ,
, and
, the
linear combination becomes
As a result, it is now a valid
contraction. Invoking the summation convention, we can now express by the remarkably compact equation
This example perfectly illustrates the guiding aspect of the tensor framework. Our decision to
enumerate the coefficients by a
superscript was governed by nothing other than our desire to invoke the summation convention.
However, in doing so, we have made a prediction of how the coefficients
transform under a change of basis. Crucially, our prediction is correct, as will be
confirmed in Chapter 14. We have highlighted this
surprising aspect of the tensor notation before but it is worth reiterating: it not only
captures the properties of objects, but also predicts them.
8.2Equating linear combinations
We will frequently encounter the situation where the decomposition of one and the same vector with
respect to one and the same basis is arrived at in two different ways. Since, as it is well known
from elementary Linear Algebra, the decomposition is unique, the decomposition coefficients -- i.e.
the components -- in the two alternatively-derived expansions must coincide. In other words, from
the equality of two linear combinations we can conclude the equality of the coefficients. While
this is a straightforward matter, the tensor notation captures linear combinations in such a
compact manner that this logic may sometimes prove elusive. It is therefore worth illustrating with
a few examples.
Suppose that the basis consists of the vectors . Then
from
we can conclude that
To see how this conclusion is
reached, unpack both sides of the former equation, i.e.
With the equation in this form, it
becomes clear that the Linear Algebra principle of equating expansion coefficients applies. Also,
the unpacked form prevents you from slipping into the incorrect argument of "dividing" both sides
of the identity by
. Note
that even as your experience with the tensor notation grows, you will not cease to practice
unpacking. You will simply learn to imagine and process the unpacked form more quickly.
The equality of linear combinations may appear in more algebraically complicated forms. For
example, from the identity
we can conclude that
Note that we need not concern
ourselves with the interpretation of the combination
in order to reach this conclusion.
In situations where the repeated indices enumerating the elements of the basis are different on the
two sides of the equation, e.g.
you will need to rename one of the
pairs of the repeated indices so that the two pairs match. In the above example, switch the roles
of and on the right, i.e.
From this form of the identity, we
are able to conclude that
I hope that the foregoing examples illustrate that while the tensor notation may require some
practice, its compactness does not obfuscate the ideas that you may be used to seeing in the
unpacked form.
8.3The relationship between matrices relating two bases
In the near future, we will study transformation of systems under a change of coordinates. The
corresponding topic in Linear Algebra is referred to as change of basis. In this Section, we
will use the tensor notation to demonstrate that the matrices relating the two bases are the
inverses of each other.
Suppose that and
are
two alternative bases. Let the system
express the elements of the basis
with respect to , i.e.
and
express the elements of the basis
with respect to , i.e.
In other words, is
the component of
with respect to and
is
the component of
with respect to .
Because the systems
and
perform opposite conversions, we may anticipate that they are the inverses of each other in the
matrix sense. This is, indeed, the case. Furthermore, this fact is naturally demonstrated in the
tensor notation and will offer us an opportunity to practice combining indicial expressions.
The idea is to substitute the identity into
and
thus express the basis in
terms of itself. However, since the equation
features
(with a ) while the equation
features (with
an ), substitution is not possible until we properly
coordinate the index names. In the identity
rename into (in order to free up for the next step), i.e.
and then into , i.e.
Next, substitute the above identity
into
which yields
From here, one can proceed according
to one of two approaches: the tensor-novice or the tensor-expert. We will describe
both approaches. The tensor-novice approach analyzes the elementary operations that take place
under the hood. The tensor-expert approach showcases the elegance and the efficiency of the tensor
notation.
For the tensor-novice approach, begin by unpacking the identity
on the dummy index , i.e.
and subsequently on the live index
which, as we discussed in the
previous Chapter, expands the equation into a set of three, i.e.
where the identities are still not fully unpacked: each equation contains three un-unpacked
contractions on the index .
The above identities express the elements of the basis with
respect to itself. From elementary Linear Algebra, we know that there is a unique way of
doing so and that is
Since equality of linear combinations implies equality of coefficients, we arrive at the following
nine identities:
Finally, note that with the help of
the indicial notation and the Kronecker delta ,
these nine identities can be captured by the single tensor equation
We are now able to formulate our conclusion. Since the Kronecker delta corresponds to the identity
matrix, this identity states precisely that the matrices corresponding to
and
are the inverses of the other, as we set out to show. The advantage of the approach that we have
just described is that it exposed almost every elementary low-level detail of the calculation. On
the other hand, its exhaustive details may have obscured the simple algebraic structure of the
equations. This approach therefore cannot be considered an effective use of the tensor notation.
The tensor-expert approach brings out the elegance of the argument by keeping the equations
compact. In the equation
replace
with the equivalent expression . The
resulting identity
shows the equivalence of the two linear combinations with respect to the same basis, which is
precisely the situation we described earlier. Equating the coefficients, we find
Switching the sides, i.e.
we immediately recognize it as
precisely the equation we set out to prove.
8.4The dot product in the tensor notation
Recall from Chapter 2, that for two vectors and , the matrix form of the component
space expression for reads
where and are the matrices representing the components of and , and is the matrix of pairwise dot products of the basis vectors
, i.e.
As elegant as the equation
may be, its tensor analogue is every
bit as aesthetically pleasing. In fact, with the introduction of index juggling in Chapter 11, it will become even more so.
The entries of the matrix are naturally enumerated by a
pair of subscripts, i.e.
Once again, the choice of subscripts
is dictated strictly by the rules of the tensor notation rather than some a priori insight
into the nature of the system or
knowledge of how it transforms under a change of basis. The tensor notation stipulates that all
terms in an identity must have matching indicial signatures. Thus, the fact that the expression
has two subscripts means that we have no choice but to enumerate the entries of by subscripts. The fact that this
choice accurately predicts the manner in which
transforms under a change of basis, which will be confirmed in Chapter 14, is simply yet another illustration of the great predictive ability of
the tensor notation.
Next, following the convention adopted earlier, which was also dictated by the rules of the tensor
notation, enumerate the components and
of
and by superscripts, i.e.
We will now show that in terms of
,
, and
, the
expression for reads
First, let us convince ourselves
that this formula is correct by fully unpacking the contractions on the right that represent a sum
of nine terms, i.e.
This equations is identical to the equation
derived in Chapter 2, except for the fact that the
components of the vectors are now enumerated by superscripts. Thus, the combination
indeed represents the dot product .
Thanks to the summation convention, the expression
is every bit as compact as its matrix analogue . Furthermore, the indicial form
offers a few notational advantages over the matrix form. First, it does not require the operation
of the transpose. As we discussed in the previous Chapter, this speaks to the extreme economy of
operations in Tensor Calculus. Second, as we also discussed in the previous Chapter, the order of
the multiplicative terms in the expression
is immaterial. Third, the expression
gives access to the individual entries of the matrix as well as the individual components
of and . This will prove to be of crucial
advantage in numerous applications, including quadratic form minimization discussed below. Finally,
note that with the help of index juggling introduced in Chapter 11, the expression
will be supplanted by the remarkably compact equivalent
which, while valid for all bases, exhibits the utmost simplicity of the dot product expressed with
respect to an orthonormal basis.
Next, let us re-derive the identity
strictly in the tensor notation, as opposed to by matching up
with a previously established result. On the one hand, the upcoming calculation is almost
too simple to be called a derivation. On the other hand, it does require a careful
manipulation of indices and will therefore serve as a worthwhile exercise of the tensor notation.
Start with the decompositions of and in terms of the basis , i.e.
Since we are about to combine these
expressions in a single product, they cannot both use the index -- otherwise, we would end up with an invalid combination
. Thus,
we will keep in the expression for and switch to in the expression for , i.e.
Dotting the two identities, we find
(Recall from Section 7.8.2 the discussion concerning the subtleties inherent in
expressions that feature two or more simultaneous contractions.) Rearrange the terms on the right
to bring the two vectors together, i.e.
Since
we arrive at the desired result
The foregoing discussion is important for two reasons. First, the dot product is a central
operation in Geometry, and therefore in Tensor Calculus, and its component space representation --
later to be referred to as the coordinate space representation -- is of utmost value.
Second, the discussion illustrated that the tensor notation is an effective tool for deriving
algebraic relationships. Note that in Chapter 2, we
essentially guessed the equation
and subsequently observed its
correctness. In this Section, with the help of the tensor notation, we were able to arrive
the equivalent equation
by straightforward algebraic manipulation.
8.5The chain rule in the tensor notation
Many of the fundamental identities in Tensor Calculus are obtained by differentiating identities
involving composite functions. As a result, such analyses rely heavily on the use of the chain
rule. Fortunately, as we are about to demonstrate, the multivariate chain rule lends itself
perfectly to the tensor notation and illustrates another one of its natural applications.
8.5.1The case of one independent variable
We will begin with a function of a single variable given by
the composition
where is a function of three variables, and each of
, , and are functions of
. According to the chain rule, the derivative is given in terms of the partial
derivatives of and the ordinary derivatives of , , and by the identity
Note the convention that we will use throughout our narrative of suppressing the arguments of
functions when they are clear from the context. With the full detail of the arguments included, the
above equation would read
which makes it evident why we prefer
Our present goal is to express the right side in the tensor notation. Fortunately, being a sum
of products, it is ready to be interpreted as a contraction. To this end, denote the arguments
of by ,
, and
,
turning into . Also, denote the functions of , , and by , , and . In terms of the new symbols,
is given by
while the expression for its
derivative reads
Now the expression on the right can
be easily captured with the help of the summation convention, i.e.
As we discussed at the end of
Section 7.3, the index in the symbol can
be thought of as a subscript because it is a superscript in the "denominator" of a "fraction".
Therefore, the repeated index appears once as a subscript and once as a superscript, thus properly
triggering Einstein's summation convention.
Finally, if you still find it helpful to visualize the matrix form of indicial expressions, note
that the above identity can be captured by the equation
We will continue to provide the
corresponding matrix forms for each of the chain rule identities in this Section.
8.5.2The case of several independent variables
Let us now consider a function of two variables formed by composing the
function with functions of two variables , , and , i.e.
Even though in this example there
are only two independent variables, our analysis will apply to functions with an
arbitrary number of arguments.
Applying the chain rule for each independent variable, we find
Let us convert these identities into a single indicial equation. In addition to denoting the
independent variables of by and
the functions , , and by ,
denote the independent variables as and
, i.e.
Collectively, we will refer to and
as
.
The superscript is taken from a different alphabet to
highlight the fact that the number of independent variables
is different from the number of the arguments of . In terms of the new symbols, the
above differential identities read
We can now "pack" these identities into the single indicial equation
As usual, the repeated index
represents a summation while the free index enumerates independent equations. Note that the object
represents six partial derivatives: the derivative of each of the three functions with respect to each of the two variables .
If the Latin index is treated as first and the Greek as second, the above identity can be captured
in matrix form by the equation
A note on terminology is in order. The phrase differentiation with respect to
refers to the evaluation of the partial derivatives of a function with respect to each of the
independent variables .
However, from the point of view of the mechanics of differentiation, the derivatives are evaluated
as if with respect to a single variable, such as or
. In
other words, thanks to the tensor notation, the simultaneous nature of the operation does not
increase the complexity of the analysis compared to the evaluation of a single derivative. As a
matter of basic tensor proficiency, you should be able to go fluently from the equation
defining in terms of and to
the equation
that gives its derivatives in terms
of the derivatives of and .
We will now turn our attention to the most general case of several functions of several variables.
8.5.3The case of several functions of several independent variables
Finally, let us consider the general case of several functions of
several variables
composed with a matching number of functions of
several variables .
We have run out of alphabets, so we going to use a letter, , from a different part of the Latin alphabet to indicate
that the number of functions may
be different from the number of arguments in
each function and
from the number of independent variables .
For the sake of concreteness, suppose that there are functions , i.e.
,
,
, and
, and
thus there are composite functions , i.e.
Pack these equations into a single one with the help of a live index , i.e.
Differentiating the combined
equation with respect to and
yields
These equations represent a total of identities, as each equation represents identities corresponding to , , , and . Express the contractions on the right by using a dummy
index , i.e.
and, subsequently, combine the two equations into a single one by using a free index , i.e.
This single equation captures the
partial derivatives
of the functions in terms of the partial derivatives of and
. The
application of the chain rule to multivariate composite functions will be one of the most common
operations going forward.
Finally, let us give the matrix form of this equation. If the superscript enumerating the
functions is considered first and the subscript enumerating the variables second,
then the corresponding equation in matrix form reads
The natural ability of the tensor notation to handle the chain rule pays immediate dividends in the
analysis of inverse functions, to which we now turn.
8.6Inverse functions
As we have already mentioned on a number of occasions, the tensor property describes how a
system transforms under a change of coordinates. A change of coordinates is, in turn, specified by
two sets of inverse functions. In this Section, we will derive the relationship between the partial
derivatives of those sets of functions. This exercise will serve the dual purpose of preparing us
for future analyses of coordinate changes as well as increasing our fluency with the tensor
notation.
8.6.1Coordinate transformations as inverse functions
To describe a change of coordinates in an -dimensional space requires a set of functions of variables. Indeed, we must specify how each of the new coordinates is obtained from the old coordinates. For example, the functions that
describe the transformation from Cartesian coordinates to polar coordinates are
Naturally, the inverse coordinate transformation -- from the new coordinates back to
the old coordinates -- is also described by a set of functions of variables. For the same coordinate transformation, the
functions that describe the inverse transformation are
Thus, we have two sets of functions, i.e. those that translate from old coordinates to
new and those that translate from new to old. By definition, the two
sets are function inverses of each other. In other words, if one set maps the values
to
, then
the other sends back
to .
Remaining in two dimensions for now, denote the functions that translate the old coordinates to the
new coordinates by and or, collectively, . Denote the functions that translate in the
opposite direction by and or, collectively, . For the transformation between Cartesian and
polar coordinates, we have
Let us confirm that these sets of
functions are indeed the inverses of each other by evaluating the composite functions
We expect to recover and to recover . Substituting and for the arguments of and , we find
It is left as an exercise to show that
and therefore
as we set us to show. It is left as an exercise to show that composing the functions in the
opposite way yields the same result, i.e.
In dimensions, the collection of the partial derivatives of
the functions with
respect each of their independent variables is a second-order system with
elements. The same is true of the partial derivatives of with
respect to each of their independent variables. The two systems are connected by an
extraordinarily elegant relationship which we are about to derive. We will discover that the tensor
notation, and its particular effectiveness in expressing the chain rule, truly shines in this
application.
8.6.2Functions of one variable
Let us start our discussion with functions of one variable. Consider two ordinary functions and that are the
inverses of each other, i.e. if maps the number to the number , then maps back to . In other words, if
then
Here are a few examples of inverse
functions that we may use to confirm the derived relationships:
Let us also document their
derivatives:
While, on the face of it, the
derivatives and do not appear to be related in
an obvious way, you may remember from ordinary Calculus that they are, in a certain sense, the
algebraic reciprocals of each other. Of course, they are not reciprocals in the sense that
which can be immediately seen from
the examples above. Instead, the proper relationship can be described like this: if is the image of under , i.e.
or, equivalently,
then
In other words, the derivatives are
the reciprocals of each other at the appropriate values of their arguments -- namely must
be evaluated at , i.e. the image of , rather than itself. It is left as an exercise for the reader to
remind themselves why this relationship makes perfect sense when one considers the relationship
between the graphs of the inverse functions and .
Let us confirm this relationship for one of the examples above. Suppose that ,
, and thus .
Since and
, we have
Since , we
have
and therefore the relationship
indeed holds since both sides equal
.
Let us now present a derivation of the relationship
which will serve as a blueprint for
deriving the analogous relationship in the multidimensional case. Most differential relationships
are derived by forming an identity with respect to the independent variables and subsequently
evaluating the derivative of both sides with respect to each variable. In this case of inverse
functions of one variable, the identity reads
In words, it states that applying
to recovers the value of . An application of the chain rule yields
from which we have
which is precisely the relationship
we set out to derive.
We will now turn our attention to inverse pairs of functions of two variables and, subsequently, to
the general case of inverse sets of functions of variables.
8.6.3Inverse sets of two functions of two variables
Denote one pair of functions by and and the inverse pair by and . For the sake of greater clarity, we will use
the letters and for the arguments of , and
and for the arguments of . By
definition, the two sets of functions are the inverses of each other if
In other words, if and
send
and to and , then and
send
and back to and . For example, as we showed above, the
functions describing the coordinate transformation from Cartesian to polar coordinates -- or
between any two coordinate systems, for that matter -- represent such sets of functions.
Differentiate each one of the above identities with respect to and . Applying the chain rule to the first
identity yields
Doing the same for the second identity yields
As we did earlier in the Chapter, we omitted the arguments of the functions for the sake of
conciseness. A more detailed version of, say, the first identity would read
However, this level of detail obscures the overall structure of the expressions. We will,
therefore, continue to use the concise form, but we must remember that the derivatives of and
are
to be evaluated at and .
When we organize the partial derivatives into matrices
we observe that the four equations
above are captured by the identity
Thus, the two matrices
representing the partial derivatives of the functions
and
are the inverses of each other. This is our central conclusion, and it is a direct
generalization of the one-dimensional identity
Let us confirm that the newly discovered relationship holds for the transformation between
Cartesian and polar coordinates. Recall that the two coordinate systems are related by the
equations
Thus, using the letters and , and and for the independent variables, the
functions and
are
given by
Evaluating their derivatives, we
find
and
Once again, keep in mind that the
partial derivatives of are
to be evaluated at
Performing this substitution, we find
Then, multiplying the two matrices
yields
which confirms the general identity
8.6.4Inverse sets of functions of variables
Finally, let us analyze the general -dimensional case by using the tensor notation from start
to finish. Suppose that the sets of functions and are the inverses of each other. In the tensor
notation, this relationship is captured by the single identity
Differentiate this identity with respect to .
In effect, we are simultaneously differentiating each of the equations represented by the above identity with respect
to each independent variable. We find
As we described in Section 7.3, the expression on the right is captured by the Kronecker
delta symbol ,
i.e.
Thus, an application of the chain
rule yields
were we once again dropped all of
the functional arguments for the sake of conciseness. That is all there is to it -- this is the
central identity that we set out to establish. That fact that we achieved it in one effortless step
is a tribute to the effectiveness of the tensor notation.
As we have done previously, arrange the partial derivatives into the matrices
In Chapter 13, we will refer to these matrices as the Jacobians and of
the coordinate transformation. In terms of the these matrices, of the tensor identity
reads
Thus, we have proven the general result that the matrices of partial derivatives of inverse sets of
functions are the inverses of each other. As we have already mentioned, this fundamental fact plays
a critical role in the construction of the tensor framework. More importantly for our present
purposes, the calculation demonstrated the great effectiveness of the tensor notation.
8.7Quadratic form minimization
Quadratic form minimization is not particularly relevant to the goals of this book. However, it is
an essential problem in Applied Mathematics and, additionally, our discussion will serve as an
excellent illustration of one important feature of the tensor notation: its ability to access the
individual elements of systems.
Suppose that is a vector in the sense of
described in Section 2.7. Denote the entries of by .
Quadratic form minimization is the task of finding the minimum of the function
where is a symmetric positive definite matrix and is an arbitrary vector.
Finding the extremal values of a function is a classical problem in ordinary Calculus. If you
recall, the extremal values occur at those points where all partial derivatives of are equal to zero. However, the above form is
not conducive to differentiation since the latter requires access to the individual entries of
. Thus, the tensor notation is far better suited for this
calculation.
Let us write the expression for in the tensor form
Since the indices and are being used for the contractions,
we are unable to differentiation the above identity with respect to or
.
Indeed, the expressions
are invalid. Thus, instead, we will
differentiate the above identity with respect to ,
i.e.
As we discussed in Section 7.8.3, the product rule
applies to contractions as if they were simple products. Therefore, we have
As we discussed in Section 7.3, the derivatives
and
are perfectly captured by the Kronecker delta symbol, i.e.
Thus,
Since ,
, and
, we
have
In order to collect like terms, the
independent variables must be enumerated by the same index. This can be achieved by renaming the
repeated index into in the first term, i.e.
Next, factor out and
switch the order of the terms inside the parentheses, i.e.
Equating the partial derivative
to zero, we arrive at the linear system that determines the critical values of the
independent variables :
If you prefer to use the indices
and in the final equation, you may
rewrite this equation as
For a symmetric system ,
i.e. ,
the above equation reads
This is the classical conclusion for
the problem of quadratic form minimization.
For, perhaps, one final time, let us unpack the identity
in order to show the elementary
equations that it represents. Taking , begin by unpacking the free index to reveal the three individual equations
Next, unpack the contraction in each of the equations, i.e.
For a symmetric ,
the unpacked version of the corresponding equation
is
Finally, let us rewrite the equation
in the matrix form. If
corresponds to the matrix , then
corresponds to
and, therefore, the above equation reads
For a symmetric matrix , i.e. ,
this equation assumes the classical form
8.8Exercises
Exercise 8.1For
show that the second derivative is given by
Exercise 8.2For
show that the collection of second derivatives is given by
Exercise 8.3For
show that the collection of second derivatives is given by
Exercise 8.4Show that
Exercise 8.5For
confirm that
Exercise 8.6Explain why the relationship
makes sense by describing the relationship between the graphs of the inverse functions and .
Exercise 8.7Confirm the relationship
for the functions
as well as a handful of other pairs of inverse functions of your choice.
Exercise 8.8For inverse functions and , derive the second-order equation
and verify this identity for the functions
Exercise 8.9Derive the third-order equation analogous to the one in the previous exercise and test it against the same set of functions.
Exercise 8.10Show that the location of the minimum of the ordinary function
is given by the equation
Note the complete analogue with the multivariate function
whose minimum is given by the equation