6. Coordinate Systems in a Euclidean Space

6.1Introductory remarks

A coordinate system is a method for enumerating points in a Euclidean space by numbers. In order for the coordinate system to be reasonably regular, the number of coordinates must match the dimension of the Euclidean space, i.e. three coordinates in a three-dimensional space, two coordinates on a plane, and one coordinate on a straight line. Furthermore, the correspondence between the points and the coordinates must be reasonably smooth. More precisely, the position vector

\mathbf{R}

should be a sufficiently differentiable function of the coordinates. Other than this requirement, a coordinate system may be completely arbitrary.

For a general coordinate system, the coordinates will be denoted by the capital letter

Z

with a superscript, i.e.

Z^{1},Z^{2},Z^{3}

or, collectively,

Z^{i}

. When indicating the coordinates of a particular point, we will put the coordinates in parenthesis, i.e.

\left( Z^{1},Z^{2} ,Z^{3}\right)

. The unusual placement of the index as a superscript is a crucial element of the tensor notation which is the bedrock of Tensor Calculus. Generally speaking, the term tensor notation refers to the use of indices, both as superscripts and subscripts, to enumerate sets of related objects. Its most basic elements will be described in Chapter 7. Many of its other important elements, such as index juggling, will emerge in later chapters.

The use of a superscript for enumerating coordinates is a completely arbitrary choice, and we could have just as well chosen to use a subscript. However, Tensor Calculus has strict rules for coordinating the placements of indices. Once we have chosen to use a superscript for coordinates, the placement of all other indices is uniquely determined.

Obviously, there are an unlimited number of ways to impose a coordinate system upon a Euclidean space. There are a handful of well-known families of coordinate systems that are frequently used for analyzing problems with special geometries. The most common coordinate systems in a three-dimensional space are Cartesian or, more generally, affine coordinates denoted by

x,y,z

, cylindrical coordinates denoted by

r,\theta,z

, and spherical coordinates denoted by

r,\theta,\mathbf{\varphi}

. In two dimensions, the most common coordinate systems are once again Cartesian or affine coordinates denoted by

x,y

and polar coordinates denoted by

r,\theta

.

In two dimensions, a coordinate system can be represented graphically by its coordinate lines, i.e. curves that consist of points that correspond to a fixed value of one variable while the other is allowed to vary. The following figure illustrates the coordinate lines for a generic coordinate system

Z^{1},Z^{2}

in the plane.

(6.1)

In three dimensions, coordinate lines are replaced by coordinate surfaces, i.e. surfaces that correspond to a fixed value of one variable while the other two are allowed to vary. We will use this method of illustrating coordinate systems for cylindrical and spherical coordinates later in this Chapter.

6.2An example illustrating the great utility of coordinates

In the early chapters, we discovered the impressive utility of geometric vectors when treated as pure geometric objects. However, we also observed the serious limitations of pure geometric methods. Most of these limitations are removed by the use of coordinate systems. We will now begin to explore the remarkable power of analytical methods that leverage the utility of coordinate systems.

For a simple but effective demonstration, let us revisit the problem of differentiating the vector-valued function

\mathbf{R}\left( \gamma\right)

that corresponds to the unit circle as

\gamma

changes from

0

to

2\pi

.

(6.2)

In Chapter 4, we solved this problem by a geometric analysis in a coordinate-free setting. Our solution was intuitive, insightful, and intellectually satisfying. On the other hand, our argument was lengthy and is, in practice, applicable only to very simple problems. Just imagine an

\mathbf{R}\left( \gamma\right)

that traces out an ellipse instead of a circle -- the problem instantly becomes worthy of an eighteenth-century graduate thesis. With the help of coordinates, the circle and the ellipse are equally simple and the solution is quicker, more straightforward and more powerful compared to the coordinate-free approach.

Introduce a Cartesian coordinate system

x,y

with the origin at the center of the circle. Let the unit vectors pointing in the direction of the coordinate axes be denoted by

\mathbf{i}

and

\mathbf{j}

.

(6.3)

Then

\mathbf{R}\left( \gamma\right)

is given by the equation

\mathbf{R}\left( \gamma\right) =\mathbf{i}\cos\gamma+\mathbf{j}\sin\gamma\tag{6.4}

for which differentiation with respect to

\gamma

readily yields

\mathbf{R}^{\prime}\left( \gamma\right) =-\mathbf{i}\sin\gamma +\mathbf{j}\cos\gamma.\tag{6.5}

The resulting analytical expression for

\mathbf{R}^{\prime}\left( \gamma\right)

can now be interpreted geometrically. The following figure shows

\mathbf{R}^{\prime}\left( \gamma\right)

placed at the tip of

\mathbf{R}\left( \gamma\right)

.

(6.6)

It is clear that the vector

\mathbf{R}^{\prime}\left( \gamma\right)

is a unit vector orthogonal to

\mathbf{R}\left( \gamma\right)

. This can also be verified by evaluating the dot products

\mathbf{R}^{\prime}\left( \gamma\right) \cdot\mathbf{R}^{\prime}\left( \gamma\right)

and

\mathbf{R}\left( \gamma\right) \cdot\mathbf{R}^{\prime}\left( \gamma\right)

. Note that the inner product matrix with respect to the basis

\mathbf{i},\mathbf{j}

is the identity matrix, i.e.

\left[ \begin{array} {cc} \mathbf{i}\cdot\mathbf{i} & \mathbf{i}\cdot\mathbf{j}\\ \mathbf{i}\cdot\mathbf{j} & \mathbf{j}\cdot\mathbf{j} \end{array} \right] =\left[ \begin{array} {cc} 1 & 0\\ 0 & 1 \end{array} \right] ,\tag{6.7}

and, therefore, according to the formula for evaluating dot products in the component space derived in Section 2.6,

\mathbf{R}^{\prime}\left( \gamma\right) \cdot\mathbf{R}^{\prime}\left( \gamma\right) = \begin{array} {c} \left[ \begin{array} {cc} -\sin\gamma & \cos\gamma \end{array} \right] \\ \\ \end{array} \left[ \begin{array} {cc} 1 & 0\\ 0 & 1 \end{array} \right] \left[ \begin{array} {r} -\sin\gamma\\ \cos\gamma \end{array} \right] =1\tag{6.8}

and

\mathbf{R}\left( \gamma\right) \cdot\mathbf{R}^{\prime}\left( \gamma\right) = \begin{array} {c} \left[ \begin{array} {cc} \cos\gamma & \sin\gamma \end{array} \right] \\ \\ \end{array} \left[ \begin{array} {cc} 1 & 0\\ 0 & 1 \end{array} \right] \left[ \begin{array} {r} -\sin\gamma\\ \cos\gamma \end{array} \right] =0\tag{6.9}

where the first identity confirms that

\mathbf{R}^{\prime}\left( \gamma\right)

is unit length and the second confirms that it is orthogonal to

\mathbf{R}\left( \gamma\right)

.

6.3An example illustrating the peril of coordinates

Consider the problem that appeared in Exercise 4.16 in Chapter 4. Given a point

A

and a curve

\Gamma

, show that for the point

B

on

\Gamma

that is closest to

A

, the segment

AB

is orthogonal to

\Gamma

.

(6.10)

The intended solution was as follows. Let

\mathbf{R}\left( \gamma\right)

be the vector equation of the curve

\Gamma

, where the origin for the position vector

\mathbf{R}

is placed at

A

. The problem then is to find the value

\gamma_{0}

of

\gamma

that yields the shortest vector

\mathbf{R} \left( \gamma\right)

.

(6.11)

Denote the objective function by

F\left( \gamma\right)

, i.e.

F\left( \gamma\right) =\mathbf{R}\left( \gamma\right) \cdot\mathbf{R} \left( \gamma\right) ,\tag{6.12}

where we neglected to take the square root of the right side since for a positive quantity, there is no difference between minimizing it or its square. Suppose that the minimum of

F\left( \gamma\right)

occurs at

\gamma=\gamma_{0}

, i.e.

F^{\prime}\left( \gamma_{0}\right) =0.\tag{6.13}

By the dot product rule

\left( \mathbf{U}\cdot\mathbf{V}\right) ^{\prime}=\mathbf{U}^{\prime} \cdot\mathbf{V}+\mathbf{U}\cdot\mathbf{V}^{\prime},\tag{6.14}

we find that the derivative of

F^{\prime}\left( \gamma\right)

is given by

F^{\prime}\left( \gamma\right) =2\mathbf{R}\left( \gamma\right) \cdot\mathbf{R}^{\prime}\left( \gamma\right) .\tag{6.15}

Equating

F^{\prime}\left( \gamma\right)

to zero, we conclude that the critical value

\gamma_{0}

is characterized by the equation

\mathbf{R}\left( \gamma_{0}\right) \cdot\mathbf{R}^{\prime}\left( \gamma_{0}\right) =0.\tag{6.16}

In other words,

\mathbf{R}\left( \gamma_{0}\right)

, which corresponds to the segment

AB

, is orthogonal to the tangent

\mathbf{R}^{\prime}\left( \gamma_{0}\right)

, as we set out to prove.

The great advantage of this approach is, of course, its geometric insight. By considering vectors themselves rather than their components, we never let go of the geometric meaning and, as a result, the final identity yielded itself to an immediate geometric interpretation. On the other hand, the great shortcoming of this approach is that, while it perfectly characterizes the solution in geometric terms, it does not provide a means of finding it for a specific geometric configuration, i.e. finding the specific point

B

on a specific curve

\Gamma

that is closest to a specific point

A

.

Let us demonstrate the coordinate approach by attempting the same problem with the help of Cartesian coordinates. Suppose that the coordinates of the point

A

are

\left( x_{1},y_{1}\right)

and that the curve

\Gamma

is given by the functions

x\left( \gamma\right)

and

y\left( \gamma\right)

.

(6.17)

Then the objective function

F\left( \gamma\right)

is given by the equation

F\left( \gamma\right) =\left( x\left( \gamma\right) -x_{1}\right) ^{2}+\left( y\left( \gamma\right) -y_{1}\right) ^{2}.\tag{6.18}

Its derivative

F^{\prime}\left( \gamma\right)

is

F^{\prime}\left( \gamma\right) =2\left( x\left( \gamma\right) -x_{1}\right) x^{\prime}\left( \gamma\right) +2\left( y\left( \gamma\right) -y_{1}\right) y^{\prime}\left( \gamma\right) .\tag{6.19}

Equating

F^{\prime}\left( \gamma_{0}\right)

to

0

, we obtain the desired algebraic equation for

\gamma_{0}

, i.e.

\left( x\left( \gamma_{0}\right) -x_{1}\right) x^{\prime}\left( \gamma_{0}\right) +\left( y\left( \gamma_{0}\right) -y_{1}\right) y^{\prime}\left( \gamma_{0}\right) =0.\tag{6.20}

The great advantage of this approach is, of course, the fact that, for a specific problem, it can identify the specific point of the curve

\Gamma

that is closest to the point

A

. For example, if

\Gamma

is the parabola given by the equations

\begin{aligned}x\left( \gamma\right) & =\gamma\ \ \ \ \ \ \ \ \ \ \left(6.21\right)\\y\left( \gamma\right) & =\gamma^{2},\ \ \ \ \ \ \ \ \ \ \left(6.22\right)\end{aligned}

and the coordinates of

A

are

\left( 3,1\right)

,

(6.23)

then the equation for

\gamma_{0}

reads

\left( \gamma_{0}-3\right) \times1+\left( \gamma_{0}^{2}-1\right) \times\left( 2\gamma_{0}\right) =0\tag{6.24}

or

2\gamma_{0}^{3}-\gamma_{0}-3=0.\tag{6.25}

An approximate solution of this equation,

\gamma_{0}\approx 1.289623901485060347262

, gives a precise location of the sought after point

B

.

What, then, is the great disadvantage of this approach? It is this: neither the precise numerical answer for the specific problem, nor the more general equation

\left( x\left( \gamma_{0}\right) -x_{1}\right) x^{\prime}\left( \gamma_{0}\right) +\left( y\left( \gamma_{0}\right) -y_{1}\right) y^{\prime}\left( \gamma_{0}\right) =0 \tag{6.20}

yield the geometric insight that

AB

must be orthogonal to the curve. While it is true that an experienced eye may spot the dot-product structure in the equation above, keep in mind that this is one of the simplest problems one may encounter. In a more complicated situation, the geometric interpretation is likely to be irrevocably lost with the introduction of coordinates. This phenomenon is exemplified by Euler's minimal surface equation

r^{\prime\prime}\left( z\right) r\left( z\right) -r^{\prime}\left( z\right) ^{2}-1=0 \tag{1.2}

briefly discussed in Chapter 1, which did not yield the geometric insight that a minimal surface is characterized by zero mean curvature.

The last two examples have demonstrated both the great utility and the great peril of coordinate systems. The beauty of Tensor Calculus is in its remarkable ability to combine the geometric and the coordinate approaches in a way that extracts the full benefits of both.

6.4A common ill-advised way of introducing special coordinate systems

In all likelihood, you are already familiar with the most common special coordinate systems described below. Nevertheless, I hope that you do not skip this discussion since it describes coordinates systems differently from most textbooks. The common approach of introducing a special coordinates is by relating it to an a priori Cartesian coordinate system. This approach is typified by the following figure from the Wikipedia article on spherical coordinates, where one notices the ever-present background Cartesian grid.

(6.26)

Subsequently, spherical coordinates

r,\theta,\varphi

are related to Cartesian coordinates

x,y,z

by the equations

\begin{aligned}r & =\sqrt{x^{2}+y^{2}+z^{2}}\ \ \ \ \ \ \ \ \ \ \left(6.27\right)\\\theta & =\arctan\left( \sqrt{x^{2}+y^{2}},z\right)\ \ \ \ \ \ \ \ \ \ \left(6.28\right)\\\varphi & =\arctan\left( x,y\right)\ \ \ \ \ \ \ \ \ \ \left(6.29\right)\end{aligned}

as well as the (more elegant) inverse equations

\begin{aligned}x & =r\sin\theta\cos\varphi\ \ \ \ \ \ \ \ \ \ \left(6.30\right)\\y & =r\sin\theta\sin\varphi\ \ \ \ \ \ \ \ \ \ \left(6.31\right)\\z & =r\cos\theta.\ \ \ \ \ \ \ \ \ \ \left(6.32\right)\end{aligned}

This common approach violates the spirit of Tensor Calculus by arbitrarily singling out a single coordinate system -- in this case, the Cartesian coordinates

x,y,z

. From the point of view of the geometric space, this approach is not only aesthetically and philosophically objectionable but is, in fact, logically flawed since it does not describe how the coordinates

x,y,z

were introduced in the first place. As a result, the construction is, at its very outset, detached from the very Euclidean space that it is meant to describe. For example, one is not able to answer the question what is the distance between the points with Cartesian coordinates

\left( 0,0,0\right)

and

\left( 1,2,2\right)

? If one answers

\sqrt{1^{2}+2^{2}+2^{2}}=3

, then it would seem that the presence of the coordinate system has imposed the concept of length upon the parent Euclidean space. This is contrary to our approach in which the relationship is logically reversed: the concept of length comes first as an inalienable characteristic of the Euclidean space. Thus, the better alternative, and one that is consistent with the spirit of Tensor Calculus, is to describe the coordinate system in absolute terms by referring to the inherent geometric characteristics of the Euclidean space. This will be our approach.

6.5Cartesian coordinates

Let us start with Cartesian coordinates. Cartesian coordinates are, without a doubt, the most commonly used -- and misused -- coordinate systems. That said, they are indeed a natural choice in many situations and, in a number of ways, represent the most easy to use coordinates. Our initial discussion will focus on the two-dimensional plane, as it is easier to visualize than the three-dimensional space, but is still sufficiently rich to illustrate all of its most important characteristics.

Cartesian coordinates are easiest to describe in terms of the coordinate basis

\mathbf{i},\mathbf{j}

. Choose an arbitrary origin

O

and a pair of unit orthogonal vectors

\mathbf{i}

and

\mathbf{j}

. To reiterate, in order for the coordinate system to qualify as Cartesian, the vectors

\mathbf{i}

and

\mathbf{j}

must be a) orthogonal and b) of unit length. If one of the conditions is violated, the resulting coordinates are no longer Cartesian, but merely affine.

(6.33)

Given the origin

O

and the pair of unit orthogonal vectors

\mathbf{i}

and

\mathbf{j}

, the Cartesian coordinates

x,y

of a point

P

are the components of the vector

\mathbf{U}

from

O

to

P

with respect to

\mathbf{i}

and

\mathbf{j}

, i.e.

\mathbf{U}=x~\mathbf{i}+y~\mathbf{j.}\tag{6.34}

The corresponding geometric construction is illustrated in the following figure.

(6.35)

The resulting coordinate lines corresponding to integer values of

x

and

y

form a regular square grid spaced by precisely one Euclidean unit.

(6.36)

Another common way of representing Cartesian coordinates is by drawing the coordinate axes. The

x

-axis is a straight line that passes through the origin

O

in the direction of the basis vector

\mathbf{i}

. In other words, the

x

-axis is the coordinate line that corresponds to

y=0

. Similarly, the

y

-axis is a straight line that passes through

O

in the direction of the basis vector

\mathbf{j}

, and is the coordinate line that corresponds to

x=0

.

(6.37)

This representation is attractive since it is more uncluttered. For the rest of this Section, however, we will stick with the coordinate line representation for the sake of consistency with other special coordinate systems.

There are infinitely many Cartesian coordinate systems in the plane since we are free to choose any point for the origin

O

and any orientation (in the sense of rotation) of the orthonormal basis vectors

\mathbf{i}

and

\mathbf{j}

. The following figure illustrates a different Cartesian coordinate system that differs from the one above in both the location of the origin

O

and the orientation of

\mathbf{i}

and

\mathbf{j}

.

(6.38)

Finally, we also have the choice of orientation (in the sense of Section 3.1) of the basis

\mathbf{i,j}

. If the vectors

\mathbf{i}

and

\mathbf{j}

form a positively oriented set, then the coordinate system is said to be positively oriented or right-handed. Otherwise, it is negatively oriented or left-handed. The following figure shows a left-handed Cartesian coordinate system.

(6.39)

As we have already mentioned, the requirement that

\mathbf{i}

and

\mathbf{j}

are unit vectors is essential to the definition of Cartesian coordinates. Even if

\mathbf{i}

and

\mathbf{j}

are orthogonal and have equal but non-unit lengths, the resulting system can no longer be considered Cartesian. For example, the coordinate system illustrated in the following figure (where the reference segment on the bottom right has unit length) is not Cartesian, even though its coordinate lines form a regular square grid.

(6.40)

Note that without the reference segment, there would have been no way of determining whether the system is Cartesian.

In three dimensions, a Cartesian coordinate system is constructed by selecting an arbitrary origin

O

and a set of three pairwise-orthogonal unit vectors

\mathbf{i}

,

\mathbf{j}

, and

\mathbf{k}

.

(6.41)

Echoing the two-dimensional case, the Cartesian coordinates

x,y,z

of a point

P

are the components of the vector

\mathbf{U}

from

O

to

P

with respect to

\mathbf{i}

,

\mathbf{j}

, and

\mathbf{k}

, i.e.

\mathbf{U}=x~\mathbf{i}+y~\mathbf{j}+z\mathbf{~k.}\tag{6.42}

The resulting coordinate lines corresponding to integer values form a regular square grid spaced by precisely one Cartesian unit.

(6.43)

The coordinate system is right-handed or positively oriented if the set

\mathbf{i},\mathbf{j},\mathbf{k}

is positively oriented. Otherwise, it is left-handed or negatively oriented.

6.6Affine coordinates

Affine or rectilinear coordinates are a generalization of Cartesian coordinates without the constraints of orthogonality and unit length. Affine coordinates are constructed in the exact same way as Cartesian coordinates from an arbitrary linearly independent set of vectors

\mathbf{i}

,

\mathbf{j}

, and

\mathbf{k}

.

(6.44)

Once again, the affine coordinates

x,y,z

of a point

P

are the components of the vector

\mathbf{U}

from

O

to

P

with respect to the vectors

\mathbf{i}

,

\mathbf{j}

, and

\mathbf{k}

, i.e.

\mathbf{U}=x~\mathbf{i}+y~\mathbf{j}+z\mathbf{~k.}\tag{6.45}

The resulting coordinate lines corresponding to integer values form a skewed regular parallelepiped grid, as illustrated in the following figure.

(6.46)

The term rectilinear refers to the straightness of the coordinate lines. Non-affine coordinate systems are known as curvilinear.

The concept of orientation applies to affine coordinates just as well as Cartesian. An affine coordinate system is said to be positively oriented or right-handed if the set of vectors

\mathbf{i} ,\mathbf{j},\mathbf{k}

is positively oriented. Otherwise, it is negatively oriented or left-handed.

Any two affine coordinate systems are related by a combination of a linear transformation and a shift. Suppose that

x,y,z

and

x^{\prime},y^{\prime },z^{\prime}

are two sets of affine coordinates corresponding to the respective origins at

O

and

O^{\prime}

and the coordinate bases

\mathbf{i,j,k}

and

\mathbf{i}^{\prime},\mathbf{j}^{\prime},\mathbf{k} ^{\prime}

. Then the coordinates

x,y,z

and

x^{\prime},y^{\prime},z^{\prime }

are related by

\left[ \begin{array} {c} x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{array} \right] =\left[ \begin{array} {ccc} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{array} \right] \left[ \begin{array} {c} x\\ y\\ z \end{array} \right] +\left[ \begin{array} {c} x_{0}^{\prime}\\ y_{0}^{\prime}\\ z_{0}^{\prime} \end{array} \right] ,\tag{6.47}

where

x_{0}^{\prime},y_{0}^{\prime},z_{0}^{\prime}

are the coordinates of

O

in the primed coordinate system and (the transpose of) the matrix relates the unprimed and primed coordinate bases according to the formal identity

\left[ \begin{array} {c} \mathbf{i}\\ \mathbf{j}\\ \mathbf{k} \end{array} \right] =\left[ \begin{array} {ccc} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{array} \right] ^{T}\left[ \begin{array} {c} \mathbf{i}^{\prime}\\ \mathbf{j}^{\prime}\\ \mathbf{k}^{\prime} \end{array} \right] .\tag{6.48}

We can eschew the unwelcome transpose by organizing the elements of the coordinate bases in rows instead of columns, i.e.

\begin{array} {c} \left[ \begin{array} {ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \end{array} \right] =\left[ \begin{array} {ccc} \mathbf{i}^{\prime} & \mathbf{j}^{\prime} & \mathbf{k}^{\prime} \end{array} \right] \\ \\ \end{array} \left[ \begin{array} {ccc} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{array} \right] .\tag{6.49}

The proof of this property of affine coordinates is left as an exercise.

Interestingly, the matrix

A

participates in the translation from unprimed to primed coordinates and -- note the reverse direction -- from primed to unprimed coordinate bases. Thus, coordinates themselves and their associated bases transform in fundamentally opposite ways. This simple observation, it turns out, will prove to be the cornerstone of the tensor framework.

6.7Polar coordinates

Polar coordinates

r,\theta

are well suited for a wide range of geometries in the plane, especially those that are naturally described in terms of the distance to a reference point, such as the star-shaped region in the figure below. A star-shaped region is one for which there exists a fixed point from which all points on the boundary are in a direct line of sight. This allows for a unique mapping between the distance from the fixed point and the direction. Such shapes can be captured in polar coordinates by a single function.

(6.50)

The construction of a polar coordinate system is illustrated in the figure below. Designate an arbitrary point

O

as the pole or the origin, and select an arbitrary ray

l

, known as the polar axis, emanating from

O

. The polar coordinates of a point

P

are the numbers

r

and

\theta

, where

r

is the Euclidean distance from

P\

to the pole

O

and

\theta

is the signed angle, measured in radians, between the segment

OP

and the polar axis

l

in the counterclockwise direction.

(6.51)

In order to uniquely determine the numerical value of the angle

\theta

, it must be constrained to a semi-open range of length

2\pi

, such as

\left[ 0,2\pi\right)

or

\left( -\pi,\pi\right]

. Choosing

\left[ 0,2\pi\right)

, for example, results in the coordinate lines illustrated in the following figure.

(6.52)

This figure could be made to appear even more regular by choosing radial coordinate lines corresponding to multiples of, say,

\pi/4

instead of integer values.

(6.53)

In some applications, such as analysis of curves, it is often more convenient not to restrict the range of

\theta

and to allow it to be any real number. For example, the following figure shows the curve corresponding to the equations

\begin{aligned}r\left( \gamma\right) & =10+\gamma\ \ \ \ \ \ \ \ \ \ \left(6.54\right)\\\theta\left( \gamma\right) & =\gamma\ \ \ \ \ \ \ \ \ \ \left(6.55\right)\end{aligned}

for the parameter

\gamma

-- and therefore

\theta

-- ranging from

-2\pi

to

2\pi

.

(6.56)

Consider the point

P

on the curve in the figure above. Had we not already known the equation of the curve, we may think that the

\theta

-coordinate of

P

is

\pi/2

. However,

P

corresponds to

\gamma=-3\pi/2

and, therefore, to

\theta=-3\pi/2

. Thus, the choice to allow

\theta

to take on arbitrary values results in a great deal of convenience at the cost of uniqueness.

Furthermore, we can also allow the variable

r

to take on negative values. By convention, the point

P

with coordinates

\left( r,\theta\right)

, where

r

is negative, is found at the point with proper polar coordinates

\left( -r,\theta+\pi\right)

. In other words, for negative

r

, we find

P

by moving in the "negative" direction along the ray corresponding to the angle

\theta

. A curve given by the equations

\begin{aligned}r\left( \gamma\right) & =3\sin2\gamma\ \ \ \ \ \ \ \ \ \ \left(6.57\right)\\\theta\left( \gamma\right) & =\sin\gamma,\ \ \ \ \ \ \ \ \ \ \left(6.58\right)\end{aligned}

where

0\leq\gamma\leq2\pi

and therefore

r

assumes negative values, is shown in the following figure.

(6.59)

Note that the variable

r

changes sign at multiples of

\pi/2

. For a continuous curve, this change of sign in

r

can occur only when the curve passes through the origin.

6.8Cylindrical coordinates

Cylindrical coordinates extend polar coordinates to three dimensions. In order to construct a cylindrical coordinate system, first, select a plane known as the coordinate plane. The coordinate plane divides the space into two half-spaces. Arbitrarily select one of the half-spaces as positive and the other as negative. Construct a polar coordinate system within the coordinate plane by selecting an arbitrary pole

O

and an arbitrary ray

l

. The polar angle

\theta

increases in the direction that appears counterclockwise from the positive half-space. Then, to each point

P

in the space, assign the coordinates

r

,

\theta

, and

z

, where

r

and

\theta

are the polar coordinates of the orthogonal projection of

OP

onto the coordinate plane and

z

is the signed Euclidean distance between

P

and the coordinate plane, i.e.

z

is positive if

P

is found in the positive half-space and negative otherwise.

(6.60)

The cylindrical or longitudinal axis is the straight line orthogonal to the coordinate plane that passes through the origin

O

. It consists of the points for which

r=0

. The distance between a point

P

and the cylindrical axis equals

r

. The term cylindrical comes from the fact that points characterized by constant

r

form a cylinder. The other two families of coordinate surfaces are planes.

A selection of coordinate lines for cylindrical coordinates is shown in the following figure.

(6.61)

A selection of coordinate surfaces is shown in the following figure.

6.9Spherical coordinates

Spherical coordinates, denoted by the letters

r

,

\theta

, and

\varphi

, are perfectly intuitive. Using a planetary analogy, the angles

\theta

and

\varphi

correspond to colatitude and longitude on the surface of the Earth, while

r

corresponds to the Euclidean distance to the center of the Earth. To construct a spherical coordinate system, start by selecting an arbitrary origin

O

. The coordinate

r

of the point

P

is the distance between

O

and

P

. Next, select an arbitrary coordinate plane that passes through

O

. We will refer to the straight line orthogonal to the coordinate plane that passes through the origin

O

as the spherical axis. The angle

\theta

, known as the colatitude, varies from

0

to

\pi

and gives the angle between the segment

OP

and the spherical axis. The remaining coordinate is the azimuth

\varphi

which varies from

0

to

2\pi

. It corresponds to the angle between the projection of

P

onto the coordinate plane and a fixed arbitrarily polar axis

l

that passes through the origin

O

in the coordinate plane.

(6.63)

The points corresponding to a given value of

r

form a coordinate sphere. If

\varphi

is fixed in addition to

r

, the result is a "meridian" on the corresponding coordinate sphere. If

\theta

is fixed in addition to

r

, the result is a "parallel". Neither angle is defined at the origin

O

. The azimuth

\varphi

is undefined along the entire spherical axis. The following figure shows one coordinate surface for each variable.

(6.64)

This completes our descriptions of the special coordinate systems that will be featured throughout our narrative.

6.10An attempt at a coordinate-based expression for the gradient

In Chapter 4, we gave a geometric definition for the gradient

\mathbf{\nabla}U

of a scalar field

U

, as well as an alternative definition found in most textbooks, where the gradient is defined as the collection of the partial derivatives

\left( \frac{\partial U}{\partial x},\frac{\partial U}{\partial y} ,\frac{\partial U}{\partial z}\right)\tag{6.65}

with respect to Cartesian coordinates. Having now formally introduced the concept of the coordinate basis

\mathbf{i},\mathbf{j,k}

for Cartesian coordinates, we may conjecture that the connection between the two definitions is captured by the equation

\mathbf{\nabla}U=\frac{\partial U}{\partial x}\mathbf{i}+\frac{\partial U}{\partial y}\mathbf{j}+\frac{\partial U}{\partial z}\mathbf{k}\tag{6.66}

that interprets the partial derivatives as the components of

\mathbf{\nabla }U

with respect to the coordinate basis

\mathbf{i},\mathbf{j,k}

. This identity is indeed correct in Cartesian coordinates and it is left as an exercise to prove it. However, we must wonder if this relationship continues to be valid in affine coordinates. Furthermore, we are interested in deriving the general analytical expression for the gradient that works in all coordinate systems. We will save the second task for later, but will now show why the above equation does not hold in non-Cartesian affine coordinates.

In the plane, consider two orthogonal affine coordinate systems. Let the first coordinate system be Cartesian coordinates

x,y

corresponding to the coordinate basis

\mathbf{i,j}

. For the other coordinate system

x^{\prime },y^{\prime}

choose the affine coordinates with the coordinate basis

\mathbf{i}^{\prime},\mathbf{j}^{\prime}

obtained from

\mathbf{i,j}

by a twofold stretch, i.e.

\begin{aligned}\mathbf{i}^{\prime} & =2\mathbf{i}\ \ \ \ \ \ \ \ \ \ \left(6.67\right)\\\mathbf{j}^{\prime} & =2\mathbf{j.}\ \ \ \ \ \ \ \ \ \ \left(6.68\right)\end{aligned}

In other words, in the new, "primed" coordinate system, integer coordinate lines are two Euclidean units apart. In particular, this means that the primed coordinates

x^{\prime},y^{\prime}

are given in terms of the unprimed coordinates

x,y

by the equations

\begin{aligned}x^{\prime} & =\frac{1}{2}x\ \ \ \ \ \ \ \ \ \ \left(6.69\right)\\y^{\prime} & =\frac{1}{2}y.\ \ \ \ \ \ \ \ \ \ \left(6.70\right)\end{aligned}

Notice that, once again, we are observing coordinates and the associated coordinate bases transforming by opposite rules.

The two coordinate systems are illustrated side by side in the following figure. The two plots represent the same scalar field

U

which is, of course, independent of the coordinates.

(6.71)

Let

U\left( x,y\right)

denote

U

as a function of

x

and

y

and

U\left( x^{\prime},y^{\prime}\right)

denote

U

as a function of

x^{\prime}

and

y^{\prime}

. Importantly, the functions

U\left( x,y\right)

and

U\left( x^{\prime},y^{\prime}\right)

are different functions. For example, if

U\left( x,y\right) =xy,\tag{6.72}

then

U\left( x^{\prime},y^{\prime}\right) =4x^{\prime}y^{\prime}.\tag{6.73}

Even though the three objects -- the scalar field

U

along with the functions

U\left( x,y\right)

and

U\left( x^{\prime},y^{\prime}\right)

-- are different, it does make sense to denote them by the same letter

U

due to their close relationship.

We are now in a position to compare the values of the expressions

\frac{\partial U}{\partial x}\mathbf{i}+\frac{\partial U}{\partial y}\mathbf{j}\text{ \ \ \ and\ \ \ \ }\frac{\partial U}{\partial x^{\prime} }\mathbf{i}^{\prime}+\frac{\partial U}{\partial y^{\prime}}\mathbf{j}^{\prime}\tag{6.74}

Recall that the coordinate basis vectors are related by the equations

\begin{aligned}\mathbf{i}^{\prime} & =2\mathbf{i}\ \ \ \ \ \ \ \ \ \ \left(6.67\right)\\\mathbf{j}^{\prime} & =2\mathbf{j.} \ \ \ \ \ \ \ \ \ \ \left(6.68\right)\end{aligned}

In other words, in the change from the unprimed to the primed coordinates, the coordinate basis vectors double. Thus, the only remaining question is whether the partial derivatives correspondingly double or halve? We will now show that they, too, double and thus the combined expression quadruples.

Let us compare the partial derivatives

\partial U\left( x,y\right) /\partial x

and

\partial U\left( x^{\prime},y^{\prime}\right) /\partial x^{\prime}

at a single point

P

with unprimed coordinates

\left( x,y\right)

and primed coordinates

\left( x^{\prime},y^{\prime}\right)

. In each coordinate system, increase the first coordinate by

h

, i.e. consider the point

P_{h}

with unprimed coordinates

\left( x+h,y\right)

along with the point

P_{h}^{\prime}

with primed coordinates

\left( x^{\prime }+h,y\right)

. These distinct points are illustrated in the figure above. Observe that the point

P_{h}^{\prime}

is twice as far from

P

as the point

P_{h}

. Thus, the ratio

\frac{U\left( x^{\prime}+h,y^{\prime}\right) -U\left( x^{\prime},y^{\prime }\right) }{h}\tag{6.75}

is roughly twice as great as

\frac{U\left( x+h,y\right) -U\left( x,y\right) }{h}.\tag{6.76}

Therefore, in the limit as

h

approaches

0

, we have

\frac{\partial U\left( x^{\prime},y^{\prime}\right) }{\partial x^{\prime} }=2\frac{\partial U\left( x,y\right) }{\partial x}.\tag{6.77}

Similarly, for the partial derivatives with respect to the second coordinate, we find

\frac{\partial U\left( x^{\prime},y^{\prime}\right) }{\partial y^{\prime} }=2\frac{\partial U\left( x,y\right) }{\partial y}.\tag{6.78}

In summary, when we change from the Cartesian coordinates

x,y

to the affine coordinates

x^{\prime},y^{\prime}

, both the coordinate vectors and the partial derivatives double. Consequently, the value of the proposed coordinate expression for the gradient quadruples, i.e.

\text{ }\frac{\partial U}{\partial x^{\prime}}\mathbf{i}^{\prime} +\frac{\partial U}{\partial y^{\prime}}\mathbf{j}^{\prime}=4\left( \frac{\partial U}{\partial x}\mathbf{i}+\frac{\partial U}{\partial y}\mathbf{j}\right)\tag{6.79}

Thus, we have reach the important conclusion that the value of the expression

\frac{\partial U\left( x,y\right) }{\partial x}\mathbf{i}+\frac{\partial U\left( x,y\right) }{\partial y}\mathbf{j}\tag{6.80}

depends on the particular choice of the coordinate system. In particular, it cannot be the coordinate-space representation of the gradient

\mathbf{\nabla }U

.

As we mentioned above, this observation makes it clear that a more effective analytical framework is needed for constructing coordinate-dependent expressions that produce the same value in all coordinate systems. This task of building such a framework will be accomplished in the next few chapters. In particular, the correct coordinate-space expression for the gradient will be given in Chapter 10.

6.11Exercises

Exercise 6.1Calculate

\mathbf{R}^{\prime}\left( \gamma\right)

for an ellipse with semiaxes

a

and

b

that corresponds to

\mathbf{R}\left( \gamma\right) =a\cos\gamma\ \mathbf{i}+b\sin\gamma \ \mathbf{j.}\tag{6.81}

What is the length of

\mathbf{R}^{\prime}\left( \gamma\right)

as a function of

\gamma

?

Exercise 6.2Consider a particle moving uniformly around a circle of radius

r

making a complete revolution in time

T

. Show that its acceleration points towards the center of the circle and has the magnitude

r\omega^{2}

, where

\omega =2\pi/T

.

Exercise 6.3Describe the six degrees of freedom in choosing a Cartesian coordinate system in the three-dimensional space.

Exercise 6.4Show that the vectors

\mathbf{i}

,

\mathbf{j}

, and

\mathbf{k}

of a right-handed Cartesian basis satisfy the equations

\begin{aligned}\mathbf{i}\times\mathbf{j} & =\mathbf{k}\ \ \ \ \ \ \ \ \ \ \left(6.82\right)\\\mathbf{j}\times\mathbf{k} & =\mathbf{i}\ \ \ \ \ \ \ \ \ \ \left(6.83\right)\\\mathbf{k}\times\mathbf{i} & =\mathbf{j.}\ \ \ \ \ \ \ \ \ \ \left(6.84\right)\end{aligned}

Exercise 6.5Given two affine coordinate systems

x,y,z

and

x^{\prime},y^{\prime },z^{\prime}

with the respective origins at

O

and

O^{\prime}

and the coordinate bases

\mathbf{i,j,k}

and

\mathbf{i}^{\prime},\mathbf{j}^{\prime },\mathbf{k}^{\prime}

related by

\begin{array} {c} \left[ \begin{array} {ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \end{array} \right] =\left[ \begin{array} {ccc} \mathbf{i}^{\prime} & \mathbf{j}^{\prime} & \mathbf{k}^{\prime} \end{array} \right] \\ \\ \end{array} \left[ \begin{array} {ccc} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{array} \right] , \tag{6.49}

show that

x,y,z

and

x^{\prime},y^{\prime},z^{\prime}

are related by a combination of a linear transformation and a shift, i.e.

\left[ \begin{array} {c} x^{\prime}\\ y^{\prime}\\ z^{\prime} \end{array} \right] =\left[ \begin{array} {ccc} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{array} \right] \left[ \begin{array} {c} x\\ y\\ z \end{array} \right] +\left[ \begin{array} {c} x_{0}^{\prime}\\ y_{0}^{\prime}\\ z_{0}^{\prime} \end{array} \right] , \tag{6.47}

where

x_{0}^{\prime},y_{0}^{\prime},z_{0}^{\prime}

are the primed coordinates of

O

.

Exercise 6.6For a scalar field

U

, show that the expression

\frac{\partial U\left( x,y,z\right) }{\partial x}~\mathbf{i}+\frac{\partial U\left( x,y,z\right) }{\partial y}~\mathbf{j}+\frac{\partial U\left( x,y,z\right) }{\partial z}~\mathbf{k}\tag{6.85}

yields the same vector in all Cartesian coordinates.

Exercise 6.7Furthermore, demonstrate that the expression

\frac{\partial U\left( x,y,z\right) }{\partial x}~\mathbf{i}+\frac{\partial U\left( x,y,z\right) }{\partial y}~\mathbf{j}+\frac{\partial U\left( x,y,z\right) }{\partial z}~\mathbf{k}\tag{6.86}

corresponds to the gradient

\nabla U

of the scalar field

U

as defined in Chapter 4.

Exercise 6.8Show that the expression

\frac{1}{\mathbf{i}\cdot\mathbf{i}}\frac{\partial U\left( x,y,z\right) }{\partial x}~\mathbf{i}+\frac{1}{\mathbf{j}\cdot\mathbf{j}}\frac{\partial U\left( x,y,z\right) }{\partial y}~\mathbf{j}+\frac{1}{\mathbf{k} \cdot\mathbf{k}}\frac{\partial U\left( x,y,z\right) }{\partial z}~\mathbf{k}\tag{6.87}

yields the same vector in all orthogonal affine coordinates, i.e. affine coordinates characterized by an orthogonal coordinate basis

\mathbf{i,j,k}

.