Chapter 2

A First Course in Numerical Methods · 17 exercises

Problem 1

The fraction in a single precision word has 23 bits (alas, less than half the length of the double precision word). Show that the corresponding rounding unit is approximately $6 \times 10^{-8}$.

4 step solution

Problem 2

The function $f_{1}\left(x_{0}, h\right)=\sin \left(x_{0}+h\right)-\sin \left(x_{0}\right)$ can be transformed into another form, $f_{2}\left(x_{0}, h\right)$, using the trigonometric formula $$ \sin (\phi)-\sin (\psi)=2 \cos \left(\frac{\phi+\psi}{2}\right) \sin \left(\frac{\phi-\psi}{2}\right) $$ Thus, $f_{1}$ and $f_{2}$ have the same values, in exact arithmetic, for any given argument values $x_{0}$ and $h$. (a) Derive $f_{2}\left(x_{0}, h\right)$. (b) Suggest a formula that avoids cancellation errors for computing the approximation $\left(f\left(x_{0}\right)\right.$ $\left.+h)-f\left(x_{0}\right)\right) / h$ to the derivative of $f(x)=\sin (x)$ at $x=x_{0} .$ Write a MATLAB program that implements your formula and computes an approximation of $f^{\prime}(1.2)$, for $h=1 \mathrm{e}-20,1 \mathrm{e}-19, \ldots, 1$ (c) Explain the difference in accuracy between your results and the results reported in Example $1.3$.

4 step solution

Problem 3

(a) How many distinct positive numbers can be represented in a floating point system using base $\beta=10$, precision $t=2$ and exponent range $L=-9, U=10 ?$ (Assume normalized fractions and don't worry about underflow.) (b) How many normalized numbers are represented by the floating point system $(\beta, t, L, U) ?$ Provide a formula in terms of these parameters.

2 step solution

Problem 4

Suppose a computer company is developing a new floating point system for use with their machines. They need your help in answering a few questions regarding their system. Following the terminology of Section $2.2$, the company's floating point system is specified by $(\beta, t, L, U) .$ Assume the following: \- All floating point values are normalized (except the floating point representation of zero). \- All digits in the mantissa (i.e., fraction) of a floating point value are explicitly stored. \- The number 0 is represented by a float with a mantissa and an exponent of zeros. (Don't worry about special bit patterns for $\pm \infty$ and NaN.) Here is your part: (a) How many different nonnegative floating point values can be represented by this floating point system? (b) Same question for the actual choice $(\beta, t, L, U)=(8,5,-100,100)$ (in decimal) which the company is contemplating in particular. (c) What is the approximate value (in decimal) of the largest and smallest positive numbers that can be represented by this floating point system? (d) What is the rounding unit?

4 step solution

Problem 5

(a) The number $\frac{8}{7}=1.14285714285714 \ldots$ obviously has no exact representation in any decimal floating point system $(\beta=10)$ with finite precision $t$. Is there a finite floating point system (i.e., some finite integer base $\beta$ and precision $t$ ) in which this number does have an exact representation? If yes, then describe such a system. (b) Answer the same question for the irrational number $\pi$.

2 step solution

Problem 6

Write a MATLAB program that receives as input a number $x$ and a parameter $n$ and returns $x$ rounded to $n$ decimal digits. Write your program so that it can handle an array as input, returning an array of the same size in this case. Use your program to generate numbers for Example $2.2$, demonstrating the phenomenon depicted there without use of single precision.

5 step solution

Problem 9

Suggest a way to determine approximately the rounding unit of your calculator. State the type of calculator you have and the rounding unit you have come up with. If you do not have a calculator, write a short MATLAB script to show that your algorithm works well on the standard IEEE floating point system.

5 step solution

Problem 10

The function $f_{1}(x, \delta)=\cos (x+\delta)-\cos (x)$ can be transformed into another form, $f_{2}(x, \delta)$, using the trigonometric formula $$ \cos (\phi)-\cos (\psi)=-2 \sin \left(\frac{\phi+\psi}{2}\right) \sin \left(\frac{\phi-\psi}{2}\right) . $$ Thus, $f_{1}$ and $f_{2}$ have the same values, in exact arithmetic, for any given argument values $x$ and $\delta$. (a) Show that, analytically, $f_{1}(x, \delta) / \delta$ or $f_{2}(x, \delta) / \delta$ are effective approximations of the function $-\sin (x)$ for $\delta$ sufficiently small. (b) Derive $f_{2}(x, \delta)$. (c) Write a MATLAB script which will calculate $g_{1}(x, \delta)=f_{1}(x, \delta) / \delta+\sin (x)$ and $g_{2}(x, \delta)=$ $f_{2}(x, \delta) / \delta+\sin (x)$ for $x=3$ and $\delta=1 . \mathrm{e}-11 .$ (d) Explain the difference in the results of the two calculations.

4 step solution

Problem 11

(a) Show that $$ \ln \left(x-\sqrt{x^{2}-1}\right)=-\ln \left(x+\sqrt{x^{2}-1}\right) $$ (b) Which of the two formulas is more suitable for numerical computation? Explain why, and provide a numerical example in which the difference in accuracy is evident.

3 step solution

Problem 12

For the following expressions, state the numerical difficulties that may occur, and rewrite the formulas in a way that is more suitable for numerical computation: (a) $\sqrt{x+\frac{1}{x}}-\sqrt{x-\frac{1}{x}}$, where $x \gg 1$. (b) $\sqrt{\frac{1}{a^{2}}+\frac{1}{b^{2}}}$, where $a \approx 0$ and $b \approx 1$.

6 step solution

Problem 14

Consider the approximation to the first derivative $$ f^{\prime}(x) \approx \frac{f(x+h)-f(x)}{h} $$ The truncation (or discretization) error for this formula is $\mathcal{O}(h)$. Suppose that the absolute error in evaluating the function $f$ is bounded by $\varepsilon$ and let us ignore the errors generated in basic arithmetic operations. (a) Show that the total computational error (truncation and rounding combined) is bounded by $$ \frac{M h}{2}+\frac{2 \varepsilon}{h}, $$ where $M$ is a bound on $\left|f^{\prime \prime}(x)\right|$. (b) What is the value of $h$ for which the above bound is minimized? (c) The rounding unit we employ is approximately equal to $10^{-16}$. Use this to explain the behavior of the graph in Example $1.3$. Make sure to explain the shape of the graph as well as the value where the apparent minimum is attained. (d) It is not difficult to show, using Taylor expansions, that $f^{\prime}(x)$ can be approximated more accurately (in terms of truncation error) by $$ f^{\prime}(x) \approx \frac{f(x+h)-f(x-h)}{2 h} $$ For this approximation, the truncation error is $\mathcal{O}\left(h^{2}\right)$. Generate a graph similar to Figure $1.3$ (please generate only the solid line) for the same function and the same value of $x$, namely, for $\sin (1.2)$, and compare the two graphs. Explain the meaning of your results.

7 step solution

Problem 15

Suppose a machine with a floating point system $(\beta, t, L, U)=(10,8,-50,50)$ is used to calculate the roots of the quadratic equation $$ a x^{2}+b x+c=0 $$ where $a, b$, and $c$ are given, real coefficients. For each of the following, state the numerical difficulties that arise if one uses the standard formula for computing the roots. Explain how to overcome these difficulties (when possible). (a) $a=1 ; b=-10^{5} ; c=1$. (b) $a=6 \cdot 10^{30} ; b=5 \cdot 10^{30} ; c=-4 \cdot 10^{30}$ (c) $a=10^{-30} ; b=-10^{30} ; c=10^{30}$.

6 step solution

Problem 16

Write a quadratic equation solver. Your MATLAB script should get $a, b, c$ as input, and accurately compute the roots of the corresponding quadratic equation. Make sure to check end cases such as $a=0$, and consider ways to avoid an overflow and cancellation errors. Implement your algorithm and demonstrate its performance on a few cases (for example, the cases mentioned in Exercise 15). Show that your algorithm produces better results than the standard formula for computing roots of a quadratic equation.

5 step solution

Problem 17

Write a MATLAB program that (a) sums up $1 / n$ for $n=1,2, \ldots, 10,000$; (b) rounds each number $1 / n$ to 5 decimal digits and then sums them up in 5 -digit decimal arithmetic for $n=1,2, \ldots, 10,000 ;$ (c) sums up the same rounded numbers (in 5 -digit decimal arithmetic) in reverse order, i.e., for $n=10,000, \ldots, 2,1$. Compare the three results and explain your observations. For generating numbers with the requested precision, you may want to do Exercise 6 first.

4 step solution

Problem 18

(a) Explain in detail how to avoid overflow when computing the $\ell_{2}$ -norm of a (possibly large in size) vector. (b) Write a MATLAB script for computing the norm of a vector in a numerically stable fashion. Demonstrate the performance of your code on a few examples.

6 step solution

Problem 19

In the statistical treatment of data one often needs to compute the quantities $$ \bar{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i}, \quad s^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}, $$ where $x_{1}, x_{2}, \ldots, x_{n}$ are the given data. Assume that $n$ is large, say, $n=10,000$. It is easy to see that $s^{2}$ can also be written as $$ s^{2}=\frac{1}{n} \sum_{i=1}^{n} x_{i}^{2}-\bar{x}^{2} $$ (a) Which of the two methods to calculate $s^{2}$ is cheaper in terms of overall computational cost? Assume $\bar{x}$ has already been calculated and give the operation counts for these two options. (b) Which of the two methods is expected to give more accurate results for $s^{2}$ in general? (c) Give a small example, using a decimal system with precision $t=2$ and numbers of your choice, to validate your claims.

5 step solution

Problem 21

The IEEE 754 (known as the floating point standard) specifies the 128 -bit word as having 15 bits for the exponent. What is the length of the fraction? What is the rounding unit? How many significant decimal digits does this word have? Why is quadruple precision more than twice as accurate as double precision, which is in turn more than twice as accurate as single precision?

5 step solution

Show/ page(17 total)