- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

No, not just yet. Take the problem I posted and we can move on. You're shifting the goalposts.

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

'Robust' h is a non-issue for the first derivative.How about [$]f(x) = 1/(sin(x))[$] evaluated at x=0.001? When I use your method of [$]f(z)[$] where [$]z = x + ih[$], with h = 0.001, and compute the imaginary part of [$]f(z)/h[$], I get f'(0.001) ≈ -499,999.833335 whereas the correct answer is 1,000,000. Is there a robust approach to picking h?

Don't be afraid to take small h (e.g. h = 1.0e-50). There is no subtraction cancellation.

See eq. (3) of Squire and Trapp. It's magic.

I get f'(x) = - 9999999.83333 for both exact and the ST method.... With h = 0.001 I get -499,999.833335.

Did you do your calculus proper?

- Traden4Alpha
**Posts:**23951**Joined:**

Sorry, I now have less idea on what the goalposts are. Are they: specific or general f(x)? Specific or general value of x? Highest performance?

What is "the problem"? And what are the constraints on the solution?

What is "the problem"? And what are the constraints on the solution?

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

Ok, one last time. Compute the derivative of [$]f(x) = cos(sin(sin(cos(x)))[$] at x = 1.0.Sorry, I now have less idea on what the goalposts are. Are they: specific or general f(x)? Specific or general value of x? Highest performance?

What is "the problem"? And what are the constraints on the solution?

You only need to produce a single number.

- Traden4Alpha
**Posts:**23951**Joined:**

OK.Ok, one last time. Compute the derivative of [$]f(x) = cos(sin(sin(cos(x)))[$] at x = 1.0.Sorry, I now have less idea on what the goalposts are. Are they: specific or general f(x)? Specific or general value of x? Highest performance?

What is "the problem"? And what are the constraints on the solution?

You only need to produce a single number.

df(x)/dx = sin(x) * cos(cos(x)) * cos(sin(cos(x))) * sin(sin(sin(cos(x))))

Evaluated at x = 1 yields f'(1) = 0.29677089

The computational complexity is four calls to sin(), three calls to cos(), and three multiplies.

ST works, too, although I'm not sure how to set a robust value of h. (Can h be arbitrarily tiny or does that risk round-off errors with some functions???)

In any case, the computational complexity of ST is three calls to sin(), four calls to cos(), four calls to sinh(), three calls to cosh(), and seven multiplies.

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

At this stage it is clear that the method is robust down to the machine level, see Squire&Trapp FORMULA (4).

I suspect you computed f'(x) symbolically(?). If you did it manually I would by very impressed..But it's really boring high-school calculus. In real life exact derivatives are a mirage.

See output and catastrophy with FIRST_ORDER classic FM.

Exact derivative: 0.2967708904695361

Step size, Squire/Trapp, FDM classico

1, 0.346385733136783, 0.04225614025248303

0.1, 0.2996009285356403, 0.2993676773608678

0.01, 0.2967991573202214, 0.2973099839978843

0.001, 0.2967711731345647, 0.2968273693718659

0.0001, 0.2967708932961859, 0.2967765638250963

1e-05, 0.2967708904978026, 0.2967714580637271

1e-06, 0.2967708904698187, 0.2967709471501933

1e-07, 0.296770890469539, 0.296770896746068

1e-08, 0.296770890469536, 0.2967708856438377

1e-09, 0.296770890469536, 0.296770941154989

1e-10, 0.2967708904695361, 0.2967703860434766

1e-11, 0.296770890469536, 0.2967626144823043

1e-12, 0.2967708904695361, 0.2967626144823043

1e-13, 0.296770890469536, 0.2964295475749167

1e-14, 0.296770890469536, 0.2997602166487922

1e-15, 0.296770890469536, 0.333066907387547

1e-16, 0.296770890469536, 0

1e-17, 0.296770890469536, 0

1e-18, 0.296770890469536, 0

1e-19, 0.296770890469536, 0

1e-20, 0.2967708904695361, 0

1e-21, 0.296770890469536, 0

1e-22, 0.296770890469536, 0

1e-23, 0.296770890469536, 0

1e-24, 0.296770890469536, 0

1e-25, 0.296770890469536, 0

1e-26, 0.296770890469536, 0

1e-27, 0.296770890469536, 0

1e-28, 0.296770890469536, 0

1e-29, 0.296770890469536, 0

1e-30, 0.296770890469536, 0

1e-31, 0.296770890469536, 0

1e-32, 0.296770890469536, 0

1e-33, 0.296770890469536, 0

1e-34, 0.296770890469536, 0

1e-35, 0.2967708904695361, 0

1e-36, 0.2967708904695361, 0

1e-37, 0.296770890469536, 0

1e-38, 0.296770890469536, 0

1e-39, 0.296770890469536, 0

1e-40, 0.2967708904695361, 0

1e-41, 0.296770890469536, 0

1e-42, 0.296770890469536, 0

1e-43, 0.296770890469536, 0

1e-44, 0.296770890469536, 0

1e-45, 0.2967708904695361, 0

1e-46, 0.296770890469536, 0

1e-47, 0.296770890469536, 0

1e-48, 0.296770890469536, 0

1e-49, 0.296770890469536, 0

1e-50, 0.296770890469536, 0

I suspect you computed f'(x) symbolically(?). If you did it manually I would by very impressed..But it's really boring high-school calculus. In real life exact derivatives are a mirage.

See output and catastrophy with FIRST_ORDER classic FM.

Exact derivative: 0.2967708904695361

Step size, Squire/Trapp, FDM classico

1, 0.346385733136783, 0.04225614025248303

0.1, 0.2996009285356403, 0.2993676773608678

0.01, 0.2967991573202214, 0.2973099839978843

0.001, 0.2967711731345647, 0.2968273693718659

0.0001, 0.2967708932961859, 0.2967765638250963

1e-05, 0.2967708904978026, 0.2967714580637271

1e-06, 0.2967708904698187, 0.2967709471501933

1e-07, 0.296770890469539, 0.296770896746068

1e-08, 0.296770890469536, 0.2967708856438377

1e-09, 0.296770890469536, 0.296770941154989

1e-10, 0.2967708904695361, 0.2967703860434766

1e-11, 0.296770890469536, 0.2967626144823043

1e-12, 0.2967708904695361, 0.2967626144823043

1e-13, 0.296770890469536, 0.2964295475749167

1e-14, 0.296770890469536, 0.2997602166487922

1e-15, 0.296770890469536, 0.333066907387547

1e-16, 0.296770890469536, 0

1e-17, 0.296770890469536, 0

1e-18, 0.296770890469536, 0

1e-19, 0.296770890469536, 0

1e-20, 0.2967708904695361, 0

1e-21, 0.296770890469536, 0

1e-22, 0.296770890469536, 0

1e-23, 0.296770890469536, 0

1e-24, 0.296770890469536, 0

1e-25, 0.296770890469536, 0

1e-26, 0.296770890469536, 0

1e-27, 0.296770890469536, 0

1e-28, 0.296770890469536, 0

1e-29, 0.296770890469536, 0

1e-30, 0.296770890469536, 0

1e-31, 0.296770890469536, 0

1e-32, 0.296770890469536, 0

1e-33, 0.296770890469536, 0

1e-34, 0.296770890469536, 0

1e-35, 0.2967708904695361, 0

1e-36, 0.2967708904695361, 0

1e-37, 0.296770890469536, 0

1e-38, 0.296770890469536, 0

1e-39, 0.296770890469536, 0

1e-40, 0.2967708904695361, 0

1e-41, 0.296770890469536, 0

1e-42, 0.296770890469536, 0

1e-43, 0.296770890469536, 0

1e-44, 0.296770890469536, 0

1e-45, 0.2967708904695361, 0

1e-46, 0.296770890469536, 0

1e-47, 0.296770890469536, 0

1e-48, 0.296770890469536, 0

1e-49, 0.296770890469536, 0

1e-50, 0.296770890469536, 0

it's fairly straightforward[$]f(x) = cos(sin(sin(cos(x)))[$]

[$]f'(x) = \cos'(\sin(\sin(\cos x)))\frac{d\;}{dx}\sin(\sin(\cos x))[$]

[$]= \cos'(\sin(\sin(\cos x)))\sin'(\sin(\cos x))\frac{d\;}{dx}\sin(\cos x)[$]

[$]= \cos'(\sin(\sin(\cos x)))\sin'(\sin(\cos x))\sin'(\cos x)\frac{d\;}{dx}\cos x[$]

[$]= \sin(\sin(\sin(\cos x)))\cos(\sin(\cos x))\cos(\cos x)\sin x

[$]

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

No wonder I had to ask T4A twice It's a lot of work.it's fairly straightforward[$]f(x) = cos(sin(sin(cos(x)))[$]

[$]f'(x) = \cos'(\sin(\sin(\cos x)))\frac{d\;}{dx}\sin(\sin(\cos x))[$]

[$]= \cos'(\sin(\sin(\cos x)))\sin'(\sin(\cos x))\frac{d\;}{dx}\sin(\cos x)[$]

[$]= \cos'(\sin(\sin(\cos x)))\sin'(\sin(\cos x))\sin'(\cos x)\frac{d\;}{dx}\cos x[$]

[$]= \sin(\sin(\sin(\cos x)))\cos(\sin(\cos x))\cos(\cos x)\sin x

[$]

For x = 1.0, I get [$]f''(x) ~0.11352696[$] Two function calls to f are needed.

Last edited by Cuchulainn on May 10th, 2018, 11:40 am, edited 2 times in total.

- Traden4Alpha
**Posts:**23951**Joined:**

As ppauper shows, f'(x) isn't too hard to compute symbolically -- it's just a bit of chain rule crank turning. But I must confess that I had some help from Stephen (Wolfram).

S&T is good magic and formula #4 does seem to imply that one can, in theory, pick arbitrarily small values of h with impunity. But, in practice, the mixing of very large numbers and very small numbers in a numerical calculation seems risky to me. Using S&T with tiny h requires proving that the imaginary part of all digital/numerical implementations of every complex function is robust with respect to round-off error (e.g., from adding h to some larger number during the calculation) or overflow (e.g., something like 1/h occurs in the calculation). That we are often relying on black-box, third-party implementations for complex functions would seem to imply there's risk in trusting that all of this code is robust for tiny h.

Moreover S&T assumes that the higher order derivatives are well behaved so that i*h^3*f'''(x0)/3! is small relative to i*h*f'(x0). I'm sure you know better than I about this so maybe all Taylor series in complex variables are guaranteed to converge. Is h = 1e-50 guaranteed to give an accurate answer all functions?

S&T is good magic and formula #4 does seem to imply that one can, in theory, pick arbitrarily small values of h with impunity. But, in practice, the mixing of very large numbers and very small numbers in a numerical calculation seems risky to me. Using S&T with tiny h requires proving that the imaginary part of all digital/numerical implementations of every complex function is robust with respect to round-off error (e.g., from adding h to some larger number during the calculation) or overflow (e.g., something like 1/h occurs in the calculation). That we are often relying on black-box, third-party implementations for complex functions would seem to imply there's risk in trusting that all of this code is robust for tiny h.

Moreover S&T assumes that the higher order derivatives are well behaved so that i*h^3*f'''(x0)/3! is small relative to i*h*f'(x0). I'm sure you know better than I about this so maybe all Taylor series in complex variables are guaranteed to converge. Is h = 1e-50 guaranteed to give an accurate answer all functions?

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

You mean, ppauper was cheating? Naughty!

This is not allowed (in my requirement). Now that you are here can you spuke the second derivative symbolically for me.

That covers a magnitude. S&T addresses this point. It's a non-issue. Look at my output. That's not the issue here.

You know that FD classic to compute gradients numerically in NN is probably your concern and the concern is justified because they lead to cancellation (in my output it give zero at some stage)..

Not really.. It is a clever application of Taylor's theorem using complex analysis. Like a black swan we see for the first time.

Very few people,know the method. I only found it few weeks ago. Actually, the DL book by Goodfellow et al has 10 sentences on it.

- Traden4Alpha
**Posts:**23951**Joined:**

There's nothing in S&T about the universal properties of digital code for numerical approximations of complex functions. S&T provides proof in the analytic domain only.But, in practice, the mixing of very large numbers and very small numbers in a numerical calculation seems risky to me.

That covers a magnitude. S&T addresses this point. It's a non-issue. Look at my output. That's not the issue here.

You know that FD classic to compute gradients numerically in NN is probably your concern and the concern is justified because they lead to cancellation (in my output it give zero at some stage)..

And I'm shocked that you would cite stable performance on a single run on a single example that uses well-behaved functions as evidence that all implementations of every possible complex function evaluated at every possible value are also robust for arbitrarily small h.

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

You miss the point. And you don't know what I have tested..I did mention option greeks example but you chose to ignore it in favour of your inocuous 1/sin(x) example. Let's not get melodramatic. I am experimenting. As soon as something goes wrong I will let you know.There's nothing in S&T about the universal properties of digital code for numerical approximations of complex functions. S&T provides proof in the analytic domain only.But, in practice, the mixing of very large numbers and very small numbers in a numerical calculation seems risky to me.

That covers a magnitude. S&T addresses this point. It's a non-issue. Look at my output. That's not the issue here.

You know that FD classic to compute gradients numerically in NN is probably your concern and the concern is justified because they lead to cancellation (in my output it give zero at some stage)..

And I'm shocked that you would cite stable performance on a single run on a single example that uses well-behaved functions as evidence that all implementations of every possible complex function evaluated at every possible value are also robust for arbitrarily small h.

This method has been around for a long time in use. And there are lots to see.

Let me Google it for you (read Abreu et al and repeat the question)

https://www.google.nl/search?source=hp& ... yd73VM6Ojc

Last edited by Cuchulainn on May 10th, 2018, 6:15 pm, edited 3 times in total.

- Cuchulainn
**Posts:**59924**Joined:****Location:**Amsterdam-
**Contact:**

Give this discussion a chance to run its course. Going all defensive is a bit silly.

And less cherry picking, por favor

And less cherry picking, por favor

I did that one by handAs ppauper shows, f'(x) isn't too hard to compute symbolically -- it's just a bit of chain rule crank turning. But I must confess that I had some help from Stephen (Wolfram).

You mean, ppauper was cheating? Naughty!

- FaridMoussaoui
**Posts:**433**Joined:****Location:**Genève, Genf, Ginevra, Geneva

The use of the complex-step in calculating the derivative of an analytic function was introduced by Lyness & Moler.Icomplexify[$]f(z)[$] where [$]z = x + ih[$] and h = 0.001, e.g.

Then compute theimaginary partof [$]f(z)/h[$] and you are done.(*)

For x = 1.87 I get 1.3684..... e+289 for both exact and this method.(12 digits accuracy)

No subtraction cancellation as with FD classic.

(*) Squire and Trapp 1998.

Numerical Differentiation of Analytics Functions, SIAM Vol 4, N2, 1967 available here: link to the pdf paper

PS: In a previous life, I used the complex-step gradients to compute the sensittivity derivatives of the aerodynamic cost function.

GZIP: On