Page 1 of 1

AAD in practice

Posted: April 21st, 2017, 4:37 pm
by snufkin
Hi all,

A newbie question: is AAD (adjoint algorithmic differentiation) actually used in practice? What is it good for and what are its limitations? Some of the quants say that it's only good for first derivatives; I can see a brilliant description at https://www.wilmott.com/automatic-for-the-greeks/ — but then I was under impression that CCR is not actually used?

Best regards,

Re: AAD in practice

Posted: April 21st, 2017, 4:42 pm
by snufkin
It seems the question was raised before, in 2011: viewtopic.php?f=34&t=85423&p=647774&hilit=AAD#p647774 — no conclusion though, and the referred webinar is gone...

Re: AAD in practice

Posted: April 21st, 2017, 5:03 pm
by snufkin

Re: AAD in practice

Posted: April 21st, 2017, 5:04 pm
by Cuchulainn
There was a discussion here
viewtopic.php?f=34&t=85423&p=565013&hilit=AAD#p565013

At the time I tried lightly reading some of the articles but I did not understand much,  to be honest.

Re: AAD in practice

Posted: April 21st, 2017, 5:42 pm
by mtsm
Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it. 

It's also used for various risk calculations in some global IBs. 

Re: AAD in practice

Posted: April 21st, 2017, 5:46 pm
by Cuchulainn
Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it. 

It's also used for various risk calculations in some global IBs. 
It is used to compute gradients and Hessian, that kind of area?

It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?

Re: AAD in practice

Posted: April 21st, 2017, 7:10 pm
by outrun
Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it. 

It's also used for various risk calculations in some global IBs. 
It is used to compute gradients and Hessian, that kind of area?

It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
Yes, exactly, for gradient. In ML frameworks like tensorflow you specify a graph of computations -like excel does- with some end result that's typically a cost function (lsquares errors, likelihood, entropy) and then it automatically computer the gradient throughout the whole dependency tree and allows you to search for a minimal cost.

Eg
https://stats.stackexchange.com/questio ... tensorflow

Re: AAD in practice

Posted: April 21st, 2017, 9:06 pm
by snufkin
It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
Cuch, for me the eye-opener was this article: https://www.wilmott.com/automatic-for-the-greeks/ — it explains the basic idea and shows the application, which is quite impressive (given how simple the basic idea is!)

Re: AAD in practice

Posted: April 21st, 2017, 9:23 pm
by snufkin
It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
The core idea is as follows:

Dual numbers are an extension of the real numbers, similar to complex numbers, except that instead of an imaginary unit i with the property [$]i^2 = -1[$], we have an infinitesimal unit [$]\varepsilon[$] with the property [$]\varepsilon^2 = 0[$]. The coefficient of [$]\varepsilon[$] is the gradient with respect to [$]x[$]; this is initially 1 since [$]dx/dx\ =\ 1[$]
Since most of the transformations you use in numerical methods are linear, you... get the actual derivative propagated alongside the value — voila. 

Moreover, if you redefine the operations to support differentiation, you can work with more complicated models, too: as long as you know what's the derivative of the result of an operation in terms of values and derivatives of the operands, you're fine. E.g. \[ (x + x'\varepsilon) \times (y + y'\varepsilon) = xy +  (xy' + x'y)\varepsilon \]