Serving the Quantitative Finance Community

snufkin
Topic Author
Posts: 64
Joined: January 25th, 2017, 9:05 am
Location: Cambridge

Hi all,

A newbie question: is AAD (adjoint algorithmic differentiation) actually used in practice? What is it good for and what are its limitations? Some of the quants say that it's only good for first derivatives; I can see a brilliant description at https://www.wilmott.com/automatic-for-the-greeks/ — but then I was under impression that CCR is not actually used?

Best regards,
Last edited by snufkin on April 21st, 2017, 9:57 pm, edited 1 time in total.

snufkin
Topic Author
Posts: 64
Joined: January 25th, 2017, 9:05 am
Location: Cambridge

It seems the question was raised before, in 2011: viewtopic.php?f=34&t=85423&p=647774&hilit=AAD#p647774 — no conclusion though, and the referred webinar is gone...
Last edited by snufkin on April 21st, 2017, 9:58 pm, edited 1 time in total.

snufkin
Topic Author
Posts: 64
Joined: January 25th, 2017, 9:05 am
Location: Cambridge

Cuchulainn
Posts: 64699
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

There was a discussion here

At the time I tried lightly reading some of the articles but I did not understand much,  to be honest.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

mtsm
Posts: 353
Joined: July 28th, 2010, 1:40 pm

Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it.

It's also used for various risk calculations in some global IBs.

Cuchulainn
Posts: 64699
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it.

It's also used for various risk calculations in some global IBs.
It is used to compute gradients and Hessian, that kind of area?

It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Yes, it is used extremely heavily in ML to perform optimization by gradient descent. There are a lot of ML packages that implement this de facto. Just look at any of the packages released by the big tech firms. It's all about it.

It's also used for various risk calculations in some global IBs.
It is used to compute gradients and Hessian, that kind of area?

It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
Yes, exactly, for gradient. In ML frameworks like tensorflow you specify a graph of computations -like excel does- with some end result that's typically a cost function (lsquares errors, likelihood, entropy) and then it automatically computer the gradient throughout the whole dependency tree and allows you to search for a minimal cost.

Eg
https://stats.stackexchange.com/questio ... tensorflow

snufkin
Topic Author
Posts: 64
Joined: January 25th, 2017, 9:05 am
Location: Cambridge

It is possible to understand AAD by a simple 101 example or does one need to have certain background knowledge?
Cuch, for me the eye-opener was this article: https://www.wilmott.com/automatic-for-the-greeks/ — it explains the basic idea and shows the application, which is quite impressive (given how simple the basic idea is!)
Last edited by snufkin on April 21st, 2017, 9:58 pm, edited 1 time in total.

snufkin
Topic Author
Posts: 64
Joined: January 25th, 2017, 9:05 am
Location: Cambridge

Dual numbers are an extension of the real numbers, similar to complex numbers, except that instead of an imaginary unit i with the property $i^2 = -1$, we have an infinitesimal unit $\varepsilon$ with the property $\varepsilon^2 = 0$. The coefficient of $\varepsilon$ is the gradient with respect to $x$; this is initially 1 since $dx/dx\ =\ 1$
Moreover, if you redefine the operations to support differentiation, you can work with more complicated models, too: as long as you know what's the derivative of the result of an operation in terms of values and derivatives of the operands, you're fine. E.g. $(x + x'\varepsilon) \times (y + y'\varepsilon) = xy + (xy' + x'y)\varepsilon$