*Softmax is non-linear so it's not affine.*

My bad. Its input argument is an affine transfomation. That's what I meant, of course..

Are ODEs any good for DL?

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

My bad. Its input argument is an affine transfomation. That's what I meant, of course..

Are ODEs any good for DL?

Cuchulainn wrote:Softmax is non-linear so it's not affine.

My bad. Its input argument is an affine transfomation. That's what I meant, of course..

Are ODEs any good for DL?

The most common architecture elements in a layer are an affine transform (each unit in a layer computes a weighted sum of inputs that come from a previous layer) followed by a non-linear activation function -like softmax, or sigmoid, or relu-.

Softmax is mostly used at the output layer for modelling class probabilties. If the input to the softmax is an affine tranform of some previous layer then those inputs are called 'logits'. The termiology doesn't come from ML but is much older, a single layer NN that does affine + softmax is equivalent to classical https://en.wikipedia.org/wiki/Multinomi ... regression

What sub-area would you apply ODEs to? I can imagine if you had some physical (preservation?) laws as a starting point of some new theory then you would write those down as ODEs before you try to solve them? Did you have something in mind?

Agree with "underlying maths (chapter 4, for example) is fairly basic and somewhat outdated".

However, what it enables is to solve a problems a lot faster that it was possible and with reduced manual intervention, especially in credit risk modeling (retail, SME & Wholesale) type of applications.

Just in those days financial underwriters scorned the generalized & opaque underwriting, I see many quants ridiculing DL or ML these days.

However, what it enables is to solve a problems a lot faster that it was possible and with reduced manual intervention, especially in credit risk modeling (retail, SME & Wholesale) type of applications.

Just in those days financial underwriters scorned the generalized & opaque underwriting, I see many quants ridiculing DL or ML these days.

They laughed at the Gaussian copula with base correlation skew, but we proved them wrong!

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

All truth passes through three stages. First, it is ridiculed. Second, it is violently opposed. Third, it is accepted as being self-evident.

For me personally (I am a beginner here), I am interested in the short term in why (and when) DL and ML work in theory (i.e the maths). I am still intrigued by how Gradient descent (and its variations) has become the darling of Computer Science. I hope to find out.

And there's a wee bit of hubris perhaps maybe possibly that DL solves all problems . In the 80s the West feared that Japan's 5th gen project (the secret tool was Prolog LOL) would take over the world.

My gut feeling says

Last edited by Cuchulainn on November 20th, 2017, 9:51 pm

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

DL/PDE? I have not been able to find something I can understand, yet. I haven't lost hope.

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

rrao4 wrote:Agree with "underlying maths (chapter 4, for example) is fairly basic and somewhat outdated".

However, what it enables is to solve a problems a lot faster that it was possible and with reduced manual intervention, especially in credit risk modeling (retail, SME & Wholesale) type of applications.

Just in those days financial underwriters scorned the generalized & opaque underwriting, I see many quants ridiculing DL or ML these days.

It is a challenge to get a handle on this stuff. AFAIK there is no mathematically sharp text on DL (Goodfellow et al is not hands on). Many articles seem to be have written in a hurry. And a lot of textual description (which is fine) but an algorithmic description is also needed. Takes some reverse engineering to get from solution back to the problem.

Freeman and Skapura might be a bit outdated but it explains things well IMO.

Last edited by Cuchulainn on November 21st, 2017, 12:06 pm

What do you mean by "hands on"? If you want mathematical rigor, it will not be an easy read. If you want it to be a page turner, it will not be mathematically rigorous.

Cuchulainn wrote:rrao4 wrote:Agree with "underlying maths (chapter 4, for example) is fairly basic and somewhat outdated".

However, what it enables is to solve a problems a lot faster that it was possible and with reduced manual intervention, especially in credit risk modeling (retail, SME & Wholesale) type of applications.

Just in those days financial underwriters scorned the generalized & opaque underwriting, I see many quants ridiculing DL or ML these days.

It is a challenge to get a handle on this stuff. AFAIK there is no mathematically sharp text on DL (Goodfellow et al is not hands on). Many articles seem to be have written in a hurry. And a lot of textual description (which is fine) but an algorithmic description is also needed. Takes some reverse engineering to get from solution back to the problem.

Freeman and Skapura might be a bit outdated but it explains things well IMO.

I understand your point, being a purist to some extent myself, however, the world consists more of non maths people, many with MBA from ivy league who get scared with simple log transformations, forget about EVT and PDEs. Long story short, people learn to drive a car without learning Newton's Laws.

I understand your point, being a purist to some extent myself, however, the world consists more of non maths people, many with MBA from ivy league who get scared with simple log transformations, forget about EVT and PDEs. Long story short, people learn to drive a car without learning Newton's Laws.

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

rrao4 wrote:I understand your point, being a purist to some extent myself, however, the world consists more of non maths people, many with MBA from ivy league who get scared with simple log transformations, forget about EVT and PDEs. Long story short, people learn to drive a car without learning Newton's Laws.

Plan B is if I can get the C++ code (open-source) which is not complete spaghetti then I can reverse engineer it. I can replace algo 1 by algo 2 while keeping the original I/O.

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

ISayMoo wrote:What do you mean by "hands on"? If you want mathematical rigor, it will not be an easy read. If you want it to be a page turner, it will not be mathematically rigorous.

I mean step-by-step, a who-dunnit approach to see how all the parts fit. e.g.not as in Goodfellow in which chapter 4 is quarantined from the main theme.

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

rrao4 wrote:I understand your point, being a purist to some extent myself, however, the world consists more of non maths people, many with MBA from ivy league who get scared with simple log transformations, forget about EVT and PDEs. Long story short, people learn to drive a car without learning Newton's Laws.

Plan B is if I can get the C++ code (open-source) which is not complete spaghetti then I can reverse engineer it. I can replace algo 1 by algo 2 while keeping the original I/O.

I am a numerical analyst by training so its incumbent on me (fun) to kick the tires of the bespoke car. WLOG we assume Newton's Laws to be true for the moment.

- Traden4Alpha
**Posts:**23508**Joined:**

Cuchulainn wrote:Are ODEs any good for solving simultaneous equations? (It may be a stupid question. I don't know the answer to that because I've never looked.)Softmax is non-linear so it's not affine.

My bad. Its input argument is an affine transfomation. That's what I meant, of course..

Are ODEs any good for DL?

DL seems more like an atemporal system in which one wants to estimate values for a large set of parameters that simultaneously output the right values for each of a large set of inputs.

One might use gradient descent to Are ODEs any good for the values, but the iteration is a nuisance.

- Cuchulainn
**Posts:**54536**Joined:****Location:**Amsterdam-
**Contact:**

My question is not random. Are you familiar with RNNs?