SERVING THE QUANTITATIVE FINANCE COMMUNITY

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Is there a name for this transform method?

I ran into this algorithm and want to know more about it's mathematical properties.

It's an algorithm that allows you to do highly complex transforms of multivariate systems using arbitrary functions...but the transform is always invertible! A possible use case is e.g. to learn a function that maps complex empirical distributions into easy to sample distributions (gaussian, uniform), the inverse of which can be used to generate new samples from the empirical distribution.

Below you see an example of the forward and back transform. There is a conceptual link with copula's, but can copula's do things like this?

It goes like this: we have a D dimensional variable, $\overline{x} = \{x_1, x_2, ... x_D\}$, select a subset of coordinates that don't change, i.e. $y_1 = x_1, y_2 = x_2,.. y_d = x_d$, the remainder coordinates you can transform using any type of function depending on the unchanged coordinates, in particular you can use non-invertible ones, e.g. like f(a) = sin(a^2)

Forward transform
\begin{align} y_1 &= x_1 \\ y_2 &= x_2 \\ &... \\ y_d &= x_d \\ y_{d+1} &= x_{d+1} + f(x_1, x_2,.. x_d) \\ y_{d+2} &= x_{d+2} + g(x_1, x_2,.. x_d) \\ &... \end{align}

The tricks are now it that
1) you can back transform like this
\begin{align} x_1 &= y_1 \\ x_2 &= y_2 \\ &... \\ x_d &= y_d \\ x_{d+1} &= y_{d+1} - f(y_1, y_2,.. y_d) \\ x_{d+2} &= y_{d+2} - g(y_1, y_2,.. y_d) \\ &... \end{align}

2) and you can repeatedly select subset of coordinates and stack transforms on transform to create highly complicated transforms on all coordinates.

Is there a name for this? What limitations does this have if you compare it to general invertible mappings (are there some invertible mapping it can't do)?

Posts: 23951
Joined: September 20th, 2002, 8:30 pm

### Re: Is there a name for this transform method?

Cool! If this has a name, I'd think it would come from topology.

On curious issue is that if this algorithm operates on empirical data, then the transform outside the convex hull of the data should be undefined unless there are known properties of the data generating process to support extrapolation.

Alan
Posts: 10270
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

### Re: Is there a name for this transform method?

Interesting. A similar transformation can be used to remove the correlation in bivariate stochastic volatility models. See my comment in this thread

I don't know a name for it, though.

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

Thanks both!
T4A, yes extrapolation is always an issue, but I think it can be prevented by mapping to a distribution with finite support like the uniform^d distribution, and then take good care of the border?

Alan, cool. I had read that thread at the time but didn't get it, but now I understand more!: that's indeed the same thing, and also invertible. Nice.

frolloos
Posts: 1621
Joined: September 27th, 2007, 5:29 pm
Location: Netherlands

### Re: Is there a name for this transform method?

Alan, cool. I had read that thread at the time but didn't get it, but now I understand more!: that's indeed the same thing, and also invertible. Nice.
Sorry for the late reply Alan, but like Outrun I didn't get it, however I still don't get it
The transformation, in this thread, or the other thread, is it like a Gram-Schmidt orthogonalization? Maybe to take a concrete example of the SABR model with beta = 1, how would it work? I don't want to hijack this thread so probably should re-post this in the other thread.

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

Alan, cool. I had read that thread at the time but didn't get it, but now I understand more!: that's indeed the same thing, and also invertible. Nice.
Sorry for the late reply Alan, but like Outrun I didn't get it, however I still don't get it
The transformation, in this thread, or the other thread, is it like a Gram-Schmidt orthogonalization? Maybe to take a concrete example of the SABR model with beta = 1, how would it work? I don't want to hijack this thread so probably should re-post this in the other thread.
Hi frolloos, I see this as a brainstorm type of thread, so feel free to post here (and also feel tree to somewhere else if you think that better).

In my case I'm mostly interested in non-linear transforms that are invertible, not necessarily to orthogonalize things with analytical transforms in mind, however, the 'transform' bit / trick is the same.  If you have multivariate data you can split the dimensions into two sets, x,y and define a transform like this
\begin{align} x^\prime &\leftarrow f(x)\\ y^\prime &\leftarrow h(y | g(x)) \end{align}

For this to be invertible you only need f() and h() to invertible, but not g()! The reason is that you can do this
\begin{align} x &\leftarrow f^{-1}(x^\prime)\\ y &\leftarrow h^{-1}(y^\prime | g(x)) \end{align}
note that you first reconstruct $x$ so that you can then compute $g(x)$ which is needed in the second line.

the notation is a bit weird for the inverse of $h$ because its a function of both a variable $y$ and a 'conditioning' set of parameters that don't depend of $y$. I mean to say that this holds:
$y = h^{-1}(h(y | \Theta) | \Theta)$

In my application I'm using simple functions for f() and h(), and a highly non-linear and non-invertible functions for g() (a neural network), the goal is to decompose an empirical distribution as a function of independent simple distributions (blue step in the plot above), and also be able to back-transform so that I can use it as a MC distribution sampler (red step in the plot above).

Posts: 23951
Joined: September 20th, 2002, 8:30 pm

### Re: Is there a name for this transform method?

Interesting!

How stable are your functions under bootstrap or jackknife type resamplings of the data?

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

I'm still trying to get the paper (from the plots above) working! The paper is getting good reviews from the community, it seems to solve issues I was running into with other methods I tried earlier (GAN which is 'likelihood free', VAE which didn't converge -even when I extended it with a random forest type of frank-brain-module ).

Maybe 1-2 day? I'll keep posting progress.

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

It works! really Cool.

In this example I'm feeding it the (S&P500, NASDAQ) joint return distribution (top left). It then learns a mapping/decomposition into a 2D Gaussian latent representation (top right). The bottom row shows that you can use the invertibility property to generate samples.

Here is a video of the training phase of the Neural Network. I really like how it's learning fat tails in the bottom left at the end of the video!

mtsm
Posts: 352
Joined: July 28th, 2010, 1:40 pm

### Re: Is there a name for this transform method?

What are you trying to do? Learn the distribution through a generative model to then sample risk scenarios?

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

What are you trying to do? Learn the distribution through a generative model to then sample risk scenarios?
Yes, a generative model for generating samples from high dimensional distributions that capture complex details of that distribution. Also have a simple latent (and maybe low dimensional) representation which can be used to model the latent evolution in time. Application would be simulate and forecast scenarios of stocks, curves and (vol) surface.
In general I'm working on empirical deep learning method for popular mathematical modelling elements used in QF: modelling dynamics, modelling densities, simulation, dimension reduction, outlier detection, inpainting missing data.

Do you know this real NPV method?
I think it's way more useful that GANs and VAEs. E.g. when trying the Wasserstein GAN training was very unstable, -and worse-, it always ended in mode collapse. It's maybe because I experiment with low dimensional problems compared to the image datasets people benchmark against?

Posts: 23951
Joined: September 20th, 2002, 8:30 pm

### Re: Is there a name for this transform method?

A few observations:
1) In both the first example ("double U") and second example ("index vs index"), the forward map seems to create an strangely lumpy scatter plot for the latent Gaussian. There too many tight clusters and open holes in the would be gaussian. I wonder if the map construction process anomalously amplifies natural stochastic variations in point density in some way. In wonder what the map construction process does to the distribution of point neighborhoods (e.g., the Delaunay triangles or Voronoi polygons)?

2) In theory, the maps should be rotation invariant, right? The latent Gaussian can be spun about the origin. But are they?

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

For those who don't know the details of GANs, a "GAN" is a "generative adversarial networks". It's a neural network that learns to convert D dimensional iid random variates into samples from a complex higher dimensional density. The "learning" of the complex density is done by providing a set of examples from that distribution.
A typical demo it to generate randomly faces of people like this:

Since there exist no parametric distribution of "pictures of human faces" (a 64x64 rgb image can be viewed as a point in a  64*64*3 =12.000 dimension space) you can't use likelihood based methods to fit the distribution to data. Instead a 2nd neural network gets trained in parallel to look at the generated images, and real images and tries to tell them apart. E.g. the the first network might generate random faces with just 1 eye and the second network will as some point figure that out. The two network are battling: one tries to fool the other, the second tries to find flaws. It's very much like the game between art forgery and art expert. The main benefit it that you can learn to model and sample from very complex densities without requiring a likelihood function.
The "model collapse" problem is when the first neural network decides to copy one of the high probability examples in high detail over and over again, and nothing else.  Like only paining the Mona Lisa's over and over again. The second network wont be able to tell it apart from all the other painting examples, and the game converges, but we don't end up with a good distribution sampler that converts random numbers into unique realistic paintings.
Last edited by outrun on May 17th, 2017, 7:30 pm, edited 2 times in total.

outrun
Topic Author
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

### Re: Is there a name for this transform method?

2) In theory, the maps should be rotation invariant, right?  The latent Gaussian can be spun about the origin.  But are they?
thanks for the ideas.
Yes the samples can be spun, or resampled, in the video I draw a new set of samples every frame.
About the lumps: these are very early results, I only got it working this morning and wanted to post it before I had to spend the whole afternoon doing other stuff. I'm glad it's converging and not diverging like it did yesterday (due to a bug in my code).
I'll try to experiment with other topologies of the neural network (more layers, less layers, different layers).  i.e. stir the pot! It would also be good to benchmark the data likelihood against other density models fitted to this data: a simple bivariate gaussian fit, a gaussian mixture fit, a kernel density.
Things will get more interesting when I move on to more complicated densities, like the joint "volume, returns" density for a bunch of stocks, or high frequency data I have.

Posts: 23951
Joined: September 20th, 2002, 8:30 pm

### Re: Is there a name for this transform method?

What's strange is that the lumps and holes are also in the first example which you presumably copied from the paper or someone else's implementation so its not a bug in your code. Moreover, the "double U" example has a second very strange artifact: the forward map creates anomalous density below the origin and the reverse map makes the upper U more dense than the lower U.