Matching model joint distribution with empirical data

betso · October 9th, 2008, 10:15 am

I have a dataset of two correlated poisson variablesX1 ~ Poisson(lambda1)X2 ~ Poisson(lambda2)Pr(X1,X2)<>Pr(X1)*Pr(X2)I would like to specify the joint distribution of X1 and X2 so that it fits a dataset.So far I have tried two things:i, created a correlation matrix for X1 and X2, cholesky decomposed it and generated correlated random outcomes (in detail I created two gaussian independent variables, correlated them and then transformed them back to uniform rvs and then to poissons via the cumulative distribution function). perhaps this is a mad approach, please advise if so. to my disappointment the resulting joint distribution does not look anything like the dataset.ii, i tried various copula types (frank, clayton, gumbel) to tie the poisson marginals to fit the empirical dataset, but again no resemblance.what is striking is that the joint distribution for the dataset have a significant higher probability of seeing X1=X2 than any of the two above approaches.in particular i am fighting with the following questions:a, is it valid to test the fit of the model generated joint distribution with the dataset using least squares? i have heard about maximum likelihood, how can i create a ML function for this particular case?b, is there a well known approach to creating a model for an empirical joint distribution (other than my two naive approaches above)?it is important that the modelled joint distribution fits the empirical one (or at least resembles it) and that this relationship is maintained when lambda1 and lambda2 changes. any suggestions on how I can proceed are more than welcome. thanks.

Alan · October 9th, 2008, 5:15 pm

Two suggestions.1. Ignore the fact that the X's are integers. Create a transformed data set: z1 = Log X1, z2 = Log X2.Then, fit {z1,z2} to a bivariate normal.Now, how does the distribution of {X1,X2} look vs. what you have tried?2. Explain what is generating the data --- sensible modelling should depend on the process.regards,

betso · October 12th, 2008, 9:23 am

Thanks Alan. To start with I log transformed the discrete variables before creating the correlation matrix and this did make a significant difference (for approach i mentioned above). I will look into a bivariate normal fitting as well. As I started googling on bivariate distributions I came across the bivariate poisson which also provided a much better fit to the dataset.

Aaron · October 12th, 2008, 8:21 pm

The bivariate Poisson is a much more natural idea than the other ones you described.Define Z, Y1 and Y2 as independent Poisson variates with:X1 = Z + Y1X2 = Z + Y2A natural way to fit parameters is to note:E(X1) = E(Z) + E(Y1)E(X2) = E(Z) + E(Y2)E(X1*X2) = E(Z^2) + E(Z)E(Y1) + E(Z)E(Y2) + E(Y1)E(Y2) =E(Z)^2 + E(Z) + E(Z)E(Y1) + E(Z)E(Y2) + E(Y1)E(Y2)So you just fit the three equations above to the average values for X1, X2 and X1*X2.

betso · October 13th, 2008, 5:17 pm

Thanks Aaron. Looks like an relatively easy job to fit the bivariate Poisson with the expressions you presented.I am also interested in possibilities to extend the model to incorporate more variables and have learned about the multivariate Poisson model. This model seems to be tricky to fit to data and also quite limiting in that the correlation it supports is not pair-wise rather it needs to apply to all variates.For more than two variates, would the most plausible approach then be something along the lines of what Alan described?