Serving the Quantitative Finance Community

 
User avatar
Collector
Topic Author
Posts: 2572
Joined: August 21st, 2001, 12:37 pm
Contact:

Re: Abuse of curve fitting?

April 30th, 2017, 1:48 pm

The deeper issue is that finding the "best fit" model via whatever method provides no guarantee of a good fit.  If the sine() model is simply wrong, the best fit will still be a bad fit which is what seems to be happening here.

And even if one estimates that there's miniscule chance that so many measurements would come so close to fitting a 5.9 year sinusoid, there's still the open issue of the measurements that fail to fit.  Thus one must ask what is the chance that measurements of a 5.9 year sinusoid would yield this pattern of data (or worse).

Either that or one must hypothesize that certain failed-to-fit measurements are simply wrong --there being much larger error bars or some bias error in those wayward data points. Yet rotten-cherry-rejecting is as bad as cherry-picking.  If an ill-fitting data point is hypothesized to be wrong, why can't the well-fitting data points be wrong, too?  Or maybe the entire model is wrong.
I think you are on to some key issues here. 
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: Abuse of curve fitting?

April 30th, 2017, 3:06 pm

When I fit y = a*sin(b + c*T) through the data, and give each observation the probablity N(obs_mean, y, obs_stdev) I get the following loglikelihood as a function of cycle length
Image
import numpy as np
from scipy.stats import norm
from scipy.optimize import minimize
import matplotlib.pyplot as plt

data = np.array([
    [1982, 6.67248,  6.4E-5],
    [1996, 6.6729,   7.5E-5],
    [1997, 6.67398,  1.0E-4],
    [2000, 6.674255, 1.4E-5],
    [2001, 6.67559,  4.0E-5],
    [2002, 6.67422,  1.5E-4],
    [2003, 6.67387,  4.0E-5],
    [2005, 6.67228,  1.3E-4],
    [2006, 6.67425,  1.9E-5],
    [2009, 6.67349,  2.7E-5],
    [2010, 6.67234,  2.1E-5],
    [2013, 6.67545,  2.7E-5], # https://www.ncbi.nlm.nih.gov/pubmed/25166649
    [2014, 6.67191,  1.5E-4], # http://www.nature.com/nature/journal/v510/n7506/abs/nature13433.html
])

T = data[:,0]           # the year the eperiment was done
mu = data[:,1]          # estimated G value
sigma = data[:,2]*10    # estimated stdev

def loglikelihood(T, mu, sigma, mean, amplitude, phase, cycle_length):
    y = mean + amplitude*np.sin(phase + T  * (2.0*np.pi) / (cycle_length + 0.0001) )
    z = (y - mu) / sigma
    z[z<-20] = -30
    z[z>20] = 30
    return np.sum( np.log( norm.pdf(z)) )

# -----------------------------------------------------------------
# initial guess
# -----------------------------------------------------------------
cycle_length = 5.9
x0 = (np.mean(mu), np.std(mu)*3, 0.0)

# Define a function 
def f(a):
    mean, amplitude, phase = a[0], a[1], a[2]
    return -loglikelihood(T, mu, sigma, mean, amplitude, phase, cycle_length)
    
# -----------------------------------------------------------------
# Compute the LL for each freq
# -----------------------------------------------------------------
cycle_lengths = np.linspace(0,10,num=1001)
loglikelihoods = []
for cycle_length in cycle_lengths:
    res = minimize(f, x0,  method='Nelder-Mead', tol=1e-6)
    loglikelihoods.append(f(res.x))

# -----------------------------------------------------------------
# plot the LL as function of the range of cycle_lengths
# -----------------------------------------------------------------
plt.figure(figsize=(8,5))
plt.plot(cycle_lengths, loglikelihoods, color='blue')
plt.xlabel('cycle_length')
plt.ylabel('loglikelihood')
plt.title('-log likelihood of data (lower is better)');
plt.savefig('g-ll.png')
plt.close()


# -----------------------------------------------------------------
# plot the dat and th 5.9 fit
# -----------------------------------------------------------------
cycle_length = 5.9
res = minimize(f, x0,  method='Nelder-Mead', tol=1e-6)
mean, amplitude, phase = res.x[0], res.x[1], res.x[2]
T2 = np.linspace(1980, 2016, num=4*(1+2016-1980) )
y = mean + amplitude * np.sin(phase + T2 * cycle_length / (2.0*np.pi) )

plt.figure(figsize=(8,5))
plt.plot(T2,y, color='k')
plt.errorbar(T, mu, yerr=sigma, fmt='o', ecolor='red')
plt.xlabel('year')
plt.ylabel('G')
plt.title('data and the 5.9 fit');
plt.savefig('g59.png')
plt.close()

# -----------------------------------------------------------------
# best fit
# -----------------------------------------------------------------
best_fit_ndx = loglikelihoods.index(min(loglikelihoods))
best_freq = cycle_lengths[best_fit_ndx]
cycle_length = best_freq
res = minimize(f, x0,  method='Nelder-Mead', tol=1e-6)
mean, amplitude, phase = res.x[0], res.x[1], res.x[2]
T2 = np.linspace(1980, 2016, num=4*(1+2016-1980) )
y = mean + amplitude*np.sin(phase + T2  * (2.0*np.pi) / (cycle_length + 0.0001) )

plt.figure(figsize=(8,5))
plt.plot(T2,y, color='k')
plt.errorbar(T, mu, yerr=sigma, fmt='o', ecolor='red')
plt.xlabel('year')
plt.ylabel('G')
plt.title('data and the {0} fit'.format(best_freq));
plt.savefig('g-best.png')
plt.close()
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: Abuse of curve fitting?

April 30th, 2017, 3:18 pm

Now suppose the experiments are flawed, they have structural biases, wrong error estimates: what would be the probability of observing a 5.9 cycle?

One way to test this is to randomize the points in time each experiment was done. This is like saying that if they did the same (flawed) experiment at a different date -with the exact same outcome (not dependent on T)-: what best cycle would we then fit? This will give it a best-fitted-cycle distribution. 

Now, randomizing the timestamps will probably not result is uniform random frequencies distribution because we have a small number of samples. I'll MC it!
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Re: Abuse of curve fitting?

April 30th, 2017, 3:33 pm

I think way to go is many more measurements with same equipment from same location (or better same set up in different locations). 15 observations done with different equipment at different locations, hemm I would be very careful to draw conclusions from that.

I have a big G machine that can be hooked up to my computer, but is is not accurate enough for this type of study ;-) But wonderful combination of ancient mechanics and modern electronics. We just need to size up this machine and get accurate measurements

Image
What about GPS satellite orbit data? They've been orbiting for decades, are tracked to extremely tight error-bands, and there's dozens of them up there. Even a 1 part-per-million change in G would induce a 27 meter change in the orbital radius. Sure, there a lot of noise terms (photon pressure, out-gassing, maneuvers, magnetic field effects, and probably even a tiny amount of drag at 20,200 km above the Earth) but those terms can be modeled and removed given the volume of data.
 
User avatar
Paul
Posts: 6598
Joined: July 20th, 2001, 3:28 pm

Re: Abuse of curve fitting?

April 30th, 2017, 4:02 pm

I know nothing about this problem but some obvious questions:

Is sine to be expected?

Suppose it is periodic with period 5.9 years what is the best fit simple(ish) function? (I.e. Use the data mod 5.9. Apologies if already mentioned.) Then explain that.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: Abuse of curve fitting?

April 30th, 2017, 5:16 pm

Now suppose the experiments are flawed, they have structural biases, wrong error estimates: what would be the probability of observing a 5.9 cycle?

One way to test this is to randomize the points in time each experiment was done. This is like saying that if they did the same (flawed) experiment at a different date -with the exact same outcome (not dependent on T)-: what best cycle would we then fit? This will give it a best-fitted-cycle distribution. 

Now, randomizing the timestamps will probably not result is uniform random frequencies distribution because we have a small number of samples. I'll MC it!
Here is the best fitting cycle distribution based on 1000x randomising experiment times between 1980 and 2016. Not sure what conclusion we can draw. The density might be 1/CycleLength ? There is a 1-2% change that if you look at a 100 cycles [0.0, 0.1, 0.2, ...,10.0] that 5.9 gives the best fit. The probability is mainly driven by the interval size: what does 5.9 mean? between 5.85 and 5.95?
Image
 
User avatar
Alan
Posts: 2957
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Abuse of curve fitting?

April 30th, 2017, 6:54 pm

Is this over fitting? 
Image
The chart gives the impression that all these measurements are done at a well-defined "point in time". Well, I looked up a local study (local to me at UC Irvine), here: http://royalsocietypublishing.org/conte ... 5.full.pdf. From the article:
A total of about 2500 h of G data were collected in y2000, y2002, y2004 and y2006 ...
If the other studies are similarly protracted, the chart is pretty dubious.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: Abuse of curve fitting?

April 30th, 2017, 7:30 pm

Is this over fitting? 
Image
The chart gives the impression that all these measurements are done at a well-defined "point in time". Well, I looked up a local study (local to me at UC Irvine), here: http://royalsocietypublishing.org/conte ... 5.full.pdf. From the article:
A total of about 2500 h of G data were collected in y2000, y2002, y2004 and y2006 ...
If the other studies are similarly protracted, the chart is pretty dubious.
That's a very important data property the 5.9 cycle paper overlooked!
 
User avatar
Collector
Topic Author
Posts: 2572
Joined: August 21st, 2001, 12:37 pm
Contact:

Re: Abuse of curve fitting?

April 30th, 2017, 8:46 pm

Is this over fitting? 
Image
The chart gives the impression that all these measurements are done at a well-defined "point in time". Well, I looked up a local study (local to me at UC Irvine), here: http://royalsocietypublishing.org/conte ... 5.full.pdf. From the article:
A total of about 2500 h of G data were collected in y2000, y2002, y2004 and y2006 ...
If the other studies are similarly protracted, the chart is pretty dubious.
That is an interesting study:

"Data taken in y2004 proved to be exceptionally noisy, probably for two reasons: a leak that had developed in the evacuated thermal insulation gap between the outer and inner walls of the dewar, and an untightened screw in the cryostat inner structure."

"After a run was started all humans left the underground laboratory."  what about the mice?

Others have claimed that big G varies with orientation relative to fixed stars:

Experimental evidence that the gravitational constant varies with orientation

"Our repetitive measurements of the gravitational constant (G) show that G varies significantly with the orientation of the test masses relative to the system of fixed stars"

I like this later idea, but then should it then not have been captured by more experiments?
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Re: Abuse of curve fitting?

May 1st, 2017, 2:27 am

Is this over fitting? 
Image
The chart gives the impression that all these measurements are done at a well-defined "point in time". Well, I looked up a local study (local to me at UC Irvine), here: http://royalsocietypublishing.org/conte ... 5.full.pdf. From the article:
A total of about 2500 h of G data were collected in y2000, y2002, y2004 and y2006 ...
If the other studies are similarly protracted, the chart is pretty dubious.
That is an interesting study:

"Data taken in y2004 proved to be exceptionally noisy, probably for two reasons: a leak that had developed in the evacuated thermal insulation gap between the outer and inner walls of the dewar, and an untightened screw in the cryostat inner structure."

"After a run was started all humans left the underground laboratory."  what about the mice?

Others have claimed that big G varies with orientation relative to fixed stars:

Experimental evidence that the gravitational constant varies with orientation

"Our repetitive measurements of the gravitational constant (G) show that G varies significantly with the orientation of the test masses relative to the system of fixed stars"

I like this later idea, but then should it then not have been captured by more experiments?
If G were anisotropic, surely it would be very visible as anomalies in satellite orbits.

Methinks the mouse population is on a 5.9 year cycle and that's affecting the readings.
 
User avatar
Collector
Topic Author
Posts: 2572
Joined: August 21st, 2001, 12:37 pm
Contact:

Re: Abuse of curve fitting?

May 1st, 2017, 9:59 am

Soon great possibly to look into another possibly gravity anomaly: The Allais and  also the Jeverdan effect (that we talked about before) 

NASA Total Solar Eclipse 2017: The Path Through the United States

An experiment to detect Allais effect around total solar eclipse of 9 March 2016 (Indonesia)

I know at least there are some gropes planning to measure if they observe Allais and also Jeverdan effect in USA in Aug. Time gates that can be plugged straight into your computer are cheap these days (when looking for Jeverdan effect). For people in US there is still time to prepare a nice pendulum.
 
User avatar
Collector
Topic Author
Posts: 2572
Joined: August 21st, 2001, 12:37 pm
Contact:

Re: Abuse of curve fitting?

May 25th, 2017, 10:41 pm

The chart gives the impression that all these measurements are done at a well-defined "point in time". Well, I looked up a local study (local to me at UC Irvine), here: http://royalsocietypublishing.org/conte ... 5.full.pdf. From the article:



If the other studies are similarly protracted, the chart is pretty dubious.
That is an interesting study:

"Data taken in y2004 proved to be exceptionally noisy, probably for two reasons: a leak that had developed in the evacuated thermal insulation gap between the outer and inner walls of the dewar, and an untightened screw in the cryostat inner structure."

"After a run was started all humans left the underground laboratory."  what about the mice?

Others have claimed that big G varies with orientation relative to fixed stars:

Experimental evidence that the gravitational constant varies with orientation

"Our repetitive measurements of the gravitational constant (G) show that G varies significantly with the orientation of the test masses relative to the system of fixed stars"

I like this later idea, but then should it then not have been captured by more experiments?
If G were anisotropic, surely it would be very visible as anomalies in satellite orbits.

Methinks the mouse population is on a 5.9 year cycle and that's affecting the readings.
I think there possibly could be many of the same challenges as with one-way speed of light when it comes to potential one-way speed of gravity. That possibly most experiments (effects) would measure just round trip gravity that for sure should be isotropic.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Re: Abuse of curve fitting?

May 26th, 2017, 1:14 pm

That is an interesting study:

"Data taken in y2004 proved to be exceptionally noisy, probably for two reasons: a leak that had developed in the evacuated thermal insulation gap between the outer and inner walls of the dewar, and an untightened screw in the cryostat inner structure."

"After a run was started all humans left the underground laboratory."  what about the mice?

Others have claimed that big G varies with orientation relative to fixed stars:

Experimental evidence that the gravitational constant varies with orientation

"Our repetitive measurements of the gravitational constant (G) show that G varies significantly with the orientation of the test masses relative to the system of fixed stars"

I like this later idea, but then should it then not have been captured by more experiments?
If G were anisotropic, surely it would be very visible as anomalies in satellite orbits.

Methinks the mouse population is on a 5.9 year cycle and that's affecting the readings.
I think there possibly could be many of the same challenges as with one-way speed of light when it comes to potential one-way speed of gravity. That possibly most experiments (effects) would measure just round trip gravity that for sure should be isotropic.
Wouldn't GPS provide the perfect test of a one-way speed of light or other anisotropies in EM wave propagation?

There's 2,000 Continuously Operating Reference Stations (https://www.ngs.noaa.gov/CORS/) in the US that would surely show extremely obvious significant anomalies in computed ground station locations as a function of time-of-day and day-of-year if the speed of light or gravity varied. The CORS network enables instantaneous location measurements accurate to ±20 mm (across the roughly 20,000 km orbital distances that GPS signals travel) meaning something on the order of 1 parts-per-billion variations should be detectable. And I'd bet that a careful analysis of all the data could detect light-speed or gravity anomalies down to parts per trillion.
 
User avatar
Collector
Topic Author
Posts: 2572
Joined: August 21st, 2001, 12:37 pm
Contact:

Re: Abuse of curve fitting?

May 26th, 2017, 2:12 pm

If G were anisotropic, surely it would be very visible as anomalies in satellite orbits.

Methinks the mouse population is on a 5.9 year cycle and that's affecting the readings.
I think there possibly could be many of the same challenges as with one-way speed of light when it comes to potential one-way speed of gravity. That possibly most experiments (effects) would measure just round trip gravity that for sure should be isotropic.
Wouldn't GPS provide the perfect test of a one-way speed of light or other anisotropies in EM wave propagation?

There's 2,000 Continuously Operating Reference Stations (https://www.ngs.noaa.gov/CORS/) in the US that would surely show extremely obvious significant anomalies in computed ground station locations as a function of time-of-day and day-of-year if the speed of light or gravity varied.  The CORS network enables instantaneous location measurements accurate to ±20 mm (across the roughly 20,000 km orbital distances that GPS signals travel) meaning something on the order of 1 parts-per-billion variations should be detectable.  And I'd bet that a careful analysis of all the data could detect light-speed or gravity anomalies down to parts per trillion.
The only way to know would be to go carefully through experiments (or data) claiming to prove isotropic one-way. Mariniov is one of the few I have seen that have gone through a long series of experiments (before GPS) claiming to prove isotropic one-way, that in reality when looks more closely at in reality are experiments that only measure round-trip.