December 21st, 2017, 12:01 am
Thanks! €20k is nice incentive he has organized.
Cool contest!
It looks like these may be all {given X(t), predict X(t+)} time series rather than {given [X
,Y], predict Y[i+] from X[i+]} data sets?
For M3 they have data on various timescales (annual, quarterly, monthly,..) and sometimes as little a dozen data points. In not sure if you're allowed to correlate on point in time between series (or aditional series you can add yourself)?
Why not? Wouldn't most ML approaches automagically notice correlation patterns among the data series and exploit them? If the ML system shares state between it's analyses of multiple series, then it is implicitly using more than each data series in isolation.
The use of additional data series seems a bit trickier. It might be considered "cheating" or it might be entirely encouraged. If the goal is to create learning system capable of ingesting a novel corpus of data and making predictions, then attempting to augment the 100,000 provided series with additional series would be bad. But if the goal is to generate a general predictor that processes all of humanity's data and provides predictions, then augmenting the 100,000 data series is clearly a welcome solution.
Are the 100,000 data series meant to be the only training data or are they just the test data series?
And what about a hybrid system that generates "new" data series through functions of combinations of the original data?