May 19th, 2012, 7:33 am
As a rough starting guide:large = the tools you currently use won't work on data that bigcomplex = the data is not in consistent forms, for instance bond prices in both yields and prices, stock splits, redemptions. Also price data can sometimes arrive in the wrong order. As a trivial example, two stocks may correlate to a useful degree, but around dividend payments or their announcement the correlation may temporarily fade away. Some stocks pay dividends in other currencies so there's another source of complexity.noisy=not just random but actively malicious, the data has been changed with the explicit goal of screwing with you. OK, that's not what happened, but as well as signals being lost in thermal/gaussian noise some of the values were entered incorrectly and/or the exchange meddled with them. It may be that the feed you get shows what it claims are prices, but some of them are various forms of pinging by others trying to find levels and don't represent actual quotes.Also there are issues about "when", timestamps are not always reliable or precise enough and if they are in two places that relativity course you took suddenly starts to have value.