Page 1 of 1

"Large, complex and noisy data sets"

Posted: May 17th, 2012, 12:00 pm
by Hills1234
Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lot

"Large, complex and noisy data sets"

Posted: May 17th, 2012, 1:07 pm
by EscapeArtist999
QuoteOriginally posted by: Hills1234Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lotHave you heard of the word googleplex? Or perhaps been out to a major city's junkyard?Honestly this is one of those sittuations where if you have to ask the price you probably can't afford it.

"Large, complex and noisy data sets"

Posted: May 17th, 2012, 7:33 pm
by Anomanderis
QuoteOriginally posted by: EscapeArtist999QuoteOriginally posted by: Hills1234Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lotHave you heard of the word googleplex? Or perhaps been out to a major city's junkyard?Honestly this is one of those sittuations where if you have to ask the price you probably can't afford it.C'mon, help. Or don't help. No need to gloat about it.Here -> check this out. And always remember - google is your friend. large data

"Large, complex and noisy data sets"

Posted: May 17th, 2012, 7:43 pm
by capafan2
Also search for "Big Data Analytics". You will find a lot of firms do that. The software and hardware is commodity. You can practice using Apache Hadoop and Amazon AWS cloud.

"Large, complex and noisy data sets"

Posted: May 18th, 2012, 8:39 am
by EscapeArtist999
The OP isn't asking about how to get experience analysing large data sets - I suspect he is thinking of applying for quant dev or quant at a HF shop or has been rejected and wants to understand what these guys at highly secretive organisations do.I do not work with high freq data - and I would say to really know what the people mean you will actually have to go in the real world and network, not post the questions on wilmott.com

"Large, complex and noisy data sets"

Posted: May 19th, 2012, 7:33 am
by DominicConnor
As a rough starting guide:large = the tools you currently use won't work on data that bigcomplex = the data is not in consistent forms, for instance bond prices in both yields and prices, stock splits, redemptions. Also price data can sometimes arrive in the wrong order. As a trivial example, two stocks may correlate to a useful degree, but around dividend payments or their announcement the correlation may temporarily fade away. Some stocks pay dividends in other currencies so there's another source of complexity.noisy=not just random but actively malicious, the data has been changed with the explicit goal of screwing with you. OK, that's not what happened, but as well as signals being lost in thermal/gaussian noise some of the values were entered incorrectly and/or the exchange meddled with them. It may be that the feed you get shows what it claims are prices, but some of them are various forms of pinging by others trying to find levels and don't represent actual quotes.Also there are issues about "when", timestamps are not always reliable or precise enough and if they are in two places that relativity course you took suddenly starts to have value.

"Large, complex and noisy data sets"

Posted: May 19th, 2012, 5:08 pm
by tu160
several links to start:https://www.nyxdata.com/nysedata/defaul ... .pdfgoogle about FIX, TAQ, HDF5