Serving the Quantitative Finance Community

 
User avatar
Hills1234
Topic Author
Posts: 0
Joined: March 13th, 2011, 10:44 am

"Large, complex and noisy data sets"

May 17th, 2012, 12:00 pm

Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lot
 
User avatar
EscapeArtist999
Posts: 0
Joined: May 20th, 2009, 2:49 pm

"Large, complex and noisy data sets"

May 17th, 2012, 1:07 pm

QuoteOriginally posted by: Hills1234Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lotHave you heard of the word googleplex? Or perhaps been out to a major city's junkyard?Honestly this is one of those sittuations where if you have to ask the price you probably can't afford it.
 
User avatar
Anomanderis
Posts: 0
Joined: November 15th, 2011, 10:07 pm

"Large, complex and noisy data sets"

May 17th, 2012, 7:33 pm

QuoteOriginally posted by: EscapeArtist999QuoteOriginally posted by: Hills1234Hi guys,I often see on job specs that they are looking for candidates who have worked with "large, complex and noisy data sets", How large? What are typical dimensions? What fields are within the data set? Bid/Ask Price, Volumne obviously...what else? And what do they mean by complex and noisy? Are they just referring to irregularly spaced time series data, missing values, outliers?...what else?Thanks a lotHave you heard of the word googleplex? Or perhaps been out to a major city's junkyard?Honestly this is one of those sittuations where if you have to ask the price you probably can't afford it.C'mon, help. Or don't help. No need to gloat about it.Here -> check this out. And always remember - google is your friend. large data
Last edited by Anomanderis on May 16th, 2012, 10:00 pm, edited 1 time in total.
 
User avatar
capafan2
Posts: 1
Joined: June 20th, 2009, 11:26 am

"Large, complex and noisy data sets"

May 17th, 2012, 7:43 pm

Also search for "Big Data Analytics". You will find a lot of firms do that. The software and hardware is commodity. You can practice using Apache Hadoop and Amazon AWS cloud.
 
User avatar
EscapeArtist999
Posts: 0
Joined: May 20th, 2009, 2:49 pm

"Large, complex and noisy data sets"

May 18th, 2012, 8:39 am

The OP isn't asking about how to get experience analysing large data sets - I suspect he is thinking of applying for quant dev or quant at a HF shop or has been rejected and wants to understand what these guys at highly secretive organisations do.I do not work with high freq data - and I would say to really know what the people mean you will actually have to go in the real world and network, not post the questions on wilmott.com
 
User avatar
DominicConnor
Posts: 41
Joined: July 14th, 2002, 3:00 am

"Large, complex and noisy data sets"

May 19th, 2012, 7:33 am

As a rough starting guide:large = the tools you currently use won't work on data that bigcomplex = the data is not in consistent forms, for instance bond prices in both yields and prices, stock splits, redemptions. Also price data can sometimes arrive in the wrong order. As a trivial example, two stocks may correlate to a useful degree, but around dividend payments or their announcement the correlation may temporarily fade away. Some stocks pay dividends in other currencies so there's another source of complexity.noisy=not just random but actively malicious, the data has been changed with the explicit goal of screwing with you. OK, that's not what happened, but as well as signals being lost in thermal/gaussian noise some of the values were entered incorrectly and/or the exchange meddled with them. It may be that the feed you get shows what it claims are prices, but some of them are various forms of pinging by others trying to find levels and don't represent actual quotes.Also there are issues about "when", timestamps are not always reliable or precise enough and if they are in two places that relativity course you took suddenly starts to have value.
 
User avatar
tu160
Posts: 0
Joined: October 23rd, 2007, 1:14 pm

"Large, complex and noisy data sets"

May 19th, 2012, 5:08 pm

several links to start:https://www.nyxdata.com/nysedata/defaul ... .pdfgoogle about FIX, TAQ, HDF5