Serving the Quantitative Finance Community

 
User avatar
ashkar
Topic Author
Posts: 0
Joined: October 17th, 2011, 9:25 am

technology stack for analysing 1Tb dataset

September 8th, 2015, 10:05 am

What would you say is the cheapest way (in terms of building infrastructure) to analyse trade data potentially around 1Tb size? The data that we will be calculating is historical prices and greeks for various exotics/structured deals with a large super-set of pre-calculated hedge trades. The operations we want to do on it to run optimisation algorithms to find the best hedge strategy/models based on some metrics.I'm not familiar with any big data technologies and being fo/hacky environment we don't want to own a complex technology stack which would require dedicated support etc. So looking for simple ideas e.g. KDB + Python/Pandas. I'm not familiar with hadoop type technologies but solution could possibly include a cluster if setup is easy.Thx
 
User avatar
rmax
Posts: 374
Joined: December 8th, 2005, 9:31 am

technology stack for analysing 1Tb dataset

September 8th, 2015, 1:14 pm

Depends on some parameters. People are always banging on about Hadoop being cheap and easy and being able to do everything whether you want it to or not.Cloud services are another possibility (but dependent on the security etc). Azure looks pretty good, but have not had an opportunity to use/implement etc yet.
 
User avatar
DominicConnor
Posts: 41
Joined: July 14th, 2002, 3:00 am

technology stack for analysing 1Tb dataset

September 13th, 2015, 12:08 pm

"Cost" has several factors, the biggest is normally people not hardware or software, so the easiest option is typically the cheapest.Also thje "real time"-ness of the calculations matter, are you seeking out a set of stratgies to run with ?That's a big batch job that requires one styack, wheres reacting to changes requires a different one.
 
User avatar
bojan
Posts: 0
Joined: August 8th, 2008, 5:35 am

technology stack for analysing 1Tb dataset

September 14th, 2015, 11:54 am

You can easily buy a machine with 1 TB of RAM today (for maybe 15k USD?). So you don't really need "bid data" technologies unless you want to use them for their parallelisation capabilities rather then for their data volume handling capabilities.
 
User avatar
spacewiz
Posts: 0
Joined: September 12th, 2015, 6:58 pm

technology stack for analysing 1Tb dataset

September 15th, 2015, 12:40 am

If you wish to minimize/avoid infrastructure investment and maintenance efforts - I would go with a cloud solution, of which Amazon Web Services is one of the best candidates that provides a good combination of cost, security, and reliability.
 
User avatar
sw88
Posts: 0
Joined: September 10th, 2015, 4:44 pm

technology stack for analysing 1Tb dataset

September 15th, 2015, 5:59 am

I would go for 1. Python because of post trade analysis simplicity2. C++ because the library support is abundant, and the fields you requested is not so much.I would not go for SQL I because the data structure you are countering is determined.
 
User avatar
sw88
Posts: 0
Joined: September 10th, 2015, 4:44 pm

technology stack for analysing 1Tb dataset

September 15th, 2015, 6:03 am

Why do you need 1tb data set by the way? Blpapi is good enough to get all the things needed.
 
User avatar
ISayMoo
Posts: 2332
Joined: September 30th, 2015, 8:30 pm

technology stack for analysing 1Tb dataset

October 1st, 2015, 11:15 am

QuoteOriginally posted by: rmaxDepends on some parameters. People are always banging on about Hadoop being cheap and easy and being able to do everything whether you want it to or not.Cloud services are another possibility (but dependent on the security etc). Azure looks pretty good, but have not had an opportunity to use/implement etc yet.Hadoop is not easy, it's not cheap and certainly isn't able to do "everything". Using Hadoop for N-to-N problems (many inputs, many outputs) is non-trivial. And it doesn't scale automagically.Been there, got the scars.
 
User avatar
DominicConnor
Posts: 41
Joined: July 14th, 2002, 3:00 am

technology stack for analysing 1Tb dataset

October 3rd, 2015, 8:27 am

Blpapi would be pretty slow for a lot of this stuff.
 
User avatar
kuentang
Posts: 0
Joined: October 20th, 2013, 2:20 pm

technology stack for analysing 1Tb dataset

October 18th, 2015, 4:06 pm

Go for kdb+ and R.Kim
 
User avatar
Searay
Posts: 10
Joined: May 18th, 2014, 4:55 pm

Re: technology stack for analysing 1Tb dataset

November 5th, 2016, 7:49 pm

Blpapi would be pretty slow for a lot of this stuff.
Sorry for the late revival, but if you have any experience with design/deployment of a database in context of minimizing cost to maintain blackbox+db in "the cloud" versus an in-house server, could you comment? I'm looking to store 3 months' worth of tick data across 30 US futures instruments, plus 8-10 years of 1-hour OHLC bar data past that, and the overall storage capacity projected plus the monthly transfer limits between a cloud solution like, say, Amazon, and my colo'd server to Chicago gets costs over $400/mo quickly. Versus going with a home server for the db, where speed will suffer, but then again, such a thing can be scheduled to occur in the after-hours.
Also, if you're using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?