Serving the Quantitative Finance Community

 
User avatar
whoknew
Topic Author
Posts: 0
Joined: April 10th, 2008, 5:15 pm

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 8:14 pm

As an alternative to kdb+ from [kx] systems or Vhayu's Velocity software, is there any activity going on to develop the database software that high energy physicists use at CERN to store the massive amount of data produced by particle accelerators? (It's called root.)Does anyone know of a firm that has been successful implementing root? (As either a warehouse of as an in-memory db.) The link is:root.cern.ch/Thanks
 
User avatar
kziemski
Posts: 0
Joined: April 17th, 2005, 5:48 pm

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 9:07 pm

Wow! i have not seen this in a while. loved CINT the C++ interpreter.i have to tell you that even among C++ programmers with a physics background when i used to mention the software all but a few have even heard of it and were shocked that something like it existed.And i have never heard anyone in finance mention it by name. I'm not sure you can compare KDB and especially Vhayu to ROOT and i wouldn't even call the object persistence of ROOT to be a "database". I can see where you might want to go with ROOT in order to get some of the functionality but there is a lot you would have to build up. Vhayu is a really solid market data warehousing solution and KDB querying capability/performance would be match on your own. Can you tell me what/why you would like to use ROOT?
 
User avatar
whoknew
Topic Author
Posts: 0
Joined: April 10th, 2008, 5:15 pm

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 10:28 pm

Thanks for the input. I'm at a prop shop and we have an inhouse solution that handles extremely granular data but it's not realtime. We're looking for a solution to let us publish realtime computations on unusually granular input to both research and trading applications. And we'd like to store the data together with a subset of derived data. It's a lot of data per day. On the order of .4TB +/- .2TB. My familiarity with ROOT is that I've heard a Fermi Lab physicist mention it in a talk and I've been to the home page (basically). If it's the wrong direction then thanks for letting me know but we're not afraid of doing low level programming. My concern with a KX or Vhayu solution is that we can't customize it well enough. Especially given our performance standards. And there's no sense in purchasing a product that has a huge set of pre-written analytics when we're going to write our own anyway (and not tell them what we're doing with it).
 
User avatar
kziemski
Posts: 0
Joined: April 17th, 2005, 5:48 pm

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 11:16 pm

if all i had to go on was the description my gut reaction is i don't think ROOT is what you want to use.from the size of the data i have a feeling we are talking about US equity TAQ data right? Vhayu will tell you they deal with that kind of flow and they have the handlers built in for alot of different sources. BTW the Reuters Tick Capture Engine is actually Vhayu which might kill two birds with one stone. KDB's appeal is that querying capability. check out their documentation on that. if the language appeals to you then really go with them. but if its just storage and retrieval its overkill. Windows environment? If you are not afraid to do low level programming then a third option is memory mapped files. I've done heavy systems programming outside of the finance world and this is what i would use assuming i want the most control. The size of the data and your descriptions about computation give me some idea about why u thought ROOT might be a good idea.If you want, I can guide further if you are willing to divulge a little more.email me at kziemski at gmail dot com.
 
User avatar
quantie
Posts: 20
Joined: October 18th, 2001, 8:47 am

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 11:48 pm

long years ago smartquant when it was open-source used CINT and ROOT framework. I am not sure if it still does but it is worth checking.
 
User avatar
kziemski
Posts: 0
Joined: April 17th, 2005, 5:48 pm

financial data storage/analysis using CERN's "root" software

November 18th, 2008, 11:52 pm

thats interesting didn't know that. i think now the api is .net only right?
 
User avatar
whoknew
Topic Author
Posts: 0
Joined: April 10th, 2008, 5:15 pm

financial data storage/analysis using CERN's "root" software

November 19th, 2008, 2:04 pm

Kziemski-Before we go offline I don't mind getting more input from the community. "from the size of the data i have a feeling we are talking about US equity TAQ data right? "What I meant by 'granular data' is book depth. A selection of equities & futures, mostly. Thanks, I didn't know Reuters Tick Capture Engine is Vhayu. I've got a feeling these commercial products won't allow enough control - can I change how they've written their Reuters feedhandler, for example? Also, it looks like a pretty small commercial competition, only 2 high-end competitors? Hey quantie, thanks for the heads up about the R-Finance Conference on Dec 4th! I'll look into smartquant.
 
User avatar
kziemski
Posts: 0
Joined: April 17th, 2005, 5:48 pm

financial data storage/analysis using CERN's "root" software

November 19th, 2008, 2:33 pm

just an offer in case you get stuck. the details affect the choice and so if you were reluctant to broadcast details etc...if you are trying to simulate an order book for markets then the database should store orders added, removed and matches in a serialized form. i actually had the privilege to look into how Island ECN back in the day did their matching engine and it was a combination of Java and the berkley db as the backing store. They actually included Java code on order book construction for customers of their ITCH market data feed which was distributed their entire activity in realtime. but that code is almost certainly outdated. vhayu will customize their solution to your needs and that service is provided to you with the product. that product is expensive and if you're justified to ride them for what ever you need. vhayu, kdb aren't the only ones their are ton of compaines that sell themselves as "TickerPlant" engines. they end up occupying the top floor of the SIFMA conference which might be coming up soon actually. don't know whats your timeframe?. if you don't have the sources for market data yet already provisioned you can check out LIM their products great with a really interesting query language and speed wise they are very close to realtime i think bindings in C++ , java,python(?), .net .
 
User avatar
whoknew
Topic Author
Posts: 0
Joined: April 10th, 2008, 5:15 pm

financial data storage/analysis using CERN's "root" software

November 19th, 2008, 9:42 pm

It's very interesting to see how the real time orderbook depth is handled. It brings up a lot of issues regarding efficient realtime computations vs efficient historical storage. And yes I've looked at LIM (a few times) and was not impressed. Yes, we have our market data sources straight. I was trying to investigate alternatives to the commercial timeseries data warehouse vendors. I think I'll strike ROOT off the list of possibilities. Do you have further suggestions about commercial vendors or what rudiments to start with if we were the ambitious DIY types? SIFMA is in April of 2009.
 
User avatar
bojan
Posts: 0
Joined: August 8th, 2008, 5:35 am

financial data storage/analysis using CERN's "root" software

November 20th, 2008, 8:14 am

QuoteOriginally posted by: whoknew Do you have further suggestions about commercial vendors or what rudiments to start with if we were the ambitious DIY types? Before you dive into memory mapped files, I would suggest looking into HDF5:documentationThese libraries have a long track record and should easily cope with the amounts of data that you mentioned. They are also used in high data volume scientific applications.Also there is a nice python binding called PyTables homepage
 
User avatar
kziemski
Posts: 0
Joined: April 17th, 2005, 5:48 pm

financial data storage/analysis using CERN's "root" software

November 20th, 2008, 4:04 pm

hdf is a good backing store and i have used them when i needed to store down huge datasets that i then need in a file in order to transfer. but its really only a storage format and doesn't provide anything in terms of indexing etc. was going to say more about the rest of your question later.