January 2nd, 2011, 5:14 pm
At first I thought of making my own binary format for the t-s, still I want to reuse as much already available code as possible so I thought of using non-relational dbs. My (limited) understanding is that non-relational dbs essentially do that but I stand to be corrected, I'm not an expert in dbs. Regarding the data size, the initial data won't be that big. However, if I can implement a setup that scales with size, I will chose to do that. I guess that people who store tick data e.g. from eSignal do have tick-dbs of 10tb. Some years of tick data (for a couple of industry groups) could reach that(?). Again, I'm not 100% sure how to do this(*), hence this post if someone can share how they store their data (for backtesting purposes) I'll be very happy to hear how they worked out this part of their system.(*) e.g. I'm not aware of typical sizes for tick-data dbs if you want to trade two stock-exchanges, what is the typical size for fx dbs, what is the typical size for commodities dbs etc. Also how many GBs per day do such dbs typically increase? Also does it matter much if a db supports transactions or not? 10TB is super big for non-institutional setups today (in 3 yrs it may be ok) do people use storage devices from eg Dell for this sort of job or raid'in "normal" 2TB hdds are ok? The points you make about documentation are very concise and also some existing relational DBs have a huge userbase and are unlikely to cease in a couple of years. I have no clue how nosql will be in 5 years from now but I'm pretty confident that PostgresSQL and MySQL will still be around in one form or another. I had worked with MySQL in '02-'03 so while it probably has changed drastically, possibly some things I remember are still there. To make what I'm trying to do more clear, the purpose of this db will be to hold data for backtesting purposes. It will store timeseries of tick data and the backtesting platform will fetch these to check eg for pairs trading. If there are other aspects of a historical data db (e.g. more queries except fetching whole t-s from date1 to date2) that I should be considering, please do let me know!QuoteOriginally posted by: AKalmykovQuoteThe reason I chose to work with non-relational DB is that I don't see any advantage in having a relational DB for storing time series.But what are the advantages of non-relation db?QuoteDB will be faster if the DB is on the same drive instead of using a network db (cheaper as well).Cheaper - yes, faster - (should be) no (compare 6 Gbit/s SATA vs100 Gigabit Ethernet). In my opinion you should use non relation DB only if at least one of the following points are true1) your have terrific amount of data (>>10TB)2) your data is essentially "schema-less"3) all your database consist of several huge tables and you don't need to run any complex queries on several table (e.g. joins) 4) you need to execute heavy and full scan queries (like "full text search")I think non of above holds for your application. You can run relational db local or remotely. It doesn't matter. Also note that your expertise is very important when you choose your DB platform. Stick to something you have experience with or at least you have some good knowledge base. You can easily find a good book (or find some expert) how to work with and optimize Oracle or MySQL. I doubt that such reference exists for non relational db (but I could be wrong). Good luck.Disclaimer: I'm not non relational DB expert. But I'm quite proficient in old school relational databases. So I'm biased.