October 11th, 2011, 11:47 am
Anyone have any good articles, book recommendations or personal experience for building an effective database structure for an equity fundamentals database for historical backtesting and stock screening etc?Note a single variable is about 45000 symbols, 350 months. There are hundreds of variables.Data is taken into R where the variables are used as matrices with symbols as rows and dates as columns.I tried going with these set ups in Oracle and Sybase:A)- Fields: Symbol, Variable, Date, Value- Seperate and clustered indices for all but value.- Data needs to be "melted"/pivoted to a mxn matrix for R (crazy memory inefficient)- Average query for a normal sample into R: 4-8 minutesB)- Each variable in a sperate table.- Fields: Symbol, Date, Value- Seperate and clustered indices for all but value.- Views added to cache common subsets (dunno if it helped at all...)- Data needs to be "melted"/pivoted to a mxn matrix for R (crazy memory inefficient)- Average query for a normal sample into R: 3-5 minutesC) [Should maybe have tried a column based database here]- Symbols and dates stored seperately and map to row and col numbers only- Each variable in a seperate table with symbols for rows and dates for columns- Really bad for where data maps on disk.- Data already in correct format for RAverage query for a normal sample into R: 1-3 minutesAll of these are too slow so decided to go with gzipped binary RData workspaces containing the variables which is about 5 sec to load locally and 20 sec over the network.Is there anything I can do to make the database route come anywhere close to the binary file speeds?Any suggestions welcome.
Last edited by
Hansi on February 12th, 2012, 11:00 pm, edited 1 time in total.