Serving the Quantitative Finance Community

 
User avatar
Ciportne
Topic Author
Posts: 0
Joined: February 26th, 2010, 8:59 am

Matlab - Co-integration test on 'Very Large' Files

May 19th, 2010, 12:03 pm

Hi all,I am trying to perform a co-integration test on a file containing 10 years of daily closing equity prices and I'm trying to find the best approach. The file is a 6 Gig text file downloaded from the CRSP database (Center for Research in Security Prices). It contains daily closing prices of all stocks listed in the US during the last 10 years in a flat file (CSV format)The flat file contains approx 6 million lines with all the time series listed sequentially for each stock. I'm trying to figure out the best way to perform the analysis in Matlab. Especially considering that a number of the time series will have broken start/end dates (The CRSP database contains all stocks, listed and de-listed).BackgroundI am working on a project to backtest a modified version of Gatev, Goetzman, Rouwenhorst (2006). So I intend to run the Johansen test for co-integration. Although I am a competent programmer in VB6.0 and Excel VBA I'm not a Matlab expert. In VB6.0 or VBA I would perform a series of loops on the text file to extract a data subset, but since Matlab is matrix based I'm sure there is a more efficient method.Any suggestions appreciated
 
User avatar
bhamadicharef
Posts: 0
Joined: September 8th, 2007, 2:30 pm

Matlab - Co-integration test on 'Very Large' Files

May 21st, 2010, 5:08 am

Look at matlabcentral fileexchange for some routines to read CVS and also do co-integration test. With a 6GB file you a likely to get some memory issues... where is the CRSP database query page ? is it free ? Brahim
 
User avatar
Marine
Posts: 0
Joined: July 17th, 2003, 7:56 am

Matlab - Co-integration test on 'Very Large' Files

May 21st, 2010, 7:12 am

Dude this is just crazy and unrealistic! Do you know how many iterations you are trying to calculate?I understand what you are trying to do but you need to be realistic. Pick one sector and run a co-integration test on those stocks first. Expand your universe slowly and eventually you might get there.Make sure you run an ADF test on the individual time series to ensure it is non-stationary before running the Johansen test otherwise the results will be invalid.$.02
Last edited by Marine on May 20th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
Hansi
Posts: 41
Joined: January 25th, 2010, 11:47 am

Matlab - Co-integration test on 'Very Large' Files

May 21st, 2010, 9:01 am

Okay, first off you are not going to be able to work easily with 6GB of data in text file. You options are to download the data again in smaller increments, break it down with something like sed or load it into a database. I recommend the database route. Matlabs overhead for data is about 1.8*raw size so you'd need 12 GB of RAM just to load that file if Matlab actually supported that much RAM. Even Matlab 2010a 64bit has a max ram addressing of per single matrix that's less than 6GB. Also 10 years of data might give some false positives or negatives depending on the co-integration changes during the timeline so I recommend taking sub sampling not only with specific stocks but also with specific time frames.Best practice for this is to set up a DB, load the data in there and then take only a few series at a time, work with them, store the results and then clear the variables from memory and move to the next set of data.
 
User avatar
Church
Posts: 0
Joined: September 4th, 2007, 10:27 am

Matlab - Co-integration test on 'Very Large' Files

May 21st, 2010, 8:02 pm

If it's possible to do this, it probably only feasible in SAS, not in Matlab. Only SAS can work with files that large and also has lots of capabilities on time series etc.
 
User avatar
Hansi
Posts: 41
Joined: January 25th, 2010, 11:47 am

Matlab - Co-integration test on 'Very Large' Files

May 21st, 2010, 8:27 pm

QuoteOriginally posted by: ChurchIf it's possible to do this, it probably only feasible in SAS, not in Matlab. Only SAS can work with files that large and also has lots of capabilities on time series etc.Well there are mex c++ extensions to Matlab that allow you a large file usage similar to SAS but it's a commercial product.
 
User avatar
frenchX
Posts: 11
Joined: March 29th, 2010, 6:54 pm

Matlab - Co-integration test on 'Very Large' Files

May 22nd, 2010, 7:52 am

QuoteOriginally posted by: bhamadicharefWith a 6GB file you a likely to get some memory issues... I totally agree with that. You may experience out of memory error. matlab maximum matrix size according to the version and the OSI had a problem similar to yours with a file far much smaller (2^15 lines) but with a very memory consuming program (time frequency analysis with a sliding window). Even with a matlab 64 bits on a 64 bits os on a powerfull server (8 cores and 16 GB of ram) I had the "out of memory error". How did I manage to solve this problem ? exactly as Hansi said, by dividing my file into pieces. You make the calculus on a slide, you store the result and you flush the memory. Moreover for very hard calculus with high algorithm complexity, it may be faster to do that rather than to make a one shot. Sometimes dividing your file by 2 divide the computation times by more than 4 (It was the case with my problem)
 
User avatar
jlaipple
Posts: 0
Joined: February 4th, 2010, 12:36 pm

Matlab - Co-integration test on 'Very Large' Files

May 24th, 2010, 4:40 pm

I've done something very similar (calculating correlation/cointegration tests across all stocks on 10+ years of data). After attemping to use R, I eventually found the only way to handle this was in a traditionaly programming language C++/C#/VBA where you can iteratively add/remove what it's memory quite easily. I used C#. Memory issues definitely still exist unless you come up with ways to intelligently add/prune what's in your programs memory at which point. I wrote my applicaiton to perform multi-threaded analysis to take advantage of all 8 cores on my machine and it took 5 1/2 days with all 8 cores at 100% to complete the analysis on 15 years of data. The best thing you can do is to filter out stocks that definitely won't be relevant because of lack of volume, this significantlly reduces your working set.
 
User avatar
ronm
Posts: 0
Joined: June 8th, 2007, 9:00 am

Matlab - Co-integration test on 'Very Large' Files

May 26th, 2010, 9:56 am

If I can recall correctly, Johansen Rank test is valid only for few variables in the system, perhaps for n=10
 
User avatar
Yossarian22
Posts: 4
Joined: March 15th, 2007, 2:27 am

Matlab - Co-integration test on 'Very Large' Files

June 8th, 2010, 6:23 pm

Use Perl (Tie:File or DBI) and the perl fork manager 6 GB is a zip!
 
User avatar
asd
Posts: 17
Joined: August 15th, 2002, 9:50 pm

Matlab - Co-integration test on 'Very Large' Files

February 7th, 2011, 6:44 pm

Another option might be to dump the data into a MySQL database. Intersecting data, filtering, etc. will be faster
 
User avatar
ktang
Posts: 0
Joined: January 15th, 2010, 7:16 pm

Matlab - Co-integration test on 'Very Large' Files

February 13th, 2011, 3:15 pm

QuoteOriginally posted by: CiportneHi all,I am trying to perform a co-integration test on a file containing 10 years of daily closing equity prices and I'm trying to find the best approach. The flat file contains approx 6 million lines with all the time series listed sequentially for each stock. I'm trying to figure out the best way to perform the analysis in Matlab. Especially considering that a number of the time series will have broken start/end dates (The CRSP database contains all stocks, listed and de-listed).Fortunately i have access to an unlimited KDB+\Q environment. There i uploaded our 16million lines of data from a day and from matlab i can query the data to perform the cointegration test. But this is still a pain to wait.Some papers suggest to calculate the discrete fourier transformation and take only the first 60 or 80 coefficients to perform the cointegration test.