September 5th, 2007, 7:13 pm
Matlab (at least on 32-bit workstations) basically can't deal with datasets that occupy more than 300MB or so of RAM. If you can naturally write your code so that you never need more than that loaded at a time (eg process 1 day at a time and cache results) then I would recommend matlab, if not then use either C++ or whatever you're currently using to pre-chew the data and then use Matlab to postprocess it (there's a way to wrap C++ code as matlab native functions btw, or store intermediate data in text or hdf5 files). I would recommend Matlab because it's vastly better documented and has a vastly better GUI/debugger/code editor/command line etc. integration than either Splus or R. You'd want optimisation, statistics and database toolboxes, plus maybe some of the fancier analytical ones such as GARCH. I wouldn't bother with the financial toolbox - I didn't find anything in it that I couldn't replicate in 30 mins or so. On the other hand, Statistics toolbox has a lot of data mining stuff that you might use - PCA, decision trees, factor analysis, whatnot.The advantage of R (besides being free, which is irrelevant if you're at a commercial org) is really advanced statistical algos, but unless you already know what they are and how to use them, I don't see them doing you much good. And personally, I find the language itself a disaster, though many intelligent people disagree. Splus is, last I looked, a really expensive version of R with less functionality - but I may be wrong there, of course.