Serving the Quantitative Finance Community

 
User avatar
QuietStorm
Topic Author
Posts: 0
Joined: February 11th, 2003, 7:02 pm

Data Mining

July 25th, 2003, 4:05 am

Hi,What are your experiences and recommendations regarding data mining? Specifically, were you able to use off-the-shelf packages or did you write your own program(s)? If you used off the shelf products, I'd be interested in hearing recommendations regarding both commercial and open source packages. If you (or your company) wrote your own data mining software, I'd be interested in hearing why. What were the shortcomings in the commercially available products that made you feel the need to develop your own software? Thanks.
 
User avatar
DominicConnor
Posts: 41
Joined: July 14th, 2002, 3:00 am

Data Mining

July 25th, 2003, 7:06 am

Well, we're writing our own.Our reasons are partly cost, but we're being honest with ourselves that we don't really know what we want. I suppose you'd call it data prospecting rather than mining. We know there is useful data there in the forms of sequences and correlations, but don't know what yet.We don't wan't any black boxes, and the processing is a flimsy mix of SQL, C++ and Excel VBA.We can cope with the ugliness since once we find something to put into production, that will be hard coded.I've worked on DM products and one lesson I've learned is that they can't sensibly be shrink wrapped, you need a data miner who understands both the technology and what you're trying to find. I know of only one other person who could do our stuff, and he's in Oz.For us the killer is that nothing we've come across has the depth in math we require, one newbie to the project has been handed WIQF since we're analysing equity returns data.Performance is obviously an issue, one analysis I'm working on is competing with the heat death of the universe.
 
User avatar
Julianrcook
Posts: 0
Joined: July 14th, 2002, 3:00 am

Data Mining

July 25th, 2003, 3:43 pm

The best known (free) data mining software that is off-the-shelf is the weka library written by Ian Witten and Eibe Frank whilst they were writing their book 'Data Mining'. This is actually written in java, but the performance is quite good and it supports some quite sophisticated methods like Support Vector machines. Their book is excellent at explaining how to implement learning schemes.On the other hand most data mining methods were not written with Finance in mind, but there are two books that I recently purchased that specifically cover data mining and time series.The important question is:Are you trying to use Data Mining to predict Times series , or predict a probability distribution? Probably 99% of the literature deal with the former. The same is true for software - Most Finance software is neural network type for predicting time series, but there are some papers (no software) dealing with probability distributions (Weigend etc)