October 1st, 2006, 7:22 pm
QuoteOriginally posted by: ZmeiGorynychDoes anyone have firsthand experience with python for data exploration/prototyping/playing around with ideas (especially on datasets >> 100MB)? I've used matlab so far, but its inability to cope with mildly large datasets is starting to really annoy me (I give the VM 1Gb of RAM, and it still chokes on datasets of around 150Mb).From a first glance, it seems that the python language structure has most of the features I like in matlab (dynamic; concise; good debugger; hashes, functions, primitives and classes are first-class objects), and NumPy/SciPy would give most of the linear algebra etc. functionality that I want. Also, eclipse + PyDev Extensions look like an IDE almost equal to that of matlab, with decent debugging, code completion, etc. Are there any snags in switching to the above combo from matlab, that are not obvious before one tries? I remember hearing somewhere that one IB was using python a lot - any idea which?I've never spent that much time with matlab, but python is great for numeric analysis with some caveats. Not sure how far >> than 100MB you want to go, but I highly recommend having 2-4x physical ram over the data set to allow for fragmentation and temp arrays when using say numpy. I prefer to keep large datasets in numpy arrays to avoid per datum memory overheads. Given 64 bit machine and plenty of RAM then you can prototype very quickly. You can use weave.blitz to get rid of the temps and get pretty big speedups. matplotlib is great as well for plotting. Its not that hard to hack something up to use mmap backed arrays if you want to use datesets > physical RAM. Depending on how big the data you are wanting to play with, pytables might be useful for you.R is great as well btw.