Serving the Quantitative Finance Community

 
User avatar
tagoma
Topic Author
Posts: 18597
Joined: February 21st, 2010, 12:58 pm

Re: Python tricks

April 23rd, 2021, 12:44 pm

Is there clean way to compare two separate dataframes in long format (dimension1, dimension2, .... dimensionN, value) ?
I have a pile of csv files, one new published each month, containing accumulated values and I need to show what has been going during each month. Any comment on this much welcome!

EDIT:  maybe I pass all the dimensions as as many index levels (multiindex feature of pd.DataFrame) while values are left as pandas columns. it is then easy to make the diff between rows in different columns.
 
User avatar
katastrofa
Posts: 10072
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Python tricks

April 23rd, 2021, 7:42 pm

You didn't give too many details, but is the diff enough to compare such multidimensional datasets?
Can the components be correlated? Python has lots of nice clustering algorithms, multidimensional scaling, etc. (mostly sklearn) for that. Functional data analysis, baby! :-D

You can simply do diff on a multidimensional df, I think.
 
User avatar
tagoma
Topic Author
Posts: 18597
Joined: February 21st, 2010, 12:58 pm

Re: Python tricks

April 23rd, 2021, 8:03 pm

thanks for the suggestion kat. apologies i wasn't clear enough earlier. y dataset was simple fundamental market data. i went with the multi-indexing solution that popped in my mind (the one I quickly  commented in my previous post). i hadn't visited the sklearn website for agesm i have to say. the library seems to have grown substantially.