Serving the Quantitative Finance Community

 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Choice of matrix library and interface

November 21st, 2011, 6:52 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnQuoteOriginally posted by: outrunThat's thr exact point, if they use the top line (map, I suspect they do: they call their default "map_std") instead of the bottom line (unordered map) then that is 100 times slower! My hunch is that sparse should only be 5x slower than dense with hash-keys, mauve even on par due to the much higher cache hit on the smaller memory footprint.100 times slower more or less is what I see as well.Nice! Too bad I have a deadline tomorrow that I need to work on all night . Would be nice to see if changing map to unordered_map solves that!Um... if you happen to be testing on MSVC, I wouldn't expect any improvement right now.Take a look here: http://www.codeproject.com/KB/cross-pla ... Net.aspxIn particular, the "C++ vs C#: Hash tables" and the "C# vs C++: Sorted maps" parts."This test pits two types of .NET Dictionary against the equivalent Microsoft C++ hash_maps (or unordered_map in VS2010). First I tried an <int, int> hashtable, then a <string, string> hashtable.""Why is the difference so large? Well, I suspect Microsoft's (Dinkumware) hash_map is simply terrible.""Strangely, however, the MS STL's map<K,V> is actually faster than its hashtable (hash_map or unordered_map), although not as fast as my hash_map. "That being said, I expect the performance to improve in futures versions of MSVC (and there shouldn't be problems with other compilers).At the same time, boost::unordered_map also already behaves better on MSVC (not mentioned in the benchmark above, to my recollection):http://groups.google.com/group/comp.lan ... a34d3c618a
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Choice of matrix library and interface

November 21st, 2011, 7:06 pm

Well, I wrote a CGM using Boost mapped_matrix vs dense matrix. Maybe some calls are not optimal, but it is not clear where bottleneck is. The code is here If we can run it on non MSVC we can compare apples with apples.BTW what will we do with sparse matrices in this project?
Last edited by Cuchulainn on November 20th, 2011, 11:00 pm, edited 1 time in total.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Choice of matrix library and interface

November 21st, 2011, 7:28 pm

Quotein the first link, VS2010 C++ has the same performance as C#, right? That shows that both might be optimized now.Negative, "With an occasional exception, C# wins by a landslide! One particular C++ scenario, x86 VC10 with SSE2, runs better than the others and when handling strings, outperforms C#. However, in most situations VC++ loses big time. The first test, for example, runs over 9 times faster in .NET4 x64 compared to C++ VC10 SSE2."He only got an improvement when he used his own implementation instead of the standard (std::unordered_map) one:"Anyway, I can say with certainty that it's not the C++ compiler's fault. I know this because I wrote my own version of hash_map for my company, and my implementation runs roughly ten times faster (with int keys). I knew I got better performance from my hashtable, but I didn't realize how dramatic the difference could be until I saw the numbers. Here are the results based on my implementation: [...]As you can see, C++ is much more competitive now. It wins some races, and loses others. My hash_map is heavily inspired by the design of .NET's Dictionary, and I am unable to guess why my version outperformed C# while doing int queries and removals, yet not inserts."QuoteHowever, in the google groups like shows the performance gain of unordered_map 0.6sec vs map 17sec.Yes -- which emphasizes that it wasn't the C++ compiler's fault per se, but rather the particular implementation (std::unordered_map). Consequently, when switching to better implementations, the problem is addressed: both the custom implementation ("my hash_map") and another reference implementation (boost::unordered_map) have no performance problems. However, that still means that std::unordered_map has performance problems, and they appear to be rather serious on as recent version as MSVC 2010."Again, certain "bad" parts of the Microsoft (Dinkumware) STL, like hash_map/unordered_map, are dramatically slower than the equivalent C# and should be avoided."
Last edited by Polter on November 20th, 2011, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Choice of matrix library and interface

November 21st, 2011, 7:29 pm

OOPS sorry, wrong one it should be:
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Choice of matrix library and interface

November 21st, 2011, 7:40 pm

Cuch, thanks for the code. Could you highlight the parts of code you're interested in profiling first? Ideally: put the calls to boost::timer member-functions such as these: boost::timer Timer; // start timing int two = 1 + 1; // perform time consuming operation std::cout << Timer.elapsed() << " s\n"; // report elapsed time Timer.restart(); // restart timing int four = two + two; // perform another time consuming operation std::cout << Timer.elapsed() << " s\n"; // report elapsed timeIncidentally, on MSVC:QuoteSetting environment for using Microsoft Visual Studio 2010 x86 tools.TestConjugateGradient.cppTestConjugateGradient.cpp(160) : warning C4018: '<' : signed/unsigned mismatchTestConjugateGradient.cpp(165) : warning C4018: '<' : signed/unsigned mismatchTestConjugateGradient.cpp(196) : warning C4018: '<' : signed/unsigned mismatchTestConjugateGradient.cpp(201) : warning C4018: '<' : signed/unsigned mismatchTestConjugateGradient.cpp(89) : warning C4018: '<=' : signed/unsigned mismatch; TestConjugateGradient.cpp(171) : see reference to function template instantiation 'void ConjugateGradient<boost::numeric::ublas::matrix<T>>(const M &,const boost::numeric::ublas::vector<T> &,boost::numeric::ublas::vector<T> &) with [T=double,M=boost::numeric::ublas::matrix<double>]' being compiledTestConjugateGradient.cpp(86) : warning C4101: 'tmp' : unreferenced local variable You should be able to get rid of some of those mismatch warnings by just getting rid of "long" loop counter (just change it to std::size_t).
Last edited by Polter on November 20th, 2011, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Choice of matrix library and interface

November 21st, 2011, 7:58 pm

QuoteCuch, thanks for the code. Could you highlight the parts of code you're interested in profiling first? Ideally: put the calls to boost::timer member-functions such as these:Hi Polter,CGM for mapped_matrix.It was an exercise in mapping algebra to C++ but is not clear to me what the Complexity is, e.g. is x += a faster than x = x + a etc. I have no idea.
Last edited by Cuchulainn on November 20th, 2011, 11:00 pm, edited 1 time in total.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Choice of matrix library and interface

November 21st, 2011, 8:07 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: PolterQuotein the first link, VS2010 C++ has the same performance as C#, right? That shows that both might be optimized now.Negative, "With an occasional exception, C# wins by a landslide! One particular C++ scenario, x86 VC10 with SSE2, runs better than the others and when handling strings, outperforms C#. However, in most situations VC++ loses big time. The first test, for example, runs over 9 times faster in .NET4 x64 compared to C++ VC10 SSE2."He only got an improvement when he used his own implementation instead of the standard (std::unordered_map) one:"Anyway, I can say with certainty that it's not the C++ compiler's fault. I know this because I wrote my own version of hash_map for my company, and my implementation runs roughly ten times faster (with int keys). I knew I got better performance from my hashtable, but I didn't realize how dramatic the difference could be until I saw the numbers. Here are the results based on my implementation: [...]As you can see, C++ is much more competitive now. It wins some races, and loses others. My hash_map is heavily inspired by the design of .NET's Dictionary, and I am unable to guess why my version outperformed C# while doing int queries and removals, yet not inserts."QuoteHowever, in the google groups like shows the performance gain of unordered_map 0.6sec vs map 17sec.Yes -- which emphasizes that it wasn't the C++ compiler's fault per se, but rather the particular implementation (std::unordered_map). Consequently, when switching to better implementations, the problem is addressed: both the custom implementation ("my hash_map") and another reference implementation (boost::unordered_map) have no performance problems. However, that still means that std::unordered_map has performance problems, and they appear to be rather serious on as recent version as MSVC 2010.ok, that's clear. So suppose we have a *good* implementation, then a sparse matrix should benefit from it because an ordered map does unneeded ordening.It's still interesting even then! I'm assuming you're referring to hopefully-constant (and hopefully-avoiding worst-case linear) complexity instead of logarithmic in some of the operations (inserting, erasing, searching) -- and to that extent I agree with you, if those are the kinds of operations that are most common for the application (sparse matrices) and we hope to avoid the kind of data that could cause the worst-case complexity, then I expect unsorted to be faster.However, in general it's hard to tell in advance, without profiling:Complexity guarantees -- depends on the operation, e.g., construction of empty container is O(1) for (sorted) associative containers, but O(n) for unordered associative containers (hash maps)Interface -- also depends, e.g., iterator invalidation is never possible for the (sorted) ones:See tables comparing (ordered)map with unordered map here:http://www.boost.org/doc/libs/release/d ... son.htmlIn particular, even O(1) vs. O(log N) is not always as clear cut as it might seem, those constants can get pretty large, in particular in case of associative containers:http://lanzkron.wordpress.com/2010/06/0 ... utzes/From the section "Hash map":Quote"std::map has logarithmic complexity, a no-brainer would be to use a constant time cache, luckily Visual Studio 9 supplies std::tr1::unordered_map, that's bound to improve things.Nope, 26.22, things are just going from bad to worse (317%), how can a constant speed algorithm do worse than a logarithmic one? Well it all depends on your constants, it's easy to forget that as programmers we mostly use bounded numbers; even logarithmic complexity on a 32 bit number is constant if you take the big view. Anyway things aren't that bad, if we try using boost::unordered_map instead of the Visual Studio's implementation we "only" take 14.57 seconds (176%).Lesson 1: I wouldn't go so far to say that MS's implementation of unordered_map is inferior but you should remember that you depend on the quality of the implementation you're using and some implementations will work better with some data sets."
Last edited by Polter on November 20th, 2011, 11:00 pm, edited 1 time in total.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Choice of matrix library and interface

November 21st, 2011, 8:49 pm

Cuch -- do you think you could cut all the code that is not directly relevant for (nor a required dependency of) the task being profiled?Since you're the author it might be somewhat more optimal than if either me or outrun were to do that :-)Just a quick and rough estimate of the total execution time:// Now try it with sparse matricesu::matrix<double> mm(n,n);$ time TestConjugateGradient_matrix.exereal 0m0.059suser 0m0.000ssys 0m0.015s// Now try it with sparse matricesu::mapped_matrix<double> mm(n,n);$ time TestConjugateGradient_mapped_matrix.exereal 0m0.075suser 0m0.000ssys 0m0.015sIs this roughly what you get?
Last edited by Polter on November 20th, 2011, 11:00 pm, edited 1 time in total.