Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 15th, 2015, 10:40 am

C++11 does not have concurrent containers (which is weird IMO) (e.g. Queue) but Microsoft's PPL does supportOk, cest la vie. What to do;1. Use PPL and not C++11 concurrency2. Use C++11 for thread/tasks and PPL for data structures3. Write thread-safe wrappers for STL data containers.?
Last edited by Cuchulainn on August 14th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Hansi
Posts: 41
Joined: January 25th, 2010, 11:47 am

C++ 11 Concurrency

August 15th, 2015, 11:10 am

Yes it's by design: http://blogs.msdn.com/b/vcblog/archive/ ... 42.aspxUse a library PPL, Intel TBB or google-concurrency-library
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 15th, 2015, 11:17 am

QuoteOriginally posted by: HansiYes it's by design: http://blogs.msdn.com/b/vcblog/archive/ ... 42.aspxUse a library PPL, Intel TBB or google-concurrency-libraryTheir rationale seems to beQuoteTaking Stephan?s words here, the reality is that they aren?t, not as a bug but as a feature: having every member function of every STL container acquiring an internal lock would annihilate performance. As a general purpose, highly reusable library, it wouldn?t actually provide correctness either: the correct level to place locks is determined by what the program is doing. In that sense, individual member functions don?t tend to be such correct level. Sure. If that is so, then I agree. But who's asking them? why not have 2 versions?BTW I see C++11 does not support thread_group nor interruption_point() (which Boost does).
Last edited by Cuchulainn on August 14th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

C++ 11 Concurrency

August 15th, 2015, 3:16 pm

Regarding parallelism: It seems that parallel algorithms are coming first:http://www.open-std.org/JTC1/SC22/WG21/ ... fRegarding concurrency: Composable futures, task group, etc., are coming, too:http://www.open-std.org/JTC1/SC22/WG21/ ... 01.htmlThe good news is that both Parallelism TS and Concurrency TS are on the fast track to approval:http://isocpp.org/blog/2015/06/trip-rep ... meetingOne note on the concurrent containers -- these are non-trivial if you want a solution that will scale well into the future -- which includes working well with future hardware as well as future problems, with a design that will stand the test of time (we wouldn't necessarily want to end up with another vector<bool> or valarray...).One issue to take into account is transactional memory (instead of, say, locks -- which would defeat the purpose); Transactional Memory TS is very much worth a look in this context (as well as the Intel's TSX on the hardware side).Theoretically (and, as far as concurrency is concerned, getting the fundamentals right is not necessarily a bad idea before jumping right to the practical implementation) there are some unsolved problems here.For an illustration of what I mean and why is this important, I'd recommend "Unlocking Concurrency": http://queue.acm.org/detail.cfm?id=1189288First, it's indeed highly recommended to use an existing, mature, battle-tested library solution (quality matters -- the flip-side being: there are only so many high-quality choices available, including folks / organizations willing to spend their time / resources to write and defend proposals in this area...):QuoteIt is possible to write a concurrent hash-map data structure using locks so that you get both concurrent read accesses and concurrent accesses to disjoint data. In fact, the recent Java 5 libraries provide a version of HashMap, called ConcurrentHashMap, that does exactly this. The code for ConcurrentHashMap, however, is significantly longer and more complicated than the version using coarse-grained locking. The algorithm was designed by threading experts and it went through a comprehensive public review process before it was added to the Java standard. In general, writing highly concurrent lock-based code such as ConcurrentHashMap is very complicated and bug prone and thereby introduces additional complexity to the software development process.However, at the same time -- and despite the occasional optimistic expectations -- in fact, having a concurrent hash map in the standard library doesn't (and wouldn't) make "concurrency with hash maps" a solved problem (if the programmers actually want composability -- which is pretty fundamental for programs that are to stand the test of time):QuoteAlthough highly concurrent libraries built using fine-grained locking can scale well, a developer doesn?t necessarily retain scalability after composing larger applications out of these libraries. As an example, assume the programmer wants to perform a composite operation that moves a value from one concurrent hash map to another, while maintaining the invariant that threads always see a key in either one hash map or the other, but never in neither. Implementing this requires that the programmer resort to coarse-grained locking, thus losing the scalability benefits of a concurrent hash map (figure 3A). To implement a scalable solution to this problem, the programmer must somehow reuse the fine-grained locking code hidden inside the implementation of the concurrent hash map.It remains to be seen if, say, HTM / STM is the way to go (and which design point -- or points -- along the continuum of choices is -- or are -- "right").But the point about locks stands -- if the wish is to have STL containers with locks, a strictly superior approach (complexity _and_ performance) is to recompile the workload as a single-threaded application, since using a locks-based "concurrent" data structure would completely and purposefully defeat any benefits anyway:http://a248.e.akamai.net/f/1097/1823/7m ... 8/fig2.jpg
Last edited by Polter on August 14th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 16th, 2015, 7:00 pm

QuoteFirst, it's indeed highly recommended to use an existing, mature, battle-tested library solution (quality matters -- the flip-side being: there are only so many high-quality choices available, including folks / organizations willing to spend their time / resources to write and defend proposals in this area...):Which ones? C++ concurrency is missing functionality that is standard e.g. reduction vars, loop-level parallelization, producer-consumer.// valarray is a behemoth and is mathematically not even wrong. Transcendental functions of valarrays is mathematical nonsense.
Last edited by Cuchulainn on August 15th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 16th, 2015, 7:09 pm

QuoteOriginally posted by: PolterRegarding parallelism: It seems that parallel algorithms are coming first:http://www.open-std.org/JTC1/SC22/WG21/ ... 7.pdfSeems reasonable. if you know the interfaces of let's say queue (deque(), enqueue()) then you have to define an extra Bridge to the hardware?queue<T, Impl = std::queue> (using template template param).I suppose getting this into the language is more difficult.Anyways, this seems to be the way C# evolved. (See Joe Duffy's book on C# "Concurrent Programming on Windows".) PPL queue
Last edited by Cuchulainn on August 15th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 18th, 2015, 7:38 am

QuoteOriginally posted by: CuchulainnQuoteOriginally posted by: PolterRegarding parallelism: It seems that parallel algorithms are coming first:http://www.open-std.org/JTC1/SC22/WG21/ ... 7.pdfSeems reasonable. if you know the interfaces of let's say queue (deque(), enqueue()) then you have to define an extra Bridge to the hardware?queue<T, Impl = std::queue> (using template template param).I suppose getting this into the language is more difficult.Anyways, this seems to be the way C# evolved. (See Joe Duffy's book on C# "Concurrent Programming on Windows".) PPL queueDoug Schmidt has solved these issues by 1999 in a bunch of articles/books/ACE code by 1999e.g.Strategized Locking, Thread-safe Interface, and Scoped LockingPatterns and Idioms for Simplifying Multi-threaded C++ ComponentsI would expect C++ Concurrency will have something similar when the time comes.BTW The terms 'future' and 'promise' in the "C++ Concurrency in Action" book have been introduced without historical reference but they are in fact at least 30 years old.I think there is lots to do..
Last edited by Cuchulainn on August 17th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 20th, 2015, 7:13 am

This code compiles and runs under VS2015 but the values of p1 and p2 are strange after the exchange.. ?? It is either a bug _or_ atomic_exchange is being called wrong.. (curious syntax, BTW) ...(&p1, p2)...
Last edited by Cuchulainn on August 19th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

C++ 11 Concurrency

August 20th, 2015, 3:49 pm

`std::atomic_exchange` replaces (overwrites) the first operand (passed-by-pointer) with the second operand (passed-by-value). If you'd like to access the former value use the return value:http://en.cppreference.com/w/cpp/atomic ... ptr/atomic
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 20th, 2015, 5:01 pm

Got it! thanks.The docs say '...(&sp,sp2) Exchange values of sp and sp2". A bit confusing.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

C++ 11 Concurrency

August 20th, 2015, 6:46 pm

Yeah, it is a bit tricky. Especially since the standard says the following:template<class T>shared_ptr<T> atomic_exchange_explicit(shared_ptr<T>* p, shared_ptr<T> r, memory_order mo);Requires: p shall not be null.Effects: p->swap(r).Returns: The previous value of *p.Throws: Nothing.The "tricky" part is that, even though `swap` is indeed the specified effect, the second operand `r` is taken by value (as in a copy, possibly optimized).So, one can conceptually think of this in terms of an atomic swap being performed between the first operand -- and a _temporary copy_ of the second operand (which then gets discarded).Well, atomics are fun, especially with relaxed memory order ;>
Last edited by Polter on August 19th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

C++ 11 Concurrency

August 21st, 2015, 5:37 am

I guess std::atomic<> is a welcome addition. For example, std::atomic<bool> can be used as an alternative to volatile (or non-thread-safe non-volatile types) for thread notification but much of atomics is getting close to the hardware wire. BTW,Are there any developments on libraries/mechanisms atop of C++ Concurrency? For example, taking a 101 example to create a parallel std::accumulate(). This is basically a thread-safe reduction variable like in OpenMP or MPI. It is possible to create a home-grown solution but it is nicer if it is in the language or in a (Boost) library.Of course, writing all these patterns yourself is great fun and/or then move to a library that someone else supports. Another question:How can one use FP-style parallel programming in C++11? Any links?
Last edited by Cuchulainn on August 20th, 2015, 10:00 pm, edited 1 time in total.
 
User avatar
dd3
Posts: 4
Joined: June 8th, 2010, 9:02 am

C++ 11 Concurrency

August 21st, 2015, 9:17 pm

QuoteOriginally posted by: outrunBoost ASIO which might end up in C++17 has heavy lock congestionsOnly if you use OpenSSL with it