Page 12 of 17

### Re: Looking for hardware recommendations

Posted: July 27th, 2017, 6:55 pm
For katastrofa and others interested in these i9's, I just got a new desktop this week with the i9-7900X. Billy7 was very kind to send me his Monte Carlo, which compares a one core run with an "all cores" run. Here are my results, which make me very happy with the upgrade:

System and run-times for the same (Heston process) Monte Carlo:

My old desktop (i7-4770, 4 core/8 virtual cores)
run time (64bit): 16.38 secs (all 8 virtual cores)

My new desktop (i9-7900X, 10 core/20 virtual cores)
run time (32bit): 66.97 secs (one core).
run time (64bit): 54.74 secs (one core).
run time (32bit): 5.31 secs (all 20 virtual cores).
run time (64bit): 4.37 secs (all 20 virtual cores)

### Re: Looking for hardware recommendations

Posted: July 27th, 2017, 8:23 pm
The above MC runs are for 1 million paths/replications, with a discretization of 1000 time steps for each path, using Andersen's Quadratic Exponential scheme for Heston. All that in about 4 secs!

### Re: Looking for hardware recommendations

Posted: July 27th, 2017, 8:41 pm
What's the speedup? Do you see a parallel pattern in the algorithm to help assign pieces to separate threads? I suspect some kind of loop parallelism?

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 2:07 pm
This being a vanilla Monte Carlo simulation using pseudo-random numbers, the parallelization is as simple as dividing the total number of paths across the available threads (somehow ensuring the number streams for each thread are statistically independent, I'm just using a different seed for each thread which may not be enough), using a simple #pragma omp parallel for. The speedup over the single-threaded execution can be seen in Alan's timings to be 54.74/4.37 = 12.53. This being 10 physical cores with HT, we cannot immediately deduce the parallel efficiency, but I've tried it without HT on my PC and it's practically 100% as expected. Alan's measurements imply that HT gives an extra 25.3% cores effectively, but I'm not sure if the clock frequency was the same when all cores were running at the same time and when the single core was used.

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 2:32 pm
Alan, . can you give timings for 1,2,..30 threads? We should see a linear speedup for the first 10 cores, a lower speedup for more cores, and it should flatten out above 20 threads.We can also eliminate any biases, see if it's indeed linear.
I'll post a plot later for an old server I have: 2 cpu's, each with 4 cores and HT. It's very similar test I did: MC, but I use GBM, C++11 threads, a custom rng (shouldn't change things), and I get an extra 35% for HT. Maybe the GBM is simpler than Heston (less instrucuctions, no if statements to check bounds) and allows for better HT utilization?

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 3:01 pm
What Alan could easily do is to use a monitoring program to check what clock freq the single core runs at and then what freq all the cores run at the same time. Turbo boost usually boosts the freq for a single core,but cannot do it for all cores for thermal reasons. If in Alan's single core run the freq was  4.3 or 4.5GHz (upped from the base 3.3GHz for this chip), then this means that the HT benefit is actually a lot higher than 25%. In order to estimate the HT benefit, we'd need the freq to be the same for the single core and all cores running at the same time

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 4:04 pm
Here is the task manager when all 20 virtual cores are running flat out. Maybe I am wrong, but it looks to me that the Intel Turbo boost has managed to boost to 4.32 GHz -- certainly all the fans noticeably kicked in

Sorry, outrun, but as you can see from the screenshot, my choices are binary.

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 4:09 pm
From an i9  review:

"The Core i9 7900X has a base clock of 3.3GHz, which doesn’t make it seem a huge amount quicker than its forebear, but the big difference is that Turbo speed of 4.3GHz.

That’s not just a single-core boost clock either. In our MSI X299 Gaming M7 ACK test board, the 7900X runs with all cores at 4.3GHz whenever it’s being pushed by CPU-intensive applications. Intel have also given it a Turbo Boost Max frequency of 4.5GHz"

So, if that's the case with Alan's board as well, that's very impressive, using 4.3GHz on all cores. Then the HT benefit estimate of 25% may be accurate. Unless the single core uses the Turbo boost max of 4.5GHz, in which case the numbers would change slightly in favour of HT.
By the way, the Test 2 I posted in the previous page was with GBM and the HT gain was very similar to the Heston run (Test 1).

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 4:12 pm
Yes, as a matter of fact, my motherboard is an MSI X299 SLI PLUS

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 4:41 pm
Cool, very binary indeed!

So perpahs my old rack server doesn't adjust it frequency, and hence I see higher HT boost? It makes horrible noise anyway, it's supposed to be in a datacenter instead of the attic. My wife is still wondering what that hum is, started 3 months ago, she never goes there but you can hear it in the floor below.

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 7:34 pm
If your test involved turning HT OFF from the BIOS, measuring the MT w/o HT time, then rebooting turning HT ON and measuring the MT w HT time and comparing the two, then Turbo Boost doesn't enter in the equation I guess as it would have been doing the same thing in both cases. The reason I'm talking about Turbo Boost here is because we don't have an MT w/o HT run-time quote from Alan, so we have to deduce it from the ST time by dividing the ST time by the number of physical cores (because the parallel efficiency here is 100%). But that's only a valid deduction if the ST time was obtained with the same frequency as the MT time. Does it make sense?!

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 7:58 pm
My test was running an increasing nr of threads and counting the number of MC samples per second,.. in a single application (no restart of the application or server). This is what I got:

* the first part 1-8 threads, the threads run on normal cores, and it's a pretty straight line -no thinks kicking in/out-
* 9-16 threads start using HT
* 17-32 it's saturated

So the HT speeup is approx 45mln / 32.5mln = +38%
?

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 8:18 pm
OK, so everything in your test is with HT ON, so strictly speaking we cannot compare with what happens w/o HT. It may well be that when HT is ON, the per thread performance for 8 cores is lower than it would have been if HT was OFF! I have actually seen apps that run a little slower on a single thread when HT is ON, compared to when HT is OFF. I think the proper way to measure the HT gain is to turn it OFF from the BIOS. Intel themselves have claimed that HT can give up to 30% gain, but I don't remember where I've read it!

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 8:24 pm
Sounds indeed like a good way to rule things out. Only thing is I can't reboot the server now (and I need to find that MC code again!).

### Re: Looking for hardware recommendations

Posted: July 28th, 2017, 9:47 pm
Sounds indeed like a good way to rule things out. Only thing is I can't reboot the server now (and I need to find that MC code again!).
No need to reboot your server as I did it with mine a few days ago and got this run time for my Test 1:

Test 1: ST: 162s, MT no HT: 40.45s, MT with HT: 31.45 s
Now that HT is back ON, I forced the same test to use only 4 threads (while 8 are available) and I get:
Test 1: ST: 162s, MT with HT (but forcing only 4 threads): 45s, MT with HT: 31.45s as before.

So it seems that while the ST performance stays the same, having HT ON means that if you set 4 threads to work (as many as the physical cores), doesn't result in the same performance as when 4 cores are used w/o HT, but rather in a 10% loss (45s instead of 40.45s). So equivalently to your testt, this would seem to imply that the HT gain is 45/31.45 = 43%. But it really is 40.45/31.45 = 28.6%.