Page **4** of **39**

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 4:39 pm**

by **Alan**

I am not saying use GPU's for everything. I am also not saying: don't use multi-cores.If you can distribute a pde problem across multi-cores, wonderful!I am just saying: for MC, given the existence of GPU's, why bother with MC on multi-cores?Sure, GPU is a niche -- agree. But MC is also a niche and these two niches are made for each other. Also, because of the 1/sqrt(N) error scaling for MC, 1/sqrt(100) is interesting, while IMO, 1/sqrt(6) simply is not.I agree that the h/w dependence is highly irritating, both in having to buy cards and worry so much about thehardware details in the programming. That's why I would like to use the hardware remotely, if possible.Anyway, everybody has their own agenda. Maybe later in the year, I can explore this one and contributesomething to the library.

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 5:22 pm**

by **Traden4Alpha**

QuoteOriginally posted by: AlanI am not saying use GPU's for everything. I am also not saying: don't use multi-cores.If you can distribute a pde problem across multi-cores, wonderful!I am just saying: for MC, given the existence of GPU's, why bother with MC on multi-cores?Sure, GPU is a niche -- agree. But MC is also a niche and these two niches are made for each other. Also, because of the 1/sqrt(N) error scaling for MC, 1/sqrt(100) is interesting, while IMO, 1/sqrt(6) simply is not.I agree that the h/w dependence is highly irritating, both in having to buy cards and worry so much about thehardware details in the programming. That's why I would like to use the hardware remotely, if possible.Anyway, everybody has their own agenda. Maybe later in the year, I can explore this one and contributesomething to the library.First, not all GPUs support double precision floats or all the features of IEEE floating point standard which can lead to problems with round-off errors, etc.Second, the 1/sqrt(N) argument only applies to time-bounded applications (e.g., what quality of MC answer can I get within a T-second deadline?). If the application has an error bound requirement, then the speed-up scales linearly with cores (within the limits of Amdahl's law).

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 5:28 pm**

by **Alan**

QuoteOriginally posted by: outrunI have one server that's configured with 4 extra slots and 800Watt extra power suply for that.If you, or anybody, wants to dedicate a server for remote access GPU experimentation, say for up to 10 users, I wouldcontribute a few hundred bucks for the shared cost of the card. There would have to be a fair amount of discussionabout the operating system and remote access tools, first.

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 5:43 pm**

by **Alan**

That's why I mentioned the operating system. I have looked at amazon's offerings a little andit looks like you need to be a unix guru to invoke their GPU instances. Now that's fine, while I haven'tused unix for a while, I could probably get up to speed again after some (well, a lot!) of experimentation. But, I think you need to have the experimentation out of the way before you sign up with amazon, orit will just be wasteful.So, it's not the per-hour cost -- it's knowng what the hell you are doing

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 5:51 pm**

by **Alan**

Well, you sound very confident. If you can do that and post step-by-step instructions for somebody likeme (unix-challenged), I will pay for the first 100 hours of time at the rate quoted. (Same offer to anybody else)

### Parallel RNG and distributed MC

Posted: **January 27th, 2012, 9:08 pm**

by **Alan**

QuoteOriginally posted by: outrunHee Alan: If we get there, then I will of-course help you, the payoff is cooperation!OK, thanks -- I appreciate it.

### Parallel RNG and distributed MC

Posted: **January 28th, 2012, 11:32 am**

by **Cuchulainn**

QuoteOriginally posted by: AlanI am not saying use GPU's for everything. I am also not saying: don't use multi-cores.If you can distribute a pde problem across multi-cores, wonderful!I am just saying: for MC, given the existence of GPU's, why bother with MC on multi-cores?Sure, GPU is a niche -- agree. But MC is also a niche and these two niches are made for each other. Also, because of the 1/sqrt(N) error scaling for MC, 1/sqrt(100) is interesting, while IMO, 1/sqrt(6) simply is not.I agree that the h/w dependence is highly irritating, both in having to buy cards and worry so much about thehardware details in the programming. That's why I would like to use the hardware remotely, if possible.Anyway, everybody has their own agenda. Maybe later in the year, I can explore this one and contributesomething to the library.I would hardly consider MC as a niche market The issues are how many stakeholders/users use CPU versus GPU. Another issue is software design of MC and accurate schemes as well as PRNG. Something that tends not to get mentioned is the amount of developer effort to get a MC/GPU engine up and running? Is that 1 week, 2, 6, 30 weeks? Anyone? QuoteI think it's fun to learn about it (developer push instead of client demand ), but I also think that results can be quite surprising, the MC might run 100x faster than without that E250 card. It's very plausible that is someone who actually needs MC for production would invest in GPU. For me it's about acquiring new skills: algorithms for large scale parallel coding.This will appeal to a very small stakeholder group, but you know that already. GPU is still too proprietary for my taste and I have no available time at the moment to play with this (hardware) technology. Used to do this kind of h/w stuff a lot before, so it's yet another kid on the block I am playing the hurler in the ditch