It's been also checked out as a possible alternative to A/B testing in e.g. clinical trials. The thing is estimating the significance of their results requires bigger samples, and hence bandit algorithms are effectively not better at all. still, I've already seen ML people trying to infer causality from observational data, and I (and every person with a basic statistical training exposed to that) repeated Holland's "there's no causation without manipulation" and the theory of Directed Acyclic Graphs to some of them like a parrot over the last two years. A few days ago I was enlightened with "Look, this Pearl guy says that we cannot do causation from our data and recommends some DAGs for that" - no way!

In another two years or so they will understand that there is no magic that's been waiting for 300 years like seamen for James Lind's oranges to reveal itself before the superior ML domain to outperform the laws of statistics in the decision making under uncertainty. Considering how methodologically incorrect all the ML experiments are, they may arrive to any random conclusion, though. Unless they vaporise themselves while training their neurons or capsules or whatever on one of those massive red-hot computer clusters.