My thinking was this:
If you want a NN to solve the diffusion equation then it's going to have to learn differentiation. So why not train it to differentiate first? Any problems (lack of rigour, pathologies, strange behaviour,...) might become apparent in this simpler problem. And we can be mathematically brutal in trying to find issues, as Devil's Advocates.
Doing something with a normal distribution is a bit irrelevant. You need to train on as many functions as possible.