Yes **gradient descent with back-propagation** is the most widely used method when training a neural networks with supervised learning.

later more discussion material!..

Why? It's an awful,method.

Can you please tell why you think it is awful?
(Apologies if you already did earlier in this thread. but I find it (this thread) especially difficult to follow.)

@Cuchullain, for me, Gradient Descent is a swiss-knife methods. Always produce results, but can be stuck in local minima.

Local minima, if it is lucky, That's the least of your worries. GD has a whole lot of issues: Off the top of my head

0. Inside GD lurks a nasty Euler method.

1. Initial guess must be close to real solution (Analyse Numerique 101).

2. No guarantee that GD is applicable in the first place (assumes cost function is smooth).

3. "Vanishing gradient syndrome"

https://en.wikipedia.org/wiki/Vanishing ... nt_problem

4. Learning rate parameter... so many to choose from (ad hoc/trial and error process).

5. Use Armijo and Wolfe to improve convergence.

6. Modify algorithm by adding momentum.

7. Any you have to compute gradient 1) exact, 2) FDM, 3) AD, 4) complex step method.

8. Convergence to local minimum.

9. The method is iterative, so no true reliable quality of service (QOS).

10. It's not very robust (cf. adversarial examples). Try regularization.

There might be some more.

// Maybe I'm hallucinating but I thought I already posted this (but it was before my first koffee).