and sigmoid + SGD :-/

sigmoid_sgd.png

I forgot to ask

1. This approximation looks VERY bad

2. Why is sigmoid bad for this benign function?

I am having mixed results here. The properties of the input function is important e.g. [$]e^{-x}[$] is good while [$]e^x[$] is awful.

@ISayMoo: selecting a 'good' activation function is an open question in the research community. As numerical analyst, I find them a bit simplistic.They are non-adaptive and cannot approximate a given function in the global domain.

Classic high order polynomials are almost useless (everyone in ML seems to be using them as recipes, why?) while 3rd order piecewise polynomials are (much) better approximations. are much better in general. You can vary the

*control points* But you know that I reckon.

Maybe some parts of ML algos do not need the fundamental laws of mathematics and they converge with enough experimentation and ad-hocness.