In the 80s-90s people were using sigmoids as activation functions for NN but it didn't work to well for deep neural networks.
One of the reason is called "the vanishing gradient problem". When training a NN you typically use gradient descent on a cost function over the output error of the network. The sigmoid function has a derivative that quickly goes to zero as you move away from zero in either direction. This means that the gradient goes to zero for a lot of neurons, and that hinders learning. This especially happens when you have deep networks (lots of layers of neurons, one layer connected to the other). One of the key drivers of the deep learning explosion is that people decided to replace the sigmoid with the RELU
function (which is identical to the payoff function of a call option!). The RELU has simple, efficient computable gradients, and it also promote sparse encodings (situations where some neurons give a signal, and other are zero).
Sigmoid are also ditched as an activation function for neurons that need to generate outputs in the [-1,1] or [0,1] range. Instead people use tanh for that nowadays. The reason for this switch is that the sigmoid in non-symetrical (skewed) whereas tanh has more symmetry. This symmetry reduces bias in the gradients during training.
There is a nice discussion here starting a 14:00