Ran into this: https://arxiv.org/abs/1804.01508
Independent implementation
https://github.com/222464/TsetlinMachine
Gauss-Newton, conjugate gradient, L-BFGS, Newton-Powell... Which one should be used?You might be right. But it does sound somewhat ad hoc.I'd suspect that the stochastic part of SGD lets it be quite robust to pathological gradients.
One thing: there are many GDs to choose from (like a Swiss Army knife?)
https://arxiv.org/abs/1609.04747
Which one should be used?
That was, well, rude.I thought I was the one asking the questions.Does it?
It was my Dutch/Irish directness. It's rude to answer a question with another question. Try it next time you give a talk to an audiemce.That was, well, rude.I thought I was the one asking the questions.Does it?
All gradient-based. With all the usual caveats. In particular, the gradient of the activation function is an issue? e.g. take the Heaviside function as a counter-example.Gauss-Newton, conjugate gradient, L-BFGS, Newton-Powell... Which one should be used?You might be right. But it does sound somewhat ad hoc.I'd suspect that the stochastic part of SGD lets it be quite robust to pathological gradients.
One thing: there are many GDs to choose from (like a Swiss Army knife?)
https://arxiv.org/abs/1609.04747
Which one should be used?
That was, well, rude.I thought I was the one asking the questions.Does it?
https://stuffjewishpeoplelike.wordpress ... -question/It was my Dutch/Irish directness. It's rude to answer a question with another question.That was, well, rude.I thought I was the one asking the questions.
Ask katastrofa?But I like that aspect of their culture. Are Talmudic Jews rude?
you think they're the same person?
"When did you last see ML and DE together?"