08-11-2015, 11:02 AM
I'm somebody who knew everything there was to know about neural networks back in the 90's. I've been there. AI researcher, worked at a natural-language startup for years, got a couple patents, did conversational robots (that were running on pure reflex action and in truth only about as smart as clams) that major companies used to do customer support, etc.
To say that the art has significantly advanced is an understatement. If you had asked me only a few weeks ago whether it is even possible to train a neural network 30 levels deep, I'd have said no, and cited you the well-known "Vanishing Gradient Problem" that seemed to be an insurmoutable obstacle back when I was doing my work. The only way I could imagine to get past it was to use genetic algorithms to evolve network weights, and that was going to take FOREVER of computer time to produce results.
So when I saw this I was more than a little bit astonished, and went immediately to catch up on the research.
Holy crap. They figured out a whole lot about training neural networks in the last 15 years. I should have expected that, but while I was working it seemed like of one of the Classic Algorithms like sorts and so on, which don't change over time. But it manifestly isn't.
The autoencoder approach to training deep levels completely bypasses the Vanishing Gradient Problem - short of a little finetuning. The Dropout method is by far the best approach to preventing overfitting while not getting in the way of convergence that I've ever reviewed - and it's effing simple. People have figured out productive and useful ways to train recurrent and nonlinear networks. And convolutional application of the deepest layers is a new idea that saves a half-acre of computer time in training and backprop, and dramatically decreases overfitting and sensitivity to irrelevant crap at the input level (at the expense of some extra time spent on forwardpropagation).
And, well, memory being an order of magnitude larger or so doesn't hurt a bit. And neither do massively parallel GPUs optimized for matrix calculations. (speeds up the training process by factors of a thousand, on DESKTOP machines!) And then there's pooling and softmax techniques that didn't exist back when.
We can now do things we could never do before.
More to the point, there is now the MEANS to try out ideas I had fifteen years ago that could, um, lead somewhere very interesting.
I won't claim to have solved strong AI until something sues me for ownership of the hardware it runs on (not bloody likely), but we live in interesting times and I've got a dozen-and-a-half things I want to try that it looks like nobody's tried yet. I think I can leverage these new capabilities in ways that people won't believe.
So I've spent the last week writing code and laughing uncontrollably. I might be who people mean when they say "Mad Scientist...."
To say that the art has significantly advanced is an understatement. If you had asked me only a few weeks ago whether it is even possible to train a neural network 30 levels deep, I'd have said no, and cited you the well-known "Vanishing Gradient Problem" that seemed to be an insurmoutable obstacle back when I was doing my work. The only way I could imagine to get past it was to use genetic algorithms to evolve network weights, and that was going to take FOREVER of computer time to produce results.
So when I saw this I was more than a little bit astonished, and went immediately to catch up on the research.
Holy crap. They figured out a whole lot about training neural networks in the last 15 years. I should have expected that, but while I was working it seemed like of one of the Classic Algorithms like sorts and so on, which don't change over time. But it manifestly isn't.
The autoencoder approach to training deep levels completely bypasses the Vanishing Gradient Problem - short of a little finetuning. The Dropout method is by far the best approach to preventing overfitting while not getting in the way of convergence that I've ever reviewed - and it's effing simple. People have figured out productive and useful ways to train recurrent and nonlinear networks. And convolutional application of the deepest layers is a new idea that saves a half-acre of computer time in training and backprop, and dramatically decreases overfitting and sensitivity to irrelevant crap at the input level (at the expense of some extra time spent on forwardpropagation).
And, well, memory being an order of magnitude larger or so doesn't hurt a bit. And neither do massively parallel GPUs optimized for matrix calculations. (speeds up the training process by factors of a thousand, on DESKTOP machines!) And then there's pooling and softmax techniques that didn't exist back when.
We can now do things we could never do before.
More to the point, there is now the MEANS to try out ideas I had fifteen years ago that could, um, lead somewhere very interesting.
I won't claim to have solved strong AI until something sues me for ownership of the hardware it runs on (not bloody likely), but we live in interesting times and I've got a dozen-and-a-half things I want to try that it looks like nobody's tried yet. I think I can leverage these new capabilities in ways that people won't believe.
So I've spent the last week writing code and laughing uncontrollably. I might be who people mean when they say "Mad Scientist...."