No, it's not a solution: such attacks cannot be detected because both original and perturbed image are perfectly valid inputs.Thus, another solution is to train the NN to detect scenes that contain potential adversarial inputs.
The problem lies with the architecture itself. A feed-forward convolutional network is not enough. I am hopeful about this approach, waiting for the authors to test the robustness of their network against adversarial attacks.