The vanishing and exploding gradient problem can be a major hurdle in deep learning. What techniques have worked best for you in mitigating these issues, especially in deep networks?
The vanishing and exploding gradient problem can be a major hurdle in deep learning. What techniques have worked best for you in mitigating these issues, especially in deep networks?