How to Deal with Vanishing and Exploding Gradients in Deep Learning?

The vanishing and exploding gradient problem can be a major hurdle in deep learning. What techniques have worked best for you in mitigating these issues, especially in deep networks?