When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?