When does gradient descent with logistic loss find interpolating two-layer networks?