A point of comparison might be Mathematica's NDSolve approach. NDSolve does MOL with adaptive time stepping. I think the default maximum t-step is T/10, where T is the final solution time.

Consider a hypothetical parabolic PDE problem with a stationary solution, approached exponentially at large t with, let's say, a 'half-life' time of 50 (but you don't know this).

You give this parabolic problem to NDSolve with T=1000. The solver takes, let's say, 50 time steps to do the whole computation. There will be many t-steps with, say, [$]t < 10[$] and the last several, sure enough, will use [$]\Delta t = 100[$], the max allowed. I have seen this pattern many times with NDSolve.

For your scheme, in effect you have a cutoff at [$]T \approx 1/\epsilon[$]. Suspect you would generally also have to adopt adaptive time stepping (in your case, [$]\tau[$]-stepping) to get any sort of decent run-time performance. If so, in the end, the pattern of the [$]t_i's[$] actually visited (under a good adaptive scheme) might be similar regardless of the time coordinate change: many at the start and few at the end.

In other words, my point is that, in the end, if you adopt adaptive time-stepping to get decent performance, the choice of the time coordinate may not matter much. Just speculating.

I did a few (diverse) tests related to optimisation via ODE gradient systems (using complex step method) (I have a post on the yugely popular [$]e^5[$] thread).

[$]dx/dt = - grad f(x) = -\nabla f(x)[$] where [$]f[$] is the function to be minimised.

1. MLE examples with normal and exponential distributions. Transformed ODE takes [2,5] less function calls than untransformed ODE for the same accuracy. The half-life [$]\tau = 1/2[$] seems like a good heuristic.

2. Other, multidimensional, unimodal functions give similar results as in 1.

In general, convergence does not severely depend on the initial value (guess). I am using C++ Boost odeint with adaptive time-stepping, probably not unlike the NDSolve approach. Using odeint with [$]\tau = 0.9999[$] takes forever (probably a yuge boundary layer at [$]\tau = 1[$]..). On the other hand, since we use complex step method (no catastrophic subrtraction) we have accuracy like [$]10^{-290}[$].

3. For multimodal functions asymptotic stability is visible, but to a local minimum as for Ackley's function, for example. If you start near the global minimum then the solver will converge to it. Not enough energy to go back uphill. It cannot get out of a rut as it were.