E is stochastic (because mini-batches). Does your argument still apply?
It would seem that mini-batch are mainly an optimisation and not a functional requirement ("what you lose on the swings you gain on the roundabouts" aka Pyrrhic victory).
Going to MB entails a new dynamical system in an enlarged state space. Then the ODE system will probably become a random ODE with white noise or even some kind of Langevin/OU SDE. Then you are in the area of numerical solutions use cases..
I understand SGD 'plumbing' but I don't understand the problem it purports to solve. It is yet to be mathematically specified IMO. I see SGD as an example
of gradient descent in a specific context. Maybe I am missing some vital info.