I still think it comes down to the fact that on-the-fly needs more computation; the more extreme the displacement, the more computation required, and the more pronounced the effect would be. Here is an example using IBL and 0.1 cm displacement height, where the meshes have been repositioned to sit next to one another:

Is it readily apparent where the on-the-fly shape ends, and the pretesselated one begins? Increase the displacement height, though, and where you are paying a one-time up-front cost for the pretesselation (compute + memory), you will pay an ongoing cost for on-the-fly, which increases with displacement height.
Just a sidenote on your material: with pretesselated subdivision set to 32, as it was in the material, this scene consumed a maximum of around 8GB on my machine; the above was rendered using subdivision of 8, and might be able to get by with less, depending on the scenario.