A Backbone for Long-Horizon Robot Task Understanding