Koalas

IO

from spark DataFrame

import databricks.koalas as ks

kdf = ks.DataFrame(df)

Group by

Sort index after group by

kdf.groupby('COL').COL.mean().sort_index()

Plotting

kdf.plot.hist(bins=100, log=True)