databricks functions mis

The substr funcion must have a length

The left / right function expects the length as a column, not a single number, so needs to use F.lit(length) as the parameter


from pyspark.sql import SparkSession

from pyspark.sql import functions as F


spark = SparkSession.builder.getOrCreate()


df = spark.createDataFrame([('abc', 1), ('efg', 2)],['name', 'value'])

df = df.withColumn('perfix1', F.col('name').substr(1, 2))

df = df.withColumn('perfix2', F.left('name', F.lit(2)))

df.show()


Collect_list / collect_set

Group by and aggregate rows into a list or set. Similar to list_agg in oracle.

from pyspark.sql import SparkSession

from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame(

    [("a", '11', '11'),

     ("a", "33", '33'),

     ("a", "33", "33"),], 

    schema = ["id", "v1", "v2"]

)

df.groupby('id').agg(F.collect_list('v1'), F.collect_set('v2')).show()