databricks functions mis
The substr funcion must have a length
The left / right function expects the length as a column, not a single number, so needs to use F.lit(length) as the parameter
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([('abc', 1), ('efg', 2)],['name', 'value'])
df = df.withColumn('perfix1', F.col('name').substr(1, 2))
df = df.withColumn('perfix2', F.left('name', F.lit(2)))
df.show()
Collect_list / collect_set
Group by and aggregate rows into a list or set. Similar to list_agg in oracle.
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(
[("a", '11', '11'),
("a", "33", '33'),
("a", "33", "33"),],
schema = ["id", "v1", "v2"]
)
df.groupby('id').agg(F.collect_list('v1'), F.collect_set('v2')).show()