Create Checkpoint in databricks / spark manually
In databricks, the checkpoint is automatically created when writing a table when the checkpoint property is set properly.
However, there is no way to manually run a checkpoint.
The optimize command only compact files. The vacuum command only delete old logs. No command for creating a checkpoint manually.
Solution
Use the deltaLog class to run the checkpoint command, but this is only available via a spark library in Scala.
In Synapse, or Fabric notebook, simply run:
%%spark
import org.apache.spark.sql.delta.DeltaLog
DeltaLog.forTable(spark,"Tables/table_name").checkpoint()
In Databricks, the library is NOT directly available, so you can not run:
%scala
spark
import org.apache.spark.sql.delta.DeltaLog
DeltaLog.forTable(spark,"Tables/table_name").checkpoint()
The workaround is:
%scala
import io.delta.tables._
import org.apache.spark.sql.DataFrame
// refer to the delta table
val deltaTable = DeltaTable.forPath(spark, "/mnt/mount_name/Tables/table_name")
// get `deltaLog`
val deltaLog = deltaTable.getClass.getMethod("deltaLog").invoke(deltaTable)
// run `checkpoint` method through deltaLog
val checkpoint = deltaLog.getClass.getMethod("checkpoint").invoke(deltaLog)