airflow install

install airflow via PyPi

A typical way is to specify the version and also constraint:

pip install "apache-airflow[celery]==2.7.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.8.txt"


Why need a constraint file in Pip install

Airflow keeps the dependency libraries open, so a constraint file gives exactly the dependency versions to produce a stable installation.

Otherwise the plain pip install apache-airflow may pick up the latest version which might not work with the current airflow release.

Also a constraint file gives you the exact same installation of airflow + providers + dependencies every time.


After the initial installation, typically, you can add other dependencies and providers as separate command after the reproducible installation. In this way you can upgrade or downgrade the dependencies as you see fit, without limiting them to constraints file.

make sure the subsequent pip install command is pinned to the version you have already installed.

pip install "apache-airflow==2.7.1" apache-airflow-providers-google==10.1.0


what is a provider package?

It's data source provider library / dependency. 

The apache-airflow-providers-google package specifically extends Apache Airflow to provide integrations with various Google Cloud Platform (GCP) services. It includes pre-built operators, sensors, and hooks for interacting with GCP services, such as Google Cloud Storage, Google Cloud BigQuery, Google Cloud Pub/Sub, Google Cloud Dataprep, and more. 

These components simplify the process of creating Airflow workflows that involve GCP services, as you can use them as building blocks in your DAGs.


The default constraints file has already included heaps for connectors / modules for GCP, azure, snowflake, alibaba cloud, etc.


setup database

Apache Airflow requires a database. For study, you can stick with the default SQLite option otherwise its better using a proper database for production purpose, Currently supporting postgreSQL, mysql, MS SQL, SQL Lite.


Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL. You can do this in option sql_alchemy_conn in section [database]. It is also common to configure this option with AIRFLOW__DATABASE__SQL_ALCHEMY_CONN environment variable.


e.g. for MS SQL, create a database and user as below

CREATE DATABASE airflow;

ALTER DATABASE airflow SET READ_COMMITTED_SNAPSHOT ON;

CREATE LOGIN airflow_user WITH PASSWORD='airflow_pass123%';

USE airflow;

CREATE USER airflow_user FROM LOGIN airflow_user;

GRANT ALL PRIVILEGES ON DATABASE::airflow TO airflow_user;


The sql alchemy connection string would be:

mssql+pyodbc://<user>:<password>@<host>[:port]/<db>?[driver=<driver>]

Will need to have the driver installed first, e.g. ODBC Driver 18 for SQL Server

mssql+pyodbc://<user>:<password>@<host>[:port]/<db>[?driver=ODBC+Driver+18+for+SQL+Server]


MySQL and PostgreSQL both have their own installation and connection string, refer to official installation guide.


After configuring the database and connecting to it in Airflow configuration, you should create the database schema.

airflow db migrate