Airflow has become the defacto data pipeline platform in many companies. Airflow was designed to run static, slow-moving workflows on a fixed schedule, and it is a great tool for that purpose. However, users often get into trouble by forcing their use cases to fit into Airflow’s model. A few examples that Airflow can not satisfy in a first-class way includes: - Complex DAGs leak application code into Pipeline - DAGs which need to be run off-schedule or with no schedule at all - DAGs that run concurrently with the same start time - DAGs with complicated branching logic - DAGs with many fast tasks - DAGs which rely on the exchange of data - Parametrized DAGs
In this talk we present a brand new Serverless Event-Driven Pipeline Platform written in Scala that addresses all the problems above