We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.
Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Computations themselves can be in the tools used by builders -- Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution.
The result is more reliable, testable, understandable data systems, that leverage the existing tools that work and that are deployable to your infrastructure.