Luigi is Spotify's popular open source library for batch data processing including dependency resolution and monitoring. It is entirely written in Python and utilises some language magic that makes writing glue code swift and intuitive.
Spotify has terabytes of data being logged by backend services every day for everything from debugging to reporting reasons. The logs are basically huge semi-structured text files that can be parsed using a few lines of Python. From this data aggregated reports need to be created, data needs to be pushed into SQL databases for internal dashboards, related artists need to be calculated using complex algorithms and a lot of other tasks need to be performed, using many different programming languages and tools.
Using a couple of real world use cases I will present how Luigi can be used to tie many of these different tools and frameworks together across environments, and in the end leverage the scientific tools available in the Python ecosystem to extract interesting results and insights.