I recently started using Luigi to build our internal ETL/BI from the ground up.
It required a lot more code than I expected to actually get things up and running. Half of that is just due to the messy nature of the data (different time formats, unix vs windows csv’s, etc, etc), but half of it was due to the fact that in the early stages of prototyping thigns are changing often. Because of that, clearing out previous runs of Redshift loads, or changing schemas has not been easy. I started writing some custom code to auto-migrate and whatnot, but ditched it as it seems unnecessary once I get the schema locked down.
All in all, had a good experience with Luigi. It’s really easy to add new data sources, but chaining everything together was a little clunky. I briefly tried Airbnb’s Airflow, but it wasn’t as easy to add a custom data source.