On the Code of Data Science

Abstract: Data science calls for rapid experimentation and building intuitions from the data. Yet, data science also underpins crucial decisions and operational logic. Writing production-ready and robust statistical analysis without cognitive overhead may seem a conundrum. I will explore simple, and less simple, practices for fast turn around and consolidation of data-science code. I will discuss how these considerations led to the design of scikit-learn, that enables easy machine learning yet is used in production. Finally, I will mention some scikit-learn gems, new or forgotten.