Milliways

Chaotic order code base

by Ty Myrddin

Published on March 11, 2022

Data science chaos

As the importance of data science has grown, so too has the body of jargon associated with it. While many terms of art are well-defined, others are buzzwords, ubiquitous in the media but lacking concrete meaning.

Also, data science is not linear, although some steps need to be done before others, like initial explorations before sensible wrangling, and wrangling before preprocessing and/or feature engineering, but after that it gets chaotic. For example, some models do not work well with simple label encoding, and with feature engineering creativity based on understanding the domain, and on how features are correlated.

This where the puzzling fun really starts.

It does make sense to first create a skeleton and make "steps" to practice some used techniques:

We develop reuseful snippets using Jupyter notebooks and public datasets, to gradually build towards functions and classes for (other) projects later.


Oh well. Last orders, please. Waiter