Chaotic order code base
by Ty Myrddin
Published on March 11, 2022
As the importance of data science has grown, so too has the body of jargon associated with it. While many terms of art are well-defined, others are buzzwords, ubiquitous in the media but lacking concrete meaning.
Also, data science is not linear, although some steps need to be done before others, like initial explorations before sensible wrangling, and wrangling before preprocessing and/or feature engineering, but after that it gets chaotic. For example, some models do not work well with simple label encoding, and with feature engineering creativity based on understanding the domain, and on how features are correlated.
This where the puzzling fun really starts.
It does make sense to first create a skeleton and make "steps" to practice some used techniques:
- Data wrangling
- Machine learning
- Data visualisation
- Data analysis
- Deep learning
- Natural language processing
We develop reuseful snippets using Jupyter notebooks and public datasets, to gradually build towards functions and classes for (other) projects later.
Oh well. Last orders, please. Waiter