Posts

Showing posts from May, 2022

Machine Learning - Classification problem

Image
SciKit Learn library data classification problem. This experiment was split in two posts/notebooks, because the texts, code and outputs are extensive. To complement this experiment, I wisht to propose a real classification prediction scenario for this problem, and a minimum viable product for data analysis machine learning based, in the near future. This first post talks about Data analysis and the second one << yet to come >> talks about Selecting the Model for Prediction Data Analysis Just after data loading, you have to analyse it to clean, merge and transform to make it machine learning friendly. Check every variable to get a good idea of the kind of data is there on the dataset. Numerical or categorical? Continuous or discrete? And after the data fits the right layout, you have to analyse other aspects like its dimmensionality, correlations and repetitions, to make another transformation to work good with your models. So, let's jump into it.