5 Steps in Machine Learning Model Development.
Machine Learning (ML) , a Subpart of Artificial Intelligence (AI). There are five generic steps to develop Machine Learning (ML) model.
Step 1 : Data Preparation — Data Collection and Preprocessing
Data is the Master of ML model. The data preparation is the key step for building ML model. The data preparation consists of two process : data collection and preprocessing. Data collection process involves procurement of real time data through standard protocol or acquisition of data from standard databases. The most used databases to acquire data for ML model are University of California, Irvine (UCI) repository and Kaggle Repository. Data Preprocessing is the next sub step which involves standardization of data and removal of missing or inappropriate data if any.
Step 2 : Data Sampling — Training and Testing Data Split, Data Augmentation and Data Annotation.
Data Sampling is the next key step which aids in training the ML model. Splitting the data into training and testing set which involves specification of ratios.
The ratios which can be specified are :
- 80:20– Data will be split as 80% Training data and 20% Testing Data.
- 70:30 — Data will be split as 70% Training data and 30% Testing Data.
- 5 Fold Cross Validation — Data will be split in 5 Fold.
- 10 Fold Cross Validation — Data will be split in 10 Fold.
Data Augmentation is the best way to increase the data size which is essential to improve the accuracy of ML model. Data Annotation is the process of labelling the data if supervised machine learning models are incorporated.
Step 3 : Feature Engineering
The data are of different types :
- Structured Data
- Unstructured Data
Mainly , Structured Data consists of Features with Label or Features without Label.
Feature Engineering is the process of identifying the top features or reducing the dimensionality of features. Feature Selection is the technique to select top features which helps in converging to the label. Dimensionality Reduction is the process of reducing the dimensions of the features to remove the redundancy of data.
Step 4 : Fitting the Model
The most prominent step is fitting the ML model to the data . There are different ML algorithms . ML algorithms are mainly classified into three types :
- Supervised Learning : The data consists of Features with Label . The various Supervised Learning Algorithms are Linear Regression, Logistic Regression, Naive Bayes, K Nearest Neighbor, etc.
- Unsupervised Learning : The data consists of Features only. The various Unsupervised Learning Algorithms are Principal Component Analysis (PCA) , K Means Clustering, etc.
- Reinforcement Learning : The Learning is through rewards and punishments.
Step 5 : Model Testing and Evaluation
Model Testing and Evaluation is the final step after the completion of model training. Model Testing is done on test data. The model evaluation methods are accuracy, precision, recall and F1 score.
Conclusion
The five steps of ML model specified are the generic steps. The more time consuming and vital step is the data collection.
Happy Learning !