How to fine tune a machine learning algorithm?
Fine tuning refers to a technic in machine learning where the goal is to find the optimal parameters. Fine tuning helps in increasing model performance and accuracy. Obviously, fine tuning is performed on training data and tested on validation data or test data. Usually, before fine tuning an algorithm, it is important to try several algorithms to find the better one. Fine tuning comes at the end of the training phase.
Note that fine tune also refers to an approach in transfer learning. Fine tuning can mean training a neural network algorithm using another trained neural network parameters. This is done by initializing a neural network algorithm with a parameter from another neural network model and usually in the same domain problem.
Fine tuning is the last step in the training phase as it comes after trying multiple machine learning algorithms and selecting the best ones. Fine tuning is considered as a non-necessary phase as it is possible to create a machine learning model without fine tuning it. However, if the idea is to increase accuracy, fine tuning is the best way.
Fine tuning can also be called hyperparameter optimization and there are multiple technics to perform the optimization. Manual search is a technic that uses the data scientistâs experience to select the best parameters and find the optimal ones. For example, a data scientist can decide to reduce the value of the batch size in training a neural network algorithm to help get a faster converges. Manual search is not the most optimal technic but can be combined with other technics. Random search is a technic that creates a grid of hyperparameters and tries different random combination of hyperparameters. Random search is usually used in combination with cross validation as each combination of hyperparameters is tested with a specific fold from the dataset. Grid search is a technic that sets up a grid of hyperparameters and trains the model on each possible combination. The parameters to be used in the grid search are usually selected from a prior random search. Bayesian optimization is considered to be the best technic over the others as it uses probabilities to find the optimal search spaces for the hyperparameters.
So, fine tuning is a set of technics that can help in improving the performance. When it refers to hyperparameter tuning it can be is used at the end of the training phase and can make a difference between a good model and a very good model. When it refers to transfer learning it can help improve deep neural network model performance.
How to build deep neural network architecture?
Today, deep learning is one of the most promising machine learning algorithms, especially for image recognition and unstructured data. A deep neural network is basically a neural network with more than two hidden layers. Hidden layers are layers between the input layer and the output layer in a neural network, its role is to learn features from input layer. While increasing the number of hidden layers, it helps the neural network to learn more complex features from the input data.
Building a state-of-the-art deep neural network first depends on the type of problem that has to be solved. A problem in image classification doesnât require the same architecture as a problem in anomaly detection or forecasting. In image classification, the most common type of layers that are used is convolutional layers as they are most suitable for images as input. In anomaly detection, it is preferable to use an architecture based on encoder-decoder as the neural network will deconstruct and reconstruct the input and try to flag any input that doesnât follow the general pattern.
One of the most frequently asked question about building a deep neural network is how deep the neural network should be at the beginning of the training. The ideal situation is to start with the smallest architecture possible which means the least layers possible and then increasing the number of layers until it reaches the best possible performance.
Another frequently asked question is how to select activation functions. An activation function plays a crucial role in a deep learning model as it is capable of transforming input data to a nonlinear approximation. To make it simple, the most popular activation function for hidden layers is ReLU. It is the one that shows the best results, in most of the case. For the layer before the output layer, the selection is based on the type of problems that are being solved. For example, if the problem is a classification problem, the Softmax activation function is the one that should be used as it helps in converting the data into probabilities. Also, for the activation function usage, it is recommended to sometimes try new ones (Tanh, Leaky ReLU, etc.) and see if that increases the performance and accuracy.
Also note that every type of deep neural network architecture has its specific use case. For example, a generative adversarial neural network is a type of architecture that can be used to generate data. It can also be used for anomaly detection but it cannot be used for image classification or any other task. So, you have to consider all possible architectures of neural networks as a tool box and select the most appropriate one based on the problem that you are trying to solve.
How to train a machine learning algorithm faster?
Sometimes, when trying to train a machine learning algorithm, it can take a long time, that is, sometimes days and even weeks or more. This can be due to the amount of training of data used or the type or size of the algorithm used; obviously the performance of your machine learning model is also impacted by your computer or server capabilities like Memory and processor.
To make the training faster, there are different technics such as using GPU instead of CPU. This switch of processor helps in making the computing faster as GPU can handle more computing in parallel. Note that not all algorithms and frameworks support GPU computing. The most popular ones to support it are neural networks and XGboost.
Another technic to making the computing faster is parallel computing. This can be performed in multiple ways: either by parallelizing the data or by parallelizing the model. For example, parallelizing data can be performed by using a cluster of machines with the support of a framework like Spark MLlib.
Another option is to change the algorithm and select another one with less complexity as the complexity of an algorithm plays a crucial role in the training time. For example, a support vector machine is considered as a complex algorithm which means that that the training time can grow very fast. So, on large dataset, it is advisable to select another algorithm with less complexity.
The last option is to basically sample the data. On a large dataset, it is possible to sample the dataset using stratification which helps the dataset in keeping its original ratios and characteristics. A decade ago, sampling the data used to be the most popular technic to make the computing time faster. Nowadays, the most popular technic is GPU usage or parallelization.
Why do we normalize the input data in deep neural network?
Normalization of the input is one of the best practices in deep neural network. In general, normalization of the data helps in speeding the learning and getting to convergence faster.
Also, the data becomes more suitable for the activation function, especially the sigmoid function. Now, imagine that the inputs are of different scales (not normalized). The weight of some inputs will be updated faster or larger than the other ones. This might hurt the learning. On another side, it guarantees that there are positive and negative values available as inputs for the next layer and this makes the learning more flexible.
Note that the other type of transformations can achieve the same result than normalizing the input for a deep neural network such as standardization; linearly scale input data, and so on.
When can we consider that we did a good job in a machine learning project?
In machine learning, it is always hard to evaluate if a work or a project has been well-performed. Usually, to answer such a problem, there are an infinite number of ways to obtain a good solution. Also, a job or a project can be infinitely improved but you donât have an infinite amount of time to deliver a project.
In general, the criteria to evaluate if a job done has been good are that it is logical and follows the best practice. The best practice means that the tools and technics used have been approved by the community and are considered as a standard in the industry.
When a data scientist delivers a work, he should not be a perfectionist and should think like an engineer who is trying to solve a practical problem. As an engineer, the data scientist should be result-oriented, focusing on how to get the best outcome in the shortest amount of time.
Usually in a data science project, we apply the lean and agile style where the idea delivers a result fast and iterates to improve the work. So, this means that the data scientist will have to update his work on a regular basis by improving it at each iteration.
Sometimes, it can get very confusing while talking about whether a job has been done well in machine learning, because a good result in accuracy doesnât necessarily mean that your job is good and if you get a bad result in accuracy, it doesnât mean that our job is not good. This is directly related to the problem that you are trying to solve since sometimes, some problems are very hard and it is almost impossible, due to the data, to get good accuracy. This means that even if the accuracy is not strong, you may have done great work.
So, while evaluating a machine learning job, the focus should be on the logic and reasoning behind the work instead of focusing on the accuracy.
When should we use deep learning instead of the traditional machine learning models?
To understand when to use deep learning instead of traditional machine learning, it is important to understand the strengths of deep learning compared to traditional machine learning. Deep learning shows better results than traditional machine learning in image recognition, object detection, speech recognition, and natural language processing. This means that for any task that includes unstructured data, it is better to use deep learning. This is due to the fact that deep learning extracts its own features and patterns by itself which are then adapted to unstructured data such as images.
To treat an image with traditional machine learning, we will have to extract all the relevant features from the image prior to training which is time-consuming and can be inaccurate. So, deep learning is preferred in case it is hard to extract features from the data.
Deep learning can also show a strong advantage in case we have a large amount of data which is not the case for some machine learning algorithms. With large data, deep learning is capable of learning better and showing a better performance. So, when we have a large amount of data, deep learning is preferred over the other machine learning technics.
How much time does it take to become a good data scientist?
A good data scientist is a data scientist that has a good understanding of statistics, mathematics, computer science, and of course, machine learning. A good data scientist is capable of solving any hard problem and finding an optimal solution to any type of problem.
Becoming a good data scientist is a journey as it takes continuously learning new technics and updating your knowledge. Becoming a good data scientist doesnât necessarily require a PhD but it requires discipline and autonomy. It is also a matter of talent since to be able to solve some problems, a good scent is required.
Becoming a decent data scientist requires years of hard work, and therefore, becoming a good data scientist requires you to be a step ahead.
How to evaluate the performance of a model?
Evaluating the performance of a model is one of the most important steps in a machine learning project as it helps in discovering if the trained model is a good model that can be deployed. To evaluate a modelâs performance, we use what is called a metric, either a visual metric or a mathematical metric. Usually, these metrics are called performance metrics.
An evaluation metric is defined based on the type of problem that we are trying to solve. It can be a classification problem, a regression problem, an unsupervised model, image recognition, and so on.
There are several types of evaluation metrics. Some of them are as follows:
- Classification problem: Area Under the Curve (AUC), confusion matrix, accuracy, recall, precision, and F1-score.
- Regression problem: Mean square error, root mean square error, mean absolute error, coefficient of determination, Adjusted R-squared.
Each evaluation metric is unique and has its own strength, so donât hesitate to use multiple evaluation metrics for the same project to evaluate the same machine learning model.
In case of a large dataset, should I sample my data or use distributed computing?
In the past, when a statistician or a data analyst was in front of a large dataset, the most popular technic was to sample the data to apply the machine learning algorithms afterwards. Nowadays, a new technic has emerged which is called distributed computing, and more precisely, data parallelism. This technic helps in using all the data in training a machine learning model.
In term of time consumption, sampling data is faster to setup than distributed computing. So, in case of a small project with limited time for delivery, it is more relevant to use sampling dataset. In cases when the project is a long-length project with a focus on the performance, using distributed computing is more relevant. On another side, if we are trying to apply the deep learning model it is more advisable to use distributed computing to be able to take advantage of the complete dataset.
Distributed computing and data parallelism require strong knowledge in data engineering and computer science. So, itâs a practice that might take time to be setup for a beginner.
How much time should I spend in data transformation?
Data transformation is a process that is performed by data scientists during the data preparation step when the data can be transformed in various ways depending on the format, the type, and the purpose. The most popular data transformations are natural logarithm for continuous target variable to erase a skew in data, the one hot encoding transformation to transform categorical variables int...