What is Machine Learning?
Machine learning is an area of study that focuses on making algorithms and methods that allow computers to learn from data and make predictions or take actions without being told to do so. It has gained a lot of attention in the past few years because computers are getting faster and there are more big files to use. Machine learning is used in many fields, including healthcare, banking, marketing, and self-driving cars.
One of the best things about machine learning is that it can use past data to find trends and make predictions. In the area of image identification, for example, a big set of labelled pictures can be used to teach machine learning algorithms how to correctly spot items or find certain traits. In the same way, machine learning methods are used in natural language processing to help computers understand and create human language. This makes possible things like robots and language translation.
In general, there are three main types of machine learning techniques: supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning: Supervised learning is a type of machine learning in which the model learns from training data that has been labelled, so that each training data point has both input features and known output labels. The goal is to learn a mapping function that can correctly guess labels for new data that has never been seen before. Supervised learning can be broken down even further into two subgroups:
Regression: Regression methods are used when the target variable is a continuous function. The idea is to learn a function that can predict an output number that stays the same over time. Linear regression is a famous regression method that fits the data to a straight line so that the gap between the expected values and the real values is as small as possible.
Classification: Classification methods are used when the target variable fits into a set of groups or types that have already been set up. The goal is to learn a code that can put new data into one of the classes. Logistic regression is a common method for classifying data that models the likelihood of belonging to each class.
Unsupervised learning: Unsupervised learning is a type of machine learning in which the model learns from data that hasn't been labelled, which means that the result labels are unknown. The goal is to use the data to find hidden patterns, structures, or connections. Unsupervised learning can be broken down even further into two groups:
Clustering: Algorithms for clustering put together groups of data points that are similar based on their properties. The idea is to find trends or groups in the data without knowing anything about the classes ahead of time. k-means, hierarchical clustering, and DBSCAN are all examples of clustering methods.
Dimensionality Reduction: Techniques for reducing the amount of incoming traits while keeping the important information are used for dimensionality reduction. This is especially helpful when working with data that has a lot of dimensions or when trying to see data in areas with fewer dimensions. Common ways to reduce the number of dimensions are Principal Component Analysis (PCA) and t-SNE (t-distributed Stochastic Neighbour Embedding).
Reinforcement learning: In reinforcement learning, an agent learns how to connect with its surroundings to get the most benefits over time. The agent does things in the external environment, gets input in the form of awards or punishments, and then changes its plans or strategies to fit. Reinforcement learning is often used when an agent needs to learn by making mistakes and then correcting them. It can be used in robots, games, and systems that run on their own.
The agent's goal is to find a policy that will give him or her the most money in the long run. It looks at the surroundings to figure out what the best actions are and finds a good mix between discovery and exploitation to get the best results. Deep reinforcement learning with neural networks are the examples of reinforcement learning methods.
Popular Machine Learning Algorithms:
Linear Regression: Linear Regression is a popular method for supervised learning that is used to make predictions about values that don't change. It assumes that the input features and the goal variable are linked in a linear manner. The goal is to find the line that fits the data best and reduces the gap between what was expected and what happened. Linear regression is used a lot in the social sciences, economics, and business.
Logistic Regression: Logistic regression is a popular classification method in supervised learning. It is used when a target variable is made up of classes that are either categorical or binary. Logical regression, which is different from linear regression, describes the likelihood that a case belongs to a certain class. Logistic regression uses a logistic function (also called a "sigmoid function") to turn the result into a chance number between 0 and 1. If the chance is more than a certain threshold, which is usually 0.5, the case is put into one class. If it is less than that threshold, it is put into the other class.
Support Vector Machine(SVM): Support Vector Machines (SVMs) are powerful supervised learning algorithms that can be used for both classification and regression tasks. SVM tries to find the best hyperplane that divides the data points of different classes by as much as possible. In binary classification, SVM gets the best hyperplane that maximises the distance between the points of different classes that are closest to each other. These points are called "support vectors." SVM can deal with both data that can be separated in a linear way and data that can't be separated in a linear way. It does this by using kernel functions to map the input features into higher-dimensional spaces.
Decision Trees: Decision trees are supervised learning methods that can be used for both classification and regression tasks. They have a structure like a flowchart, where the core nodes are tests on the input features, the branches are the possible feature outputs, and the end nodes are the projected class or value. Decision trees divide the data into groups that are more similar in terms of the target variable by repeatedly splitting the data based on the conditions of the features. The process of splitting keeps going until a stopping point is reached, such as a maximum depth or a minimum amount of samples per leaf. Decision trees are easy to understand and can handle both category and number data.
Links to learn more: