We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x’s and the actual output y’s.
J(theta_0, theta_1) = dfrac {1}{2m} displaystyle sum _{i=1}^m left ( hat{y}_{i}- y_{i} right)^2 = dfrac {1}{2m} displaystyle sum _{i=1}^m left (h_theta (x_{i}) – y_{i} right)^2J(θ0,θ1)=2m1i=1∑m(y^i−yi)2=2m1i=1∑m(hθ(xi)−yi)2
To break it apart, it is frac{1}{2}21 bar{x}xˉ where bar{x}xˉ is the mean of the squares of h_theta (x_{i}) – y_{i}hθ(xi)−yi , or the difference between the predicted value and the actual value.
This function is otherwise called the “Squared error function”, or “Mean squared error”. The mean is halved left(frac{1}{2}right)(21) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the frac{1}{2}21 term. The following image summarizes what the cost function does:
To establish notation for future use, we’ll use x^{(i)}x(i) to denote the “input” variables (living area in this example), also called input features, and y^{(i)}y(i) to denote the “output” or target variable that we are trying to predict (price). A pair (x^{(i)} , y^{(i)} )(x(i),y(i)) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples {(x^{(i)} , y^{(i)} ); i = 1, . . . , m}(x(i),y(i));i=1,…,m—is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.
To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:
When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
With unsupervised learning there is no feedback based on the prediction results.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Non-clustering: The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into “regression” and “classification” problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.
Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about whether the house “sells for more or less than the asking price.” Here we are classifying the houses based on price into two discrete categories.
Example 2:
(a) Regression – Given a picture of a person, we have to predict their age on the basis of the given picture
(b) Classification – Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.
1. Supervised Learning: In a supervised learning model, the algorithm learns on a labeled dataset, to generate expected predictions for the response to new data.
eg: Very classy example is house price prediction, we first need data about houses such as; square foot, no. of rooms, the house has a garden or not, and so on features. We then need to know the prices of these houses ie; class labels. Now data coming from thousands of houses, their features, and prices, we can now train a supervised machine learning model to predict a new house’s price based on past experiences of the model.
Supervised Learning is of two types:
a) Classification: In Classification, a computer program is trained on a training dataset, and based on the training it categorizes the data in different class labels. This algorithm is used to predict the discrete values such as male|female, true|false, spam|not spam, etc.
Eg; Email spam detection, speech recognition, identification of cancer cells, etc.
Types of Classification Algorithms:
Naive Bayes classifier
Decision Trees
Logistic Regression
K-Nearest Neighbours
Support vector machine
Random forest classification
b) Regression: The task of the regression algorithm is to find the mapping function to map input variables(x) to the continuous output variable(y). Regression algorithms are used to predict continuous values such as price, salary, age, marks, etc.
Eg; Weather prediction, house price prediction, fake news detection, etc.
Types of Regression Algorithms:
Simple linear Regression
Multiple linear Regression
polynomial Regression
Decision Tree Regression
Random forest Regression
Ensemble Method
2. Unsupervised Learning: In an unsupervised learning model, the algorithm learns on an unlabeled dataset and tries to make sense by extracting features, co-occurrence, and underlying patterns on its own.
Eg; Anomaly detection, including fraud detection. Another example is Opening emergency hospitals to the maximum prone to accident areas. K-means clustering will group these locations of max prone areas into clusters and define a cluster center(ie;hospital) for each cluster(ie;accident prone areas).
Artificial Intelligence is purely math and scientific exercise but when it becomes computational, it starts to solve human problems.
Machine Learning is a subset of Artificial Intelligence. ML is the study of computer algorithms that improve automatically through experience. ML explores the study and construction of algorithms that can learn from data and make predictions on data. Based on more data, machine learning can change actions and responses which will make it more efficient, adaptable, and scalable.
Deep Learning is a technique for implementing machine learning algorithms. It uses Artificial Neural Networks for training data to achieve highly promising decision making. The neural network performs micro calculations with computational on many layers and can handle tasks like humans.
Types of Machine Learning
1. Supervised Learning: In a supervised learning model, the algorithm learns on a labeled dataset, to generate expected predictions for the response to new data.
Eg; For House price prediction, we first need data about houses such as; square foot, no. of rooms, the house has a garden or not, and so on features. We then need to know the prices of these houses ie; class labels. Now data coming from thousands of houses, their features, and prices, we can now train a supervised machine learning model to predict a new house’s price based on past experiences of the model.
Supervised Learning is of two types:
a) Classification: In Classification, a computer program is trained on a training dataset, and based on the training it categorizes the data in different class labels. This algorithm is used to predict the discrete values such as male|female, true|false, spam|not spam, etc.
Eg; Email spam detection, speech recognition, identification of cancer cells, etc.
Types of Classification Algorithms:
Naive Bayes classifier
Decision Trees
Logistic Regression
K-Nearest Neighbours
Support vector machine
Random forest classification
b) Regression: The task of the regression algorithm is to find the mapping function to map input variables(x) to the continuous output variable(y). Regression algorithms are used to predict continuous values such as price, salary, age, marks, etc.
Eg; Weather prediction, house price prediction, fake news detection, etc.
Types of Regression Algorithms:
Simple linear Regression
Multiple linear Regression
polynomial Regression
Decision Tree Regression
Random forest Regression
Ensemble Method
2. Unsupervised Learning: In an unsupervised learning model, the algorithm learns on an unlabeled dataset and tries to make sense by extracting features, co-occurrence, and underlying patterns on its own.
Eg; Anomaly detection, including fraud detection. Another example is Opening emergency hospitals to the maximum prone to accident areas. K-means clustering will group these locations of max prone areas into clusters and define a cluster center(ie;hospital) for each cluster(ie;accident prone areas).
Types of Unsupervised Learning:
Clustering
Anomaly detection
Association
Autoencoders
Latent variable models
Neural Networks
3. Reinforcement Learning: Reinforcement learning is a type of machine learning where the model learns to behave in an environment by performing some actions and analyzing the reactions. RL takes appropriate action in order to maximize the positive response in the particular situation. The reinforcement model decides what actions to take in order to perform a given task that’s why it is bound to learn from the experience itself.
Eg; Lets take an example of a baby when she is learning how to walk. In the first case, when the baby starts walking and makes it to the chocolate since the chocolate is the end goal for the baby and the response of a baby is positive as she is happy. In the second case, when the baby starts walking and while walking she gets hit by the chair and couldnot reach to the chocolate then she starts crying which is a negative response. It is to say that how we human learn from trail and error. Here, the baby is “agent” , chocolate is the “reward” and many hurdles in between. Now the agent tries several ways and finds out the best possible path to reach the reward.