# [Wu Enda Machine Learning Notes] 2. Univariate Linear Regression

2022-11-24 23:08:28

️如果有收获的话,欢迎点赞收藏,您的支持就是我创作的最大动力

## 二、单变量线性回归

Common expression symbols：

Suppose the function passes to find the optimal two parameters,In order to obtain a curve that best fits the data.

### 1. 代价函数

• 定义：通过**代价函数（cost function）**得到的值,来获得最优解,Smaller values ​​represent higher accuracy.
• So we have to find the minimum value of the cost function,So as to get its corresponding parameter value,Then the best fitting curve is obtained.

Among them, dividing by two in front of the formula is convenient for subsequent derivative calculations,This function can solve most regression problems.

This is our linear regression model.

And we can simplify the assumption function,So as to better understand the meaning behind the cost function.

Its cost function image is as follows：

Above we can know when the parameter is equal to 1时,The value of the cost function is the smallest,So bring the argument back into the equation for the hypothetical function,We can get a curve that best fits the data.如果参数更多的话,就会更加复杂,Below is a 3D image of the two parameters：

### 2. 梯度下降

• 定义：We will get the initialization parameters,Then keep looking for smaller ones by changing the parametersJ值.

• 注意
• One of the properties of gradient descent is ,You may end up with two different local optima due to the deviation of the initial position,就如下图

assignment and equal sign

• `：=`Represents assignment in computers,即将b赋值给a
• `=`Represents true and false judgments in the computer,即判断a是否等于b

• 如果α太小,It takes many, many steps to reach the lowest point.
• 如果α太大,It may lead to failure to converge or even diverge,It may go over the bottom.

The derivative part on the far right of the formula is JThe partial derivative of the function with respect to the parameter is the slope of the tangent line,详解如下：

• If the selected parameter is in JThe slope of the tangent line in the function curve is positive,Then the block part of the derivative is also positive,That is, the parameter is subtracted by a positive value,从图像上来看,The direction of parameter reduction goes to the left, that is, to the direction of the lowest point of the curve.
• If the selected parameter is in JThe slope of the tangent line in the function curve is negative,Then the block part of the derivative is also negative,That is, the argument is subtracted by a negative value,That is, add a positive value,从图像上来看,The direction of parameter increase is to the right, that is, to the direction of the lowest point of the curve.

So it can be seen from the image,When the operation has reached a local minimum,The tangent slope is zero,The parameters will not change anymore,如下图：

### 3. 线性回归的梯度下降

• When learning the linear regression model（下图右侧）与梯度下降算法（下图左侧）后,It is to solve the problem of combining the two

Now we bring the linear regression model into the gradient descent algorithm to calculate,Let's calculate the derivative part first,Find the partial expression for the derivative of our parameter.

in the gradient descent function,We may get different local optima due to different initial values,But in the cost function of linear regression,There will always be a minimum and only minimum value,即会得到一个`凸函数（convex function）`,如下图所示：

So just use the gradient descent of the linear regression function,In the end, a global optimal solution will always be obtained,He has no other local optimum.

And this gradient descent problem is shown in the figure below,Constantly changing parameter values,To find the curve that best fits the data.

Batch 梯度下降法

Meaning at each step of gradient descent,It will traverse the entire training set of samples.