current position:Home>[Wu Enda Machine Learning Notes] 2. Univariate Linear Regression

[Wu Enda Machine Learning Notes] 2. Univariate Linear Regression

2022-11-24 23:08:28Pandaconda

个人博客:https://blog.csdn.net/Newin2020?spm=1011.2415.3001.5343
专栏定位:In-class notes for students studying Wu Enda's machine learning videos.
专栏简介:在这个专栏,I'll be organizing my notes for all the content of Andrew Ng's machine learning videos,方便大家参考学习.
视频地址:吴恩达机器学习系列课程
️如果有收获的话,欢迎点赞收藏,您的支持就是我创作的最大动力

二、单变量线性回归

Common expression symbols:

在这里插入图片描述

假设函数(Hypthesis)

在这里插入图片描述

Suppose the function passes to find the optimal two parameters,In order to obtain a curve that best fits the data.

1. 代价函数

  • 定义:通过**代价函数(cost function)**得到的值,来获得最优解,Smaller values ​​represent higher accuracy.
  • So we have to find the minimum value of the cost function,So as to get its corresponding parameter value,Then the best fitting curve is obtained.

平方误差代价函数(The squared error dost function)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7iNiodUz-1669286287654)(吴恩达机器学习.assets/image-20211025200229839.png)]

Among them, dividing by two in front of the formula is convenient for subsequent derivative calculations,This function can solve most regression problems.

This is our linear regression model.

And we can simplify the assumption function,So as to better understand the meaning behind the cost function.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-i2bsNN3Z-1669286287657)(吴恩达机器学习.assets/image-20211025205244393.png)]

Its cost function image is as follows:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-n4Dtm1ei-1669286287661)(吴恩达机器学习.assets/20161019210000001.png)]

Above we can know when the parameter is equal to 1时,The value of the cost function is the smallest,So bring the argument back into the equation for the hypothetical function,We can get a curve that best fits the data.如果参数更多的话,就会更加复杂,Below is a 3D image of the two parameters:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hYGUvNfk-1669286287663)(吴恩达机器学习.assets/20161019210322838.png)]

小结

因此,对于回归问题,We just boil down to finding the minimum value of the cost function,Below is my objective function for linear regression.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Y77m3Tqt-1669286287667)(吴恩达机器学习.assets/20161019210740285.png)]

2. 梯度下降

  • 定义:We will get the initialization parameters,Then keep looking for smaller ones by changing the parametersJ值.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EuajsClb-1669286287670)(吴恩达机器学习.assets/image-20211027220610954.png)]

  • 注意
    • One of the properties of gradient descent is ,You may end up with two different local optima due to the deviation of the initial position,就如下图

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wMXbicA4-1669286287671)(吴恩达机器学习.assets/image-20211027220759265.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-w2HureQE-1669286287672)(吴恩达机器学习.assets/image-20211027220836846.png)]

梯度下降算法(Gradient Descent Algorithm)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3UGBXJEo-1669286287673)(吴恩达机器学习.assets/image-20211027224629083.png)]

assignment and equal sign

  • :=Represents assignment in computers,即将b赋值给a
  • =Represents true and false judgments in the computer,即判断a是否等于b

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kG2cIGA4-1669286287674)(吴恩达机器学习.assets/image-20211027231556413.png)]

算式中αUsed to control the rate of descent,The larger the value, the faster the gradient decreases,但是αThe value of can't be too big or too small,原因如下:

  • 如果α太小,It takes many, many steps to reach the lowest point.
  • 如果α太大,It may lead to failure to converge or even diverge,It may go over the bottom.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2YOdlNyQ-1669286287675)(吴恩达机器学习.assets/image-20211027230242584.png)]

在梯度下降算法中,The parameters are updated at the same time,That is, the left side of the figure below is the correct operation,The right side is incorrect operation.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RfrvFQgV-1669286287676)(吴恩达机器学习.assets/image-20211027224134036.png)]

The derivative part on the far right of the formula is JThe partial derivative of the function with respect to the parameter is the slope of the tangent line,详解如下:

  • If the selected parameter is in JThe slope of the tangent line in the function curve is positive,Then the block part of the derivative is also positive,That is, the parameter is subtracted by a positive value,从图像上来看,The direction of parameter reduction goes to the left, that is, to the direction of the lowest point of the curve.
  • If the selected parameter is in JThe slope of the tangent line in the function curve is negative,Then the block part of the derivative is also negative,That is, the argument is subtracted by a negative value,That is, add a positive value,从图像上来看,The direction of parameter increase is to the right, that is, to the direction of the lowest point of the curve.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-I5YDV6er-1669286287677)(吴恩达机器学习.assets/image-20211027225214043.png)]

So it can be seen from the image,When the operation has reached a local minimum,The tangent slope is zero,The parameters will not change anymore,如下图:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wWKOa52K-1669286287678)(吴恩达机器学习.assets/image-20211027230736034.png)]

小结

综上所述,当αwithin the normal range and unchanged,The function can still find local minima,Because the closer to the lowest point,The slope of the tangent line will be smaller until it is equal to zero,So the magnitude of its reduction will be smaller and smaller,until a local minimum is reached.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nMiqACrW-1669286287679)(吴恩达机器学习.assets/image-20211027231304113.png)]

3. 线性回归的梯度下降

  • When learning the linear regression model(下图右侧)与梯度下降算法(下图左侧)后,It is to solve the problem of combining the two

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SThn9F5Q-1669286287680)(吴恩达机器学习.assets/image-20211028195134710.png)]

Now we bring the linear regression model into the gradient descent algorithm to calculate,Let's calculate the derivative part first,Find the partial expression for the derivative of our parameter.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VjV4OAbA-1669286287681)(吴恩达机器学习.assets/image-20211028195248996.png)]

最后,Bring back the formal expression for the parameter,就如下所示:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-drHpiB8y-1669286287683)(吴恩达机器学习.assets/image-20211028195811654.png)]

in the gradient descent function,We may get different local optima due to different initial values,But in the cost function of linear regression,There will always be a minimum and only minimum value,即会得到一个凸函数(convex function),如下图所示:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8ncRLeci-1669286287684)(吴恩达机器学习.assets/image-20211028200307499.png)]

So just use the gradient descent of the linear regression function,In the end, a global optimal solution will always be obtained,He has no other local optimum.

And this gradient descent problem is shown in the figure below,Constantly changing parameter values,To find the curve that best fits the data.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OAzFyYKH-1669286287685)(吴恩达机器学习.assets/image-20211028200711348.png)]

小结

总的来说,The algorithm we used above is as follows:

Batch 梯度下降法

Meaning at each step of gradient descent,It will traverse the entire training set of samples.

这是目前来说,Our first learned machine algorithm.

copyright notice
author[Pandaconda],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/328/202211242306246890.html

Random recommended