Linear Regression

发表于 2018-06-10 更新于 2025-04-29 分类于数据结构与算法阅读次数： 32

事情的发生是有规律性的，这样我们才能预测天气、发射火箭、制造原子弹、将来甚至进行星际旅行。那么这些规律又是如何发现的呢？以前靠人，例如牛顿、爱因斯坦。那么以后会不会由机器来发掘呢？

Linear Regression

场景1 假设你是房地产销售人员，而且你手里有一些已经有一些已经成功售出的历史数据，如 表 1-1 所示。你想根据房屋面积来预测房子的售价，这个时候你应该怎么办？

表 1-1 历史数据

房屋面积(m²)	售价(万)
42	88
55	108
64	130
75	158
82	180
90	192
108	210
123	258

图 1-1 数据分布图

假想函数

观察数据分布图，我们容易发现它很像初中就接触过的线性函数——y=ax+b。这里我们就假设售价与房屋面积的关系为线性函数。即假想函数(Hypothesis Function) 是

有了假想函数，下一步需要做的事情就是确定假想函数中的各个参数的值，或者叫特征值(Features)，这里我们有这两个特征值。

我们分别做如下假设，图 1-2 中表示了各个假想函数的拟合结果:

，得到假想函数
，得到假想函数
，得到假想函数

图 1-2 各个假想函数的拟合结果

代价函数

从 图 1-2 中我们很明显的可以看到，时拟合的结果最好。

提问1 除了用经验来判断拟合结果的好坏，有没有能量化比较的方法呢?

答案是肯定的，这就是代价函数(Cost Function)。

1	注:有的公式符号前面需要加转义字符 \，例如 _，应该写成 \_

。表示假想函数求出来的值减去实际值的平方
。表示从 i=0 累加到 i=n

随着的取值不同，代价函数的结果值如 图 1-3 所示

图 1-3 代价函数的结果

代价函数表示的是假想函数的拟合程度，代价函数求出来的值越小，说明拟合的越成功。现在的问题已经变成了如何求代价函数的最小值。

梯度下降

梯度下降(Gradient Descent)定义如下:

图 1-4 梯度下降的几何含义

需要注意的是:在每次迭代中，应该同时更新参数。在这里需要同时更新

图 1-5 α 取值的影响

实际操作中，我们需要及时调整参数 α，以确保梯度下降算法在合理的时间内收敛。未能收敛或取得最小值的时间太多意味着我们的步长是错误的。

octave 代码实现

图 1-6 J(θ)求偏导

执行结果如下:

X-y-theta

iterator-result

Cost Function

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

m = size(X,1);
predictions = X*theta;
sqrErrors = (predictions-y).^2;

J = 1/(2*m)*sum(sqrErrors);



% =========================================================================

end

Gradient Descent

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    x1 = X(:,2);
    h = theta(1) + (theta(2)*x1);

    theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
    theta_one = theta(2) - alpha * (1/m) * sum((h-y) .* x1);

    theta = [theta_zero; theta_one];


    % ============================================================

    % Save the cost J in every iteration
    J_history(iter) = computeCost(X, y, theta);
    disp(J_history(iter));

end
    disp(min(J_history));
end

Linear Regression

假想函数

代价函数

梯度下降

octave 代码实现

引用