深入探究sklearnsvr

一、簡介

支持向量回歸 (Support Vector Regression, SVR) 是一種回歸分析的方法，和 SVM 相關。SVR 同樣採用 SVM 中的技巧，使用核函數，最終目的是求解支持向量。與 SVM 不同的是，SVR 對每個樣本都有一個不同的目標函數和不同的損失函數。

在 Scikit-learn 中，SVR 實現就是 sklearn.svm.SVR。Scikit-learn 中也提供了 Decision Tree、Gradient Boosting、Random Forest、K-Neighbor 等一系列回歸分析的方法，而 SVM 是其中一種，它是在高維空間指定樣本之間的最大間隔用分離超平面來實現分類的。

二、使用

使用 SVR 進行回歸分析需要採取下面幾個步驟：

1. 導入數據

import pandas as pd

dataset=pd.read_csv("sample.csv")
X=dataset.iloc[:,:-1].values
y=dataset.iloc[:,-1].values

2. 訓練模型

from sklearn.svm import SVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error 

model=make_pipeline(StandardScaler(),SVR(C=1.0, epsilon=0.2))
model.fit(X,y)

y_pred=model.predict(X)
error=mean_squared_error(y,y_pred)
print('MSE: %.3f' % error)

3. 評估模型

評估模型時，可以使用 Scikit-learn 中的 scoring 函數，比如 r2_score、mean_squared_error、mean_absolute_error 等。

from sklearn.metrics import r2_score

r_square=r2_score(y, y_pred)
print('R-Square: %.3f' % r_square)

三、參數

1. C 參數

C 參數是 SVM 優化問題的懲罰因子，它決定了訓練誤差的容忍度。

當 C 越小，模型會容忍更大的誤差，但泛化誤差可能會受到影響；當 C 越大，模型會儘可能去減少訓練誤差，但泛化誤差也可能會變大。

model=SVR(C=1.0)

2. kernel 參數

kernel 參數是用於指定用於執行非線形特徵映射的內核類型。Scikit-learn 中提供了四種內核類型：

linear：線性核函數
poly：多項式核函數
rbf：徑向基核函數
sigmoid：Sigmoid 核函數

model=SVR(kernel='rbf')

四、優化

1. Grid Search

Grid Search 是一種超參數優化演算法，它會嘗試用不同的超參數組合來訓練模型，並選擇使評分最佳的超參數組合。在 Scikit-learn 中，可以使用 GridSearchCV 讓算機自動搜索超參數組合。

from sklearn.model_selection import GridSearchCV 

param_grid={'C':[1,10],'kernel':['rbf','linear']}

grid=GridSearchCV(SVR(),param_grid,refit=True,verbose=3)

grid.fit(X,y)

print(grid.best_params_)
print(grid.best_estimator_)

2. Random Search

Random Search 和 Grid Search 類似，不過它是隨機選擇超參數，並在一定次數里選取最優超參數。在 Scikit-learn 中，可以使用 RandomizedSearchCV 讓算機自動搜索超參數組合。

from sklearn.model_selection import RandomizedSearchCV

param_dist={'C':[0.1,0.5,1],'kernel':['rbf','linear']}

rand=RandomizedSearchCV(SVR(),param_distributions=param_dist,n_iter=10,refit=True,verbose=3)

rand.fit(X,y)

print(rand.best_params_)
print(rand.best_estimator_)

五、總結

本文介紹了 SVM 的回歸分析方法 SVR，並詳細介紹了 SVR 的使用、參數和優化。在實際應用中，我們可以選擇不同的核函數、適度調整參數，來得到更好的回歸分析結果。

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/293657.html