最近学习随即森林分类算法,碰到一个问题,试了各种互联网上的方法,都不能得到正确结果,只好在这里求助大家了.
是这样:test_lables 是测试样本二分类的真实标签,有 692 个样本,test_hat 是预测值,现在我想把这两个合并在一块,组成一个 692*2 的矩阵,每个预测值对应一个真实值。源代码如下:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
#from sklearn import datasets
dataframe = pd.read_csv( "D:/Research/TuPo_sel0.Train.csv", header = None )
train_features = dataframe.iloc[ :, 0:24]
train_lables = dataframe.iloc[:, 24]
test_data = pd.read_csv( "D:/Research/TuPo_sel0.Valid.csv", header = None )
test_features = test_data.iloc[ :, 0:24 ]
test_lables = test_data.iloc[ :, 24 ]
dummy = DummyClassifier( strategy = 'uniform', random_state = 1 )
dummy.fit( train_features, train_lables )
print( "dummy_score =", dummy.score( test_features, test_lables ) )
style = 1
if style == 1:
max_features = 19
n_estimators = 400
randomforest = RandomForestClassifier( max_features = max_features, n_estimators = n_estimators, random_state=1, n_jobs=-1 )
model = randomforest.fit( train_features, train_lables )
test_hat = model.predict( test_features )
test_hat1 = np.hstack( ( test_hat, test_lables ) )
test_hat1.reshape( -1, 2 )
print( test_hat1.shape )
print( test_hat1 )
print( "max_features =", max_features, "; n_estimators =", n_estimators,
"; randomforest_score =", randomforest.score( test_features, test_lables ) )
运算结果如下:
runfile('D:/Python Programs/TryLoadData.py', wdir='D:/Python Programs')
dummy_score = 0.5447976878612717
(1384,)
[0 0 1 ... 0 0 0]
max_features = 19 ; n_estimators = 400 ; randomforest_score = 0.6416184971098265
求教各位怎么修改才能得到正确结果?