寻找sklearn决策树分类器的随机状态

提问者：小点点

寻找sklearn决策树分类器的随机状态

我有一些数据，我正在拟合一个sk学习决策树分类器。因为分类器使用了一点随机性，所以我运行了几次并保存了最佳模型。然而，我希望能够重新训练数据，并在不同的机器上获得相同的结果。

在我为每个分类器训练模型之后，有没有办法找出初始的随机状态？

编辑sklearn模型有一个名为get_params（）的方法，显示输入内容。但是对于random\u state它仍然表示None。然而，根据文档，在这种情况下，它使用numpy生成一个随机数。我想弄清楚那个随机数是什么

共2个答案

匿名用户

你必须传递一个显式的随机状态给d-树构造函数：

>>> DecisionTreeClassifier(random_state=42).get_params()['random_state']
42

将其保留为默认值None意味着fit方法将使用numpy。random的单例随机状态，它是不可预测的，在不同的运行中也不相同。

匿名用户

我建议你最好使用随机森林来达到这个目的——随机森林包含许多根据你的预测器子集建模的树。然后，您可以通过简单地使用R随机森林变量ame.estimators_< /code>来查看模型中使用的random_states

我将在这里使用我的代码作为示例：

with open('C:\Users\Saskia Hill\Desktop\Exported\FinalSpreadsheet.csv', 'rb') as csvfile:
    titanic_reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    row = titanic_reader.next()
    feature_names = np.array(row)

    # Load dataset, and target classes
    titanic_X, titanic_y = [], []
    for row in titanic_reader:  
    titanic_X.append(row)
    titanic_y.append(row[11]) # The target values are your class labels

    titanic_X = np.array(titanic_X)
    titanic_y = np.array(titanic_y)
    print titanic_X, titanic_y

print feature_names, titanic_X[0], titanic_y[0]
titanic_X = titanic_X[:, [2,3,4,5,6,7,8,9,10]] #these are your predictors/ features
feature_names = feature_names[[2,3,4,5,6,7,8,9,10]]

from sklearn import tree

rfclf = RandomForestClassifier(criterion='entropy', min_samples_leaf=1,  max_features='auto', max_leaf_nodes=None, verbose=0)

rfclf = rfclf.fit(titanic_X,titanic_y)

rfclf.estimators_     #the output for this is pasted below:

[DecisionTreeClassifier(compute_importances=None, criterion='entropy',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_density=None, min_samples_leaf=1, min_samples_split=2,
        random_state=1490702865, splitter='best'),
DecisionTreeClassifier(compute_importances=None, criterion='entropy',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_density=None, min_samples_leaf=1, min_samples_split=2,
        random_state=174216030, splitter='best') ......

因此，随机林将随机性引入到决策树文件中，并且不需要对决策树使用的初始数据进行调整，但它们作为交叉验证的方法，为您的数据准确性提供了更大的信心（特别是如果像我一样，您有一个小数据集）。


		      
                相关问题
                

																                
					
										   AngularJS-$销毁是否删除事件侦听器？
										   您是否需要取消订阅Angular中的路由器参数？
										   Angular2路由器（@angular2/router），如何设置默认路由？
										   @组件的Angular@取消订阅装饰器
										   Spring：404错误仅以vo类作为Spring控制器中的参数
										   返回415不支持的媒体类型REST客户端的响应状态
										   编译器如何为类分配内存？
										   sizeof（）值是由编译器还是链接器决定的？
										   如何禁用RBP帧指针寄存器优化GCC时使用-O*？
										   ARM帧指针寄存器（r11）不断变化
										   为什么x86架构使用两个堆栈寄存器（esp； ebp）？
										   通过修改LLVM后端Clobber X86寄存器
										   Python不和谐音乐机器人停止播放任何歌曲几分钟
										   断开音乐机器人与语音频道的连接
										   Discord.py-音乐机器人队列命令
										   不和谐机器人帮助命令[discord.py]
										   如何使用Discord.py制作不和谐音乐机器人
										   discord.py音乐机器人：如何组合播放和队列命令
										   不和谐机器人无法读取命令
										   为什么我的不和谐机器人只执行我的命令一次，而且只执行一次？

寻找sklearn决策树分类器的随机状态

共2个答案

相关问题

热门标签

微信关注