我如何在python（pandas）数据框架中获得一列，以查看导致我得出结果的决策树的所有规则？

提问者：小点点

我如何在python（pandas）数据框架中获得一列，以查看导致我得出结果的决策树的所有规则？

我正在sklearn上开发决策树（分类器），它工作得很好，我可以可视化树，并预测我的类。但是我想创建一列（在我的pandas数据框架中），这是在树中获得结果的路径。我的意思是，我想要一个所有规则的串联来得到我的结果，比如：-白色=假，黑色=假，重量=1，价格=5。请问你有什么想法吗？

共1个答案

匿名用户

根据这里的示例，您可以创建应用规则的解释。

估计器。decision_path为您提供获得结果所遵循的节点

is_leaves是一个数组，如果每个节点是一个叶子，即终端（True）或分支/决策（False）

然后，您可以迭代节点\u指示器，以获取已访问的节点

对于每个节点，您可以获得阈值和相关的功能

最后，将该函数应用于数据帧，就完成了。

def get_decision_path(estimator, feature_names, sample, precision=2, is_leaves=None):
    if is_leaves is None:
        is_leaves = get_leaves(estimator)
    feature = estimator.tree_.feature
    threshold = estimator.tree_.threshold

    text = []

    node_indicator = estimator.decision_path([sample])
    node_index = node_indicator.indices[node_indicator.indptr[0]:
                                        node_indicator.indptr[1]]

    for node_id in node_index:
        if is_leaves[node_id]:
            break

        if sample[feature[node_id]] <= threshold[node_id]:
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        text.append('{}: {} {} {}'.format(feature_names[feature[node_id]],
                                          sample[feature[node_id]],
                                          threshold_sign,
                                          round(threshold[node_id], precision)))

    return '; '.join(text)

def get_leaves(estimator):
    n_nodes = estimator.tree_.node_count
    children_left = estimator.tree_.children_left
    children_right = estimator.tree_.children_right
    is_leaves = np.zeros(shape=n_nodes, dtype=bool)
    stack = [(0, -1)]
    while len(stack) > 0:
        node_id, parent_depth = stack.pop()

        if children_left[node_id] != children_right[node_id]:
            stack.append((children_left[node_id], parent_depth + 1))
            stack.append((children_right[node_id], parent_depth + 1))
        else:
            is_leaves[node_id] = True
    return is_leaves

实例

print(get_decision_path(estimator, 
                        ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], 
                        [6.6, 3.0 , 4.4, 1.4]))

花瓣宽度（厘米）：1.4

完整代码

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
from sklearn import tree
import pydotplus
from IPython.core.display import HTML, display

def get_decision_path(estimator, feature_names, sample, precision=2, is_leaves=None):
    if is_leaves is None:
        is_leaves = get_leaves(estimator)
    feature = estimator.tree_.feature
    threshold = estimator.tree_.threshold

    text = []

    node_indicator = estimator.decision_path([sample])
    node_index = node_indicator.indices[node_indicator.indptr[0]:
                                        node_indicator.indptr[1]]

    for node_id in node_index:
        if is_leaves[node_id]:
            break

        if sample[feature[node_id]] <= threshold[node_id]:
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        text.append('{}: {} {} {}'.format(feature_names[feature[node_id]],
                                          sample[feature[node_id]],
                                          threshold_sign,
                                          round(threshold[node_id], precision)))

    return '; '.join(text)


def get_leaves(estimator):
    n_nodes = estimator.tree_.node_count
    children_left = estimator.tree_.children_left
    children_right = estimator.tree_.children_right
    is_leaves = np.zeros(shape=n_nodes, dtype=bool)
    stack = [(0, -1)]
    while len(stack) > 0:
        node_id, parent_depth = stack.pop()

        if children_left[node_id] != children_right[node_id]:
            stack.append((children_left[node_id], parent_depth + 1))
            stack.append((children_right[node_id], parent_depth + 1))
        else:
            is_leaves[node_id] = True
    return is_leaves

# prepare data
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

X = df.iloc[:, 0:4].to_numpy()
y = df.iloc[:, 4].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# create decision tree
estimator = DecisionTreeClassifier(max_leaf_nodes=5, random_state=0)
estimator.fit(X_train, y_train)

# visualize decision tree
dot_data = tree.export_graphviz(estimator, out_file=None,
                                feature_names=iris.feature_names,
                                class_names=iris.target_names,
                                filled=True, rounded=True,
                                special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
svg = graph.create_svg()
display(HTML(svg.decode('utf-8')))

# add explanation to data frame
is_leaves = get_leaves(estimator)
df['explanation'] = df.apply(lambda row: get_decision_path(estimator, df.columns[0:4], row[0:4], is_leaves=is_leaves), axis=1)

df.sample(5, axis=0, random_state=42)


		      
                相关问题
                

																                
					
										   Android：在模块jefied-play-services-测量和jefied-play-services-测量-impl中发现重复类
										   在Hashmap中查找匹配的键/值对
										   如何迭代Hashmap并与同一Hashmap中的其他键进行组合以比较它们的对象
										   HashCode-如果相等的对象碰巧在同一个桶中散列会发生什么？
										   如何防止对数组中类对象的重复引用？
										   JavaHashMap内部数据结构在重新散列期间如何变化？
										   如何以及何时在HashMap中完成重新散列
										   在hashmap或hashtable中重新散列的成本
										   HashMap如何识别内部数组中的哪些位置包含元素？
										   当HashMap增加其大小时，HashMap中值的索引会发生什么？
										   @BeforeClass在ktor测试类中不工作
										   Jest vanilla JavaScript JSDOM刷新失败，切换beforeAll到before每一个后的第二次测试中断
										   在笑话中，定义全局变量是否与在BeforeAll中定义相同？
										   静态编程语言中@BeforeAll的正确解决方法是什么
										   线程“main”java. lang.NoClassDefFoundError中的异常：在Intellij[Spring boot]中
										   线程“main”java. lang.NoClassDefFoundError中的异常：org/apache/log4j/ProvisionNode
										   异步管道是否从服务中定义并从组件变量指向的可观察对象取消订阅？
										   结合主体时不更新在模板中的异步管道可观察
										   组件中的Angular 2重复订阅
										   共享可观察数据的正确方式和地点是什么

我如何在python（pandas）数据框架中获得一列，以查看导致我得出结果的决策树的所有规则？

共1个答案

相关问题

热门标签

微信关注