Basic Usage Example

In this example we show the simplest usage of BoolXAI.RuleClassifier.

Input data

First, we need some binarized data to operate on. We’ll use the Breast Cancer Wisconsin (Diagnostic) Data Set, which can be loaded easily using sklearn:

[1]:
from sklearn import datasets

X, y = datasets.load_breast_cancer(return_X_y=True, as_frame=True)

# Inspect the data
print(X.shape)
X.head()
(569, 30)
[1]:
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678

5 rows × 30 columns

Binarizing the data

All features are continuous, so we’ll use sklearn’s KBinsDiscretizer to binarize the data. Note that in general, we expect there to exist a better binarization scheme for most datasets - we use KBinsDiscretizer here only for pedagogical reasons. Unfortunately, KBinsDiscretizer does not return legible feature names. For this reason, below we use a patched version of KBinsDiscretizer which we call BoolXAIKBinsDiscretizer (located in util.py):

[2]:
# Transformers should output Pandas DataFrames
from sklearn import set_config

# Make sure that the transformer will return a Pandas DataFrame,
# so that the feature/column names are maintained.
set_config(transform_output="pandas")

# Binarize the data
from util import BoolXAIKBinsDiscretizer

binarizer = BoolXAIKBinsDiscretizer(
    n_bins=10, strategy="quantile", encode="onehot-dense"
)
X_binarized = binarizer.fit_transform(X)
X_binarized.head()
print(X_binarized.shape)
X_binarized.head()
(569, 300)
[2]:
[mean radius<10.26] [10.26<=mean radius<11.366] [11.366<=mean radius<12.012] [12.012<=mean radius<12.726] [12.726<=mean radius<13.37] [13.37<=mean radius<14.058] [14.058<=mean radius<15.056] [15.056<=mean radius<17.068] [17.068<=mean radius<19.53] [mean radius>=19.53] ... [worst fractal dimension<0.0658] [0.0658<=worst fractal dimension<0.0697] [0.0697<=worst fractal dimension<0.0735] [0.0735<=worst fractal dimension<0.0769] [0.0769<=worst fractal dimension<0.08] [0.08<=worst fractal dimension<0.0832] [0.0832<=worst fractal dimension<0.089] [0.089<=worst fractal dimension<0.0959] [0.0959<=worst fractal dimension<0.1063] [worst fractal dimension>=0.1063]
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
3 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 300 columns

Training a rule classifier

We’ll now train a rule classifier using the default settings:

[3]:
from boolxai import BoolXAI

# Instantiate a BoolXAI rule classifier (for reproducibility, we set a seed)
rule_classifier = BoolXAI.RuleClassifier(random_state=43)

# Learn the best rule
rule_classifier.fit(X_binarized, y);

We can print the best rule found and the score it achieved:

[4]:
print(f"{rule_classifier.best_rule_}, {rule_classifier.best_score_:.2f}")
And(~237, ~209, ~259, ~238, ~236), 0.91

Now that we have a trained classifier, we can use it (and the underlying rule) to make predictions. As a quick check, we predict values for the training data, and then recalculate the score, using the default metric (balanced accuracy):

[5]:
from sklearn.metrics import balanced_accuracy_score

# Apply Rules
y_predict = rule_classifier.predict(X_binarized)
score = balanced_accuracy_score(y, y_predict)
print(f"{score=:.2f}")
score=0.91

We see that the score indeed matches the best score reported by the classifier, in the attribute best_score_. Note that one can use other metrics. However, it’s important to use the same metric for training, by passing it into the rule classifier using the metric argument, as is used for evaluation.

Making sense of the rules

The above rule uses the indices of the features which makes it compact, but it’s also hard to interpret. We can replace the indices with the actual feature names:

[6]:
rule = rule_classifier.best_rule_
feature_names = X_binarized.columns
rule.to_str(feature_names)
[6]:
'And(~[926.96<=worst area<1269.0], ~[worst radius>=23.682], ~[worst compactness>=0.4478], ~[1269.0<=worst area<1673.0], ~[781.18<=worst area<926.96])'

We can also plot the rule using the plot() method. Again, we have the option to pass in the feature names:

[7]:
rule.plot(feature_names)
../_images/_examples_basic_usage_17_0.png

We define the complexity of a rule to be the total number of operators and literals. The depth of a rule is defined as the number of edges in the longest path from the root to any leaf/literal. We can inspect both for the above example:

[8]:
print(f"Complexity: {rule.complexity()}")  # Note: len(rule) gives the same result
print(f"Depth: {rule.depth()}")
Complexity: 6
Depth: 1

For programmatic access it can be convenient to have a dictionary representation of a rule, optionally passing in the feature names:

[9]:
rule.to_dict(feature_names)
[9]:
{'And': ['~[926.96<=worst area<1269.0]',
  '~[worst radius>=23.682]',
  '~[worst compactness>=0.4478]',
  '~[1269.0<=worst area<1673.0]',
  '~[781.18<=worst area<926.96]']}

It’s also possible to get the graph representation of a rule as a NetworkX DiGraph, once again, optionally also passing in the feature names:

[10]:
G = rule.to_graph(feature_names)
print(G)
DiGraph with 6 nodes and 5 edges

We can inspect the data in each node:

[11]:
print(G.nodes(data=True))
[(5095761872, {'label': 'And', 'num_children': 5}), (5095762064, {'label': '~[926.96<=worst area<1269.0]', 'num_children': 0}), (5095761728, {'label': '~[worst radius>=23.682]', 'num_children': 0}), (5095760576, {'label': '~[worst compactness>=0.4478]', 'num_children': 0}), (5095760816, {'label': '~[1269.0<=worst area<1673.0]', 'num_children': 0}), (5095760864, {'label': '~[781.18<=worst area<926.96]', 'num_children': 0})]

The name of each node is set using id(), which returns the memory address of that object. This is done to make sure that the name of each node is unique. Each node contains two additional attributes. The label key contains the label of each node, and num_children is the number of children of that node. Literals have zero children (they are leaves), and operators must have two or more children.