Upstream Usage Example

In this example we cover advanced usage of BoolXAI.RuleClassifier via upstream classification - i.e., using an ensemble of rule classifiers.

Input data

We’ll start with the same binarized data we used in the Basic Usage Example. In order to speed up the execution, we’ll only use a subset of the data:

[1]:
import numpy as np
from sklearn import set_config
from sklearn import datasets
from sklearn.metrics import balanced_accuracy_score

from boolxai import BoolXAI
from util import BoolXAIKBinsDiscretizer

set_config(transform_output="pandas")

X, y = datasets.load_breast_cancer(return_X_y=True, as_frame=True)

# Use a subset of the data to speed up execution.
# For higher quality results, comment these lines out.
X = X.iloc[:100, :100]
y = y.iloc[:100]

# Binarize the data
binarizer = BoolXAIKBinsDiscretizer(
    n_bins=10, strategy="quantile", encode="onehot-dense"
)
X_binarized = binarizer.fit_transform(X)
X_binarized.head();

Boosting

Boosting defines a meta-classifier in which copies of a classifier are trained iteratively such that they focus on the most difficult samples to predict, given the previously trained classifiers. The final result is obtained by combining the weighted predictions from these sub-classifiers. Boosting can be used to greatly improve the results provided by a weak learner, such as our highly regularized rule classifiers.

As a baseline result, we’ll train and evaluate a rule classifier without boosting:

[2]:
seed = 43
rule_classifier = BoolXAI.RuleClassifier(random_state=seed)
rule_classifier.fit(X_binarized, y)
y_predict = rule_classifier.predict(X_binarized)
score = balanced_accuracy_score(y, y_predict)
print(f"Without boosting: {score=:.2f}")
Without boosting: score=0.91

Now we’ll use sklearn’s AdaBoostClassifier to create a boosted classifier with 5 underlying sub-classifiers:

[3]:
from sklearn.ensemble import AdaBoostClassifier

boosted_rule_classifier = AdaBoostClassifier(
    BoolXAI.RuleClassifier(random_state=seed),
    n_estimators=5,
    algorithm="SAMME",
    random_state=seed,
)
boosted_rule_classifier.fit(X_binarized, y)
y_predict = boosted_rule_classifier.predict(X_binarized)
score = balanced_accuracy_score(y, y_predict)
print(f"With boosting: {score=:.2f}")
With boosting: score=1.00

This score is clearly much higher than the one obtained without boosting. Note that this comes at the price of additional training (and inference) time, but also a higher complexity (lower interpretability). Also, boosted classifiers are more likely to overfit the data. This could be evaluated by comparing the performance on out of sample data (e.g., via cross-validation).

We can print the best rule for each of the sub-classifiers:

[4]:
# Print the best rule for each of the sub-classifiers inside boosted_rule_classifier
for subclassifier in boosted_rule_classifier.estimators_:
    print(subclassifier.best_rule_.to_str(X_binarized.columns))
AtLeast1([worst perimeter<77.186], [compactness error<0.0105], [0.0546<=mean compactness<0.0719], [0.0074<=concave points error<0.0087], [area error<16.119])
Or([0.0049<=smoothness error<0.0055], [565.14<=worst area<648.84], [365.18<=mean area<448.1], [0.0719<=mean compactness<0.0871], [worst radius<12.199])
AtLeast1([87.668<=worst perimeter<96.489], Choose1([perimeter error<1.5335], [15.614<=worst radius<16.5]), [0.2212<=worst concavity<0.2656])
And(~[0.1908<=worst compactness<0.2131], ~[0.0189<=compactness error<0.0245], ~[0.156<=worst concave points<0.1726], ~[worst area>=1713.1], ~[0.034<=concavity error<0.0388])
Or([mean area<365.18], [21.328<=worst texture<23.204], [worst texture<19.063], [0.0899<=worst concavity<0.1394])

They are clearly very different from one another. We can also score each of the sub-classifiers separately:

[5]:
for subclassifier in boosted_rule_classifier.estimators_:
    y_predict = subclassifier.predict(X_binarized)
    score = balanced_accuracy_score(y, y_predict)
    print(f"{score=:.2f}")
score=0.88
score=0.87
score=0.69
score=0.74
score=0.80

and inspect the weight given to each of the sub-classifiers:

[6]:
boosted_rule_classifier.estimator_weights_
[6]:
array([2.19722458, 2.47293046, 1.64179982, 1.61359288, 2.40632095])

Multiclass classification

In this case instead of having two classes (say 0 and 1), we have more than two classes. sklearn provides several ways of converting a binary classifier to a multiclass classifier. We choose to use OneVsRestClassifier rather than OneVsOneClassifier or OutputCodeClassifier since it is far more interpretable. OneVsOneClassifier trains a subclassifier for each class on a binary classification task defined by labels consisting of the chosen class, and a label including all other classes.

First, we load a multiclass classification dataset:

[7]:
from sklearn import datasets

X, y = datasets.load_iris(return_X_y=True, as_frame=True)

print("Unique labels:")
np.unique(y)
Unique labels:
[7]:
array([0, 1, 2])
[8]:
# Inspect the data
print(X.shape)
X.head()
(150, 4)
[8]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

We binarize the data as in the Basic Usage Example, but this time we use a smaller number of bins:

[9]:
# Binarize the data
binarizer = BoolXAIKBinsDiscretizer(
    n_bins=4, strategy="quantile", encode="onehot-dense"
)
X_binarized = binarizer.fit_transform(X)
X_binarized.head()
print(X_binarized.shape)
X_binarized.head()
(150, 16)
[9]:
[sepal length (cm)<5.1] [5.1<=sepal length (cm)<5.8] [5.8<=sepal length (cm)<6.4] [sepal length (cm)>=6.4] [sepal width (cm)<2.8] [2.8<=sepal width (cm)<3.0] [3.0<=sepal width (cm)<3.3] [sepal width (cm)>=3.3] [petal length (cm)<1.6] [1.6<=petal length (cm)<4.35] [4.35<=petal length (cm)<5.1] [petal length (cm)>=5.1] [petal width (cm)<0.3] [0.3<=petal width (cm)<1.3] [1.3<=petal width (cm)<1.8] [petal width (cm)>=1.8]
0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
3 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

With the data in hand, we can now use OneVsRestClassifier to implicitly train multiple rule classifiers, one for each class, combined into a single classifier:

[10]:
from sklearn.multiclass import OneVsRestClassifier

from boolxai import BoolXAI

# Instantiate a multiclass rule classifier and fit it
multiclass_rule_classifier = OneVsRestClassifier(
    BoolXAI.RuleClassifier(random_state=43)
)
multiclass_rule_classifier.fit(X_binarized, y);

We can make predictions and calculate scores as usual with the combined classifier:

[11]:
# Apply Rules
y_predict = multiclass_rule_classifier.predict(X_binarized)
score = balanced_accuracy_score(y, y_predict)
print(f"{score=:.2f}")
score=0.96

We can also print out the best rule for used internally by OneVsRestClassifier for each of the classes:

[12]:
# Print the best rule for each of the sub-classifiers inside multiclass_rule_classifier
for subclassifier in multiclass_rule_classifier.estimators_:
    print(subclassifier.best_rule_.to_str(X_binarized.columns))
Or(And(~[petal width (cm)>=1.8], [sepal width (cm)>=3.3]), [petal length (cm)<1.6], [petal width (cm)<0.3])
And(~[petal width (cm)>=1.8], ~[petal length (cm)<1.6], ~[petal width (cm)<0.3], ~[sepal width (cm)>=3.3], ~[petal length (cm)>=5.1])
Or([petal width (cm)>=1.8], [petal length (cm)>=5.1])

Multilabel classification

In the multilabel (multioutput) case, we have labels that consist of multiple binary numbers. We can fit a classifier to each output using MultiOutputClassifier. To try this out, we first need some data. We can generate a small synthetic dataset with make_multilabel_classification():

[13]:
from sklearn.datasets import make_multilabel_classification

X, y = make_multilabel_classification(
    n_classes=3, n_samples=200, n_features=4, random_state=seed
)
[14]:
print("Unique labels:")
np.unique(y, axis=0)
Unique labels:
[14]:
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 1],
       [1, 0, 0],
       [1, 1, 0],
       [1, 1, 1]])

Let’s inspect the data:

[15]:
print(X.shape)
print(X)
(200, 4)
[[10. 23. 13.  2.]
 [13. 10. 16. 11.]
 [12. 16. 13.  9.]
 [16. 18. 20.  7.]
 [11. 12. 12.  5.]
 [16. 22. 15.  7.]
 [10. 14. 24. 17.]
 [ 8.  7. 16.  9.]
 [21. 13. 15. 10.]
 [20. 19. 15.  9.]
 [ 6. 11. 14. 15.]
 [11. 12. 16.  8.]
 [ 7. 10. 13.  6.]
 [11.  9. 11. 11.]
 [18.  1. 11. 16.]
 [15. 14. 17. 15.]
 [ 6. 19. 13.  3.]
 [13. 11. 12. 15.]
 [10. 10. 10. 11.]
 [12. 16. 16.  2.]
 [ 9. 16. 21.  3.]
 [14. 10. 13. 11.]
 [13. 16. 14.  4.]
 [ 5. 15. 11. 21.]
 [ 8. 13. 10. 14.]
 [11. 12.  9. 12.]
 [14. 28. 18.  2.]
 [12. 19. 17.  4.]
 [10. 23. 19.  6.]
 [18. 10. 16. 10.]
 [11. 19. 26.  1.]
 [22. 14.  7. 17.]
 [10. 26. 22.  5.]
 [ 1. 22. 21.  9.]
 [ 8. 12. 16.  4.]
 [13. 11. 15. 10.]
 [ 4.  9. 23.  3.]
 [11. 16. 19. 15.]
 [16. 12. 18. 12.]
 [ 8. 13. 10. 16.]
 [11. 24. 18.  2.]
 [16. 15. 17.  8.]
 [ 7.  9. 16. 13.]
 [13. 14. 12. 11.]
 [ 8. 19. 12.  1.]
 [14. 20. 10.  1.]
 [11. 18. 20.  4.]
 [ 5. 15. 16. 10.]
 [18. 20. 17.  0.]
 [12. 14. 11. 13.]
 [10. 12. 18.  8.]
 [ 9. 20. 21.  4.]
 [11.  8. 15.  7.]
 [22.  7. 18. 11.]
 [14. 23. 12.  1.]
 [13. 26. 17.  3.]
 [14. 20. 12. 10.]
 [13. 13. 18. 11.]
 [20.  3. 11. 14.]
 [ 9. 15. 13.  2.]
 [ 7. 16. 20.  4.]
 [16. 20. 24.  2.]
 [ 6. 23. 15.  4.]
 [ 8. 24. 18.  3.]
 [15. 10. 18. 14.]
 [16. 16. 11.  2.]
 [18.  8. 19. 12.]
 [ 8. 10.  8.  7.]
 [19.  8. 20. 11.]
 [ 7. 15.  8.  5.]
 [12.  8. 17. 14.]
 [11. 15. 17.  2.]
 [18. 18. 17.  0.]
 [18. 14. 10.  7.]
 [16.  7. 14.  7.]
 [15. 18.  8.  2.]
 [ 8. 17. 23.  3.]
 [12. 12. 13.  6.]
 [18.  0. 15. 16.]
 [10. 12. 26. 11.]
 [11. 15. 13.  3.]
 [11. 21.  9.  1.]
 [10. 18. 14.  6.]
 [15.  9. 11. 14.]
 [ 2. 10. 21.  1.]
 [17. 13. 12.  7.]
 [14.  7. 14. 14.]
 [17. 13. 10.  5.]
 [13. 20.  9.  4.]
 [10. 25. 19.  1.]
 [10. 10.  9.  9.]
 [12. 19. 20.  3.]
 [ 3. 22. 19.  6.]
 [17.  8. 14. 15.]
 [ 5.  7. 17. 11.]
 [10. 15. 18.  6.]
 [11.  7. 18.  9.]
 [14. 15. 10. 11.]
 [17. 20. 10. 21.]
 [11. 14. 19. 14.]
 [14. 15. 18.  9.]
 [ 9. 23. 20.  8.]
 [ 9. 14. 20.  2.]
 [16. 12. 14.  7.]
 [13. 18. 12. 16.]
 [16. 16. 16.  1.]
 [ 9. 17. 16.  2.]
 [15. 10. 21. 10.]
 [13. 11. 18. 15.]
 [ 4. 18. 24.  5.]
 [15. 20. 11.  2.]
 [ 9. 17. 21.  7.]
 [12. 23. 11.  5.]
 [11. 17. 15.  3.]
 [21.  1.  8. 16.]
 [16.  0. 14. 16.]
 [12.  9. 17. 15.]
 [ 7. 23. 20.  4.]
 [19.  9. 20. 10.]
 [ 7. 13. 11.  6.]
 [ 9. 21. 10.  1.]
 [ 8. 10. 23.  4.]
 [ 9. 21. 14.  2.]
 [11. 10. 31.  6.]
 [ 3. 19. 25.  2.]
 [11. 11. 16.  9.]
 [ 8. 22. 13.  5.]
 [12. 18. 14. 11.]
 [14. 21. 20.  5.]
 [17. 12. 23. 10.]
 [13. 14. 15. 12.]
 [10. 20. 16.  6.]
 [11. 16. 17.  6.]
 [13.  9. 17.  6.]
 [10. 23. 19.  7.]
 [14. 11. 14. 13.]
 [ 6. 20. 17.  5.]
 [12. 20. 20.  2.]
 [12. 25. 18. 14.]
 [13. 15. 13.  9.]
 [12. 21. 12.  2.]
 [13.  8. 14. 19.]
 [ 7.  9. 18. 10.]
 [14.  0. 17. 19.]
 [14.  0. 19. 14.]
 [ 4. 10. 15.  9.]
 [16. 25. 11.  2.]
 [ 9. 15. 19.  0.]
 [12.  9. 15.  3.]
 [ 9. 14. 13.  6.]
 [10. 14. 15.  6.]
 [15. 15. 11.  1.]
 [15. 21. 20.  3.]
 [12. 12. 16. 10.]
 [11. 13. 16.  8.]
 [18. 11. 15.  6.]
 [ 9. 22. 13.  5.]
 [ 9. 12. 10. 18.]
 [14. 12. 17. 12.]
 [13.  9.  8. 14.]
 [12. 10.  9. 15.]
 [19. 20. 15. 13.]
 [11. 26. 17.  1.]
 [ 9. 17.  5.  2.]
 [ 9.  8. 19. 12.]
 [10. 25. 11.  0.]
 [17.  7. 17.  6.]
 [ 9. 22. 19.  0.]
 [13. 12. 19.  9.]
 [15. 10. 11. 14.]
 [10. 13.  8.  7.]
 [ 5. 23. 16.  1.]
 [19. 18. 22. 10.]
 [ 5. 11. 16. 12.]
 [12. 13. 14.  7.]
 [11. 15. 19. 13.]
 [11. 12. 16. 12.]
 [14. 10. 16.  8.]
 [14. 10. 24. 11.]
 [ 5. 10. 14. 12.]
 [ 9.  4. 12. 13.]
 [19.  0. 17. 14.]
 [19. 17. 10.  7.]
 [11. 11. 21.  7.]
 [12.  9. 18.  8.]
 [ 8. 14. 15.  9.]
 [17.  2. 14. 21.]
 [16. 13. 19. 11.]
 [11.  9. 18.  7.]
 [12. 18. 13.  1.]
 [ 1. 11. 22.  5.]
 [11. 20. 18.  3.]
 [12. 20. 25.  4.]
 [ 8. 23. 14.  4.]
 [ 6.  9. 10.  5.]
 [12. 27. 13.  3.]
 [14. 14. 13.  6.]
 [20.  9. 16.  8.]
 [10. 19. 14.  3.]
 [16. 20. 18.  4.]]

As before, we binarize the data. We’ll also give the features names, so that we get more intuitive binarized feature names below:

[16]:
import pandas as pd

X = pd.DataFrame(X)
X.columns = ["a", "b", "c", "d"]

# Binarize the data
binarizer = BoolXAIKBinsDiscretizer(
    n_bins=2, strategy="quantile", encode="onehot-dense"
)
X_binarized = binarizer.fit_transform(pd.DataFrame(X))
X_binarized.head()
print(X_binarized.shape)
X_binarized.head()
(200, 8)
[16]:
[a<12.0] [a>=12.0] [b<14.0] [b>=14.0] [c<16.0] [c>=16.0] [d<7.0] [d>=7.0]
0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0
1 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0
2 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0
3 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
4 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0

With the data in hand, we can now use OneVsRestClassifier to implicitly train multiple rule classifiers, one for each label, combined into a single classifier:

[17]:
from sklearn.multioutput import MultiOutputClassifier

from boolxai import BoolXAI

# Instantiate a multilabel rule classifier and fit it
multilabel_rule_classifier = MultiOutputClassifier(
    BoolXAI.RuleClassifier(random_state=43)
)
multilabel_rule_classifier.fit(X_binarized, y);

We can make predictions and calculate scores as usual with the combined classifier. Note, however, that balanced_accuracy_score does not support multilabel classification, so we switch to accuracy_score:

[18]:
from sklearn.metrics import accuracy_score

# Apply Rules
y_predict = multilabel_rule_classifier.predict(X_binarized)
score = accuracy_score(y, y_predict)
print(f"{score=:.2f}")
score=0.38
[19]:
# Print the best rule for each of the sub-classifiers inside multilabel_rule_classifier
for subclassifier in multilabel_rule_classifier.estimators_:
    print(subclassifier.best_rule_.to_str(X_binarized.columns))
AtMost1(~[a<12.0], [c<16.0], [a>=12.0], ~[d<7.0])
AtLeast1(Choose1(~[d<7.0], [c<16.0], ~[a<12.0]), ~[d>=7.0])
AtMost1(AtMost1(~[b>=14.0], ~[a<12.0]), [d<7.0])