The precision is the ratio tp/(tp+fp) where tp is the number of true positives and fp the number
of false positives. The precision is intuitively the ability of the classifier not to label as positive a
sample that is negative.
The best value is 1 and the worst value is 0.
Parameters:
actual (Union[List, np.ndarray, pd.Series]) – Binary ground truth (correct) labels (0/1).
predicted (Union[List, np.ndarray, pd.Series]) – Binary predicted labels, as returned by a classifier (0/1).
The recall is the ratio tp/(tp+fn) where tp is the number of true positives and fn the number of
false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.
The best value is 1 and the worst value is 0.
Parameters:
actual (Union[List, np.ndarray, pd.Series]) – Binary ground truth (correct) labels (0/1).
predicted (Union[List, np.ndarray, pd.Series]) – Binary predicted labels, as returned by a classifier (0/1).
If predictions within ANY group are homogeneous, we cannot calculate some of the performance measures
(such as TPR,TNR,FPR,FNR), in this case, NaN is returned.
Parameters:
labels (Union[List, np.ndarray, pd.Series]) – Binary ground truth labels for the provided dataset (0/1).
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
The equality (or lack thereof) of the false negative rates across groups is an important fairness metric.
In practice, this metric is implemented as a difference between the metric value for group 1 and group 2.
\[E[d(X)=0 \mid Y=1, g(X)] = E[d(X)=0, Y=1]\]
Parameters:
labels (Union[List, np.ndarray, pd.Series]) – Binary ground truth labels for the provided dataset (0/1).
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
The equality (or lack thereof) of the false omission rates across groups is an important fairness metric.
In practice, this metric is implemented as a difference between ratio of false negatives to negative examples
in the data set for group 1 and group 2.
\[FOR = FN/N, conditioned on a protected attribute\]
Parameters:
labels (Union[List, np.ndarray, pd.Series]) – Binary ground truth labels for the provided dataset (0/1).
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
We define the predictive equality as the situation when accuracy of decisions is equal across race groups,
as measured by false positive rate (FPR).
Drawing the analogy of gender classification where race is the protected attribute, across all race groups,
the ratio of men incorrectly predicted to be a woman is the same.
More formally,
\[E[d(X)|Y=0, g(X)] = E[d(X), Y=0]\]
Parameters:
labels (Union[List, np.ndarray, pd.Series]) – Binary ground truth labels for the provided dataset (0/1).
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
Idea: Imagine two groups have different ROC curves.
Find the convex hull such that any FPR, TPR pair can be satisfied by either
protected-group-conditional predictor. This might not be possible without randomization [4].
The output of this optimization is a tuple of four probabilities of flipping the likelihood
of a positive prediction to achieve equal FPR & TPR across two groups. We can then apply
these learned mixing rates on new unseen data to achieve fairer distributions of outcomes.
Parameters:
labels (Union[List, np.ndarray, pd.Series]) – Binary ground truth labels for the provided dataset (0/1).
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
likelihoods (Union[List, np.ndarray, pd.Series]) – Scores between 0 and 1 from some black-box classifier.
Apply fairness probabilistic mixing rates to a new dataset.
The idea here is to probabilistically flip a subset of likelihoods and labels in each group
based on learned mixing rates so that we achieve fairer distribution of outcomes.
There is a trade-off between fairness and accuracy of a classifier. In general, repairing fairness metrics
results in lower accuracy, but the relationship is non-linear and data dependent.
Parameters:
predictions (Union[List, np.ndarray, pd.Series]) – Binary predictions from some black-box classifier (0/1).
likelihoods (Union[List, np.ndarray, pd.Series]) – Scores between 0 and 1 from some black-box classifier.
Calculates the AUC using a direct matching method. That is, AUC is calculated for instances where the
actual item the user has seen matches one of the top-k recommendations.
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..auc_batch,auc_acc=auc.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'AUC for this batch: {auc_batch} Overall AUC: {auc_acc}')>>>AUCforthisbatch:0.65OverallAUC:0.68
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..auc_batch,auc_acc=auc.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'AUC for this batch: {auc_batch} Overall AUC: {auc_acc}')>>>AUCforthisbatch:{'auc':0.65,'support':12}OverallAUC:{'auc':0.68,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. AUC currently returns auc and the support used to calculate AUC.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
1. Matching
Calculates the CTR using a direct matching method. That is, CTR is only calculated for instances where the
actual item the user has seen matches the recommendation.
2. Inverse Propensity Score (IPS)
Calculates the IPS, an estimate of CTR with a weighted correction based on how likely an item was to be recommended
by the historic policy if the user saw the item in the historic data.
\(I(\hat{a} = a)\) is a boolean of whether the user-item pair has historic data
\(p(a|x,h)\) is the probability of the item being recommended for the test context given the historic data
3. Doubly Robust Estimation (DR)
Calculates the DR, an estimate of CTR that combines the directly predicted values with a correction based on how
likely an item was to be recommended by the historic policy if the user saw the item in the historic data.
\(I(\hat{a} = a)\) is a boolean of whether the user-item pair has historic data
\(p(a|x,h)\) is the probability of the item being recommended for the test context given the historic data
At a high level, doubly robust estimation combines a direct estimate with an IPS-like correction if historic data is
available. If historic data is not available, the second term is 0 and only the predicted reward is used for the
user-item pair.
IPS and DR implementations are based on: Dudík, Miroslav, John Langford, and Lihong Li.
“Doubly robust policy evaluation and learning.” Proceedings of the 28th International Conference on International
Conference on Machine Learning. 2011. Available as arXiv preprint arXiv:1103.4601
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:0.453OverallCTR:0.316
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:{'ctr':0.453,'support':12}OverallCTR:{'ctr':0.316,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. CTR currently returns ctr and the support used to calculate CTR.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:0.453OverallCTR:0.316
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:{'ctr':0.453,'support':12}OverallCTR:{'ctr':0.316,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. MAP currently returns map and the support used to calculate MAP.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
NDCG measures the ranking of the relevant items with a non-linear, discounted (log2) score per rank. NDCG is
normalized such that the scores are between 0 and 1.
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:0.453OverallCTR:0.316
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:{'ctr':0.453,'support':12}OverallCTR:{'ctr':0.316,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. NDCG currently returns ndcg and the support used to calculate NDCG.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
Precision@k measures the precision of the recommendations when only k recommendations are made to the user. That is,
it measures the ratio of recommendations among the top k items that are relevant.
\[Precision@k = \frac{1}{\left | A \cap P \right |}\sum_{i=1}^{\left | A \cap P \right |} \frac{\left | A_i \cap P_i[1:k] \right |}{\left | P_i[1:k] \right |}\]
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:0.453OverallCTR:0.316
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:{'ctr':0.453,'support':12}OverallCTR:{'ctr':0.316,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. Precision currently returns precision and the support used to calculate
Precision.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
Recall@k measures the recall of the recommendations when only k recommendations are made to the user. That is,
it measures the ratio of relevant items that were among the top k recommendations.
Calculating the extended results for the whole data:
This assumes you are operating on the full data and you want to get the auxiliary information such as the
support in addition to the metric. The information returned depends on the metric. Returns dict.
This assumes that you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes that you want to get the metric by itself. Returns Tuple[float,float].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:0.453OverallCTR:0.316
Calculating the extended results across multiple matches:
This assumes you are operating on batched data, and will therefore call this method multiple times for each
batch. It also assumes you want to get the auxiliary information such as the support in addition to the metric.
The information returned depends on the metric. Returns Tuple[dict,dict].
foractual_responses_batch,recommendations_batchin..ctr_batch,ctr_acc=ctr.get_score(actual_responses_batch,recommendations_batch,accumulate=True,return_extended_results=True)print(f'CTR for this batch: {ctr_batch} Overall CTR: {ctr_acc}')>>>CTRforthisbatch:{'ctr':0.453,'support':12}OverallCTR:{'ctr':0.316,'support':122}
Parameters:
actual_results (pd.DataFrame) – A pandas DataFrame for the ground truth user item interaction data, captured from historical logs.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – If specified, this parameter allows you to pass in minibatches of results and accumulate the metric
correctly across the batches. This reduces the memory footprint and integrates easily with batched
training. If specified, the get_score function will return a tuple of batch results and accumulated
results.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. Recall currently returns recall and the support used to calculate
Recall.
Returns:
metric – The averaged result(s). The return type is determined by the batch_accumulate and
return_extended_results parameters. See the examples above.
Inter-List Diversity@k measures the inter-list diversity of the recommendations when only k recommendations are
made to the user. It measures how user’s lists of recommendations are different from each other. This metric has a
range in \([0, 1]\). The higher this metric is, the more diversified lists of items are recommended to different
users. Let \(U\) denote the set of \(N\) unique users, \(u_i\), \(u_j \in U\) denote the i-th and
j-th user in the user set, \(i, j \in \{1,2,\cdots,N\}\). \(R_{u_i}\) is the binary indicator vector
representing provided recommendations for \(u_i\). \(I\) is the set of all unique user pairs,
\(\forall~i<j, \{u_i, u_j\} \in I\).
By default, the reported metric is averaged over a number of num_runs (default=10) evaluations with each run
using user_sample_size (default=10000) users, to ease the computing process and meanwhile get close
approximation of this metric. When user_sample_size=None, all users will be used in evaluation.
actual_results (Ignored.) – Ignored for calculating Inter-List Diversity while it is kept for making the API design consistent across
different recommender metrics.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – Should not be True for calculating Inter-List Diversity while it is kept for making the API design
consistent across different recommender metrics.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. Inter-list diversity currently returns Inter-ListDiversity and
the support, which is the number of unique users to calculate it.
Returns:
metric – The averaged result(s). The return type is determined by return_extended_results parameters.
Intra-List Diversity@k measures the intra-list diversity of the recommendations when only k recommendations are
made to the user. Given a list of items recommended to one user and the item features, the averaged pairwise cosine
distances of items is calculated. Then the results from all users are averaged as the metric Intra-List Diversity@k.
This metric has a range in \([0, 1]\). The higher this metric is, the more diversified
items are recommended to each user. Let \(U\) denote the set of \(N\) unique users, \(u_i\) denote
the i-th user in the user set, \(i \in \{1,2,\cdots,N\}\). \(v_p^{u_i}\), \(v_q^{u_i}\) are the
item features of the p-th and q-th item in the list of items recommended to \(u_i\),
\(p, q \in \{0,1,\cdots,k-1\}\). \(I^{u_i}\) is the set of all unique pairs of item indices for \(u_i\),
\(\forall~p<q, \{p, q\} \in I^{u_i}\).
By default, the reported metric is averaged over a number of num_runs (default=10) evaluations with each run
using user_sample_size (default=10000) users, to ease the computing process and meanwhile get close
approximation of this metric. When user_sample_size=None, all users will be used in evaluation.
predicted_results (pd.DataFrame) – A pandas DataFrame for the recommended user item interaction data, captured from a recommendation algorithm.
The DataFrame should contain a minimum of two columns, including self._user_id_column, self._item_id_column,
and anything else the metric may need. Each row contains the interaction of one user with one item, and the
scores associated with this interaction. There can be multiple interactions per user, and there can be
multiple users per DataFrame. However, the interactions for a specific user must be contained within a
single DataFrame.
batch_accumulate (bool) – Should not be True for calculating Intra-List Diversity while it is kept for making the API design
consistent across different recommender metrics.
return_extended_results (bool) – Whether the extended results such as the support should also be returned. If specified, the returned results
will be of type dict. Intra-list diversity currently returns Intra-ListDiversity and
the support, which is the number of unique users to calculate it.
Returns:
metric – The averaged result(s). The return type is determined by return_extended_results parameters.