The Matthews correlation coefficient (MCC)
The Matthews correlation coefficient (MCC) is a metric used to evaluate the performance of a binary classification model. It is commonly used when the classes are imbalanced, meaning that there is a significant difference in the number of samples in each class.
The MCC is calculated using the following formula:
MCC = (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
The MCC ranges from -1 to 1, where a value of 1 indicates a perfect prediction, a value of 0 indicates a prediction no better than random, and a value of -1 indicates a completely wrong prediction.
To implement the MCC in Python, you can use the following code:
def mcc(y_true, y_pred):
tp = sum((y_pred == 1) & (y_true == 1))
tn = sum((y_pred == 0) & (y_true == 0))
fp = sum((y_pred == 1) & (y_true == 0))
fn = sum((y_pred == 0) & (y_true == 1))
mcc = (tp * tn - fp * fn) / np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
return mcc
Here, y_true
and y_pred
are the true labels and predicted labels for the data, respectively. The MCC is calculated by counting the number of true and false positives and negatives, then applying the formula.
It is important to note that the MCC can be sensitive to imbalanced classes, and may not always be the best metric to use. It is always a good idea to evaluate a model using a variety of metrics to get a well-rounded understanding of its performance.
Leave a Comment