[{“metadata”:{},”cell_type”:”markdown”,”source”:”# DTSC670: Foundations of Machine Learning Models\n## Module 3\n## Assignment 6: Classification System Metrics\n\nName:\n\nBegin by writing your name above.”},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Introduction\n\nThe purpose of this assignment is to familiarize you with the metrics used to measure prediction performance in classification systems. Suppose there 20 binary observations whose target values are:\n\n$[1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1]$\n\nSuppose that your machine learning model returns prediction probabilities ([predict_proba()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict_proba) in sklearn) of:\n\n$[0.886, 0.375, 0.174, 0.817, 0.574, 0.319, 0.812, 0.314, 0.098, 0.741, 0.847, 0.202, 0.31 , 0.073, 0.179, 0.917, 0.64 , 0.388, 0.116, 0.72]$\n\n”},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate Model Predictions\n\nBegin by writing a function from scratch called `predict()` that accepts as input a list of prediction probabilities and a threshold value, and computes the final predictions to be output by the model. If a prediction probability value is less than or equal to the threshold value, then the prediction is the negative case (i.e. 0). If a prediction probability value is greater than the threshold value, then the prediction is the positive case (i.e. 1).”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ### “,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, invoke the `predict()` function to calculate the model predictions using the threshold value of 0.5. Create a variable called `thresh` that has the value 0.5, and create a list of all the prediction probabilities listed in the problem statement above. Name the list of provided prediction probablities `probs`, the threshold value `thresh`, and the list of computed model predictions `preds` in your code.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”probs = [0.886,0.375,0.174,0.817,0.574,0.319,0.812,0.314,0.098,0.741,0.847,\n 0.202,0.31,0.073,0.179,0.917,0.64,0.388,0.116,0.72]\nthresh = 0.5\npreds = ### ENTER CODE HERE ###\nprint(\”Model Predictions: \”, preds)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate the Model Accuracy\n\nWrite a function from scratch called `acc_score()` that accepts as input a list of true labels and a list of model predictions, and calculates the model accuracy.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, compute the accuracy score using your function `acc_score`, and pass as input the true labels and the model predictions you calculated above. Create a list called `labels` containing the target values listed in the problem statement above.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”labels = ### ENTER CODE HERE ###\naccuracy = ### ENTER CODE HERE ###\nprint(\”Model Accuracy: \”, accuracy)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, use the Scikit-Learn’s [accuracy_score()](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) function to check that the value you computed using `acc_score()` is correct.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate the Model Error Rate\n\nWrite a function from scratch called `error_rate()` that accepts as input a list of true labels and a list of model predictions, and calculates the model error rate. Your `error_rate()` function should use the `acc_score()` function you previosuly defined.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, compute the model error rate for the true labels and the model predictions. Name the error rate that you calculate `error` in your code.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”error = ### ENTER CODE HERE ###\nprint(\”Model Error Rate: \”, error)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate the Model Precision and Recall\n\nWrite a function from scratch called `prec_recall_score()` that accepts as input a list of true labels and a list of model predictions. `prec_recall_score()` should compute and return _both_ the model precision and recall. Do not use the built-in Scikit-Learn functions `precision_score()`,`recall_score()`, `confusion_matrix()`, or Panda’s `crosstab()` to do this. Instead, you may use those functions after to verify your calculations. We want to ensure that you understand what is going on behind-the-scenes of the precision and recall functions by creating similar ones from scratch.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Use your `prec_recall_score` function to compute `precision` and `recall` for the true labels and the model predictions you calculated previously.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”precision, recall = ### ENTER CODE HERE ###\nprint(\”Precision = \”, precision)\nprint(\”Recall = \”, recall)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, use Scikit-Learn’s `precision_score()` and `recall_score()` to verify that your calculations above are correct:”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”# Sklearn Precision Score\n### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”# Sklearn Recall Score\n### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate $F_\\beta$ Scores\n\nWrite a function from scratch called `f_beta` that computes the $F_\\beta$ measure for any value of $\\beta$. This function must invoke the `prec_recall_score` function you wrote above in order to obtain the values for precision and recall. The function must take as input (in this exact order) the true labels provided, the model predictions you calculated previously, and the value of $\\beta$ you wish to use in the calculation. We defined $F_\\beta$ in class to be:\n\n$ F_\\beta = \\frac{(\\beta^2+1) \\cdot Pr \\cdot Re}{\\beta^2 \\cdot Pr + Re} $”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, use your `f_beta` function to compute the $F_1$ score for the true labels and the model predictions you calculated previously.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”F1 = ### ENTER CODE HERE ###\nprint(\”F1 = \”, F1)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Verify your above calculation is correct by invoking Scikit-Learn’s `f1_score` function.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Calculate the TPR and FPR for ROC Curve\n\nIn the subsequent cells, you will be asked to plot an ROC curve. The ROC curve plots the True Positive Rate (TPR, also called recall) against the False Positive Rate (FPR). Both of these are scalar values, akin to precision and recall.\n\nWrite a function from scratch called `TPR_FPR_score` that is nearly identical to `prec_recall_score` that you wrote previously, which computes and returns TPR and FPR.\n\nTPR and FPR are defined as follows:\n\n$ TPR = recall = \\frac{TP}{TP + FN} $\n\n$ FPR = \\frac{FP}{FP + TN} $”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”# Compute and Plot the ROC Curve\n\nWrite a function from scratch called `roc_curve_computer` that accepts (in this exact order) as input the true labels and prediction probabilities provided in the problem statement, as well as a list of threshold values. The function must compute and return the True Positive Rate (TPR, also called recall) and the False Positive Rate (FPR) (these are both scalar values) for each threshold value in the list that is passed to the function. \n\nThe TPR will be plotted against the FPR in what is called the Receiver Operating Characteristic (ROC) curve. Your task now is to create the plot of the [ROC](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html). \n\nThe function you will write behaves identically to Scikit-Learn’s [roc_curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) function, except that it will take the list of thresholds in as input rather than return them as output. Your function must calculate one value of TPR and one value of FPR for each of the threshold values in the list. You will then take these TPR and FPR values, and plot them against each other to create the ROC curve.\n\nYou must not use any built-in library function to perform the calculation of a performance metric. You may of course use common, built-in Python functions, such as: `range()`, `len()`, et cetera.\n\nBe sure to reuse functions and code segments from your work above!\n\nAs an example, calling the `roc_curve_computer` function with the input `true_labels = [1, 0, 1, 0, 0]`, `pred_probs = [0.875, 0.325, 0.6, 0.09, 0.4]`, and `thresholds = [0.00, 0.25, 0.50, 0.75, 1.00]` yields the output `TPR = [1.0, 1.0, 1.0, 0.5, 0.0]` and `FPR = [1.0, 0.6666, 0.0, 0.0, 0.0]`.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, use your `roc_curve_computer` function along with the threshold values `thresholds = [x/100 for x in range(101)]` to compute the TPR and FPR lists for the provided data.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”thresholds = [x/100 for x in range(101)]\nTPR, FPR = roc_curve_computer(labels, probs, thresholds)”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Use the following funtion to plot the ROC curve. Pass the FPR and TPR that you calculated above as into to the function.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”def plot_roc_curve(fpr, tpr, label=None):\n plt.plot(fpr, tpr, linewidth=2, label=label)\n plt.plot([0, 1], [0, 1], ‘k–‘) # dashed diagonal line\n plt.title(‘Receiver Operating Characteristic’, fontsize=12)\n plt.axis([-0.015, 1.0, 0, 1.02])\n plt.xlabel(‘False Positive Rate (Fall-Out)’, fontsize=12)\n plt.ylabel(‘True Positive Rate (Recall)’, fontsize=12)\n plt.grid(True)\n\nplt.figure(figsize=(6, 4))\nplot_roc_curve(FPR, TPR)\nplt.show()”,”execution_count”:1,”outputs”:[{“ename”:”NameError”,”evalue”:”name ‘plt’ is not defined”,”output_type”:”error”,”traceback”:[“\u001b[1;31m—————————————————————————\u001b[0m”,”\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)”,”\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgrid\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 9\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m—> 10\u001b[1;33m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfigure\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfigsize\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m6\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m4\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 11\u001b[0m \u001b[0mplot_roc_curve\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mFPR\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mTPR\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 12\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshow\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n”,”\u001b[1;31mNameError\u001b[0m: name ‘plt’ is not defined”]}]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Next, compare your plot to the plot generated by Scikit-Learns `roc_curve` function. Use Scikit-Learns `roc_curve` function to calculate the false positive rates and the true positive rates.”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”### ENTER CODE HERE ###”,”execution_count”:null,”outputs”:[]},{“metadata”:{},”cell_type”:”markdown”,”source”:”Pass the false positive rates and the true positive rates obtained above via the Scikit-Learn functions as input to the `plot_roc_curve` function in order to compare ROC curves:”},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:”plt.figure(figsize=(6, 4))\nplot_roc_curve(fpr, tpr)\nplt.show()”,”execution_count”:null,”outputs”:[]},{“metadata”:{“trusted”:true},”cell_type”:”code”,”source”:””,”execution_count”:null,”outputs”:[]}]