How do I calculate the accuracy rate of predicting “Fail”? Am I supposed to create a confusion matrix?

Question: ABC Open University has a Teaching and Learning Analytics Unit (TLAU) which aims to provide information for data-driven and evidence-based decision making in both teaching and learning in the university. One of the current projects in TLAU is to analyse student data and give advice on how to improve students’ learning performance. The analytics team for this project has collected over 10,000 records of students who have completed a compulsory course ABC411 from 2014 to 2019.

Topic data-mining

Category Data Science


Strictly speaking, calculating accuracy doesn't require the details of a confusion matrix: it's simply the proportion of correct predictions.

Since there are 4 possible classes in this exercise and we are interested only in the accuracy of the class 'fail', this means that the 3 other classes are considered like a single class 'not fail'.

So to obtain the accuracy of fail, sum:

  • the number of students predicted as 'fail' who truly fail (True Positive cases)
  • the numbers of students predicted as 'not fail' who truly don't fail (True Negative cases)

And then divide by the total number of students.


edit to answer comment:

the DT shows for every node the proportion of instances by class, for the subset of data that it receives based on the previous conditions (see a short explanation about DTs here).

The instances are predicted at the level of leaf nodes, i.e. nodes with no children. The leaf node simply assigns the majority class. For example if we take the leaf node "studied_credits>=82.500" (just below the root), the majority class is 'withdrawn'. This means that the 5565 instances in this leaf are predicted 'withdrawn', which means 'not fail' for our purpose. This includes 1120 instances which actually should be 'fail', so this leaf node results in 4445 TNs and 0 TPs (and also 1120 FNs but we are not interested in those for accuracy).

By doing this for every leaf node you should obtain the total number of TPs and TNs. The total number of instances is given in the root node, it's 15370.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.