Fisher's exact test when resampling from the population

Question

Fisher's exact test when resampling from the population

user249018

2022年5月15日 15:00

I am using different ML classifiers for making predictions related to a binary classification. I would like to compare two groups. Let's call them A and B. The already trained and tested classifier is going to make predictions on the instances of A and B. We may call the counts count(1,A) and count(0,A) for the two classes of all elements of A and accordingly for the elements of B, count(1,B), count(0,B). The straightforward inference test would be the Fisher's exact test in asking the question if the occurrence of class 1 in the group A is statistically occurring more often than it would do it by chance. Please correct me if I am saying anything wrong since I am not a statistician. Now, I would like to complicate things. In order to enhance the claim that there is a significant statistical difference between the groups A and B, I am taking randomly samples of group B from the whole population that exists, and join the new data to the existing and fixed number of elements of A. I am then using this merged dataset for making predictions given the same model(s). I then find the counts of B only, count(1,B), count(0,B) given that the counts of A would not be altered. I repeat this process of resampling (from the population,i.e. this is not bootstrapping) some 20 times. As for group A, I have a single sample at my disposal. I have some p-values for some resampling that are significant and others that are not. I need to make inferences concerning the difference of group A and the whole population B in respect to the occurrence of class 1 and 0.

Is my procedure correct ? How should I proceed in order to make claims that are rigorous ?
A part from the p-values, I am also trying to visualize the data in terms of the enrichment of class 1 in the groups A and B. For this aim, in regard to a single sample I normalize the counts and represent the quotients of A and B as bar diagrams. Since I am doing resampling based on only group B, is it correct to include error bars as a measure of dispersion on both bars corresponding to A and B ? Since the bars are quotients of class 1 out of total number of 1's in A and B, the height of A will also change even though the counts of 1's and 0's of A is fixed.
What other visual data representation would be relevant in my case given this particular inference test ?
A part from Fisher's exact test what other test may come into account given my case ?

Many thanks.

Topic statistics machine-learning

Category Data Science

Fisher's exact test when resampling from the population

About