back to list

Project: Examples of missing not at random (MNAR) data that are difficult for classifiers

Description

It often occurs in datasets that there is missing data. A good introduction can be found here: https://stefvanbuuren.name/fimd/.

This missingness might be "completely at random" (MCAR). This occurs when the probability of being missing is the same for all cases. An example of MCAR data is a weighing scale that ran out of batteries. MCAR data is typically easy to deal with: we may simply ignore the missing data, and the only drawback is that our inferences will be weaker because we have less data.

If the probability of being missing is the same only within groups, then the data are "missing at random" (MAR). For example, when placed on a soft surface, a weighing scale may produce more missing values than when placed on a hard surface. The assumption of MAR is more realistic but the treatment is more complex.

If neither MCAR nor MAR holds, then we speak of "missing not at random" (MNAR). It means that the probability of being missing varies for reasons that are unknown to us. For example, the weighing scale mechanism may wear out over time, producing more missing data as time progresses, but we may fail to note this.

The aim of this master project is to generate data that are MNAR and that are difficult for classical classifiers. This would identify cases where more robust classifiers will outperform classical classifiers, which would be useful to the imprecise probability society.

Details
Supervisor
Arthur van Camp
Interested?
Get in contact
Link
Thesis