Data science methods are increasingly applied in everyday life: from music recommendations to automated fraud detection, facial recognition systems, and personalized medicine assistants. These systems can provide benefits but are not without risks. This course provides an introduction to the nascent field of responsible data science. The course is structured around four themes:
- Fairness. Data-driven systems can inherit the existing prejudices embedded in society, resulting in systematic discrimination or other harms. We will study how to define unfairness, how to uncover and quantify it, and how to design machine learning solutions that are fairness-aware by design.
- Transparency. As the complexity of machine learning models increase, it becomes more difficult for humans to understand their behavior. A lack of explainability can be detrimental for assessing the validity of the model, trustworthiness of predictions, and providing transparency towards end-users and other stakeholders. We will cover several techniques that may explain the decision-making logic of machine learning models and how explanations can be (experimentally) evaluated.
- Privacy. As more and more data is being collected, implications of privacy and security are more important than ever. We study the main principles and techniques in data mining for privacy-preserving and secure computations.
- Accountability. The increasing power of the organizations who deploy these systems raises questions about accountability. We will cover several organizational and technical strategies that can be used to audit the development of machine learning models.
The goal of this course is to provide a practical approach, building a bridge between ethical and technical perspectives.