Dr. Balázs Pejó

Assistant Professor

pejo (at) crysys.hu

web: www.crysys.hu/~pejo/
office: I.E. 430
tel: +36 1 463 2080

Current courses | Student projects | Publications

Short Bio

Balázs Pejó was born in 1989 in Budapest, Hungary. He received a B.Sc. degree in Mathematics from the Budapest University of Technology and Economics (BME, Hungary) in 2012 and two M.Sc. degree in Computer Science in the Security and Privacy program of EIT Digital from the University of Trento (UNITN, Italy) and Eötvös Loránd University (ELTE, Hungary) in 2014. He earned the Ph.D. degree in Informatics from the University of Luxembourg (UNILU, Luxembourg) in 2019. Currently, he is a member of the Laboratory of Cryptography and Systems Security (CrySyS Lab).

Current Courses

Privacy-Preserving Technologies (VIHIAV35)

The sharing and explotation of the ever-growing data about individuals raise serious privacy concerns these days. Is it possible to derive (socially or individually) useful information about people from this Big Data without revealing personal information?
This course provides a detailed overview of data privacy. It focuses on different privacy problems of web tracking, data sharing, and machine learning, as well as their mitigation techniques. The aim is to give the essential (technical) background knowledge needed to identify and protect personal data. These skills are becoming a must of every data/software engineer and data protection officer dealing with personal and sensitive data, and are also required by the European General Data Protection Regulation (GDPR).

Student Project Proposals

Security and Privacy in/with Machine Learning

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open privacy and security problems of machine learning. Students can work on the following topics:

Required skills: none
Preferred skills: basic programming skills (e.g., python), machine learning (not required)

Economics of cybersecurity and data privacy

As evidenced in the last 10-15 years, cybersecurity is not a purely technical discipline. Decision-makers, whether sitting at security providers (IT companies), security demanders (everyone using IT) or the security industry, are mostly driven by economic incentives. Understanding these incentives are vital for designing systems that are secure in real-life scenarios. Parallel to this, data privacy has also shown the same characteristics: proper economic incentives and controls are needed to design systems where sharing data is beneficial to both data subject and data controller. An extreme example to a flawed attempt at such a design is the Cambridge Analytica case.
The prospective student will identify a cybersecurity or data privacy economics problem, and use elements of game theory and other domain-specific techniques and software tools to transform the problem into a model and propose a solution. Potential topics include:

Required skills: model thinking, good command of English
Preferred skills: basic knowledge of game theory, basic programming skills (e.g., python, matlab, NetLogo)

Publications

2022

Collaborative Drug Discovery: Inference-level Privacy Perspective

B. Pejo, M. Remeli, Á. Arany, M. Galtier, G. Ács

Transactions on Data Privacy (TDP), vol. 15, 2022.

Bibtex | Abstract | PDF | Link

@article {
   author = {Balazs Pejo, Mina Remeli, Ádám Arany, Mathieu Galtier, Gergely Ács},
   title = {Collaborative Drug Discovery: Inference-level Privacy Perspective},
   journal = {Transactions on Data Privacy (TDP)},
   volume = {15},
   year = {2022},
   howpublished = "\url{http://www.tdp.cat/issues21/abs.a449a21.php}"
}

Abstract

Pharmaceutical industry can better leverage its data assets to virtualize drug discovery through a collaborative machine learning platform. On the other hand, there are non-negligible risks stemming from the unintended leakage of participants' training data, hence, it is essential for such a platform to be secure and privacy-preserving. This paper describes a privacy risk assessment for collaborative modeling in the preclinical phase of drug discovery to accelerate the selection of promising drug candidates. After a short taxonomy of state-of-the-art inference attacks we adopt and customize several to the underlying scenario. Finally we describe and experiments with a handful of relevant privacy protection techniques to mitigate such attacks.

Games in the Time of COVID-19: Promoting Mechanism Design for Pandemic Response

B. Pejo, G. Biczók

ACM Transactions on Spatial Algorithms and Systems (TSAS), 2022.

Bibtex | Link

@article {
   author = {Balazs Pejo, Gergely Biczók},
   title = {Games in the Time of COVID-19: Promoting Mechanism Design for Pandemic Response},
   journal = {ACM Transactions on Spatial Algorithms and Systems (TSAS)},
   year = {2022},
   howpublished = "\url{https://dl.acm.org/doi/abs/10.1145/3503155}"
}

Abstract

Guide to Differential Privacy Modifications

B. Pejo, D. Desfontaines

Springer International Publishing (SpringerBriefs), 2022.

Bibtex | Link

@book {
   author = {Balazs Pejo, Damien Desfontaines},
   title = {Guide to Differential Privacy Modifications},
   publisher = {Springer International Publishing (SpringerBriefs)},
   year = {2022},
   howpublished = "\url{https://link.springer.com/book/10.1007/978-3-030-96398-9}"
}

Abstract

Incentives for Individual Compliance with Pandemic Response Measures

B. Pejo, G. Biczók

Enabling Technologies for Social Distancing: Fundamentals, concepts and solutions, (IET), 2022.

Bibtex | PDF | Link

@inproceedings {
   author = {Balazs Pejo, Gergely Biczók},
   title = {Incentives for Individual Compliance with Pandemic Response Measures},
   booktitle = {Enabling Technologies for Social Distancing: Fundamentals, concepts and solutions, (IET)},
   year = {2022},
   howpublished = "\url{https://digital-library.theiet.org/content/books/te/pbte104e}"
}

Abstract

Revenue Attribution on iOS 14 using Conversion Values in F2P Games

F. Ayala-Gómez, I. Horppu, E. Gülbenkoglu, V. Siivola, B. Pejo

AdKDD Workshop at 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (AdKDD) , 2022.

Bibtex | Abstract | PDF | Link

@inproceedings {
   author = {Frederick Ayala-Gómez, Ismo Horppu, Erlin Gülbenkoglu, Vesa Siivola, Balazs Pejo},
   title = {Revenue Attribution on iOS 14 using Conversion Values in F2P Games},
   booktitle = {AdKDD Workshop at 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (AdKDD) },
   year = {2022},
   howpublished = "\url{https://www.adkdd.org/Papers/Show-me-the-Money%3A-Measuring-Marketing-Performance-in-F2P-Games-using-Apple's-App-Tracking-Transparency-Framework/2022}"
}

Keywords

conversion value, revenue attribution, mobile advertising, privacy

Abstract

Mobile app developers use paid advertising campaigns to acquire new users. Based on the campaigns' performance, marketing managers decide where and how much to spend. Apple's new privacy mechanisms profoundly impact how performance marketing is measured. Starting iOS 14.5, all apps must get system permission for tracking explicitly via the new App Tracking Transparency Framework. Instead of relying on individual identifiers, Apple proposed a new performance mechanism called conversion value, an integer set by the apps for each user. The conversion value follows a set of rules and a schema that defines the integers based on the user's in-app behavior. The developers can get the number of installs per conversion value for each campaign. For conversion values to be helpful, we need a method that translates them to revenue. This paper investigates the task of attributing revenue to advertising campaigns using their reported conversion values. Our contributions are to formalize the problem, find the theoretically optimal revenue attribution function for any conversion value schema and show empirical results on past data of a free-to-play mobile game using different conversion value schemas.

Why Fuzzy Message Detection Leads to Fuzzy Privacy Guarantees

I. Seres, B. Pejo, P. Burcsi

22nd Financial Cryptography and Data Security Conference (FC), 2022.

Bibtex | Abstract | Link

@conference {
   author = {Andras Instvan Seres, Balazs Pejo, Peter Burcsi},
   title = {Why Fuzzy Message Detection Leads to Fuzzy Privacy Guarantees},
   booktitle = {22nd Financial Cryptography and Data Security Conference (FC)},
   year = {2022},
   howpublished = "\url{https://fc22.ifca.ai/preproceedings/9.pdf}"
}

Keywords

Fuzzy Message Detection, unlinkability, anonymity, differential privacy, game theory

Abstract

Fuzzy Message Detection (FMD) is a recent cryptographic primitive invented by Beck et al. (CCS’21) where an untrusted server performs coarse message filtering for its clients in a recipient-anonymous way. In FMD — besides the true positive messages — the clients download from the server their cover messages determined by their false- positive detection rates. What is more, within FMD, the server cannot distinguish between genuine and cover traffic. In this paper, we formally analyze the privacy guarantees of FMD from three different angles. First, we analyze three privacy provisions offered by FMD: recipient unlinkability, relationship anonymity, and temporal detection ambiguity. Second, we perform a differential privacy analysis and coin a relaxed definition to capture the privacy guarantees FMD yields. Finally, we simulate FMD on real-world communication data. Our theoretical and empirical results assist FMD users in adequately selecting their false-positive detection rates for various applications with given privacy requirements.

2021

Measuring Contributions in Privacy-Preserving Federated Learning

G. Ács, G. Biczók, B. Pejo

ERCIM NEWS, vol. 126, 2021, pp. 35-36.

Bibtex | Abstract | Link

@article {
   author = {Gergely Ács, Gergely Biczók, Balazs Pejo},
   title = {Measuring Contributions in Privacy-Preserving Federated Learning},
   journal = {ERCIM NEWS},
   volume = {126},
   year = {2021},
   pages = {35-36},
   howpublished = "\url{https://ercim-news.ercim.eu/en126/special/measuring-contributions-in-privacy-preserving-federated-learning}"
}

Abstract

How vital is each participant’s contribution to a collaboratively trained machine learning model? This is a challenging question to answer, especially if the learning is carried out in a privacy-preserving manner with the aim of concealing individual actions.

Property Inference Attacks on Convolutional Neural Networks: Influence and Implications of Target Model's Complexity

M. Parisot, B. Pejo, D. Spagnuelo

18th International Conference on Security and Cryptography (SECRYPT), 2021.

Bibtex | Link

@conference {
   author = {Mathias Parisot, Balazs Pejo, Dayana Spagnuelo},
   title = {Property Inference Attacks on Convolutional Neural Networks: Influence and Implications of Target Model's Complexity},
   booktitle = {18th International Conference on Security and Cryptography (SECRYPT)},
   year = {2021},
   howpublished = "\url{https://www.scitepress.org/Link.aspx?doi=10.5220/0010555607150721}"
}

Abstract

2020

Corona Games: Masks, Social Distancing and Mechanism Design

B. Pejo, G. Biczók

Proc. of ACM SIGSPATIAL Workshop on COVID, ACM, 2020.

Bibtex | Abstract | PDF

@inproceedings {
   author = {Balazs Pejo, Gergely Biczók},
   title = {Corona Games: Masks, Social Distancing and Mechanism Design},
   booktitle = {Proc. of ACM SIGSPATIAL Workshop on COVID},
   publisher = {ACM},
   year = {2020}
}

Abstract

Pandemic response is a complex affair. Most governments employ a set of quasi-standard measures to fight COVID-19 including wearing masks, social distancing, virus testing and contact tracing. We argue that some non-trivial factors behind the varying effectiveness of these measures are selfish decision-making and the differing national implementations of the response mechanism. In this paper, through simple games, we show the effect of individual incentives on the decisions made with respect to wearing masks and social distancing, and how these may result in a sub-optimal outcome. We also demonstrate the responsibility of national authorities in designing these games properly regarding the chosen policies and their influence on the preferred outcome. We promote a mechanism design approach: it is in the best interest of every government to carefully balance social good and response costs when implementing their respective pandemic response mechanism.

2019

Together or Alone: The Price of Privacy in Collaborative Learning

B. Pejo, Q. Tang, G. Biczók

Proceedings on Privacy Enhancing Technologies (PETS 2019), De Gruyter, 2019.

Bibtex | Abstract

@inproceedings {
   author = {Balazs Pejo, , Gergely Biczók},
   title = {Together or Alone: The Price of Privacy in Collaborative Learning},
   booktitle = {Proceedings on Privacy Enhancing Technologies (PETS 2019)},
   publisher = {De Gruyter},
   year = {2019}
}

Abstract

Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have the necessary data to train a reasonably accurate model. For such organizations, a realistic solution is to train their machine learning models based on their joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration. In this paper, by focusing on a two-player setting, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player. Using recommendation systems as our main use case, we demonstrate how two players can make practical use of the proposed theoretical framework, including setting up the parameters and approximating the non-trivial Nash Equilibrium.

2018

POSTER: The Price of Privacy in Collaborative Learning

B. Pejo, Q. Tang, G. Biczók

CCS 2018 Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2018.

Bibtex | Abstract

@inproceedings {
   author = {Balazs Pejo, , Gergely Biczók},
   title = {POSTER: The Price of Privacy in Collaborative Learning},
   booktitle = {CCS 2018 Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security},
   publisher = {ACM},
   year = {2018}
}

Abstract

Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have enough data to train a reasonably accurate model. For such organizations, a realistic solution is to train machine learning models based on a joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration. In this paper, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player.