Short Bio
Balazs Pejo (CV) was born in 1989 in Budapest, Hungary. He received a BSc degree in Mathematics from the Budapest University of Technology and Economics (BME, Hungary) in 2012 and two MSc degree in Computer Science in the Security and Privacy program of EIT Digital from the University of Trento (UNITN, Italy) and Eotvos Lorand University (ELTE, Hungary) in 2014. He earned the PhD degree in Informatics from the University of Luxembourg (UNILU, Luxembourg) in 2019. Currently, he is a member of the Laboratory of Cryptography and Systems Security (CrySyS Lab).
List of Courses
Research interests
- Contribution Evaluation
- Inference Attacks
- Differential Privacy
- Robust Learning
- Game Theory
Student Project Proposals
- Robustness-by-SHAP:
Measure via Shapley how much the samples contribute to the robustness of the model (other samples). Similar to https://arxiv.org/pdf/2303.01928v4, which measured the importance of samples towards fairness, we measured the importance towards robustness.
- Post-Training Contribution Evaluation:
Machine unlearning esentially reduces the corresponding contribution scores. Turns out, input sensitivity is a good indicator for contribution, hence, it can be reduced which implicitly reduces the contribution (https://arxiv.org/pdf/2402.15109). Relying on this, we could approximate the Shapley values post-training, or even desing a membership inference attack based on this. Neiither directions are exploredas of now.
- Quaility Inrefence with Robust Aggregation:
Determining the user's value within Federated Learning is not trivial, especially when their privacy is protected, and the security of the model is guaranteed. The CrySyS lab developed a solution to tackle the former (privacy). However, more tests and experiments are needed to determine if it also applies to the latter (security).
- Improvement Prediction:
Is it possible to predict how useful a collaboration will be in advance, and if so, to what extent can it be used to optimally set the desired privacy and security parameters? The CrySyS lab developed a solution to tackle this issue for the vanilla federation mechanism, but more tests and experiments are needed to extend it to advanced techniques.
- Poisoning Shapley:
Contribution measuring techniques, such as the Shapley value, assign values to each participant, reflecting their importance or usefulness for the training. The question naturally arises; by injecting malicious participants into the participant pool, is it possible to manipulate the contribution scores of other participants (i.e., arbitrarily increase or decrease).
- Personalized Shapley Values:
The user's values within Federated Learning depend on the test sets, which might come from different distributions for each client. Hence, they score each other differently; the question is how to resolve this tension and merge them to create a fair scoring scheme, which they all accept.
- FRAP: Capture the Fairness/Robustness/Accuracy/Privacy Trade-Off:
There are clear connection between P and A, R and A, and F and A. It is also known that F, R, and P all influence each other pair-wise. Could these trade-off be measured and could we determine the optimal setting based on some incentives?
- Meta Science:
A handful of high-quality and well-established privacy and security conferences, such as SP, CSS, etc. Could an NLP-based ML model differentiate between those and other non-peer-reviewed papers on ArXiv?
- Testing Data Inference:
The underlying data is separated into training and testing for every ML model. While Membership Inference aims to determine whether a particular data point was part of a training set, currently, there are no known techniques to indicate a data point in the test set. Is it even possible?
- Fairness of Shapley Approximation:
Shapley value is the only fair reward distribution, yet, it is exponentially hard to compute. Hence, there are a handful of approximation mechanisms. The question is which approximation, to what extent does, satisfy the desired fairness properties?
- Amplified DP:
Differential Privacy is a de-facto privacy protection mechanism with various amplification (i.e., privacy guarantee boosting) techniques. It is natural to ask, which combination of amplification technique and corresponding privacy parameters results in the highest utility amongst the (e,d)-DP mechanisms?
- Robust ML:
There are several techniques for how the effect of the malicious participants can be mitigated in Federated Learning (i.e., in the client selection phase and the aggregation phase). It begs the question, which is the optimal combination of techniques and corresponding parameters?
Selection of Thesis and Dissertations
- 2024
- Ádám Horváth (BSc, BME): Freerider Detection via Property Inference (Manuscript)
- 2023
- Frank Marcell (BSc, BME): Altruism in Fuzzy Message Detection (Manuscript)
- 2022
- Nikolett Kapui (BSc, BME): SQLi Detection Using Machine Learning (Manuscript)
- 2021
- András Tótth (BSc, BME): Distributed Approximation of the Shapley Value (Manuscript)
- 2020
- Mathias Parisot (BSc, VU-AMS): Property Inference Attacks on Convolutional Neural Networks (Manuscript)
Program Committees
- [2023-]: Conference on Computer and Communications Security (CCS)
- [2023-]: Artificial Intelligence and Statistics (AISTAT)
- [2021-]: Emerging Security Information, Systems and Technologies (SECUWARE)
- [2020-]: Privacy Enhancing Technologies Symposium (PETS)
- [2020-2022]: Workshop on Privacy in Natural Language Processing (PrivateNLP)
List of Publications
- To Appear
- 2024
- Francesco Regazzoni; Gergely Acs; Albert Zoltan Aszalos; Christos Avgerinos; Nikolaos Bakalos; Josep Ll. Berral; Joppe W. Bos; Marco Brohet; Andrés G. Castillo Sanz; Gareth T. Davies; Stefanos Florescu; Pierre-Elisée Flory; Alberto Gutierrez-Torre; Evangelos Haleplidis; Alice Héliou; Sotirios Ioannidis; Alexander Islam El-Kady; Katarzyna Kapusta; Konstantina Karagianni; Pieter Kruizinga; Kyrian Maat; Zoltán Ádám Mann; Kalliopi Mastoraki; SeoJeong Moon; Maja Nisevic; Balázs Pejó; Kostas Papagiannopoulos; Vassilis Paliuras; Paolo Palmieri; Francesca Palumbo; Juan Carlos Perez Baun; Peter Pollner; Eduard Porta-Pardo; Luca Pulina; Muhammad Ali Siddiqi; Daniela Spajic; Christos Strydis; Georgios Tasopoulos; Vincent Thouvenot; Christos Tselios; Apostolos P. Fournaris: "SECURED for Health: Scaling Up Privacy to Enable the Integration of the European Health Data Space", Design, Automation & Test in Europe Conference & Exhibition (DATE)
- 2023
- Wouter Heyndrickx; Lewis Mervin; Tobias Morawietz; Noé Sturm; Lukas Friedrich; Adam Zalewski; Anastasia Pentina; Lina Humbeck; Martijn Oldenhof; Ritsuya Niwayama; Peter Schmidtke; Nikolas Fechner; Jaak Simm; Ádám Arany; Nicolas Drizard; Rama Jabal; Arina Afanasyeva; Regis Loeb; Shlok Verma; Simon Harnqvist; Matthew Holmes; Balázs Pejó; Maria Telenczuk; Nicholas Holway; Arne Dieckmann; Nicola Rieke; Friederike Zumsande; Djork-Arné Clevert; Michael Krug; Christopher Luscombe; Darren Green; Peter Ertl; Péter Antal; David Marcus; Nicolas Do Huu; Hideyoshi Fuji; Stephen Pickett; Gergely Ács; Eric Boniface; Bernd Beck; Yax Sun; Arnaud Gohier; Friedrich Rippmann; Ola Engkvist; Andreas H. Göller; Yves Moreau; Mathieu N. Galtier; Ansgar Schuffenhauer; Hugo Ceulemans: "MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information", Journal of Chemical Information and Modeling (JCIM)
- Bowen Liu; Balázs Pejó; Qiang Tang: "Privacy-preserving Federated Singular Value Decomposition", MDPI Journal of Applied Sciences (AppSci)
- Balázs Pejó; Nikolett Kapui: "SQLi Detection with ML: a data-source perspectiv", 20th International Conference on Security and Cryptography (SECRYPT)
- Balázs Pejó; Gergely Biczó: "Quality Inference in Federated Learning with Secure Aggregation", IEEE Transactions on Big Data (IEEE TBD)
- Martijn Oldenhof; Gergely Ács; Balázs Pejó; Ansgar Schuffenhauer; Nicholas Holway; Noé Sturm; Arne Dieckmann; Oliver Fortmeier; Eric Boniface; Clément Mayer; Arnaud Gohier; Peter Schmidtke; Ritsuya Niwayama; Dieter Kopecky; Lewis Mervin; Prakash Chandra Rathi; Lukas Friedrich; András Formanek; Péter Antal; Jordon Rahaman; Adam Zalewski; Wouter Heyndrickx; Ezron Oluoch; Manuel Stößel; Michal Vančo; David Endico; Fabien Gelus; Thaïs de Boisfossé; Adrien Darbier; Ashley Nicollet; Matthieu Blottière; Maria Telenczuk; Van Tien Nguyen; Thibaud Martinez; Camille Boillet; Kelvin Moutet; Alexandre Picosson; Aurélien Gasser; Inal Djafar; Antoine Simon; Ádám Arany; Jaak Simm; Yves Moreau; Ola Engkvist; Hugo Ceulemans; Camille Marini; Mathieu Galtier: "Industry-Scale Orchestrated Federated Learning for Drug Discovery", 35th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)
- 2022
- 2021
- 2020
- 2019
- 2017
- 2016