IMSc Conference on IT Security

We organize a conference for collecting IMSc points in the context of the IT Security BSc course in the spring semester of the 2023/24 academic year at BME. Beyond IMSc point collection, the goal of the conference is to encourage students to deep-dive into some hot topics of IT security, to get familiar with the challenges and recent research results, and to share knowledge with other students in the form of short presentations. We do hope that the conference will shed light on the beauty of the field of IT security and some of its exciting research areas, and it will stimulate both the active participants of the conference and all other students enrolled in the IT security course to engage in further studies in the domain of IT security.

The Call for Papers (CfP) for the conference is available here.

Conference topics

all, uav, cyber-physical-system, vehicle, network-security, power grid, machine-learning, data-evaluation, privacy, economics, malware, binary-similarity, cryptography, machine-learning-security, LLM-security, LLM, copilot, federated-learning, poisoning, password-manager, AAA, OAuth, web-security, Kerberos

Data Quality Evaluation

How to determine which features are the most important? How to measure which data samples are the highest quality? How to identify which datasets (aka participants) in a collaboraton are the most crutial? Contributions Score Computation schemes aim to answer these questions. Without such techniques to allocate the rewards amongst participants, the collaboration could even collapse. Without such mechanisms to find bad-quality data points, the final model could have inferior performance. Without such methods, the best features could remain hidden.

Tags: machine-learning, data-evaluation

References:

Membership Inference Attack

Membership inference attacks aim to determine if a specific data point was part of a machine learning model's training set, which could pose privacy risks in sensitive domains like healthcare. It is the de-facto attack to asses the privacy leakage, so it is regularly used to audit machine learning models. However, its narrow scope and reliance on numerous assumptions raise questions about its ability to provide a comprehensive view, so its results may be misleading by creating a false sense of privacy protection.

Tags: machine-learning, privacy

References:

Prompt Injection in LLM

Large Language Models (LLMs) are a new class of machine learning models that are trained on large text corpora. They are capable of generating text that is indistinguishable from human-written text. The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. There exist several attacks that create adversarial prompts against LLMs.

Tags: machine-learning, machine-learning-security, LLM-security, LLM

References:

Poisoning Code Completion Models (CoPilot)

Large Language Models (LLMs) are a new class of machine learning models that are trained on large text corpora. They are capable of generating text that is indistinguishable from human-written text. One of their most popular application is code completion, where the model completes the source code written by a developer. Developers are found to code up to 55% faster while using such tools. Among these tools, GitHub Copilot is by far the most popular. GitHub Copilot leverages context from the code and comments you write to suggest code instantly. With GitHub Copilot, you can convert comments to code autofill repetitive code, and show alternative suggestions. However, GitHub Copilot is trained on public repositories, and therefore, it is vulnerable to data poisoning; a bad actor may intentionally contaminate the training dataset with malicious code that may trick the model into suggesting similar patterns in your code editor.

Tags: machine-learning, machine-learning-security, copilot, LLM-security

References:

Misbehaving Detection in Federated Learning

Learning systems that require all the data to be fed into a learning model running on a central server pose serious privacy concerns. For example, the transmission of health data across certain organizational boundaries may violate security and privacy rules. Federated Learning (FL) was proposed by Google to mitigate this issue by enabling a group of clients (e.g, different stakeholders) to jointly learn a model while keeping their private data at their local devices. However, FL has been shown vulnerable to model and data poisoning attacks, where one or more malicious clients try to poison the global model by sending carefully crafted local model updates to the central parameter server. These attacks may cause the central model to underperform, more costly to train, or misclassify certain testing samples. There are several schemes that attempt to detect and eliminate such misbehaving clients.

Tags: machine-learning, machine-learning-security, federated-learning, poisoning

References: