Publications

List of Publications

Dynamic k-anonymity for Electronic Health Records: A Topological Framework

With the rapid digitization of Electronic Health Records (EHRs), fast and adaptive data anonymization methods have become crucial. While topological data analysis (TDA) tools have been proposed to anonymize static datasets—creating multiple generalizations for different anonymization needs from a single computation—their application to dynamic datasets remains unexplored. Our work adapts existing methodologies to dynamic settings by developing an improved version of weighted persistence barcodes that track higher-dimensional holes in data, allowing real-time editing of persistence information.

A Swaminathan, M Akgün

Links:

DPM 2024

A privacy-preserving approach for cloud-based protein fold recognition

The complexity and cost of training machine learning models have made cloud-based machine learning as a service (MLaaS) attractive for businesses and researchers. MLaaS eliminates the need for in-house expertise by providing pre-built models and infrastructure. However, it raises data privacy and model security concerns, especially in medical fields like protein fold recognition. We propose a secure three-party computation-based MLaaS solution for privacy-preserving protein fold recognition, protecting both sequence and model privacy.

AB Ünal, N Pfeifer, M Akgün

Links:

Cell Patterns 101023 (2024)

GitHub Repository

FHAUC: Privacy Preserving AUC Calculation for Federated Learning using Fully Homomorphic Encryption

Current research on federated learning primarily focuses on preserving privacy during the training phase. However, model evaluation has not been adequately addressed, despite the potential for significant privacy leaks during this phase as well. In this paper, we demonstrate that the state-of-the-art AUC computation method for federated learning systems, which utilizes differential privacy, still leaks sensitive information about the test data while also requiring a trusted central entity to perform the computations.

CA Baykara, AB Ünal, M Akgün

Links:

arxiv:2403.14428 (2024)

Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

Leveraging the power of robust representation learning, our novel framework advances the field of privacy-preserving machine learning (ppML). Traditional ppML techniques either compromise speed or model performance to safeguard data privacy. Our solution employs multi-objective-trained autoencoders to optimize the balance between data utility and privacy. By sharing only the encoded data form, we enable secure utilization of third-party services for intensive model training and hyperparameter tuning. Our empirical validation, across both unimodal and multimodal settings makes data sharing both efficient and confidential.

S Ouaari, AB Ünal, M Akgün, N Pfeifer

Links:

arxiv:2309.04427 (2023)

A Privacy-Preserving Federated Learning Approach for Kernel methods

We’ve introduced FLAKE, a privacy-preserving Federated Learning Approach for Kernel methods on horizontally distributed data. By allowing data sources to mask their data, a centralized instance can generate a Gram matrix, preserving privacy while enabling the training of kernel-based algorithms like Support Vector Machines. We’ve established that FLAKE safeguards against semi-honest adversaries learning the input data or the number of features. Testing on clinical and synthetic data confirms FLAKE’s superior accuracy and efficiency over similar methods. Its data masking and Gram matrix computation times are significantly less than SVM training times, making it highly applicable across various use cases.

A Hannemann, AB Ünal, A Swaminathan, E Buchmann, M Akgün

Links:

IEEE TPS p82-90 (2023)

CECILIA: Comprehensive Secure Machine Learning Framework

We propose a secure 3-party computation framework, CECILIA, offering PP building blocks to enable complex operations privately. In addition to the adapted and common operations like addition and multiplication, it offers multiplexer, most significant bit and modulus conversion. The first two are novel in terms of methodology and the last one is novel in terms of both functionality and methodology. CECILIA also has two complex novel methods, which are the exact exponential of a public base raised to the power of a secret value and the inverse square root of a secret Gram matrix.

AB Ünal, N Pfeifer, M Akgün

Links:

arXiv:2202.03023 (2022)

Efficient privacy-preserving whole-genome variant queries

Disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. We propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner.

M Akgün, N Pfeifer, O Kohlbacher

Links:

Bioinformatics 38,8 (2022)

Escaped: Efficient secure and private dot product framework for kernel-based machine learning algorithms with applications in healthcare

We introduce ESCAPED, which stands for Efficient SeCure And PrivatE Dot product framework. ESCAPED enables the computation of the dot product of vectors from multiple sources on a third-party, which later trains kernel-based machine learning algorithms, while neither sacrificing privacy nor adding noise.

AB Ünal, M Akgün, N Pfeifer

Links:

AAAI 35, 11 (2021)

ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves with Secure 3-Party Computation

In this paper, we propose an MPC-based framework, called PPAURORA with private merging of sorted lists and novel methods for comparing two secret-shared values, selecting between two secret-shared values, converting the modulus, and performing division to compute the exact AUC as one could obtain on the pooled original test samples.

AB Ünal, N Pfeifer, M Akgün

Links:

arXiv:2102.08788 (2021)

Identifying disease-causing mutations with privacy protection

We present an approach to identify disease-associated variants and genes while ensuring patient privacy. The proposed method uses secure multi-party computation to find disease-causing mutations under specific inheritance models without sacrificing the privacy of individuals. It discloses only variants or genes obtained as a result of the analysis. Thus, the vast majority of patient data can be kept private.

M Akgün, AB Ünal, B Ergüner, N Pfeifer, O Kohlbacher

Links:

Bioinformatics 36,21 (2021)