Publications

Accelerating Privacy-Preserving Medical Record Linkage: A Three-Party MPC Approach

This work addresses the critical challenge of securely linking medical records across institutions while preserving patient privacy. Integrating data from various sources in healthcare can provide comprehensive insights into disease progression and treatment outcomes. However, strict privacy regulations often limit the ability to share this data. To tackle this, we propose a three-party Multi-Party Computation (MPC) Record Linkage method that ensures sensitive data remains confidential throughout the linkage process. By eliminating the need for Bloom filters and using bigram-based string similarity computations in a three-party framework, the method not only enhances privacy but also achieves an impressive up to 14x faster performance compared to state-of-the-art solutions. This scalable approach offers a practical solution for large-scale, privacy-preserving healthcare data integration.

ŞS Mağara, Noah Dietrich, AB Ünal, M Akgün

Links:

arXiv:2410.21605

Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data

We introduce FREDA (FedeRatEd Domain Adaptation), a novel framework for privacy-preserving federated unsupervised domain adaptation in high-dimensional data, with a focus on age prediction from DNA methylation across various tissues. Unlike centralized methods that require combined access to source and target domain data, FREDA ensures data privacy through federated learning, secure aggregation, and randomized encoding. Our evaluations demonstrate that FREDA achieves performance comparable to centralized methods, even in distributed environments, while effectively addressing the distribution shift problem across different domains and preserving the privacy of participants’ local data.

CA Baykara, AB Ünal, N Pfeifer, M Akgün

Links:

MIRACUM-DIFUTURE Symposium 2024

PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies

We present a novel algorithm PP-GWAS designed to improve upon existing standards in terms of computational efficiency and scalability without sacrificing data privacy. This algorithm employs randomized encoding within a distributed architecture to perform stacked ridge regression on a Linear Mixed Model to ensure rigorous analysis. Experimental evaluation with real world and synthetic data indicates that PP-GWAS can achieve computational speeds twice as fast as similar state-of-the-art algorithms while using lesser computational resources, all while adhering to a robust security model that caters to an all-but-one semi-honest adversary setting.

A Swaminathan, A Hannemann, AB Ünal, N Pfeifer, M Akgün

Links:

arXiv:2410.08122

Distributed and Secure Kernel-Based Quantum Machine Learning

While advancements in secure quantum machine learning are notable, the development of secure and distributed quantum analogues of kernel-based machine learning techniques remains underexplored. In this work, we present a novel approach for securely computing common kernels, including polynomial, radial basis function (RBF), and Laplacian kernels, when data is distributed, using quantum feature maps.

A Swaminathan, M Akgün

Links:

arXiv:2408.10265

Dynamic k-anonymity for Electronic Health Records: A Topological Framework

With the rapid digitization of Electronic Health Records (EHRs), fast and adaptive data anonymization methods have become crucial. While topological data analysis (TDA) tools have been proposed to anonymize static datasets—creating multiple generalizations for different anonymization needs from a single computation—their application to dynamic datasets remains unexplored. Our work adapts existing methodologies to dynamic settings by developing an improved version of weighted persistence barcodes that track higher-dimensional holes in data, allowing real-time editing of persistence information.

A Swaminathan, M Akgün

Links:

DPM 2024

A privacy-preserving approach for cloud-based protein fold recognition

The complexity and cost of training machine learning models have made cloud-based machine learning as a service (MLaaS) attractive for businesses and researchers. MLaaS eliminates the need for in-house expertise by providing pre-built models and infrastructure. However, it raises data privacy and model security concerns, especially in medical fields like protein fold recognition. We propose a secure three-party computation-based MLaaS solution for privacy-preserving protein fold recognition, protecting both sequence and model privacy.

AB Ünal, N Pfeifer, M Akgün

Links:

Cell Patterns 101023 (2024)

GitHub Repository

Private, Efficient and Scalable Kernel Learning for Medical Image Analysis

Addressing the need for efficient privacy- preserving methods on distributed image data, we introduce OKRA (Or- thonormal K-fRAmes), a novel randomized encoding-based approach for kernel-based machine learning. This technique, tailored for widely used kernel functions, significantly enhances scalability and speed compared to current state-of-the-art solutions. Through experiments conducted on various clinical image datasets, we evaluated model quality, computa- tional performance, and resource overhead. Additionally, our method outperforms comparable approaches.

A Hannemann, A Swaminathan, AB Ünal, M Akgün

Links:

CIBB (2024)

FHAUC: Privacy Preserving AUC Calculation for Federated Learning using Fully Homomorphic Encryption

Current research on federated learning primarily focuses on preserving privacy during the training phase. However, model evaluation has not been adequately addressed, despite the potential for significant privacy leaks during this phase as well. In this paper, we demonstrate that the state-of-the-art AUC computation method for federated learning systems, which utilizes differential privacy, still leaks sensitive information about the test data while also requiring a trusted central entity to perform the computations.

CA Baykara, AB Ünal, M Akgün

Links:

arxiv:2403.14428 (2024)

Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

Leveraging the power of robust representation learning, our novel framework advances the field of privacy-preserving machine learning (ppML). Traditional ppML techniques either compromise speed or model performance to safeguard data privacy. Our solution employs multi-objective-trained autoencoders to optimize the balance between data utility and privacy. By sharing only the encoded data form, we enable secure utilization of third-party services for intensive model training and hyperparameter tuning. Our empirical validation, across both unimodal and multimodal settings makes data sharing both efficient and confidential.

S Ouaari, AB Ünal, M Akgün, N Pfeifer

Links:

arxiv:2309.04427 (2023)

A Privacy-Preserving Federated Learning Approach for Kernel methods

We’ve introduced FLAKE, a privacy-preserving Federated Learning Approach for Kernel methods on horizontally distributed data. By allowing data sources to mask their data, a centralized instance can generate a Gram matrix, preserving privacy while enabling the training of kernel-based algorithms like Support Vector Machines. We’ve established that FLAKE safeguards against semi-honest adversaries learning the input data or the number of features. Testing on clinical and synthetic data confirms FLAKE’s superior accuracy and efficiency over similar methods. Its data masking and Gram matrix computation times are significantly less than SVM training times, making it highly applicable across various use cases.

A Hannemann, AB Ünal, A Swaminathan, E Buchmann, M Akgün

Links:

IEEE TPS p82-90 (2023)

CECILIA: Comprehensive Secure Machine Learning Framework

We propose a secure 3-party computation framework, CECILIA, offering PP building blocks to enable complex operations privately. In addition to the adapted and common operations like addition and multiplication, it offers multiplexer, most significant bit and modulus conversion. The first two are novel in terms of methodology and the last one is novel in terms of both functionality and methodology. CECILIA also has two complex novel methods, which are the exact exponential of a public base raised to the power of a secret value and the inverse square root of a secret Gram matrix.

AB Ünal, N Pfeifer, M Akgün

Links:

arXiv:2202.03023 (2022)

Efficient privacy-preserving whole-genome variant queries

Disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. We propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner.

M Akgün, N Pfeifer, O Kohlbacher

Links:

Bioinformatics 38,8 (2022)

Escaped: Efficient secure and private dot product framework for kernel-based machine learning algorithms with applications in healthcare

We introduce ESCAPED, which stands for Efficient SeCure And PrivatE Dot product framework. ESCAPED enables the computation of the dot product of vectors from multiple sources on a third-party, which later trains kernel-based machine learning algorithms, while neither sacrificing privacy nor adding noise.

AB Ünal, M Akgün, N Pfeifer

Links:

AAAI 35, 11 (2021)

ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves with Secure 3-Party Computation

In this paper, we propose an MPC-based framework, called PPAURORA with private merging of sorted lists and novel methods for comparing two secret-shared values, selecting between two secret-shared values, converting the modulus, and performing division to compute the exact AUC as one could obtain on the pooled original test samples.

AB Ünal, N Pfeifer, M Akgün

Links:

arXiv:2102.08788 (2021)

Identifying disease-causing mutations with privacy protection

We present an approach to identify disease-associated variants and genes while ensuring patient privacy. The proposed method uses secure multi-party computation to find disease-causing mutations under specific inheritance models without sacrificing the privacy of individuals. It discloses only variants or genes obtained as a result of the analysis. Thus, the vast majority of patient data can be kept private.

M Akgün, AB Ünal, B Ergüner, N Pfeifer, O Kohlbacher

Links:

Bioinformatics 36,21 (2021)