Neural network editing: algorithms and applications

Date
2024
DOI
Authors
Fu, Feisi
Version
OA Version
Citation
Abstract
Deep neural networks have demonstrated impressive performance in a wide variety of applications. However, deep neural networks are not perfect. In many cases, additional adjustments, which we call neural network editing, are essential for various objectives. In this thesis, we present three novel methodologies for neural network editing: A novel methodology for repairing neural networks. Unlike existing methods that rely on modifying the weights of a neural network which can induce a global change in the function space, our approach applies only a localized change in the function space while still guaranteeing the removal of the buggy behavior. By leveraging the piecewise linear nature of ReLU networks, our approach can efficiently construct a patch network tailored to the linear region where the buggy input resides, which when combined with the original network, provably corrects the behavior on the buggy input. A new approach for repairing pretrained neural networks to satisfy global robustness and individual fairness properties. We prove that any counterexample to a global robustness property must exhibit a corresponding large gradient. For ReLU networks, this result allows us to efficiently identify the linear regions that violate a given global robustness property. By formulating and solving a suitable robust convex optimization problem, our approach then computes a minimal weight change that will provably repair these violating linear regions. A novel approach for neural network ownership verification based on the notion of latent watermarks. Our approach to neural network ownership verification is to decouple a network's normal operation from its responses to watermarked inputs during ownership verification. The key idea is to train the network such that the watermarks remain dormant unless the owner's secret key is applied to activate it. The secret key is realized as a specific perturbation only known to the owner to the network's parameters. We show that our approach offers a strong defense against backdoor detection, backdoor removal, and surrogate model attacks. Finally, we summarize the proposed methods and discuss future directions and challenges for neural network editing. This includes exploring the difficulties and challenges of editing neural networks for the latest machine learning models, including large language models.
Description
License