Privacy in Text Processing: An information flow perspective
The problem of text document obfuscation is to provide an automated mechanism which is able to make accessible the content of a text document without revealing the identity of its writer. This is more challenging than it seems, because an adversary equipped with powerful machine learning mechanisms is able to identify authorship (with good accuracy) where, for example, the name of the author has been redacted. Current obfuscation methods are ad hoc and have been shown to provide weak protection against such adversaries. Differential privacy, which is able to provide strong guarantees of privacy in some domains, has been thought not to be applicable to text processing.
In this talk we will review obfuscation as a quantitative information flow problem and explain how generalized differential privacy can be applied to this problem to provide strong anonymization guarantees in a standard model for text processing.