Text Embeddings Reveal (Virtually) as Much as Text

Text Embeddings Reveal (Almost) as Much as Text

Keywords:textual content retrieval, embeddings, inversion, privateness

TL;DR:We suggest Vec2Text, a technique that may recuperate 90% of 32-token embedded inputs precisely

Abstract:How a lot non-public data do textual content embeddings reveal concerning the authentic textual content? We examine the issue of embedding textit{inversion}, reconstructing the total textual content represented in dense textual content embeddings. We body the issue as managed era: producing textual content that, when reembedded, is near a set level in latent area. We discover that though a naive mannequin conditioned on the embedding performs poorly, a multi-step technique that iteratively corrects and re-embeds textual content is ready to recuperate 92% of 32-token textual content inputs precisely. We practice our mannequin to decode textual content embeddings from two state-of-the-art embedding fashions, and likewise present that our mannequin can recuperate vital private data (full names) from a dataset of scientific notes.

…. to be continued
Read the Original Article
Copyright for syndicated content material belongs to the linked Source : Hacker News – https://openreview.net/forum?id=wK7wUdiM5g0

Exit mobile version