Robustness Analysis of Maryland Watermark to Paraphrasing Attacks Under Advanced Techniques
Main Article Content
Abstract
Abstract—Developing trustworthy techniques to recognize and validate machine-generated material is necessary due to the proliferation of large language models (LLMs) and the possibility of abuse. The Maryland Watermark proposed is a notable technique that embeds identifiable signatures into text generated by LLMs. This study investigates the robustness of the Maryland Watermark against paraphrasing-based evasion strategies in AI-generated text. With growing concerns over detecting machine-generated content, watermarking methods like Maryland, which subtly alter token selection probabilities, are critical for content attribution. Using the Mistral-7B-Instruct-v0.2 model and prompts from the DAIGT dataset, 1,000 documents (500 watermarked) were generated and subjected to three types of attacks: paragraph-based paraphrasing using a Seq2Seq model trained on kPar3, sentence-level paraphrasing using a T5-based ChatGPT Paraphraser, and word-level synonym substitution using a POS-aware WordNet approach. Evaluation metrics included watermark detectability (z-score, TPR, FPR), semantic similarity, and text quality (perplexity). Results show that paragraph-based paraphrasing yielded the lowest perplexity (19.53) while degrading semantic similarity most significantly, followed by sentence-based paraphrasing (perplexity 24.89). Recursive paraphrasing reduced watermark detection initially but showed recovery in detection accuracy in subsequent iterations. Word replacement attacks achieved high TPRs (95.78% for noun substitution and 39.76% for 25% token replacement), indicating their ineffectiveness. Overall, the Maryland Watermark remains robust against word-level modifications but is moderately vulnerable to advanced paraphrasing that alters semantic integrity.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an Open Access article distributed under the terms of the Attribution-Noncommercial 4.0 International License [CC BY-NC 4.0], which requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.