Can artificial intelligence replace your company’s editor?

Language models can reliably rewrite and improve business documents, but only if users provide highly specific instructions to the machine. Without precise guidelines, artificial intelligence tools often introduce factual errors and awkward phrasing, showing that professional human editors remain necessary for workplace communication. These findings were published recently in the Journal of Writing Research.

The rapid adoption of generative artificial intelligence has sparked widespread anxiety in the writing and publishing industries. Many copywriters and translators worry that automated tools will eventually render their professions obsolete. Organizations increasingly turn to digital tools to draft business correspondence, marketing materials, and internal reports.

Previous experiments have shown that language models like ChatGPT can increase productivity and improve the grammar of basic writing assignments. However, writing on behalf of an organization is distinct from writing an expressive personal essay. Organizational texts function as collective outputs that represent a company’s identity and facilitate daily operations.

Producing these documents requires an understanding of workplace dynamics, technical regulations, and a company’s preferred tone. Often, corporate documents have multiple authors, which can result in inconsistent messaging. Companies frequently hire professional external editors to untangle these conflicting voices and simplify complicated legal or technical information for everyday readers.

Daniël Janssen, a researcher at Utrecht University in the Netherlands, wanted to know if a machine could replicate this specialized editorial intuition. Janssen and his colleagues, Henri Raven, Lisanne van Weelden, and Yohannes den Hertog, designed an experiment to compare the software against experienced human professionals. They sought to determine whether the software could independently apply the same level of nuance and audience awareness to everyday corporate documents.

The research team broke their experiment into two phases. In the first phase, they observed three professional editors who each possessed more than two decades of industry experience. The researchers gave the participants four distinct Dutch business letters and asked them to make the texts “good.” The original letters came from various organizations and dealt with topics such as maternity leave policies, sickness benefits, and scheduling.

The researchers recorded the editors’ computer screens as they worked. Immediately after the revisions were complete, the study authors interviewed the editors. They used a technique called stimulated recall, where the editors watched the screen recordings and explained what they were thinking as they typed. The editors consistently focused on improving the overall tone, replacing formal jargon with accessible language, and restructuring the letters so the most pressing information appeared at the top of the page.

In the second phase, the investigators asked ChatGPT to rewrite those exact same letters. They utilized three distinct prompts to see how different instructional strategies affected the machine’s output. The first instruction was intentionally simple, asking the software to make the text “reader-focused.”

The second prompt asked the software to rewrite the text to a “B1” language level. This instruction refers to the Common European Framework of Reference for Languages. A B1 rating represents an intermediate language proficiency, which is the standard reading level targeted by most mass-market communications. The third prompt was a specialized eight-step instruction designed to simulate the exact workflow the human editors had described during their interviews.

To evaluate the results, the researchers employed a specialized reading analysis software to check the readability of the Dutch texts. This digital tool measured syntax, semantic meaning, and the level of personal engagement in the writing. The investigators also conducted a qualitative review to check each draft for factual accuracy and appropriate phrasing.

The human editors substantially improved the readability of the original letters. They utilized shorter sentences, incorporated active verbs, and increased the use of personal pronouns such as “you” and “we.” The human revisions were also completely free of factual errors and preserved the legal intent of the organizational documents.

The performance of the artificial intelligence varied widely based on the instructions it received. When given the specific instruction to write at a B1 reading level, ChatGPT performed remarkably well. This version achieved readability scores that closely resembled the human editors’ work. The B1 prompt successfully shortened complex clauses and simplified the vocabulary without changing the original meaning.

Conversely, the simple instruction to make the text reader-focused yielded poor results. The software retained complex sentence structures and relied heavily on unfamiliar words. More problematically, this basic prompt caused the machine to invent false information.

For instance, in a letter discussing an employee’s maternity leave benefits and sick pay, the simple prompt generated a sentence congratulating the employer on the upcoming expansion of their team. This represented a fundamental misunderstanding of the workplace context. A baby is not joining the corporate team as a new employee, making the congratulatory phrase entirely inappropriate for a human resources document.

The complex eight-step process prompt also underperformed compared to the B1 prompt and the human editors. While it improved the visual layout of the letters, it introduced multiple factual errors regarding the payment of certain medical benefits. Feeding the machine too many distinct revision steps at once may have created opportunities for the software to lose track of the core message.

This experiment contains a few limitations. The research relied on a very small set of business letters. Rewriting requirements differ greatly depending on the type of document, such as a journalistic news release or a consumer instruction manual. The experimental outcomes for these brief administrative messages might not reflect how the system handles longer, more intricate reports.

The software also generated its responses in a single attempt. In an actual workplace setting, a user would likely refine their prompt, regenerate the text multiple times, or manually edit the machine’s initial draft. The study evaluated human and machine outputs in isolation, rather than testing how well humans and algorithms collaborate.

Future investigations will likely explore these collaborative workflows. The study authors suggest that the role of a professional writer is shifting. Rather than creating documents entirely from scratch, professionals will increasingly act as curators and directors of automated drafts.

This technological evolution requires a specialized skill known as prompt engineering, where writers learn to feed specific contextual cues to the machine. Assessing artificial prose requires the exact same competencies used to evaluate human writing, including rhetorical fit and source verification. Effective writing might soon depend just as much on the ability to supervise and correct text generation models as it does on traditional language proficiency.

The study, “Can ChatGPT do the same? ChatGPT and professional editors compared,” was authored by Daniël Janssen, Henri Raven, Lisanne van Weelden, and Yohannes den Hertog.

Leave a comment
Stay up to date
Register now to get updates on promotions and coupons
Optimized by Optimole

Shopping cart

×