ChatGPT can help with work tasks, but supervision is still necessary

Not great, but not bad, right?
Workers experiment with ChatGPT for tasks like writing emails, producing code, or even completing a year-end. The bot uses data from the internet, books and Wikipedia to produce conversational responses. But the technology is not perfect. Our tests found that it sometimes produces answers that potentially include plagiarism, contradict themselves, are factually incorrect or have grammatical errors, to name a few – all of which can be problematic at work.
ChatGPT is basically a predictive text system, similar to but better than the ones built into text messaging apps on your phone, says Jacob Andreas, an assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory who studies natural language processing. While it often gives answers that sound good, the content can have some problems, he said.
“If you look at some of these really long ChatGPT-generated essays, it’s very easy to see places where it contradicts itself,” he said. “When you ask it to generate code, it’s mostly right, but often it’s wrong.”
We wanted to know how well ChatGPT could handle daily office tasks. Here’s what we found after testing in five categories.
We asked ChatGPT to respond to several different types of incoming messages.
In most cases, the AI provided relatively appropriate answers, although most were wordy. For example, when he responded to a colleague on Slack who asked how my day was going, it was repetitive: “@[Colleague], Thanks for asking! My day is going well, thanks for the inquiry.”
The bot often bracketed phrases when it wasn’t sure what or who it was referring to. It also assumed details that were not included in the request, leading to some factually incorrect statements about my job.
In one case, it said it could not complete the task, saying it does not “have the ability to receive emails and respond to them.” But when prompted by a more generic request, it returned an answer.
Surprisingly, ChatGPT was able to generate sarcasm when asked to respond to a colleague who asked if Big Tech is doing a good job.
One way people use generative AI is to come up with new ideas. But experts warn that people should be careful if they use ChatGPT for this at work.
“We don’t understand the extent to which it’s just plagiarism,” Andreas said.
The possibility of plagiarism was apparent when we asked ChatGPT to develop story ideas on my beat. One pitch, in particular, was for a story idea and angle that I had already covered. While it’s unclear whether the chatbot pulled from my previous stories, others like it, or just generated an idea based on other data on the internet, the fact remained: The idea wasn’t new.
“It’s good to sound human, but the actual content and ideas tend to be familiar,” said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management who studies AI’s impact on work. “They are not new insights.”
Another idea was outdated, exploring a story that would be factually incorrect today. ChatGPT says they have “limited knowledge” of anything beyond the year 2021.
Providing more detail in the prompt led to more focused ideas. But when I asked ChatGPT to write some “quirky” or “funny” headlines, the results were atrocious and some nonsensical.
Navigating tough conversations
Have you ever had a colleague talk too loudly while you are trying to work? Maybe your boss hosts a lot of meetings, which cuts into your focus time?
We tested ChatGPT to see if it could help you navigate difficult work situations like these. For the most part, ChatGPT produced appropriate responses that could serve as good starting points for workers. However, they were often a bit wordy, formulaic and in one case a complete contradiction.
“These models don’t understand anything,” Rahman said. “The underlying technology looks at statistical correlations … so it’s going to give you formula answers.”
A layoff memo like the one it produced can easily stand up to, and in some cases do better than, notices companies have sent out in recent years. Unsolicited, the bot cited “the current economic climate and the impact of the pandemic” as reasons for the layoffs and communicated that the company understood “how difficult this news can be for everyone.” It suggested that laid-off workers would have support and resources and, as requested, motivated the team by saying they would “come out of this stronger.”
In dealing with tough conversations with colleagues, the bot greeted them, gently addressed the issue and softened the delivery by saying “I understand” the person’s intent and ended the note with a request for feedback or further discussion.
But in one instance, when he was asked to ask a colleague to lower his voice on phone calls, he completely misunderstood the call.
We also tested whether ChatGPT could generate team updates if we fed it key points that needed to be communicated.
Our initial tests once again produced appropriate responses, although they were formal and somewhat monotonous. However, when we specified a “tense” tone, the wording became more informal and included exclamation marks. But every note sounded very similar even after changing the prompt.
“It’s both the structure of the sentence, but more so the connection between the ideas,” Rahman said. “It’s very logical and formal … it’s like a high school essay.”
As before, it made assumptions when it lacked the necessary information. It became problematic when it didn’t know which pronouns to use for my colleague – a mistake that could signal to colleagues that either I didn’t write the note or that I don’t know my team members very well.
Writing self-evaluation reports at the end of the year can cause dread and anxiety for some, resulting in a review that sells itself short.
Feeding ChatGPT clear performance, including key data points, led to a glowing review for myself. The first attempt was problematic, as the first prompt asked for a self-assessment for “Danielle Abril” instead of “me”. This led to a third-person review that sounded like it came from Sesame Street’s Elmo.
Changing the request to request a review for “me” and “my” performance led to complimentary phrases such as “I consistently demonstrated a strong ability,” “I am always willing to go the extra mile,” “I have been an asset to the team,” and “I’m proud of the contributions I’ve made.” It also included a nod to the future: “I’m sure I’ll continue to make valuable contributions.”
Some of the highlights were a bit generic, but overall it was a brilliant review that could serve as a good rubric. Boten produced similar results when asked to write cover letters. However, ChatGPT had one major flub: It mistakenly assumed my job title.
So was ChatGPT useful for common work tasks?
It helped, but sometimes the errors caused more work than doing the task manually.
ChatGPT served as a good starting point in most cases, providing a useful verbiage and introductory ideas. But it also produced answers with errors, factually incorrect information, redundant words, plagiarism and miscommunication.
“I can see it being useful … but only to the extent that the user is willing to check the output,” Andreas said. “It’s not good enough to drop it off the rails and send emails to your colleagues.”