The artificial intelligence boom is here, with the world becoming increasingly more and more reliant on AIs such as ChatGPT. This led to an interesting question in the newsroom; can AI edit to AP style better than a human?
Now, AI editing is not an idea I would actually take seriously, mainly due to the environmental issues with generative AI. Regardless of the results of this experiment, I will not be using AI to help me with editing.
Regardless, the idea is entertaining, and so, I pulled up an old copy-editing test from none other than the UIL’s journalism program. Used for competition purposes, and with a scoring guide, I think it will work well to judge this. Here is the link to the test, so you can follow along with the blank version.
Let us begin.
Part one of the test consists of 15 multiple-choice questions. One option is the correct answer for AP style, and the others are incorrect.
Here are my answers:
The governor had been photographed lying on a beach chair.
The Metropolitan Museum of Art acquired the 14th-century edition of the Bible.
Every one of the Confederate statues was removed.
The team that ended the “Curse of the Billy Goat” by winning the World Series is in Illinois.
Obama pardoned Chelsea Manning, born Edward on Dec. 17, 1987.
United Airlines took a public relations hit when police dragged David Dao off one of its planes.
Shortly after 1 A.M., Trump acknowledged the Senate’s approval of the tax bill in a tweet.
The election of the first female president would have been a / an historic event.
Recently, a 12-year-old inventor, Shubham Banerjee, started his own company.
The television station broadcast the weather warning yesterday.
The homeless woman had 5 cents in her pocket
The nurse placed a Band-Aid on the student’s knee
He miscalculated the effect of his actions
America’s strategy toward North Korea is failing.
The U.S. Capitol is located in Washington
Here are ChatGPT’s answers:
The governor had been photographed lying on a beach chair.
The Metropolitan Museum of Art acquired the 14th-century edition of the Bible.
Every one of the Confederate statues was removed.
The team that ended the “Curse of the Billy Goat” by winning the World Series is in Illinois.
Obama pardoned Chelsea Manning, born Edward on Dec. 17, 1987.
United Airlines took a public relations hit when police dragged David Dao off one of its planes.
Shortly after 1 a.m., Trump acknowledged the Senate’s approval of the tax bill in a tweet.
The election of the first female president would have been a historic event.
Recently, a 12-year-old inventor, Shubham Banerjee, started his own company.
The television station broadcast the weather warning yesterday.
The homeless woman had 5 cents in her pocket.
The nurse placed a Band-Aid on the student’s knee.
He miscalculated the effect of his actions.
America’s strategy toward North Korea is failing.
The U.S. Capitol is located in Washington.
ChatGPT got all 15 correct, while I got 13 out of 15, so it is the clear winner here.
Part two consists of short paragraphs with four errors each, although thanks to the AP stylebook being updated since the release of this test, the first question only has three errors (the word percentage originally had to be written out, but it is now acceptable to use %.)
Here are ChatGPT’s answers:
1. The reach of fake news was wide but shallow, the study found. One in four Americans saw at least one false story. On 289 websites, about 80% of bogus articles supported President Donald Trump.
2. North Korean dictator Kim Jong Un taunted the U.S. in his New Year’s Day address, saying newly perfected nuclear capabilities mean he can launch a strike at any time. “The nuclear button is always on my desk,” Kim said.
3. Retail cannabis shops in California opened for the first time, inaugurating what proponents say will become the world’s largest market for legalized recreational marijuana. One customer left with more than $1,000 worth of products in a large grocery bag.
4. From kickoff, with the air a pleasant 71 degrees, to Mayfield’s touchdown reception with six seconds left in the first half, Oklahoma built a 31-14 lead.
5. Danny Petersen, a sophomore, said the principal had to get permission from the superintendent to let the choir perform the risqué song.
Here are my answers:
1. The reach of fake news was wide, the study found, yet also shallow. One in four Americans saw at least one false story. On 289 websites, about 80% of bogus articles supported U.S. President Donald Trump.
2. North Korean dictator Kim Jong Un taunted the U.S. in his New Years’ Day address, saying newly perfected nuclear capabilities mean he can launch a nuclear strike at anytime. “The nuclear button is always on the desk of my office,” Jong Un Kim said.
3. Retail cannabis shops in California opened for the first time, inaugurating what proponents say will become the world’s largest market for legalized recreational marijuana. One customer left with more than one thousand dollars’ worth in a large grocery bag.
4. From kick-off, with the hazy air a pleasant 71˚F, to Mayfield’s touchdown reception with six seconds left in the first half, Oklahoma ran out to a 31-14 lead.
5. Danny Petersen, a sophomore, said the principal had to get permission from the superintendent to let the choir perform the riské song.
Both ChatGPT and I scored 3/3 on the first one. On the second, we both got 3/4 (Kim Jong Un being called a dictator is opinionated, so you can’t say that), The rest, ChatGPT gets all correct, while I consistently miss one of the four per question.
The final part is an actual story to edit, for any grammar or wordiness issues as well as commenting on reporting and structuring errors.
This is how I edited it:
WACO, Texas (AP) — Fit 4 A Cause, an organization with the purpose of bringing familys[families] together through fitness activitei[ie]s and helping non-profit organizations in Waco[,] will host a Valentine[‘]s Day, New ‘Fit’ You Bash on from 6:30 – 7:15 p.m. on Wednesday, Feb 14 at the Woodway Family Center.
This free family fitness event will feature Zumba with Lizette Luna, Raychel Mynarcik and Ben Rosas;[,] Tabata with Van Davis; and yoga with Michelle Williams. This special [HOW IS IT SPECIAL?], free workout is appropriate for all fitness levels. [ACCORDING TO WHO?]
Participants are asked to bring their yoga mats, water to drink, and jars of peanut butter to be donated to Caritas [WHAT/WHO IS CARITAS? WHY PEANUT BUTTER?]] to the Monday event.
“I[‘]m really looking forward to this bash,” organizer Ann Arnold, founder of Fit 4 A Cause [event organizer and founder of Fit 4 A Cause Ann Arnold] said. “We’re expecting about 50 people to attend. I know every single one of them will have a great time dancing and meditating. It’s a great way to spend the evening.”
[WHERE IS THE QUOTE TRANSITION???]
“My wife is making me go,” Jon Krasnof, a member of the Chamber of Commerce Board of Directors [one member of the Chamber of Commerce Board of Directors, Jon Krasnof,] said. “Still[,] it[‘]s better than sitting at home watching other people have fun on tv[TV] and falling
a][sleep before the local news comes on[.] At least this is something productive.” [THIS QUOTE HAS A SLIGHT NEGATIVE TONE, IS THIS REALLY THE BEST QUOTE TO USE?]
This is ChatGPT’s notes on editing it. I could not get it to edit the story the same way I did, as it would either rewrite it or give me notes, so I took the notes.
General Edits
Title Clarity: Consider revising the event’s title for readability and consistency. The current title (“Valentines Day, New ‘Fit’ You Bash”) could be simplified to something like “Valentine’s Day Fitness Bash” to make it clearer.
Wordiness:
“An organization with the purpose of bringing families together through fitness activities and helping nonprofit organizations in Waco” can be streamlined to: “an organization that promotes family fitness and supports Waco nonprofits.”
“Participants are asked to bring their yoga mats, water to drink, and jars of peanut butter to be donated to Caritas to the Monday event” is wordy and unclear. Simplify to: “Participants should bring yoga mats, water, and jars of peanut butter to donate to Caritas.”
Date Format:“Wednesday, Feb 14” should be “Wednesday, Feb. 14,” per AP style.
Event Time: “6:30 – 7:15 p.m.” should be written as “6:30 to 7:15 p.m.”Names and Titles: Ensure all names are accurate and correctly spelled (e.g., Lizette Luna, Raychel Mynarcik, Ben Rosas, Van Davis, and Michelle Williams). Verify roles if possible.
Quotes:
“Im really looking forward to this bash…”
Change “Im” to “I’m.”
Add a comma after “Fit 4 A Cause.”
“My wife is making me go…”
Add a comma after “Board of Directors.”
Break into shorter sentences for readability: “Still, it’s better than sitting at home watching other people have fun on TV. Falling asleep before the local news comes on is no fun. At least this is something productive.”
Spelling and Grammar:“Familys” → “families”
“Activiteis” → “activities”
“Non-profit” → “nonprofit”
“Valentines Day” → “Valentine’s Day”
“tv” → “TV”
“a sleep” → “asleep”
Fitness Activity Names:Capitalization should be consistent: Zumba (brand name), tabata (common noun, lowercase), and yoga (common noun, lowercase).
Clarification on Peanut Butter Donations: The sentence implies donations are for a “Monday event,” but the event is on Wednesday. Confirm whether “Monday” is a mistake or refers to a different event.Event Details: Confirm that the event is free, open to all fitness levels, and expected attendance is 50 people. Add “expected” or “anticipated” if the number is an estimate.
Suggestions for Additional Research or Reporting
Verify spelling of all participant and organizer names.
Confirm whether the donations are for a separate event on Monday or being collected at this Wednesday event.
Double-check if the event description accurately reflects what Fit 4 A Cause intends to promote.
Personally, I think ChatGPT did better than me when it came to things like wordiness or grammar. Even if I’m an absolute nerd when it comes to things like this, I don’t think I can beat an AI that can instantly read something like the AP stylebook and retain so much more than I ever could.
However, there are things in ChatGPT’s notes that make me incredibly hesitant to say ChatGPT did better.
Title Clarity: Consider revising the event’s title for readability and consistency. The current title (“Valentines Day, New ‘Fit’ You Bash”) could be simplified to something like “Valentine’s Day Fitness Bash” to make it clearer.
We’re not the ones in charge of the event, we can’t just change the name. Sure, the event name is clunky, but that’s not a change we can make, it’s up to the event organizers.
Event Details: Confirm that the event is free, open to all fitness levels, and expected attendance is 50 people. Add “expected” or “anticipated” if the number is an estimate.
We already know it’s an estimate. In the quote, it says:
“Im really looking forward to this bash,” organizer Ann Arnold, founder of Fit 4 A Cause said. “We’re expecting about 50 people to attend…
This suggestion from ChatGPT uses the same exact wording that was used in the quote, so it most likely had read it and got confused, which seems to happen decently often with ChatGPT. I did some research to try to see what the issue might have been. In the OpenAI forums, one thread talked about how ChatGPT struggled to read inputs that were very large, especially ones bigger than mine.
“I’ll post a prompt of, for example, 2,000 words, but the AI can only summarize the first 827 words,” wrote user dented.r33. “If I ask it about anything past that, it doesn’t know the answer. It says there’s no hard limit to how much it can read at once.”
After some conversation, a different user brought something up that I found interesting.
“Do you know what an AI hallucination is?” wrote user EricGT, linking to a Wikipedia article about AI hallucinations. “If not then you have now seen one. ChatGPT created that to give you an answer. Notice that it did not provide any references. ChatGPT is not like a search engine such as Google that links to other pages for its results.”
This lead to me doing some more research, and it appears that hallucinations are a common problem amongst AI chatbots. I found a study done to assess their abilities with assessing scientific research.
“Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards,” the study wrote. “Papers were considered ‘hallucinated’ if any 2 of the following information were wrong: title, first author, or year of publication. Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard (P<.001).”
The model I used for my experiment, GPT-4, had an almost 30% hallucination rate according to this study. However, I’ve also found rates of 1.5%, 3%, and 15-20%, depending on what methods you use. Regardless, the fact that other studies have found somewhat high rates make me not very confident in ChatGPT.
This leads me into the biggest issue I found with ChatGPT’s suggestions in part three.
Break into shorter sentences for readability: “Still, it’s better than sitting at home watching other people have fun on TV. Falling asleep before the local news comes on is no fun. At least this is something productive.”
I noticed something with the way it wanted me to edit this quote. In the original, the person we are quoting said this:
“My wife is making me go,” Jon Krasnof, a member of the Chamber of Commerce Board of Directors said. “Still its better than sitting at home watching other people have fun on tv and falling a sleep before the local news comes on at least this is something productive.”
ChatGPT added information that was never said in Krasnof’s quote. Krasnof never said that he didn’t find falling asleep on the couch watching TV fun! You can imply that, sure, but you’re putting words in his mouth. I don’t think I can really stress how much you can’t do that. This may just be a minor thing, but this does not give me confidence that AIs like ChatGPT wouldn’t edit quotes further, and change important context.
I think that ChatGPT is better than me in some aspects. It looks like it can catch errors in AP style much more consistently than I can. However, I do not trust it with actual editing, considering its tendency to make things up, alongside the fact that it is not opposed to adding unconfirmed information to quotes. While others may disagree, in this current state, AI should not have a place in editing.