You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now

Edit by Edit: an Article Feedback Tool gets firmly tested
Saturday February 02nd 2013, 11:49 pm
Filed under: %a la mod,chain-gang,wikipedia

One of the Wikipedia projects that has been developing slowly over the past two years is the Article Feedback Tool. In its first incarnation, it let readers rate articles with a star system (1 to 5 stars for each of the areas of being Well-Sourced, Complete, Neutral, and Readable).

The latest version of the tool, version 5, shifts the focus of the person giving feedback to leaving a comment, and noting whether or not they found what they were looking for. After some interation and tweaking, including an additional abuse filter for comments, it has recently been turned on for 10% of the articles on the English Wikipedia.

This is generating roughly 1 comment per minute; or 10/min if it were running on all articles. In comparison, the project gets around 1 edit per second overall. So if turned on for 100% of articles, it would add 15-20% to the editing activity on the site. This is clearly a powerful channel for input, for readers who have something to share but aren’t drawn in by the current ‘edit’ tabs.

What is the community’s response? Largely critical so far. The primary criticism is that the ease of commenting encourages short, casual/random/non-useful comments; and that it tends to be one-way communication [because there’s no obvious place to find responses? this isn’t necessarily so; replies could auto-generate a notice on the talk page of the related IP]. Many specific suggestions and rebuttals of the initial implementation have been made, some heard more than others. The implementation was overall not quite sensitive to the implications for curation and followthrough.

A roadmap that included a timeframe for expanding the tool from 10% to 100% of articles was posted, without a community discussion; so a Request for Comments was started by an interested community member (rather than by the designers). This started in mid-January, and currently has a plurality of respondents asking to turn the tool off until it has addressed some of the outstanding issues.

The impression of the developers, here as with some other large organically-developing feature rollouts, was not that they had gotten thorough and firm testing, but that editors were fighting over every detail, making communication about what works and why hard. Likewise there has been a shortage of good facilitators to take in all varieties of feedback and generate an orderly summary and practical solutions.

So how did things go wrong? Pete gets to the heart of it in his comment, where he asks for a clearer presentation of the project hopes and goals, measures of success, and a framework for community engagement, feedback, and approval:

I think it’s a mere mistake, but it does get frustrating because WMF has made this same mistake in other big technical projects…

What I’m looking for is the kind of basic framework that would encompass possible objections, and establish a useful way of communicating about them…

WMF managed that really well with the Strategic Planning process, and with the TOU rewrite. The organization knows how to do it. I believe if it had been done in this case, things would look very different right now…

It is our technical projects that are most likely to stumble at that stage – sometimes for many months – despite putting significant energy into communication.

Can we do something about it now? Like most of the commenters on the RfC, including those opposing the current implementation, I see a great deal of potential good in this tool, while also seeing why it frustrates many active editors. It seems close to something that could be rolled out with success to the contentment of commenters and long-time editors alike; but perhaps not through the current process of defining and discussing features / feedback / testing (which begs for confrontational challenge/response discussions that are draining, time-consuming, and avoid actually resolving the issues raised!).

I’ll write more about this over the coming week.

To be honest, AFT always seemed to come across (particularly in the beginning) as being a project that people were trying to convince the community that they wanted, instead of the other way around. However I say that without really being a Wikipedian, or involved in AFT, so I don’t really know.

While there are certainly cases where tech projects can validly be of the form “convince users it is needed”, that is a dangerous road and one has to be careful when embarking on it.

Comment by bawolff 02.04.13 @ 12:06 pm

I’m looking forward to your next thoughts on this.

Comment by Sumana Harihareswara 02.05.13 @ 1:35 pm

I don’t know the answer to these questions. But it’s hard for this not to be one of my favorites so far:

Comment by phoebe 02.05.13 @ 9:35 pm

I do think we could have gotten a more realistic community assessment sooner in the process, with a much simpler version of AFT5. It’s hard to imagine that the outcome would have been different. (For the record, there’s still a trial underway in German Wikipedia, and one planned for French Wikipedia, both initiated by the communities of those projects.)

Fundamentally, a lot of users find the idea that there’s a new flood of low quality contributions they somehow have to deal with very discouraging. It might have helped to completely hide the feedback from logged out users, but I doubt it — it’s very hard to find as it is. Still, the idea that a nasty comment is left unmonitored somewhere in Wikipedia troubles the collective sense of hygiene.

Comment by Erik Moeller 02.05.13 @ 11:00 pm

I’m not sure on what time frame you are thinking about ‘outcome’ here.

I see people commenting that they think English Wikipedians “don’t like” or “won’t allow” AFT-like comments, but that doesn’t seem correct to me – those aren’t questions being considered by the community. Recentchanges — and the entire history of wiki contributions — involve various floods of contribution that somehow have to be dealt with [eventually].

Some long-term outcomes that seem reasonable to me:

At one level every page should have one-click feedback: annotation and commentary should be as easy as possible.

At another level, commentors should have instant feedback on their comment: they should be able to: see what it will look like in context…
edit it… follow further discussion of it… see the comments of others, cluster related feedback. They should have some sort of template. (“Add a new section” prompts for title and subject. AFT5 prompts with “e.g., This article needs a picture.”) All of these things help improve the quality of feedback, by example and expectation. Some of these things we want to do better for Talk pages generally; but it’s interesting to see that even simple things like “have an edit button” were dropped for the first AF5 version.

Similarly, everything should be easy to find in a single stream or feed of updates. That’s what wikis themselves are good for: unstructured and aggregated work. Talk pages copy that strength for discussion – and it allows them to be useful despite their many obvious weaknesses. Having a way to edit your input, and a way to comment on it, and a way to follow others’ comments, in the same place: another simple wiki trait. All of these things are /almost, but not quite/ possible with AF5 today.

At another level the spectrum of visibility shouldn’t go from “not enabled by software” to “visible to the public and permanently archived”. At least two layers of quarantine is sensible – content that any logged-in user can see [say, O(5M) live accounts], and that the submitter can see, but that others do not by default. And content that is hidden, and can only seen by reviewers [say, O(100,000) active editors]. Beyond this is deleted content that only admins can access [say, O(1,000) admins and stewards].

It is interesting that we have to continually revisit and argue these ideas. Some community members make the same arguments against making editing easy that wiki-detractors made ten years ago. And we have yet to figure out good balance/defaults for visibility issues, even for articles and images themselves.

Comment by metasj 02.06.13 @ 4:33 am

Hey Erik and or all, did development on AFT2-4 (stars) switch pretty much to AFT5? I am curious because it seems like there are things that could make AFT2-4 more helpful (like seeing a little chart of feedback over time) that never seem to have gotten implemented.

Comment by phoebe 02.06.13 @ 12:32 pm

I think longer term, this has the potential to weed out a lot of ‘spun’ articles out of say Google’s search results. Have you thought about say this application? Having users rate the quality of articles in Google’s search results?

I agree; it would be nice to see Google take that seriously. They tried this briefly, but it never got much attention or traction. MediaWiki could store a similar cache of searches and their most popular results, which would be a step in the right direction. -sj

Comment by Caroline 02.16.13 @ 2:55 am

Bad Behavior has blocked 220 access attempts in the last 7 days.