You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now


Lessig ‘4Obama’ transcription
Tuesday February 05th 2008, 12:26 am
Filed under: Blogroll,chain-gang,metrics,popular demand

First things first. I’m no no-holds-barred Obaman like Larry Lessig.

Don’t get me wrong, I like Boyish Orator’s style, and give him a leg up over Her Royal Cleverness, but don’t stay up nights worrying about the future difference to world peace their differential election would make (other things keep me up, even in politics), and not because I don’t think peace a devastatingly important realm for immediate change.

At any rate, Lessig taped a Barackish paean, and Ball and Prime started simulscribing in gobby. Gobby sessions exert a gravitational pull on me and soon I was transcribing myself, to exercise day-cramped hands — though I would never have listened to the piece otherwise. You can read the result of our labours.

The promise of making a set of ideas more accessible and revisitable is an infinitely better reason to divest oneself of twenty minutes of life than amusement or boredom… Which makes me wonder why we don’t see dotsub everywhere, at least among the sj crowd of one. Maybe it just needs a gobby plugin, or a way to find two friends and start transcribing in tandem. I’m even feeling the itch to ride a tandem bike or sidecar. Ach. Time for a seaweed shower.



kaltura. video remixed, for all
Sunday December 09th 2007, 7:41 pm
Filed under: Glory, glory, glory,metrics

Kaltura.com does a dozen things right in one place; unusual for a modern creator/social-networking site, they focus heavily on creation.  Most unusually, they do all of this with video, the black sheep of the collaborative family : small clips, visualized; smooth remix process, with interface on the client and reasonably response time on the server without redrawing a whole screen; the best memes of history and authorship transparency realized with large-font rounded-corners elegance.

Now who is using it ? where are the transclusions for mediawiki instances?   I can’t wait to see the beta site develop.

Comments Off on kaltura. video remixed, for all


Global Voices : Some Statistics
Saturday December 16th 2006, 6:59 am
Filed under: metrics

Global Voices tracks stats in a number of ways : a stats site, with day-level data; Technorati numbers; and the results of a great survey.

Global Voices : Some Statistics …

Comments Off on Global Voices : Some Statistics


A good point.
Friday October 27th 2006, 2:27 pm
Filed under: metrics

YouTube and Libya compared for value, brought up by a thoughtful Italian blog on next-media and society.

A good point. …

Comments Off on A good point.


Foo 2.0 : deletion debate and resolution
Friday September 01st 2006, 6:25 pm
Filed under: metrics

The result of the Great Wikipedia “Foo 2.0” debate of August 2006 : See the enterprise social software page and the social computing discussion page.  Please contribute to the quality of those articles, still in sad shape and hardly a useful reference for any audience.

As always, it amazes me that so many people — homemakers, high school students, firemen — who simply care about the development of a reference work can be as sensitive to nuance and level-headed in academic discussions as academics (who have devoted much of their life to scholarly discourse).  It makes me at once proud and disappointed by our civilization; that all manner of subtleties can be picked up without special training; and that much capability is untapped through ignorance or denial of this.

But I’m ranting again, when I should be describing how to add constructively to WP.  Until then… find a hill to fly a kite this long weekend, be kind to your neighbors and good to your family, and don’t labor too long or hard.

Foo 2.0 : deletion debate and resolution …

Comments Off on Foo 2.0 : deletion debate and resolution


Hey, didja catch that BBC Focus article?
Monday April 17th 2006, 5:03 pm
Filed under: metrics

BBC Focus put out a micro-comparison of Wikipedia, Britannica Online, Encarta, and Infoplease, asking three experts to review one article apiece.  Suburbia describes it well

Reporters running a statistically insignificant comparison with other references, is becoming as popular as vandalizing Wikipedia, when it comes to coming up with a story to publish.
BBC Focus put out a micro-comparison of Wikipedia, Britannica Online, Encarta, and Infoplease, asking three experts to review one article apiece.  Suburbia describes it well

Reporters running a statistically insignificant comparison with other references, is becoming as popular as vandalizing Wikipedia, when it comes to coming up with a story to publish.

Hey, didja catch that BBC Focus article? …

Comments Off on Hey, didja catch that BBC Focus article?


“Fatally Flawed” — Internal Britannica Review Tackles Nature Methods
Thursday March 23rd 2006, 3:22 pm
Filed under: metrics

Below is a letter that Encyclopedia Britannica sent out today to some of its customers, in response to the December Nature article comparing the accuracy of articles in Wikipedia and Britannica.  A more detailed review of the Nature study, including responses to each alleged error and omission, is linked from the front page of www.eb.com; you can also see an HTML version of the review here (thanks to Ben Yates).

 

In one of its recent issues, the science journal Nature published an article
that claimed to compare the accuracy of the online Encyclopædia Britannica
with Wikipedia, the Internet database that allows anyone, regardless of
knowledge or qualifications, to write and edit articles on any subject.
Wikipedia had recently received attention for its alleged inaccuracies, but
Nature’s article claimed that Britannica’s science coverage was only slightly
more accurate than Wikipedia’s.

Arriving amid the revelations of vandalism and errors in Wikipedia, such a
finding was, not surprisingly, big news. Perhaps you even saw the story
yourself. It’s been reported around the world.

Those reports were wrong, however, because Nature’s research was invalid.
As our editors and scholarly advisers have discovered by reviewing the
research in depth, almost everything about the Nature’s investigation was
wrong and misleading. Dozens of inaccuracies attributed to the Britannica
were not inaccuracies at all, and a number of the articles Nature examined
were not even in the Encyclopædia Britannica. The study was so poorly
carried out and its findings so error-laden that it was completely without merit.

Since educators and librarians have been among Britannica’s closest
colleagues for many years, I would like to address you personally with an
explanation of our findings and tell you the truth about the Nature study.

Almost everything Nature did showed carelessness and indifference to basic
research standards. Their numerous errors and spurious procedures included
the following:

*    Rearranging, reediting, and excerpting Britannica articles. Several
of the “articles” Nature sent its outside reviewers were only sections of,
or excerpts from Britannica entries. Some were cut and pasted together
from more than one Britannica article. As a result, Britannica’s coverage
of certain subjects was represented in the study by texts that our editors
never created, approved or even saw.
*    Mistakenly identifying inaccuracies. The journal claimed to have
found dozens of inaccuracies in Britannica that didn’t exist.
*    Reviewing the wrong texts. They reviewed a number of texts that
were not even in the encyclopedia.
*    Failing to check facts. Nature falsely attributed inaccuracies to
Britannica based on statements from its reviewers that were
themselves inaccurate and which Nature’s editors failed to verify.
*    Misrepresenting its findings. Even according to Nature’s own
figures, (which grossly exaggerated the number of inaccuracies in
Britannica) Wikipedia had a third more inaccuracies than Britannica.
Yet the headline of the journal’s report concealed this fact and
implied something very different.

Britannica also made repeated attempts to obtain from Nature the original
data on which the study’s conclusions were based. We invited Nature’s
editors and management to meet with us to discuss our analysis, but they
declined.

The Nature study was thoroughly wrong and represented an unfair affront
to Britannica’s reputation.

Britannica practices the kind of sound scholarship and rigorous editorial
work that few organizations even attempt. This is vital in the age of the
Internet, when there is so much inappropriate material available. Today,
having sources like Britannica is more important than ever, with content
that is reliable, tailored to the age of the user, correlated to curriculum,
and safe for everyone.

Whatever may have prompted Nature to do such careless and sloppy
research, it’s now time for them to uphold their commitment to good
science and retract the study immediately. We have urged them strongly
to do so.

Nature responded with a polite declination.



1 Million What??
Saturday March 04th 2006, 10:52 am
Filed under: metrics

The original English Wikipedia turns 1 million this week.  Kudos to KG, who won the millionth-article pool… the two-millionth pool is now closed, but you can still place (gentleman’s) bets on when the eleventy-billionth article will be written.  (Full disclosure: My money’s on 2021.)

1 Million What?? …

Comments Off on 1 Million What??


New Hitwise Data (generated for WP)
Thursday March 02nd 2006, 10:12 pm
Filed under: metrics

New traffic data from Hitwise (.doc)
suggests that by their standards, Wikipedia is also in the top 20 orgs
with popular websites; though some, such as Yahoo, MSN, Google and
Myspace, have more than one site ahead of it.  Thanks to Hitwise  for sharing their results for the millionth article press release.

I hope that some of these leisure sites will start to integrate more
useful content with their portals, and not remain paeans to the id; it
is heartwarming to see useful content providers (such as pure search
engines, and news portals) near the tops of the list. 

Wikipedia fields 11% of education-related traffic, and 0.17%
of all traffic they measured, with Answers.com getting 1/3 of
that.  I asked for details on their methodology and sample size;
they claim 25 million users, but I don’t know their distribution,
geographically or otherwise.  They also show a pretty flat age
distribution from 18 through 44, and an even split along gender lines.



Pulses, Zeitgeists
Friday February 17th 2006, 10:30 pm
Filed under: metrics

Wikipulse is gone . But its spirit lives on.  Perhaps it can be revitalized on a New Machine.  We can rebuild it. The Six Million Dollar Analytic>

Pulses, Zeitgeists …

Comments Off on Pulses, Zeitgeists


17 lovers around the world rejoice
Monday January 30th 2006, 2:40 pm
Filed under: metrics

This week Wikipedia briefly broke into the top-17 list of most visited websites, as gauged by Alexa Toolbar users; snagging the attention of 3% of them that day.  Rock on…

In other news, if you want to find out more about Wikipedia and are in the Boston area, come to the upcoming presentation at Simmons on Feb. 13.

17 lovers around the world rejoice …

Comments Off on 17 lovers around the world rejoice


The Open Society : Myth or Catastrophic Novelty?
Saturday December 31st 2005, 4:06 am
Filed under: metrics

Earlier today, Jay-Zed pointed out the humor in juxtaposing fears of a Closed Web and resulting closed society, with the dramatic changes in openness, penetration, and reusability of information and tools over the past decade.  He posited that the existence of certain types of platforms
— for instance, inverted-hourglass networks and PC architectures —
was a specially enabling design decision, which was somewhat arbitrary
and potentially outmoded.  The implication was that without these
platforms, said dramatic changes would have been far less dramatic.

I also enjoy the juxtaposition of the recent explosive openness
with current fears about open channels of communication being closed
off; and do at times find myself laughing at over-pessimistic
statements about the world today.  On the other hand, I don’t
think that focusing on architectures, or on historical platform
choices, is very relevant to the changes we have seen.  A firmer
association can be found between penetration and reuse, and the
availability of ever-better toolchains and factories for mass
production.  

A methodical Gutenberg was not the unilateral harbinger of
the modern newspaper; that took many revolutions in pulp-processing and
printing-press design.  Today’s cheap, colorful paper production
is the result of tens of thousands of excellent, focused
innovations.  Likewise, ENIAC was not the harbinger of Ruby on
Rails (or any other modern library that allows someone with basic
programming skills to leverage 10 hours of familiarization into
a fully-customized and appealing application) — that took many
revolutions in software abstraction and philosophy…  nor were
DARPANet and IBM and Microsoft the natural father, mother, and holy concubine
of the modern “all-purpose computer”; this too was many scores of
years, and thousands of mathematical, engineering, and social
innovations in the making.

It is certainly charming that I can now find out what the Ohio
newspapers and tv stations are printing and showing, by looking online
or flipping through my satellite service.  But all the same, we hardly
live in the ‘most open’ environment our modern world has ever
known.  In many ways, we remain less open and networked than, say,
a cozy, classed Greek city-state, with a shared educational, social, and financial gossip network; shared religious, historical, and cultural anecdotes; and shared metrics
for success, civilization-wide goals, and honour; all far more intimate
than parallels in my country today.  Even the most all-telling of
tell-all [auto]biographies is diluted by this lack of openness.

Let us end on a positive note.  What further expansions in
openness may be expected or hoped for in the coming decades? 

  • An improvement in open sharing and classification of ideas,
    so that a good idea in one place is recognized and taken up in many
    others.  Great window-hinge, washing-machine, hobbyist and diaper
    designs should traverse the oceans; great experimental designs the
    fields; &c.
  • A new consciousness of making information public;
    people actively choosing every day to free and share their
    observations, discoveries, thoughts, and analyses — rather than only
    on special occasions.  This consciousness filtered out into
    processes, organizations, and governments.
  • A renaissance in the libraries of methods
    available to access information — one’s own, that of one’s family,
    that of one’s community and office, that of the world at large. 
    This is not dependent on a simple ‘application layer’ provided by a few
    organizations; any more than the question of “where can I find a copy of Anna Karenina” depends on the ‘layer’ of friends’ shelves, bookstores, libraries and online book-sellers I have access to.
  • … add your own!  good comments will be added to this list.
Comments Off on The Open Society : Myth or Catastrophic Novelty?


authority : an idea
Friday December 16th 2005, 2:35 pm
Filed under: metrics

Joho wrote a while ago about distributed authority, providing trusted
views
of Wikipedia content.  An excerpt from my reply follows; more relevant now that Wikipedia 0.5 takes form.

Distributed authority — in the ‘stamp and seal’
sense — is not my idea.  And what I would like to see happen with research groups has
been suggested by others before me; there is simply growing interest in
it now. I want to make it easy for people who already work on and
review content in a field to do so in a way that directly improves
Wikipedia.

At the moment, individual authors ‘adopt’ certain articles and try
to keep them fresh and free of errors. And various organizations
maintain their own internal knowledge-bases with content that overlaps
a good deal with relevant Wikipedia articles

Rather than trying to hack an authority system into MediaWiki, you
can do something simpler to encourage both of the above : have groups
that maintain their own small clusters of articles — 10 or 20 or 100
— on a local wiki, with its own portal page. Give them an easy way to
offer their work for merging with WP, without requiring them to all
join the site. The edits they make are implicitly ‘approved’ by them.

This is not a good verification method within WP, however
software changes are required for that (and Seth’s suggestion is one
specific path one might take). At the moment, Nature can link to
revisions of 100 articles that they approve. But once you follow a link
through to a Nature-edited revision of [[DNA]], and follow a link to
another WP article, you’ve already returned to the realm of public
editing.

The motivation for this is a few professors and talented writers who
began editing on WP, but commented that editing Wikipedia directly can
be offensive and off-putting (they are readily offended by trolling,
and have no patience for even trivial wiki-lawyering).

We’re making progress towards Wikipedia 1.0, slowly but surely; I
think along the way we will improve both the default view of content
and the selection of optional views suggested above.  Suggestions and improvements are welcome, as always.

authority : an idea …

Comments Off on authority : an idea


Community metrics: Size
Wednesday December 07th 2005, 1:16 pm
Filed under: metrics

I have seen many estimates of the size
of Wikipedia’s community; all of them too low.  And what surprises
me most of all is that noone cares much about the lack of real metrics
in their speech, their writing, their journalism, their research. 
Okay, that last is going a bit far; many researchers are very careful
about defining their metrics and terms.  But this is what makes
those which are not stick out so severely.

Here are some basic statistics, care of Erik Zachte’s scripts, the Wikimedia Foundation’s server farms, and over 100,000 active contributors over the past four years (user statistics often exclude the 15% of edits which come from editors without named accounts).

To the point of the user community: 

  • There are more than 15,000 active English-language editors, at least 1500 of them editing ‘very actively’ — 100 times a month.
  • There are 30,000 active editors, and 4,500 very active editors, in all languages combined.

Just to reiterate the casual power of thousands of zealous volunteers
with a variety of content-addictions, some of the scripted data above
has a hand-generated and hand-updated wiki cousin, with its own original additions.

As for where I personally draw the line at counting community size, I
would say the English Wikipedia has this year passed the
10,000-volunteer mark, and is currently around 20,000.  We would
know better if we counted not only edits but page-views
per
user… there are those who edit infrequently but keep up with all
aspects of the community; and also many who edit occasionally but
haven’t taken
time to learn the community policies or norms; which one might discount.

I would estimate 60,000 in the ‘copyediting’ community (active
readers, familiar with the interface, acting as typo and vandalism
monitors; and anonymous contributors), and ten times again as many
regular readers – around 500,000.  

For all languages combined : 40,000 volunteers, perhaps 120,000 in the
‘copyediting’ community (people in other langs are on average less
likely to understand that they can edit; which I would expect to grow more than linearly
with the size of the community and press coverage in that language),
and some 2M active readers.

Comments Off on Community metrics: Size


Good Samaritans : the strength of ten normal men?
Wednesday December 07th 2005, 12:31 pm
Filed under: metrics

There’s been some hubbub lately about the usefulness of anonymous
contributions to the information commons.  In particular, Monday
saw a somewhat ad-hoc test of the effect on forcing account-creation on
the quality of contributions to the* English Wikipedia.

I have some statistics of my own to add about that particular
experiment.  However, for the moment I would simply like to point
to a lovely Wikipedia contribution analysis, “Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the case of Wikipedia” (pdf) by researcher Denise Anthony, who presented it this past Monday at MIT.  Her research suggested to her that “the highest quality contributions come from the vast numbers of anonymous ‘Good Samaritans’ who contribute infrequently.”

http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf

* Note : the direct article is appropriate here because of the
“English” adjective before Wikipedia. For more detail, see my old reply
to JDL at Joho’s house.

Good Samaritans : the strength of ten normal men? …

Comments Off on Good Samaritans : the strength of ten normal men?


Download size exponentiation
Friday October 28th 2005, 2:16 am
Filed under: metrics

The size of downloads has been increasing at a record clip. 
Downloads have been growing in size since the inception of the
concept… today I direct your attention to Wikipedia and Wikimedia
downloads.  Unavailable as torrents, but rather only via http, the
full downloads at 2+GB each are unwieldy even for people who run
downloads over their broadband connections at night while they
sleep.  Is WP dump size growing faster than avergae pipe
throughput to homes and workplaces?  (yes)  What can be done
about this? 

How about… shipping hard drives to people who want them?  Guaranteed 5-day delivery; for a reasonable fee (perhaps $80 for a drive + shipping + overhead?)…

Download size exponentiation …

Comments Off on Download size exponentiation


ETC and user-centered tools
Thursday September 29th 2005, 12:57 pm
Filed under: metrics

Everyone seems to think that developing tools around people’s daily
lives, on cleverly-designed platforms, is the Answer to lots of things
– the next iPod/computer/phone, new PCs for people in China’s urban
households, etc.

It doesn’t sound terribly innovative to me; am I just a stick in the
mud?  How can anyone get excited about a PC-like platform when
there’s some real innovation being done for $100 PCs that torally
rethinks many layers in the development and distribution of
computing?  Not that I think the $100 PC is the be-all or end-all
of what target consumers really need…  I’m foolish enough to
think that most things that end-users really need doesn’t get developed
at all.  A completely silly suggestion, I know.

Comments Off on ETC and user-centered tools