{"id":41,"date":"2009-05-29T15:50:51","date_gmt":"2009-05-29T19:50:51","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/pamphlet\/?p=41"},"modified":"2009-06-02T11:38:25","modified_gmt":"2009-06-02T15:38:25","slug":"what-percentage-of-open-access-journals-charge-publication-fees","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2009\/05\/29\/what-percentage-of-open-access-journals-charge-publication-fees\/","title":{"rendered":"What percentage of open-access journals charge publication fees?"},"content":{"rendered":"<p>In the popular conception, open-access journals generate revenue by charging publication fees. The popular conception turns out to be false. Various studies have explored the extent to which OA journals charge publication fees. The results have been counterintuitive to many, indicating that far fewer OA journals charge publication fees than one might have thought. You can verify this yourself using some software I provide in this post.<\/p>\n<p><!--more-->The first study of what we&#8217;ll call the &#8220;publication-fee percentage&#8221;, by <a href=\"http:\/\/www.alpsp.org\/ngen_public\/article.asp?id=200&amp;did=47&amp;aid=270&amp;st=&amp;oaid=-1\">Kaufman and Wills<\/a>, showed that fewer than half of the OA journals they looked at charge publication fees. The figure for publication-fee percentage they report is about <strong>47%<\/strong>. (For convenience, we put all publication-fee percentages in boldface in this post.) Following on from this, <a href=\"http:\/\/www.earlham.edu\/~peters\/fos\/newsletter\/11-02-07.htm#list\">Suber and Sutton<\/a> provided a figure of <strong>16.7%<\/strong> for scholarly society journals charging publication fees.<\/p>\n<p>Bill Hooker came up with a clever way of calculating a figure for publication fee percentage, by taking advantage of the publication fee metadata hidden in the &#8220;for authors&#8221; journal listings at the <a href=\"http:\/\/www.doaj.org\/\">Directory of Open Access Journals<\/a> to <a href=\"http:\/\/www.sennoma.net\/main\/archives\/2007\/12\/if_it_wont_sink_in_maybe_we_ca.php\">calculate the figure as of December 2007<\/a>.\u00a0 Here are his totals:<\/p>\n<blockquote>\n<table border=\"0\">\n<tbody>\n<tr>\n<td>Charges<\/td>\n<td align=\"right\">534<\/td>\n<td align=\"right\">(<strong>18%<\/strong>)<\/td>\n<\/tr>\n<tr>\n<td>No charges<\/td>\n<td align=\"right\">1980<\/td>\n<td align=\"right\">(67%)<\/td>\n<\/tr>\n<tr>\n<td>Information missing<\/td>\n<td align=\"right\">453<\/td>\n<td align=\"right\">(15%)<\/td>\n<\/tr>\n<tr>\n<td>Total (excl. hybrids)<\/td>\n<td align=\"right\">2967<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/blockquote>\n<p>Depending on the disposition of the &#8220;information missing&#8221; cases, Hooker&#8217;s study indicates that <strong>18-33%<\/strong> of OA journals charge fees.<\/p>\n<p>Hooker performed his study using a combination of automated and manual methods. In particular, he apparently used manual effort to eliminate the hybrid journal listings. But it isn&#8217;t difficult to write software to perform the entire analysis automatically, which allows anyone to replicate the results him- or herself. Unfortunately, the OAI-PMH feed that DOAJ kindly provides doesn&#8217;t include the crucial information of whether journals charge fees and whether they are pure or hybrid OA journals, so I, like Hooker, resorted to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Screen_scraping\">screen-scraping<\/a>. The method is effective, if inelegant.<\/p>\n<p>Here are the results computed by my software, as of May 26, 2009:<\/p>\n<blockquote>\n<table border=\"0\">\n<tbody>\n<tr>\n<td>Charges<\/td>\n<td align=\"right\">951<\/td>\n<td align=\"right\">(<strong>23.14%<\/strong>)<\/td>\n<\/tr>\n<tr>\n<td>No charges<\/td>\n<td align=\"right\">2889<\/td>\n<td align=\"right\">(70.29%)<\/td>\n<\/tr>\n<tr>\n<td>Information missing<\/td>\n<td align=\"right\">270<\/td>\n<td align=\"right\">(6.57%)<\/td>\n<\/tr>\n<tr>\n<td>Hybrid<\/td>\n<td align=\"right\">1519<\/td>\n<td align=\"right\">(26.99%)<\/td>\n<\/tr>\n<tr>\n<td>Total<\/td>\n<td align=\"right\">5629<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/blockquote>\n<p>The numbers are consistent with those of Hooker&#8217;s study some 16 months earlier. You&#8217;ll see that the total number of full OA journals is up from 2967 to 4110, and the number with missing information has been halved from 15% to about 7%. The reduction in those with missing information seems to have gone more to those with fees than those without, so that the percentage charging fees is up some 5% and those not charging fees only up 3%. Again, depending on the &#8220;information missing&#8221; cases, the range of fee-charging journals is <strong>23-30%<\/strong>. Assuming that the missing information cases are similar in distribution to those that were resolved over the last year, the figure would be about <strong>27%<\/strong>. That leaves 73% of OA journals, the overwhelming bulk, charging no fees.<\/p>\n<p>Anyone interested in replicating the results should feel free to use the simple Python script below, provided without warranty.<\/p>\n<hr \/>\n<pre>#!\/usr\/bin\/python\r\n\r\n'''\r\nCalculate the percentage of open access journals with different\r\npublication fee policies using data from the Directory of Open Access\r\nJournals (doaj.org)\r\n\r\nStuart M. Shieber\r\nMarch 26, 2009\r\n'''\r\n\r\nfrom urllib import urlretrieve\r\nimport os\r\nimport re\r\nfrom collections import defaultdict\r\n\r\nfeecount = defaultdict(int)\r\nhybridcount = 0\r\njournalcount = 0\r\n\r\ndef processpage(file):\r\n\u00a0\u00a0\u00a0 '''Process a file of article listings from the DOAJ \"Authors\"\r\n\u00a0\u00a0\u00a0 listing of articles, which includes publication fee information to\r\n\u00a0\u00a0\u00a0 extract journal entries and update running counts'''\r\n\r\n\u00a0\u00a0\u00a0 global hybridcount, journalcount, feecount\r\n\r\n\u00a0\u00a0\u00a0 # Get the contents of the file\r\n\u00a0\u00a0\u00a0 f = open(file, 'r')\r\n\u00a0\u00a0\u00a0 contents = f.read()\r\n\u00a0\u00a0\u00a0 f.close\r\n\r\n\u00a0\u00a0\u00a0 # Clean up the file by removing some header stuff\r\n\u00a0\u00a0\u00a0 pat = re.compile(\"^.*End Result.*&lt;p \/&gt;&lt;br \/&gt;\", re.DOTALL)\r\n\u00a0\u00a0\u00a0 contents = re.sub(pat, \"\", contents)\r\n\u00a0\u00a0\u00a0 # Get rid of newlines to make pattern matching easier\r\n\u00a0\u00a0\u00a0 contents = re.sub('n', '|||', contents)\r\n\u00a0\u00a0\u00a0 # Place each article entry on a separate line by keying off of the\r\n\u00a0\u00a0\u00a0 # serendipitous use of \"passMe\" at the start of each entry\r\n\u00a0\u00a0\u00a0 contents = re.sub('passMe', 'npassMe', contents)\r\n\r\n\u00a0\u00a0\u00a0 # Match each article record, getting title, hybrid status, fee\r\n\u00a0\u00a0\u00a0 # info\r\n\u00a0\u00a0\u00a0 pat = re.compile(\"passMe[^&gt;]*&gt;([^&lt;]*)&lt;\/a&gt;.*class=info&gt;([^&lt;]*)&lt;\/span&gt;.*Publication fee.*&gt;(.*)&lt;\/font&gt;\")\r\n\u00a0\u00a0\u00a0 for match in pat.finditer(contents):\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 journalcount += 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 title = match.group(1)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 accesstype = match.group(2)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 feetype = match.group(3)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # Print an entry for a csv file\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 print \"\"%s\", \"%s\", \"%s\"\" % (title, accesstype, feetype)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # Bump counts\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if accesstype == 'Open Access':\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 feecount[feetype] +=1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else:\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 hybridcount += 1\r\n\r\n### Download all of the pages at DOAJ, caching locally, and process\r\n### each one\r\nfor letter in \"ABCDEFGHIJKLMNOPQRSTUVWXYZ\":\r\n\u00a0\u00a0\u00a0 for page in range(1,8):\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # Generate source and destination locations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 url = \"http:\/\/www.doaj.org\/doaj?func=byTitle&amp;p=%d&amp;hybrid=1&amp;query=%s\" % (page, letter)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local = \"\/tmp\/%s%d\" % (letter, page)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # Pull over the page if not cached\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if not os.path.exists(local):\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 print \"retrieving \" + url\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 urlretrieve(url, local)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # and process it\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 processpage(local)\r\n\r\n### Print a table of results\r\nfor fee in feecount.keys():\r\n\u00a0\u00a0\u00a0 print \"%-20s : %5d (%5.4f)\" % (fee, feecount[fee],\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 feecount[fee]\/float(journalcount-hybridcount))\r\nprint \"%-20s : %5d (%5.4f)\" % ('Hybrid', hybridcount, hybridcount\/float(journalcount))\r\nprint \"%-20s : %5d\" % ('TOTAL', journalcount)<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In the popular conception, open-access journals generate revenue by charging publication fees. The popular conception turns out to be false. Various studies have explored the extent to which OA journals charge publication fees. The results have been counterintuitive to many, indicating that far fewer OA journals charge publication fees than one might have thought. You [&hellip;]<\/p>\n","protected":false},"author":2110,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[618,68],"tags":[4],"class_list":["post-41","post","type-post","status-publish","format-standard","hentry","category-open-access","category-scholarly-communication","tag-code"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5pLfN-F","jetpack-related-posts":[{"id":1000,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2011\/11\/16\/how-should-funding-agencies-pay-open-access-fees\/","url_meta":{"origin":41,"position":0},"title":"How should funding agencies pay open-access fees?","author":"Stuart Shieber","date":"Wednesday, November 16, 2011","format":false,"excerpt":"\u201c...a drop in the bucket.\u201dDrop I (2007) by Delox - Martin De\u00e1k via flickr. Used by permission (CC by-nc-nd) At the recent Berlin 9 conference, there was much talk about the role of funding agencies in open-access publication, both through funding-agency-operated journals like the new eLife journal\u00a0and through direct reimbursement\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":498,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2010\/07\/31\/will-open-access-publication-fees-grow-out-of-control\/","url_meta":{"origin":41,"position":1},"title":"Will open-access publication fees grow out of control?","author":"Stuart Shieber","date":"Saturday, July 31, 2010","format":false,"excerpt":"I recently had a conversation with someone (I'll call him D) whose opinion I greatly respect, a staunch supporter of broadening access to the scholarly literature, who expressed a view I was quite surprised about. D is of the opinion that the publication fee business model for open access journals\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":327,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2009\/10\/16\/is-open-access-publishing-a-vanity-publishing-industry\/","url_meta":{"origin":41,"position":2},"title":"Is open-access journal publishing a vanity publishing industry?","author":"Stuart Shieber","date":"Friday, October 16, 2009","format":false,"excerpt":"Pride does not wish to owe and vanity does not wish to pay. \u2014Francois De La Rochefoucauld Open-access journal publishing has been criticized on a whole range of grounds as being unsustainable, unfair, or ineffective.\u00a0 Perhaps the starkest criticism is that open-access journals amount to a vanity publishing industry, and\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"vanitypress","src":"https:\/\/i0.wp.com\/blogs.law.harvard.edu\/pamphlet\/files\/2009\/10\/vanitypress1-300x225.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":532,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2010\/08\/06\/how-much-does-a-cope-compliant-open-access-fund-cost\/","url_meta":{"origin":41,"position":3},"title":"How much does a COPE-compliant open-access fund cost?","author":"Stuart Shieber","date":"Friday, August 6, 2010","format":false,"excerpt":"Tightrope walker, sculpture, Berlin, 2008. Photo from beezerella at flickr.com. Used by permission. The short answer? \u00a0Almost nothing. The Compact for Open-Access Publishing Equity is a statement of commitment to \"the\u00a0timely establishment of\u00a0durable mechanisms for\u00a0underwriting reasonable publication charges for articles written by its\u00a0faculty and published in\u00a0fee-based open-access journals and\u00a0for which\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"Tightrope walker, sculpture, Berlin, 2008. Photo from beezerella at flickr.com. Used by permission.","src":"https:\/\/i0.wp.com\/farm5.static.flickr.com\/4053\/4608050847_ea6934502c_m.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":382,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2014\/03\/28\/a-true-transitional-open-access-business-model\/","url_meta":{"origin":41,"position":4},"title":"A true transitional open-access business model","author":"Stuart Shieber","date":"Friday, March 28, 2014","format":false,"excerpt":"\u2026provide a transition path\u2026 \"The Temple of Transition, Burning Man 2011\" photo by flickr user Michael Holden, used by permission David Willetts, the UK Minister for Universities and Research, has written a letter to Janet Finch responding to her committee\u2019s \u201cA Review of Progress in Implementing the Recommendations of the\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1811,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2013\/10\/15\/lessons-from-the-faux-journal-investigation\/","url_meta":{"origin":41,"position":5},"title":"Lessons from the faux journal investigation","author":"Stuart Shieber","date":"Tuesday, October 15, 2013","format":false,"excerpt":"\u2026what\u00a0419 scams\u00a0are to banking\u2026 \u201cscams upon scammers\u201d image by flickr user Daniel Mogford used by permission. Investigative science journalist John Bohannon[1] has a news piece in Science earlier this month about the scourge of faux open-access journals. I call them faux journals (rather than predatory journals), since they are not\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/41","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/users\/2110"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/comments?post=41"}],"version-history":[{"count":30,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/41\/revisions"}],"predecessor-version":[{"id":149,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/41\/revisions\/149"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/media?parent=41"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/categories?post=41"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/tags?post=41"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}