{"id":218,"date":"2014-06-16T13:16:40","date_gmt":"2014-06-16T13:16:40","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/rprasad\/?p=218"},"modified":"2017-05-24T17:46:00","modified_gmt":"2017-05-24T17:46:00","slug":"reading-excel-with-python-xlrd","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/rprasad\/2014\/06\/16\/reading-excel-with-python-xlrd\/","title":{"rendered":"Reading Excel with Python (xlrd)"},"content":{"rendered":"<p>Every 6-8 months, when I need to use the python <a href=\"http:\/\/www.python-excel.org\/\">xlrd library<\/a>, I end up re-finding this page:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.youlikeprogramming.com\/2012\/03\/examples-reading-excel-xls-documents-using-pythons-xlrd\/\">Examples Reading Excel (.xls) Documents Using Python&#8217;s xlrd<\/a><\/li>\n<\/ul>\n<p>In this case, I&#8217;ve finally bookmarked it:)<\/p>\n<pre><code class=\"python\">from __future__ import print_function\nfrom os.path import join, dirname, abspath\nimport xlrd\n\nfname = join(dirname(dirname(abspath(__file__))), 'test_data', 'Cad Data Mar 2014.xlsx')\n\n# Open the workbook\nxl_workbook = xlrd.open_workbook(fname)\n\n# List sheet names, and pull a sheet by name\n#\nsheet_names = xl_workbook.sheet_names()\nprint('Sheet Names', sheet_names)\n\nxl_sheet = xl_workbook.sheet_by_name(sheet_names[0])\n\n# Or grab the first sheet by index \n#  (sheets are zero-indexed)\n#\nxl_sheet = xl_workbook.sheet_by_index(0)\nprint ('Sheet name: %s' % xl_sheet.name)\n\n# Pull the first row by index\n#  (rows\/columns are also zero-indexed)\n#\nrow = xl_sheet.row(0)  # 1st row\n\n# Print 1st row values and types\n#\nfrom xlrd.sheet import ctype_text   \n\nprint('(Column #) type:value')\nfor idx, cell_obj in enumerate(row):\n    cell_type_str = ctype_text.get(cell_obj.ctype, 'unknown type')\n    print('(%s) %s %s' % (idx, cell_type_str, cell_obj.value))\n\n# Print all values, iterating through rows and columns\n#\nnum_cols = xl_sheet.ncols   # Number of columns\nfor row_idx in range(0, xl_sheet.nrows):    # Iterate through rows\n    print ('-'*40)\n    print ('Row: %s' % row_idx)   # Print row number\n    for col_idx in range(0, num_cols):  # Iterate through columns\n        cell_obj = xl_sheet.cell(row_idx, col_idx)  # Get cell object by row, col\n        print ('Column: [%s] cell_obj: [%s]' % (col_idx, cell_obj))\n\n<\/code><\/pre>\n<p>Interact and pull data from a selected column.  (This could be done with 1\/5 of the code in pandas, etc.)<br \/>\n<a href=\"http:\/\/blogs.law.harvard.edu\/rprasad\/files\/2014\/06\/bin_test_\u2014_python_\u2014_148\u00d744.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blogs.law.harvard.edu\/rprasad\/files\/2014\/06\/bin_test_\u2014_python_\u2014_148\u00d744.png\" alt=\"bin_test_\u2014_python_\u2014_148\u00d744\" width=\"582\" height=\"613\" class=\"size-full wp-image-239\" srcset=\"https:\/\/archive.blogs.harvard.edu\/rprasad\/files\/2014\/06\/bin_test_\u2014_python_\u2014_148\u00d744.png 582w, https:\/\/archive.blogs.harvard.edu\/rprasad\/files\/2014\/06\/bin_test_\u2014_python_\u2014_148\u00d744-284x300.png 284w\" sizes=\"auto, (max-width: 582px) 100vw, 582px\" \/><\/a><\/p>\n<p>Code example<\/p>\n<pre><code class=\"python\">from __future__ import print_function\nfrom os.path import join, dirname, abspath, isfile\nfrom collections import Counter\nimport xlrd\nfrom xlrd.sheet import ctype_text   \n\n\ndef get_excel_sheet_object(fname, idx=0):\n    if not isfile(fname):\n        print ('File doesn't exist: ', fname)\n\n    # Open the workbook and 1st sheet\n    xl_workbook = xlrd.open_workbook(fname)\n    xl_sheet = xl_workbook.sheet_by_index(0)\n    print (40 * '-' + 'nRetrieved worksheet: %s' % xl_sheet.name)\n\n    return xl_sheet\n\ndef show_column_names(xl_sheet):\n    row = xl_sheet.row(0)  # 1st row\n    print(60*'-' + 'n(Column #) value [type]n' + 60*'-')\n    for idx, cell_obj in enumerate(row):\n        cell_type_str = ctype_text.get(cell_obj.ctype, 'unknown type')\n        print('(%s) %s [%s]' % (idx, cell_obj.value, cell_type_str, ))\n\ndef get_column_stats(xl_sheet, col_idx):\n    \"\"\"\n    :param xl_sheet:  Sheet object from Excel Workbook, extracted using xlrd\n    :param col_idx: zero-indexed int indicating a column in the Excel workbook\n    \"\"\"\n    if xl_sheet is None:\n        print ('xl_sheet is None')\n        return\n\n    if not col_idx.isdigit():\n        print ('Please enter a valid column number (0-%d)' % (xl_sheet.ncols-1))\n        return\n\n    col_idx = int(col_idx)\n    if col_idx &lt; 0 or col_idx &gt;= xl_sheet.ncols:\n        print ('Please enter a valid column number (0-%d)' % (xl_sheet.ncols-1))\n        return \n\n    # Iterate through rows, and print out the column values\n    row_vals = []\n    for row_idx in range(0, xl_sheet.nrows):\n        cell_obj = xl_sheet.cell(row_idx, col_idx)\n        cell_type_str = ctype_text.get(cell_obj.ctype, 'unknown type')\n        print ('(row %s) %s (type:%s)' % (row_idx, cell_obj.value, cell_type_str))\n        row_vals.append(cell_obj.value)\n\n    # Retrieve non-empty rows\n    nonempty_row_vals = [x for x in row_vals if x]    \n    num_rows_missing_vals = xl_sheet.nrows - len(nonempty_row_vals)\n    print ('Vals: %d; Rows Missing Vals: %d' % (len(nonempty_row_vals), num_rows_missing_vals))\n\n    # Count occurrences of values \n    counts = Counter(nonempty_row_vals)\n\n    # Display value counts\n    print ('-'*40 + 'n', 'Top Twenty Values', 'n' + '-'*40 )\n    print ('Value [count]')\n    for val, cnt in counts.most_common(20):\n        print ('%s [%s]' % (val, cnt))\n\ndef column_picker(xl_sheet):\n    try:\n        input = raw_input\n    except NameError:\n        pass\n\n    while True:\n        show_column_names(xl_sheet)\n        col_idx = input(\"nPlease enter a column number between 0 and %d (or 'x' to Exit): \" % (xl_sheet.ncols-1))\n        if col_idx == 'x': \n            break\n        get_column_stats(xl_sheet, col_idx)\n\n\nif __name__=='__main__':\n    excel_crime_data = join(dirname(dirname(abspath(__file__))), 'test_data', 'Cad Data Mar 2014.xlsx')\n    xl_sheet = get_excel_sheet_object(excel_crime_data)\n    column_picker(xl_sheet)\n\n\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Every 6-8 months, when I need to use the python xlrd library, I end up re-finding this page: Examples Reading Excel (.xls) Documents Using Python&#8217;s xlrd In this case, I&#8217;ve finally bookmarked it:) from __future__ import print_function from os.path import join, dirname, abspath import xlrd fname = join(dirname(dirname(abspath(__file__))), &#8216;test_data&#8217;, &#8216;Cad Data Mar 2014.xlsx&#8217;) # Open &hellip; <a href=\"https:\/\/archive.blogs.harvard.edu\/rprasad\/2014\/06\/16\/reading-excel-with-python-xlrd\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Reading Excel with Python (xlrd)<\/span><\/a><\/p>\n","protected":false},"author":3875,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-218","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4JC3p-3w","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/posts\/218","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/users\/3875"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/comments?post=218"}],"version-history":[{"count":8,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/posts\/218\/revisions"}],"predecessor-version":[{"id":301,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/posts\/218\/revisions\/301"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/media?parent=218"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/categories?post=218"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/rprasad\/wp-json\/wp\/v2\/tags?post=218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}