Showing posts with label spreadsheets. Show all posts
Showing posts with label spreadsheets. Show all posts

Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram


XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):


Here is the code for XLSXtoPDF.py:
# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()
And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

Thursday, November 21, 2013

Publish Microsoft Excel XLSX data to HTML with openpyxl


By Vasudev Ram

I had come across openpyxl, a library by Eric Gazoni, for reading and writing Microsoft Excel XLSX files (Open Office XML), a while ago.

So today I wrote a demo program that reads the data from an XLSX file using openpyxl and writes that data to HTML as a table. Here is a screenshot of the sample XLSX file used, fruits.xlsx (click image to enlarge):


Here is the program, XLSXtoHTMLdemo.py:
# XLSXtoHTMLdemo.py

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram - http://www.dancingbison.com

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
<html>
    <head>
        <title>
        XLSX to HTML demo
        <title>
    <head>
    <body>
        <h3>
        XLSX to HTML demo
        <h3>
    <table>
"""

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>
    for cell in row:
        if cell.value is None:
            html_data += "<td> + ' ' + "<td>
        else:
            html_data += "<td> + str(cell.value) + "<td>
    html_data += "<tr>
html_data += "<table>lt;body>lt;html>

with open("fruits.html", "w") as html_fil:
    html_fil.write(html_data)

# EOF

You can run the program with:
python XLSXtoHTMLdemo.py
Then the program's HTML output will be in the file fruits.html, a screenshot of which is below (click to enlarge):


- Enjoy.

- Vasudev Ram - Python, C, Linux, databases, open source - training and consulting.

Read all Python posts on my blog.




O'Reilly 50% Ebook Deal of the Day

Friday, March 8, 2013

Python TableFu aims to become an ORM for spreadsheets

By Vasudev Ram

Python TableFu is an interesting tool I saw recently. It is a Python library that lets you import tabular data from CSV files and then manipulate them in memory, by calling its methods.

From the site:

[ Python TableFu is a tool for manipulating spreadsheet-like tables in Python. It began as a Python implementation of ProPublica's TableFu, though new methods have been added. TableFu allows filtering, faceting and manipulating of data. Going forward, the project aims to create something akin to an ORM for spreadsheets. ]

A CSV file you specify, is used as input to create a TableFu object, and the rows are available as a list, which can be indexed to get a specific row. The columns (of each row) are available as a dictionary, so you can say table[column_name], e.g. table["Author] to get a specific column. The model seems to make sense, since table rows usually don't have specific names, they are treated as "row number so-and-so", while columns do have names - the header of the column, if it is a CSV file, or the column name of a database table, using RDBMS terminology.

Python TableFu allows filtering, "faceting" and manipulation of the table data.

Here is some example Python TableFu code, from the site; I've deleted some bits to keep it short:

>>> from table_fu import TableFu
>>> table = TableFu.from_file('tests/test.csv')
>>> table.columns
['Author', 'Best Book', 'Number of Pages', 'Style']

# get all authors
>>> table.values('Author')
['Samuel Beckett', 'James Joyce', 'Nicholson Baker', 'Vladimir Sorokin']

# total a column
>>> table.total('Number of Pages')
1177.0

# filtering a table returns a new instance
>>> t2 = table.filter(Style='Modernism')
>>> list(t2)
[Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: James Joyce, Ulysses, 644, Modernism]


# each TableFu instance acts like a list of rows
>>> table[0]


list(table.rows)
[Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: James Joyce, Ulysses, 644, Modernism,
 Row: Nicholson Baker, Mezannine, 150, Minimalism,
 Row: Vladimir Sorokin, The Queue, 263, Satire]

# rows, in turn, act like dictionaries
>>> row = table[1]
>>> print row['Author']
James Joyce

# transpose a table
>>> t2 = table.transpose()
>>> list(t2)
[Row: Best Book, Malone Muert, Ulysses, Mezannine, The Queue,
 Row: Number of Pages, 120, 644, 150, 263,
 Row: Style, Modernism, Modernism, Minimalism, Satire]

>>> t2.columns
['Author',
 'Samuel Beckett',
 'James Joyce',
 'Nicholson Baker',
 'Vladimir Sorokin']

# sort rows
>>> table.sort('Author')
>>> table.rows
[Row: James Joyce, Ulysses, 644, Modernism,
 Row: Nicholson Baker, Mezannine, 150, Minimalism,
 Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: Vladimir Sorokin, The Queue, 263, Satire]

(I deleted the angle brackets from the original output (that demarcates Rows) because it was messing up the HTML formatting, due to being interpreted as HTML elements.)

Python TableFu looks like it could be useful to manipulate CSV data in memory, before writing it out to another file or sending it to another process (or function in the same program) for further processing.

- Vasudev Ram - Dancing Bison Enterprises



Tuesday, August 28, 2012

openpyxl library to read/write Excel 2007 xlsx/xlsm files

By Vasudev Ram


openpyxl - A Python library to read/write Excel 2007 xlsx/xlsm files

Seems to have a straightforward API.

The openpyxl site has a tutorial and a cookbook.

- Vasudev Ram - Dancing Bison Enterprises


Friday, August 10, 2012

3 ways to use Python in Excel

By Vasudev Ram


DataNitro seems powerful. It can be used interactively in Excel, or by writing scripts, including defining functions.

Here is an example of interactive use, from their web site:
>>> from ystockquote import get_price
>>> Cell('A1').value= get_price('GOOG')
>>> Cell('A1').value
642.35
Though they don't mention it on the home page, I think this example embedded formula's result probably will be updated when the spreadsheet is recalculated. Otherwise it would not be too useful.

The ystockquote module used in the example is by Corey Goldberg, a Python developer. It allows you to get stock quotes from Yahoo!. Corey's blog has good Python information and examples. He has also written many open source software tools, in Python and other languages.

Pyvot is a Python tool for Visual Studio. Excerpts:

[ Pyvot enables easy transit of data between Python and Excel.

For example, we can move a list of Python values to Excel, view and manipulate the data, and retrieve the new version. In a symmetric and equally common usage, we can grab and process Excel values in Python, and display the result to Excel.
...
It requires CPython 2.6 or 2.7 with the Python for Windows extensions (pywin32) installed, and Office 2010. ]

Pyvot uses Microsoft COM (Component Object Model) to communicate with Excel. One issue many people (including me) have found with COM, is that it is a) somewhat buggy, and b) is resource-heavy. At least, that was the case when I used it in a project a while ago.

pyxll (Python Excel Addins) makes it possible to write addins for Microsoft Excel in Python. Excerpts:

[ PyXLL makes it possible to write addins for Microsoft Excel in Python. Using simple decorators your Python code can instantly be exposed to Excel as worksheet functions, menu items or macros.
...
Excel addins written using PyXLL are fast to develop and offer high performance as well as being easier to maintain and deploy than other methods of developing Excel addins.
...
PyXLL is used by investment banks, hedge funds and engineering companies all over the world. ]

Finally, if all you need to do is to read the contents of Excel worksheets programmatically, and then process the contents in some way, you may want to check out my xtopdf toolkit, which provides some minimal support for that. This article by me on the Packt Publishing site, shows how to use xtopdf to read the basic contents of Excel files; it can read numbers and text only; it does not support reading font information, formatting, colors, etc. To achieve this, xtopdf uses the xlrd library for Python, so that is a prerequisite.

The steps for setting up xtopdf for reading Excel files, is given at the end of that article, in the section named "4. Conclusion." Note: I had used what was probably an earlier version of xlrd to develop this feature, since I did it some time ago, but after taking a look at the PyPI page for xlrd (linked above) just now, it looks as though things should still work.

An interesting point about xlrd is that is written in pure Python, so you don't need to be on Windows to use it. You can use it to process Excel files on any other platform (such as Linux or UNIX) that supports Python, which may be more convenient for some needs, since Linux and UNIX have powerful software development tools. One obvious way to leverage this is to read the Excel content using xlrd and convert it to tab-delimited values, then process it with AWK.

- Vasudev Ram - Dancing Bison Enterprises