Question posted 2015 · +10 upvotes
Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create programmatically must be something I basically create from scratch, and then other Excel files (with the aforementioned sophistication) have to refer to that file (which creates a variety of other dependency issues).
My understanding of Excel file ‘tabs’ is that they’re actually just a collection of XML files. Well, is it possible to use pandas (or one of the underlying read/write engines such as xlsxwriter or openpyxl to modify just one of the tabs, leaving other tabs (with more wicked stuff in there) intact?
EDIT: I’ll try to further articulate the problem with an example.
- Excel Sheet test.xlsx has four tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4
- I read Sheet3 into a DataFrame (let’s call it df) using pandas.read_excel()
- Sheet1 and Sheet2 contain formulas, graphs, and various formatting that neither openpyxl nor xlrd can successfully parse, and Sheet4 contains other data. I don’t want to touch those tabs at all.
- Sheet2 actually has some references to cells on Sheet3
- I make some edits to df and now want to write it back to sheet3, leaving the other sheets untouched (and the references to it from other worksheets in the workbook intact)
Can I do that and, if so, how?
Accepted answer +5 upvotes
I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.
Thanks to some pointers by stackoverflow community, I found a package called xlwings that seems to cover a lot of the functionalities HaPsantran required.
To use the OP’s example:
Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:
# opened an existing excel file
wb = Workbook(Existing_file)
# Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell
df = Range(Anchor).table.value
# import pandas and manipulate the data block
df = pd.DataFrame(df) # into Pandas DataFrame <br>
df['sum'] = df.sum(axis= 1)
# write back to Sheet3
Range(Anchor).value = df.values
tested that this implementation didn’t temper existing formula in the excel file
Let me know if this solves your problem and if there’s anything I can help.
Big kudos to the developer of xlwings, they made this possible.
3 code variants in this answer
- Variant 1 — 1 lines, starts with
# opened an existing excel file - Variant 2 — 1 lines, starts with
# Find in the excel file a named cell and reach the boundar… - Variant 3 — 6 lines, starts with
# import pandas and manipulate the data block
Excel VBA objects referenced (4)
Top excel Q&A (6)
- Shortcut to Apply a Formula to an Entire Column in Excel +335 (2011)
- How should I escape commas and speech marks in CSV files so they work in Excel? +136 (2012)
- Convert xlsx to csv in linux command line +96 (2012)
- How to create a link inside a cell using EPPlus +50 (2011)
- IF statement: how to leave cell blank if condition is false ("" does not work) +44 (2013)
- T-SQL: Export to new Excel file +44 (2012)
excel solutions on this site
.