Friday 15 November 2013

Editing Excel XLSX files manually to change cell label names

We have an application that manipulates XLSX Excel files using the excellent Java Apache POI package

We have the need to deploy this application for lots of different customers & the labels used in the spreadsheets need to be customer specific. In our case the cell names are prefixed with the customer name say 'foo' & we need to alter them all to 'foo'

Now you could do this longhand in Excel but this is time comsuming & a major issue for us is that you are not actually renaming the cells in Excel, you are actually adding  an extra name to a cell.

So what to do.

1) Handily XLSX files are acxtually zip compressed files so firstly extract all the files using your favourite tool, for me I use 7zip 
2) run a command to search for all the files that contain the name you want to change

  •     find . | xargs grep -l foo 2> /dev/null

This will probably list 4 files:

  • ./docProps/app.xml
  • ./xl/charts/chart1.xml
  • ./xl/charts/chart2.xml
  • ./xl/workbook.xml 


3)Edit these files using your favourite editor. For the changes we want to make something like 'vi' or 'vim' is perfect.
4) If you are just changing names then one command will do it

  •     :1,$s/foo/bar/g

5)We also had a case where cells were duplicated for two customers 'foo' & 'bar' and we wanted to remove all the 'foo' tags. This was in the files 'app.xml' & 'workbook.xml' . Here care is needed. Excel creates these files as one long line with loads of XML tags.

6) So firstly you have to seperate the lines, For app.xml this is:

  • :1,$s/<\/vt:lpstr><vt:lpstr>/<\/vt:lpstr>\r<vt:lpstr>/g
    

    For workbook.xml, this two commands are needed:
  • :1,$s/<\/definedName><definedName name/<\/definedName>\r<definedName name/g
    
  • :1,$s/<definedNames><definedName name/<definedNames>\r<definedName name/g
    
7) You then delete the lines containing the string 'foo', but firstly count how many lines contain the string 'foo', so

  • :%s/foo//gn
Then delete the lines
  • :g/foo/d

8) In the case of app.xml two extra edits are required. See the XML below. This is a snippet of app.xml. In this example note the two numbers '87' & '93'. These both need decreasing by the numbers of lines you have deleted.Note I have formatted the XML below to make for easier reading

9) You now need to repackage this up using something like 7zip, but note the archive format must be 'zip' and the compression method must be 'DEFLATE'

Now go & enjoy.

No comments: