Thursday, June 5, 2014

Importing an Excel Workbook into R

The usual route for importing data from spreadsheet applications like Excel or OpenOffice into R involves first exporting the data in CSV format. A newer and more efficient CRAN package, called XLConnect (c. 2011), facilitates reading an entire Excel workbook and manipulating worksheets and cells programmatically from within R.

XLConnect doesn't require a running installation of Microsoft Excel or any other special drivers to be able to read and write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE). Moreover, XLConnect can handle older .xls (BIFF) as well as the newer .xlsx (Office XML) file formats. Internally, XLConnect uses Apache POI (Poor Obfuscation Implementation) to manipulate Microsoft Office documents.

As a simple demonstration, the following worksheet, from a Guerrilla Capacity Planning workbook, will be displayed in R.

First, the Excel workbook is loaded as an R object:


require(XLConnect)
wb <- loadWorkbook("~/.../XLConnect/82scaling.xlsx")

The structure of the workbook object is:


> str(wb)
Formal class 'workbook' [package "XLConnect"] with 2 slots
  ..@ filename: chr ".../XLConnect/82scaling.xlsx"
  ..@ jobj    :Formal class 'jobjRef' [package "rJava"] with 2 slots
  .. .. ..@ jobj  : 
  .. .. ..@ jclass: chr "com/miraisolutions/xlconnect/integration/r/RWorkbookWrapper"

Next, the workbook object is converted to a data frame:


df <-  readWorksheet(wb, 
    sheet = "SGI-NUMA", 
    startCol = which(LETTERS=="A"), 
    startRow = 3, 
    endCol = which(LETTERS=="P")
    endRow = 15, 
)

which can be compared with the original worksheet (above):


  Measured KRays.Sec      RelCap Efficiency Inverse Fit Transform Trendline   Parameters
1   CPU (p)      X(p) C=X(p)/X(1)        C/p     p/C p-1   (p/C)-1 Quadratic Coefficients
2         1        20        1.00       1.00    1.00   0      0.00         a   5.0000E-06
3         4        78        3.90       0.98    1.03   3      0.03         b       0.0500
4         8       130        6.50       0.81    1.23   7      0.23         c       0.0000
 ...

How to apply other XLConnect functions is described in the associated vignette.

1 comment:

Tom said...

Hey, this is great! I'm eager to try it. Thanks for posting about it, Neil.

Tom