READ_XLSX_for CLOBs ~ Extending Anton’s AS_READ_XLSX for CLOBs

I had a need to read a series of MS Excel XLSX files from a file server over a period of time – usually chunks of 20-50 files, several times a year. That’s too many files to make an upload interface practical (without annoying the users), and the files come as is from an external agency so there is no control over the contents. That is, this is NOT a case of “save to CSV format” – the requirement is to read MS Excel .XLSX format files as they come.

I was happy to use Anton Scheffer’s AS_READ_XLSX code .. but wary of the VARCHAR2(4000) column limit. Sure enough, a few files in, I hit that limit.

What to do? Modify – adjust – Anton’s code to work with CLOBS, of course. The CLOB-ready package is here, PACKAGE_READ_XLSX_CLOB.

So now I have an AS_READ_XLSX_CLOB package that will read the contents of an XLSX file column into a CLOB.

Wh so much data in one MS Excel column? Not my concern – all I need is to be able to read it into my Oracle database.

The adjusted package is available here, PACKAGE_READ_XLSX_CLOB.

The change?  Simple – remove the VARCHAR2(4000) limiting piece – the DBMS_XLSPROCESSOR.VALUEOF function call.

With a little research I discovered the DMBS_XSLPROCESSOR.VALUEOF function was returning into a VARCHAR2(4000) variable  Understandable for a function call.

BUT the DMBS_XLSPROCESSOR.VALUEOF procedure does not have that limit – well, it would have  depending on how I have the variables that hold the VARCHAR2 OUT parameter.  Using the VALUEOF procedure made it easier to use a CLOB variable to accept the OUT parameter.  Of course there is implicit conversion from the procedure VARCHAR2 OUT parameter to the CLOB variable.

I used a CLOB, because that is how I need to store my data. And yes, a significant number of those spreadsheet table cells are indeed > 4000 characters, meriting the CLOB datatype for our purposes

The change:

Change the string_val declaration from VARCHAR2(4000) to CLOB

  type tp_one_cell is record
    ( sheet_nr number(2)
    , sheet_name varchar(4000)
    , row_nr number(10)
    , col_nr number(10)
    , cell varchar2(100)
    , cell_type VARCHAR2(1)
    , string_val clob        --KC: was --varchar2(4000)
    , number_val number
    , date_val date
    , formula varchar2(4000)
  );

Replace the function calls:

t_strings( t_c ) := dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), '.' );

with equivalent procedure calls:

dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), '.', t_strings( t_c ) );

Then I use the AS_READ_XLSX_CLOB package the same as AS_READ_XLSX … for example:

select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( 'DOC', 'Book1.xlsx' ) ) );

or, for ease of reading data in a procedure:

CREATE VIEW MYSPREADHSEET_V AS select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( 'DOC', 'Book1.xlsx' ) ) );

and then reading data is as simple as:

SELECT string_val
  FROM MY_SPREADSHEET_V
 WHERE row_nr = 1
   AND col_nr = 2;

and its many variations.

Now I am reading in XLSX spreadsheets with ease, no matter what size.

Thank you Anton ~