I had a need to read a series of MS Excel XLSX files from a file server over a period of time – usually chunks of 20-50 files, several times a year. That’s too many files to make an upload interface practical (without annoying the users), and the files come as is from an external agency so there is no control over the contents. That is, this is NOT a case of “save to CSV format” – the requirement is to read MS Excel .XLSX format files as they come.
I was happy to use Anton Scheffer’s AS_READ_XLSX code .. but wary of the VARCHAR2(4000) column limit. Sure enough, a few files in, I hit that limit.
What to do? Modify – adjust – Anton’s code to work with CLOBS, of course. The CLOB-ready package is here, PACKAGE_READ_XLSX_CLOB.
So now I have an AS_READ_XLSX_CLOB package that will read the contents of an XLSX file column into a CLOB.
Wh so much data in one MS Excel column? Not my concern – all I need is to be able to read it into my Oracle database.
The adjusted package is available here, PACKAGE_READ_XLSX_CLOB.
The change? Simple – remove the VARCHAR2(4000) limiting piece – the DBMS_XLSPROCESSOR.VALUEOF function call.
With a little research I discovered the DMBS_XSLPROCESSOR.VALUEOF function was returning into a VARCHAR2(4000) variable Understandable for a function call.
BUT the DMBS_XLSPROCESSOR.VALUEOF procedure does not have that limit – well, it would have depending on how I have the variables that hold the VARCHAR2 OUT parameter. Using the VALUEOF procedure made it easier to use a CLOB variable to accept the OUT parameter. Of course there is implicit conversion from the procedure VARCHAR2 OUT parameter to the CLOB variable.
I used a CLOB, because that is how I need to store my data. And yes, a significant number of those spreadsheet table cells are indeed > 4000 characters, meriting the CLOB datatype for our purposes
Change the string_val declaration from VARCHAR2(4000) to CLOB
type tp_one_cell is record ( sheet_nr number(2) , sheet_name varchar(4000) , row_nr number(10) , col_nr number(10) , cell varchar2(100) , cell_type VARCHAR2(1) , string_val clob --KC: was --varchar2(4000) , number_val number , date_val date , formula varchar2(4000) );
Replace the function calls:
t_strings( t_c ) := dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), '.' );
with equivalent procedure calls:
dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), '.', t_strings( t_c ) );
Then I use the AS_READ_XLSX_CLOB package the same as AS_READ_XLSX … for example:
select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( 'DOC', 'Book1.xlsx' ) ) );
or, for ease of reading data in a procedure:
CREATE VIEW MYSPREADHSEET_V AS select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( 'DOC', 'Book1.xlsx' ) ) );
and then reading data is as simple as:
SELECT string_val FROM MY_SPREADSHEET_V WHERE row_nr = 1 AND col_nr = 2;
and its many variations.
Now I am reading in XLSX spreadsheets with ease, no matter what size.
Thank you Anton ~
i use as_read_xlsx , but i have a problem that excel columns with numeric data
have the leading zeros truncated ,although the column was defined as text or ‘custom’ ‘000000000’. i would GREATLY appreciate help in changing the package to return the column including the leading zeros.
I just saw this comment … notifications were apparently off.
What you ask is no doubt possible, as long as you want the leading zeros always there for certain column types.
I have not looked at the package for along time, but could revisit.
Have you tired digging in?
Thanks for upgrading this package. I am using it to great effect 🙂
Thank you – and again thanks to Anton Scheffer for the main project!
Thanks for posting this. I am using this to extract clob data from .xlsx files.
Reblogged this on ..:::: EasyOraDBA | Shadab Mohammad ::::...