APEX Data Load Wizard Customizations ~ Revisited

dataLoadWizard
I was revisiting our APEX Data Load application, which uses a highly customized version of the APEX Data Load Wizard – several of them. Key words here: Highly Customized.

Customizing the Data Load Wizard – or modifying any set feature of APEX – leaves one open to the possibility that things might not work in the following versions. This has been the case with every major APEX upgrade since we made the Data Load Wizard customization in APEX 4.2.  Each time we have had to plan time/resources to revisit the customizations and make corrections – all on our own sleuthing.  No big deal, right?

This year an upgrade to APEX 5.1,  left the last step, the load into our defined Data Load table, not working. I had two choices, rebuild the Data Load Wizard pages ( 3 sets), and re-apply the customizations.  Or, write my own load into the final table.

This time I chose the latter.
As a quick reminder, the Data Load Wizard loads uploaded data into a series of collections.  To learn more about the Data Load Wizard and why we customized, read the previous Data Load Wizard posts. The data from those collections – after transformations are applied – gets loaded into the Data Load target table.  In our case, we needed complex validations and transformations, and logging of each change. The logging of each change is what really drove us to customizing the process. So we pulled data from the uploaded collection (SPREADSHEET CONTENT) into a temporary table, performed all our validations, transformations and logging, then moved the data back into the load collection (LOAD_CONTENT).  And continued with the Data Load process.  That’s a simplification, there are a lot of other pieces involved, but that’s enough for here. (Read previous blog posts for more details).

By choosing to write my own load into the final table, and bypass the APEX processes, I lose some other features that I now have to code on my own.  What I lose:

  • Automatic insert/updates
  • Automatics insert/update failure counts and logging.

I am OK with handling these in my own bulk process insert with the LOG ERRORS clause. We only get inserts, never updates, which simplifies things.  Your mileage may vary. Note that if you need to process updates and inserts, and report which records are inserted vs updated, the solution will be more complicated.

Next time we upgrade, I expect another set of minor Data Load Wizard changes … to which I will adjust somehow.  Now, IF the Data Load Wizard were to allow greater flexibility in logging transformations for each row – then I may be able to abandon the customized code altogether and revert to the mainstream Data Load Wizard.  Until then I am resigned to maintenance with each update.

P.S.
I just had someone ask about customizing the Data Load Wizard – breaking into the collections as I did. My recommendation was, use the Data Load Wizard to load into a staging table, then write PL/SQL processes to perform the validations needed – in his case to only do inserts no updates – filter out existing rows. Such a solution will be in-the-sandbox (no customizations) and will readily upgrade.
I strongly advise against breaking into the Data Load Wizard unless there is a solid business requirement to do so. Unless …

    • You have plenty of time to go back and fix things upon upgrade, and
    • You have a finding Fairy to pay for all those upgrades.

Happy Coding, whichever approach you choose.

Read XLXS Files from APEX 5.0 and Higher – One (or Two) Ways

In my work as a consultant I often have the task of getting data from MS Excel spreadsheets into tables in the database.

If this is a one-time load, the process is simple – use either SQL Developer or, easier yet, use the Data Load Workshop in APEX:  SQL Workshop –> Utilities –> Data Workshop.

If the requirement is for a repeated load or many files, I need to set up some automated process for reading in the data.  I do not want to ask my users to open the XLSX files, save the data to CSV, then upload the CSV files.  That makes for unhappy users, at best.

I am sure there are several options for load XSLX files into Oracle data tables.  I am going to focus on a two I have used (and continue to use) successfully across several clients, across many projects.  And another option that used to be a very good option, that goes away with APEX 5.0 and higher.

Let’s do Bad News First.

The Option That Goes Away:

With APEX Listener, now Oracle REST Data Services, ORDS, we gained a wonderful utility for loading XLS or XLSX files into the database.

I will not elaborate on that here, for reasons which will be obvious in a bit.  I did cover that option, and several other Data Load options, in this presentation, How Do I Load Data, Let Me Count the Ways..  Most of these options still apply.  Yes, it is a PowerPoint, but gives you enough detail to get going.

This ORDS-based utility works fabulous for APEX versions less than APEX 5.0.  It loaded data into a collection. You can process your data however you wish from there.  However, as of APEX 5.0, APEX uses JSON to move data, and the ORDS Excel file upload function no longer works.  Joel Kallman explains this clearly in his blog, Let’s Wreck This Together, here.  It will still work for APEX versions < 5.0 – that is, APEX 4.2, APEX 4.1.  But beware, if you are using this Excel-Upload option of ORDS in APEX 4.1 or APEX 4.2 applications, you will need to make some changes when you upgrade those applications to APEX 5.0 and above.

The XLSX Option That Still Works – READ_XLSX

Thanks to Anton Scheffer from AMIS, we have a PL/SQL package that reads .XSLX files from a file in a file server folder that has been mapped to an Oracle directory.  That sounds like a mouthful, but it is simple.

You have files in a folder.  You or your system administrators arrange for that file system folder to be available to the database – usually a mapping of the physical folder to a virtual folder on the database file server.  Then your DBA creates an Oracle directory that points to the folder, physical or virtual, on the database file system. And grants appropriate grants so you can see and read and/or write from /to that Oracle directory.

Now Anton’s package, READ_XLSX comes in.  download the package, read Anton’s blog post.  I am only going to summarize here.

READ_XLSX consists of two main functions, one to read the contents of an XLSX file into a BLOB (named file2blob), and another to read the contents of that BLOB into records (this one is named “read“).  Cast the results of the read function as a TABLE, and voila, you have the results of your XLSX file.

This statement is the guts of it:

SELECT * FROM TABLE( as_read_xlsx.read( as_read_xlsx.file2blob('MY_ORA_DIRECTORY',
 'MyExcelWorkbook.xlsx' ) ) );

Now if you are like me, you do not want ot have to keep typing that SELECT statement.  So I turn it into a view:

CREATE VIEW MY_SPREADSHEET_V AS SELECT * FROM TABLE( as_read_xlsx.read( as_read_xlsx.file2blob('MY_ORA_DIRECTORY',
 'MyExcelWorkbook.xlsx' ) ) );

Now I can query from the view the same as I query from any other table or view I have access.  And incorporate queries from that view into any packages, procedures and functions that I build to meet the requirements of the task at hand.

When I have a series of files in a folder, I use a version of GET_DIR_LIST to read files from the Oracle directory.  (GET_DIR_LIST is a java utility, wrapped in PL/SQL, that returns the list of file names in a folder in a one-column table). You probably have your own method of doing the same.  Then, in a loop, I use dynamic SQL to point my view to the next file in the loop, call my loading procedure to process the contents of that file, then move on to the next file.  OF course I have all kinds of validations and error checking in there, as much is needed depending on the task at hand.  Once configured, the process is clean, reusable, easily customizable and – best of all – works in APEX 5.0 and above.

I did not use Anton’s READ_XLSX directly.  Why?  At the time I downloaded it (years ago now), Anton’s package handled cells of up to 4000 characters.  The spreadsheets I had to load contains some cells with > 4000 characters.  So I made a slight adjustment to handle CLOBs.  I described those customizations here.  I suspect most persons will not need such a customization.

Another (Better?) Option – EXCEL2COLLECTION Plugin:

Is READ_XLSX the only way? No. Seems like too much work? Yup, it can be.  Anton apparently thought so too, because he came up with EXCEL2COLECTION, a utility APEX Plugin that, with less coding than the READ_XLSX package, will read data from an XLSX, XSL, XML 2003 or CSV file into a PL/SQL collection.  I have also used this plugin successfully across several customers and projects.

I have used the EXCEL2COLLECTION plugin in conjunction with a customized instance of the APEX Data Load Wizard to enable simpler (from a user perspective) upload of XLSX files.  I describe more of that process here.  The APEX Data Load Wizard does not out-of-the-box allow upload of XLSX files – one needs to cut/paste.  Integrating the EXCEL2COLLECTION plugin into a customized Data Load Wizard series of pages makes use of other features of the Data Load Wizard – like the column mapping page – possible.  Beware, that customizing the Data Load Wizard pages and process flow means that your work is subject to break upon any and every APEX upgrade.  This customization of the Data Load Wizard process is unsupported territory.  Note that the EXCEL2COLLECTION plugin itself is supported (by Anton). My use of it in customizing the Data Load Wizard is not.

Bottom line:

Anton’s READ_XLSX remains one of the best methods of reading XLSX data into an Oracle database, when the requirement is to read many XLSX files.  When using APEX, investigate using the EXCEL2COLLECTION plugin; that will make your XLSX load tasks easier.  For one-of data loads, I still use SQL Developer or the APEX SQL Workshop Data Workshop utility.  Need to script the load for Production?  Use SQL Developer in a DEV environment, then use SQL Developer to generate a load script, there are several options for that.

Happy Data Loading!

 

 

 

APEX 5 Data Load Wizard ~ Slight Updates Mean Changes to Customized Load Pages

Slight changes in the APEX 5 Data Load Wizard meant our customized Data Load pages needed some attention to work properly in APEX 5.

Ever customize the APEX Data Load Wizard pages?  We did so in APEX 4.2,  to

  • Accomodate data change logging requirements. We needed to log every change made to incoming data.  The APEX Data Load Wizard will make those changes for you, but not log them.
  • Allow users to upload XLSX, XLS or CSV files.

So we customized. And all worked well, until we upgraded to APEX 5.

In APEX 5, all our data uploads all broke, giving an abrupt “no data found” error during the file upload.  Geez, it hadn’t even got to all our custom stuff yet!

Well, a little digging into the APEX 5 Data Load Wizard-generated pages and collections gave me the answer. And we are back in business.

This is good reminder that any time you play outside the APEX sandbox – use anything other than the standard wizards and documented APIs –  you are subject to things breaking upon upgrade. We know this and accept it … and we really should PLAN for it. Even point releases.

So what changed?

The Data Load Wizard uses a series of collections behind the scenes to store the data being uploaded and processed:

  • SPREADSHEET_CONTENT – Uploaded (or copied) data
  • PARSE_COL_HEAD – As-uploaded column headings
  • LOAD_CONTENT – Content to be loaded, plus Error and Action information in c047, c048 and c049.
  • NOT_LOAD_CONTENT – Content to NOT load, including the same Error and Action information in c047, c048 and c049.
  • LOAD_COL_HEAD – Mapped column headings; the columns to be loaded.

The collection names pretty much tell you what each collection does.

The APEX 5 DLW now also creates a new collection:

  • SPREADSHEET_SMTP – One member, the uploaded spreadsheet sheet name.  (Not sure what happens if there are multiple sheets … task for another day.)

Was this the problem?  No.  This new collection was getting created in APEX 5 – clearly not the problem.

What was? Behind the scenes, the APEX 5 DLW added a new element as  the last element in the PARSE_COL_HEAD collection, USE_APPLICATION_DATE_FORMAT.  As soon as I added this element to my collections, all worked happy again.

How did I find it?  Plain old triage. I knew from experience that upon upload the SPREADSHEET_CONTENT and PARSE_COL_HEAD columns got populated. Since we did not get beyond the upload- clearly something was happening  – missing – there.  I created some APEX 5 data load wizard pages based on the same table, and step by step examined what was there or not in session state.  Our customization populated PARSE_COL_HEAD (for a variety of reasons re saving the uploaded file, off point for this post).  But it was not adding the USE_APPLICATION_DATE_FORMAT element.  A simple PL/SQL change corrected that, and the no-data-found went away.

Note that this solved our APEX 5 data load issue.  I have seen other cases of the same no-data-found message on other posts in the APEX Community and stackoverflow.com – these may or may not be the same issue I describe here, and the pre-APEX 5 posts are definitely not. Suffice it to say if one customizes any part of the APEX Data Load Wizard, one adopts  the responsibility of keeping up with all of the nuances of the Data Load Wizard for the life of that application.

Is there something else different behind the scenes?  Yes, a few things off topic for this post.  Is there more? Could be, but it is not effecting my customized data load  pages right now.  Will it break on the next upgrade?  Maybe, and we will of course be watching, testing and prepared to adjust as needed.

Lessons reinforced:  Customize only when necessary.  Plan for, allocate time for, test, and adjust as necessary before every upgrade. Common sense stuff that sometimes gets back-seated in the haste to upgrade.