Monday, June 16, 2014

Parsing a Comma Delimited Input String with PL/SQL

A colleague recently introduced me to an Oracle 11g PL/SQL feature of syntax called "comma_to_table".  It is a part of the DBMS_UTILITY built in package.  She used it as a method to break apart a string input that had components delimited by a comma ",".

How Complicated is Delimited String Parsing?


The more commonly known function: LISTAGG converts a single column output of many rows into a string with a custom delimiter.  There originally was no real alternative to do the reverse except through a procedural path in our code:



        for i in 1 .. l_count
            loop
              select regexp_substr(string_to_parse,'[^,]+',1,i)
                into l_value from dual;
                dbms_output.put_line(l_value);
            end loop; ...

[Sourced from an example on: Oradev.com]

This example loops through the input string "string_to_parse" and breaks the string into pieces based on a self-declared delimiter.  A fancier approach is to use the CONNECT BY syntax (a hierarchical SQL syntax concept) and make an artificial cartesian product to force an iteration over multiple delimited elements:

[Ask Tom has an example] of this:

      select trim( substr (txt, instr (txt, ',', 1, level ) + 1,
             instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 ) )
        from dual 
     connect by level <= length(txt) - length(replace(txt,',','')) -1)

this one is clever and elegant but it takes a while to interpret.  The use of "connect by...level" is not readily obvious and looks like something is missing from the clause when I look at it.

Hidden Constraints:  The Trouble with Misapplied PL/SQL Syntax


The summary of the comma_to_table function is that it was created with a specific purpose in mind and does not have a fully generic application for all cases.  With a single command, this function allows coders to convert a comma-separated series of string values such as:

     '1,2,3,four,fifteen,orange,shack'

to a row-wise output:

        1
        2
        3
        four
        fifteen
        orange
        shack

Ask-Tom also warns that the original intent of this function derives from the expression: NAME TOKENIZE. [1]  This function was made to assist with internal database ETL processes where the task was to parse valid object names (such as tables) from comma delimited string inputs.  This limits the types of values that can be contained within the string:

  1. Each delimited element must be no longer than about 30 characters.
  2. The string values must represent valid database object names.
  3. Objects are defined as:
    "[schema].[object_name].[procedure|function]@[database link]"
  4. Certain reserved words that do not make valid object names will cause an error when applied.
In general, the comma-to-table function (Oracle APEX also has a similar command within its built-in package library) has enough limitations to make it a clear recommendation to avoid.  General parsing capability is not available because of its exceptions and the output can defy a coder's original intent.


Source Citations:


  1. DBMS_UTILITY.comma_to_table Limitations from "Ask Tom" at:  https://asktom.oracle.com/pls/asktom ...

No comments: