Ugly SQL: 2014

Wednesday, September 10, 2014

Oracle Instant Client Installation

This was a useful guide to installing the Oracle Instant Client and SQL Plus applications on a Linux system.

The type of download file will vary depending on your Linux installation. The two basic file types are the "instantclient-basic" and the "instantclient-sqlplus" files.


instantclient-basic-linux.x64-11.2.0.4.0.zip
instantclient-sqlplus-linux.x64-11.2.0.4.0.zip

$ mkdir ~/software
$ cd ~/software
$ unzip /tmp/instantclient-basic-linux.x64-12.1.0.x.zip
$ unzip /tmp/instantclient-sqlplus-linux.x64-12.1.0.x.zip

The versions may vary depending on your current environment. To use the software, set your PATH variables:

$ export LD_LIBRARY_PATH=/home/tomcat/software/instantclient_12_1
$ export PATH=$PATH:$LD_LIBRARY_PATH

$ sqlplus scott/tiger@//myhost.example.com:1521/myservice

Original Reference:

ORACLE-BASE - Oracle Instant Client Installation

Wednesday, August 20, 2014

Oracle-XE plus Linux = ? (Day One)

So, I was very impressed with the initial documentation on the integration of Oracle-XE (the free rdbms available from Oracle) and Linux. Installation was supposed to be simply a single line call to the Ubuntu software libraries through the "apt-get install" command. It has turned out to be more lengthy of a task (try at least one working day) with lots of implied steps in between.

Still, I am glad I have chosen this route because the documentation, though scattered is very comprehensive. It has been a challenge to pick up, select and connect various units of working instructions for different parts of the installation... from different sources.

So far, I have prepared my platform, which mostly the base requirements to run XE on any system:

Ubuntu Linux installation 14.04 via a virtual machine system partition set up on my host OS (my workstation). it has assigned 2 GB of RAM, 30 GB of disk space and one cpu unit of computing power.
Assigned swap space for extra capacity and extended a dedicated disk partition for storage of the Oracle database instance.

At present, I am downloading the XE software for 11gR2. That will be the version used in this ongoing discussion for this installation. So far, things look good. I can really appreciate an organized installation methodology.

It helps to know where you can find everything and what you named it at the time it was installed. Believe me, it looks like it will be the deciding factor of whether to move onward, or to chuck it and start over (ouch).

Onward. More soon.

Tuesday, August 12, 2014

Movie Adventures: Data Sets from the IMDB

Working With Fun Data Sources

Actually, most data is interesting. When I make examples to show concepts to others, though I try to choose something that many people can relate and also visualize typical relations between each sample.

The IMDB, One of the Internet's Early Content Pioneers

Movies are a personal favorite. It turns out that the IMDB, aka: The Internet Movie Database reads like a searchable encyclopedia of movie facts and stats... a database in many ways.

Apparently, the IMDB has its own alternate interfaces to their database. Check it out and pull down a few samples. I am currently in the process of loading a compressed sample of their "movies.list" entity (they have quite a few entities in their model). It's close to 40 mb of text, but it's coming down the network pipe pretty slow right now. I'll report back as soon as I am able to uncompress it and peek at the formatting. Apparently the IMDB doesn't leave much documentation with respect to how to use their data extracts.

A Comment on Data Stewardship

Please be sure to review the link to the IMDB Data Terms of Use, as it's "not the usual yaddah, yaddah". Make sure that you are using the data, which is identified as:

The data is NOT FREE although it may be used for free in specific circumstances.

There are also some guidelines on another IMDB page discussing the guidelines of uses of their "not-free" data in personal or commercial software products. Reading these rules and thoroughly understanding their limitations and restrictions is a good exercise in data stewardship. Stewardship is applies to information gathered by individuals and their affiliated organizations (employers, businesses, etc.).

A Sample Data File Format

Here's a peek at what's inside the downloadable data files. Not quite like an industry "standard" csv (primitive comma separated values) or xml (a little geeky) layouts, but I think they're consistent enough for parsing. It might be good if I could get to some data on movies I actually recognize... :-)

"'n Shrink" (2009) {Who's Got the Pills? (#1.1)} 2010
"'n Shrink" (2009) {Why's It So Hot? (#1.2)} 2010
"'N Sync TV" (1998) 1998-????
"'Ons Sterrenkookboek'" (2007) 2007-2008
"'Orrible" (2001) 2001-????
"'Orrible" (2001) {Dirty Dozen (#1.7)} 2001
"'Orrible" (2001) {Dog Locked (#1.5)} 2001
"'Orrible" (2001) {Enter the Garage (#1.2)} 2001
"'Orrible" (2001) {May the Best Man Win (#1.4)} 2001

In general, this should be a fun data set to work with, even for just this entity alone. The count is about 2.97 million records spanning names from the early creation of motion picture science to the modern day. (Wow, that's quite a few.)

Update: An Appeal from the IMDB.com Team

The ftp site distributing the IMDB data extracts had an interesting appeal you may want to read and respond to. I'm sure that the more who put in their input will influence the site owners to implement even better alternative access methods and api's for their site's content. What do you think? If you found this data source useful (perhaps for an ongoing basis), be sure to contact them and weigh in your opinion!

Hi!

We're in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. Since you're reading this, you use our current ftp solution to get data (or are thinking about it) and we'd love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better
overall experience.

Please head over to https://getsatisfaction.com/imdb/topics/api_bulk_data_access

to let us know what you think.

Questions

1. What works/doesn't work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?
Thanks for your time and feedback!

Regards,

IMDb.com

Be sure to report back with your findings and creative ideas for interpreting and reading through the data. I'm exited because millions of records pushes small database systems to levels where SQL optimizations are actually noticeable (well, hopefully more than with only hundreds!).

Article Summary:
How to use fun and interesting data sets from external sources in your database designs, models and personal software development projects. Also tips on how to observe data stewardship practices such as following the legal restrictions and limitations placed on the use of otherwise private, proprietary or protected data.

Saturday, August 9, 2014

Git Tips: Refreshing Source Code from the Origin (upstream source)

git-scm: quick tips for your reference

I sourced this one from the blog, Gitready: fetching upstream changes.

This tip comes from Dav Glass, whose work on YUI at GitHub requires him to bring in commits frequently. So, he’s created a helpful alias so he can merge in changes easily.

First off, he’s got a consistent naming scheme for his upstream remote:

git remote add upstream git://github.com/user/repo.git

Just throw this in your

.gitconfig

and you’re all set:

[alias]
pu = !"git fetch origin -v; git fetch upstream -v; git merge upstream/master"

Now

git pu

will grab all of the latest changes from both remotes, and then merge in the commits from upstream.

Monday, June 23, 2014

What's Up With OPEN DATA and Government?

Intro: Code for America has a chapter here in Hawaii. Although it is based on Oahu, I still manage to keep up with their activity from here on Maui.

Watching Over State Legislator and Employee Spending

Many of you database developers may wonder in an existential sense if there is anything out there beyond the systems we support each day. Yes, the government sector alone has a myriad of information that is of the public domain that is a vital link to citizen awareness of their own governance.

The self-description of the CfA Brigade in Hawaii introduces them as:

This group [sic. Hawaii CfA Brigade] is for all those interested in open data, data access, apps, big data, data visualization, APIs and anything related to civic engagement and collaborative technologies.

In the news: The personal allowance budgets of lawmakers (and their staff) who represent Hawaii are now available for public review. More details on the committee decisions that led to this is detailed in a Civil Beat article on lawmaker allowance spending reviews.

A look at how you can query and download some of these figures for your own review is explained below in the next section.

How to Export Data from the Legislator Allowance Tool

Legislative Allowance Tool @khon2

The site that provides this information is hosted at:

http://khon2.com/legislative-allowance-tool/

1. The first step is to select the initial query parameters to collect the larger set of information you are querying.

2. Look for the "Download" link next to the output report grid and select it to initiate the generation of a comma-separated-value file for downloading to your local computer.

3. The .csv text file downloaded from the tool can be ingested into any relational database or aggregation system for further research and analysis.

An example comma separated value (.csv) file format opened using spreadsheet software:

Sample Download Contents: Lawmaker Allowance Data Hawaii

What the Data Looks Like from the Web App Tool

Give it a try. Even if you aren't super savvy at SQL or database technology, even a spreadsheet formula or two transforms this raw data into a source of accountability as the Hawaii State Ethics committee probably also believes in as well. Check it out.

Monday, June 16, 2014

Parsing a Comma Delimited Input String with PL/SQL

A colleague recently introduced me to an Oracle 11g PL/SQL feature of syntax called "comma_to_table". It is a part of the DBMS_UTILITY built in package. She used it as a method to break apart a string input that had components delimited by a comma ",".

How Complicated is Delimited String Parsing?

The more commonly known function: LISTAGG converts a single column output of many rows into a string with a custom delimiter. There originally was no real alternative to do the reverse except through a procedural path in our code:

        for i in 1 .. l_count
            loop
            select regexp_substr(string_to_parse,'[^,]+',1,i)
              into l_value from dual;
                dbms_output.put_line(l_value);
            end loop; ...

[Sourced from an example on: Oradev.com]

This example loops through the input string "string_to_parse" and breaks the string into pieces based on a self-declared delimiter. A fancier approach is to use the CONNECT BY syntax (a hierarchical SQL syntax concept) and make an artificial cartesian product to force an iteration over multiple delimited elements:

[Ask Tom has an example] of this:

    select trim( substr (txt, instr (txt, ',', 1, level ) + 1,
           instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 ) )
    from dual
   connect by level <= length(txt) - length(replace(txt,',','')) -1)

this one is clever and elegant but it takes a while to interpret. The use of "connect by...level" is not readily obvious and looks like something is missing from the clause when I look at it.

Hidden Constraints: The Trouble with Misapplied PL/SQL Syntax

The summary of the comma_to_table function is that it was created with a specific purpose in mind and does not have a fully generic application for all cases. With a single command, this function allows coders to convert a comma-separated series of string values such as:

     '1,2,3,four,fifteen,orange,shack'

to a row-wise output:

        1
        2
        3
        four
        fifteen
        orange
        shack

Ask-Tom also warns that the original intent of this function derives from the expression: NAME TOKENIZE. [1] This function was made to assist with internal database ETL processes where the task was to parse valid object names (such as tables) from comma delimited string inputs. This limits the types of values that can be contained within the string:

Each delimited element must be no longer than about 30 characters.
The string values must represent valid database object names.
Objects are defined as:
"[schema].[object_name].[procedure|function]@[database link]"
Certain reserved words that do not make valid object names will cause an error when applied.

In general, the comma-to-table function (Oracle APEX also has a similar command within its built-in package library) has enough limitations to make it a clear recommendation to avoid. General parsing capability is not available because of its exceptions and the output can defy a coder's original intent.

Source Citations:

DBMS_UTILITY.comma_to_table Limitations from "Ask Tom" at: https://asktom.oracle.com/pls/asktom ...

Wednesday, June 4, 2014

A Reminder: UTL_SMTP or UTL_MAIL for Sending Mail?

I thought this was worth mentioning because it is often confused by old-school Oracle developers who have survived more than one or two major releases of this RDBMS.

From the docs at Oracle [1]:

The UTL_SMTP package is designed for sending electronic mails (emails) over Simple Mail Transfer Protocol (SMTP) as specified by RFC821.

The folks at PSOUG[2] have also noted that historically, this built-in package has been around since Oracle 8i:

UTL_SMTP Package Detail: sourced from PSOUG.org

UTL_SMTP Definition[3]

The UTL_MAIL package was similarly created to replace UTL_SMTP. Both work slightly differently from one another. Oracle announced the eventual replacement of the newer mail utility around Release of the Oracle 10g RDBMS product release. It's definition:

UTL_MAIL Package Detail: sourced from PSOUG.org

UTL_MAIL Definition [4]

I have noticed and used UTL_MAIL since at least the 10g RDBMS Release. Anyways, it would be wise to stick with the new replacement to add longevity to your PL/SQL procedural code which uses email features supplied by the database system.

Oracle PL/SQL Documentation: UTL_SMTP
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/u_smtp.htm#i1002798
(PSOUG) Puget Sound Oracle User Group
UTL_SMTP History
http://psoug.org/reference/utl_smtp.html
UTL_MAIL History
http://psoug.org/reference/utl_mail.html