NoCOUG SQL Challenge #2 – My Submission

By | February 22, 2011

Here is my (slightly modified) submission to the NoCOUG SQL Challenge. Read below for my description of what the query is doing.

with
words as
(
    select
        word1, word2, word3,
        sys_connect_by_path(case when word2 = prior word1 then '1' when word2 = prior word3 then '3' when prior word2 is null then '0' end, '.') ordered_path
    from riddle
    where word1 is not null or word3 is not null
    start with word2 = (select word2 from riddle minus (select word1 from riddle union all select word3 from riddle))
    connect by prior word1 = word2 or prior word3 = word2
),
ordered_phrases as
(
    select
        rpad(l.ordered_path, max(length(l.ordered_path)) over (),'.2') phrase_order,
        case when left_child.ordered_path is null then l.word1||' ' end
            ||l.word2
            ||case when right_child.ordered_path is null then ' '||l.word3 end phrase
    from
        words l
        left outer join words left_child
            on left_child.ordered_path = l.ordered_path||'.1'
        left outer join words right_child
            on right_child.ordered_path = l.ordered_path||'.3'
)
select
    trim(replace(sys_connect_by_path(phrase, ','),',',' ')) hidden_phrase
from
(
    select
        row_number() over (order by phrase_order) rn,
        count(*) over () cnt,
        phrase
    from ordered_phrases op
)
where rn = cnt
start with rn = 1
connect by rn = prior rn + 1;

First, you need to understand how the data in each row relates to each other. The general idea is that the words form a tree. WORD2 in all but one row (the root node) maps to exactly one of the other rows WORD1 or WORD3, making that row its parent. For example:

		SOME			MATH		FIVE 
		 |			  	 	  |
	I	SOME	BEEN       		ELBOW	FIVE	DONE 
	|						  	 |
IN	I	THOSE         				THREE	DONE	[null] 
		  |				  	  |
	OF	THOSE	SHOULD           	FOUR	THREE	TWO 
... more ...

Since it is a tree structure, I decided to use the CONNECT BY clause, and as I traverse the tree, keep track of where each node fits in the tree. I did this by assigning “.0” to the root node, and then appending a “.1” if the node links to its parents WORD1 and “.3” if the node links to its parents WORD3. In tree form, this would look something like:

		SOME			MATH		FIVE (.0)
		 |			  	 	  |
	I	SOME	BEEN (.0.1)		ELBOW	FIVE	DONE (.0.3)
	|						  	 |
IN	I	THOSE (.0.1.1)				THREE	DONE	[null] (.0.3.3)
		  |				  	  |
	OF	THOSE	SHOULD (.0.1.1.3)	FOUR	THREE	TWO (.0.3.3.1)
... more ...

So in my query, this is the “WORDS” with block, which gives me a result of:

WORD1	WORD2	WORD3	ORDERED_PATH
-----   -----   -----   ------------
SOME	MATH	FIVE	.0
ELBOW	FIVE	DONE	.0.3
THREE	DONE		.0.3.3
FOUR	THREE	TWO	.0.3.3.1
I	SOME	BEEN	.0.1
IN	I	THOSE	.0.1.1
OF	THOSE	SHOULD	.0.1.1.3
... more ...

Now I need to start putting the nodes in the correct order to put the full message together properly. Logically, this is done by starting with the root node and replacing WORD1 and WORD3 with their respective children recursively (or not replacing if that word doesn’t have a child). In my query, I achieved the same idea by joining the query to itself twice to determine if WORD1 and WORD3 had a child. In my tree a child can be found by looking at the current nodes ORDERED_PATH and adding either a “.1” (WORD1’s child) or “.3” (WORD3’s child). If WORD1 has a child, I don’t include it in the result as it will be included from the child’s record. Likewise, if WORD3 has a child, I don’t include it in its output. I also want a parent node to appear in between its WORD1 child and WORD3 child, and this is achieved by rpadding the ORDERED_PATH with a “.2”. Once this is done, I can alpha sort on the new order and my message is written out for me in the correct order, but as separate rows. This is the “ORDERED_PHRASES” with block:

PHRASE_ORDER	PHRASE
------------	-----------------------
.0.1.1.1.1.1	TRYING TO TYPE
.0.1.1.1.1.2	ONE
.0.1.1.1.1.3	HUNDRED DISTINCT WORDS
.0.1.1.1.2.2	IN
.0.1.1.1.3.1	A SINGLE PARAGRAPH
.0.1.1.1.3.2	IS
.0.1.1.1.3.3	REALLY TOUGH IF
... more ...

So the final step is to combine all of the rows back into a single row that has the words in one wide column. To do this, again I use the CONNECT BY clause. The first step is to assign a value that corresponds to the order of the phrase_order column and a value that is the total number of rows in the query. I use my ordered column to connect each row to the previous, and the total number or rows column to just filter down to the last row (whose parents include all other rows). I use sys_connect_by_path to show the phrase of the current row and all of its parents beforehand. Since some of the phrases already have spaces in them, I have to use a different character in sys_connect_by_path, so I chose a comma. I then replaced the commas with spaces and as a final step trimmed any extra space (there would have been a leading space as sys_connect_by_path adds the delimiter before every line). This gave me my final output:

TRYING TO TYPE ONE HUNDRED DISTINCT WORDS IN A SINGLE PARAGRAPH IS REALLY TOUGH IF I CANNOT REPEAT ANY OF THEM THEN PROBABLY THOSE WITH MANY LETTERS SHOULD BE USED MAYBE SOME READERS WILL UTILIZE DICTIONARIES THESAURUSES THESAURI OR POSSIBLY EVEN ENCYCLOPEDIAS BUT MY PREFERENCE HAS ALWAYS BEEN THAT GRAY MATTER BETWEEN YOUR EARS SERIOUSLY MARILYN CHALLENGES SUCH AS THIS REQUIRE SKILLS BEYOND MATH SCIENCE AND PHYSICS SO WHAT DO YOU ASK READING COMPREHENSION WRITING ABILITY GOOD OLD FASHIONED ELBOW GREASE SCIENTISTS DONT CARE ABOUT STRUCTURE THEY WANT RESULTS HEY LOOK ONLY ELEVEN MORE LEFT FIVE FOUR THREE TWO DONE

Note: In my original version, I used an ORDER BY in the “ORDERED_PHRASES” with block and then just used the rownum pseudocolumn as my rn column. However, I was thinking it could be possible for Oracle to materialize the results of the with blocks and possibly not pull them back in the correct order, so I changed this to the row_number() aggregate function. I also originally didn’t have the extra logic of assigning a “.0” to the root node (I used a lazy way of saying if this node is not a child of a WORD1 assign it a “.3”). Since there is only one root node, this doesn’t effect the outcome of the query at all, but having the root start with “.0” instead of “.3” makes more sense when showing it in the examples.

Again, I look forward to seeing the other entries as I am sure there are many other ways to do this that are probably much superior. And I do need to get my 11gR2 instance up and running again so I can start playing around with recursive queries!

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn on pictures to see the captcha *