Friday, January 19, 2007

sql threaded comment view result set

Re: Can SQL return a threaded-comment-view result set?



Hello all,

I've been meaning to get back to you all but I just haven't had time.
Okay, I've got a little bit of time now so here goes....

I received many useful answers from many of you including Tom Lane,
Joe Conway, and Josh Berkus. Max Nachlinger in particular on October
5th (which was my birthday) sent me a large amount of threaded
discussion forum code of his own. (Nice birthday present, Max. Thank
you.) I will be investigating his solution when I have more time since
his is almost certainly more efficient than my own.

My own solution is a 20-line PL/pgSQL function I put together after
reading the 7.3 docs at postgresql.org. It requires no modifications
to my original example table definitions other than that I decided to
use a 0 value instead of a NULL value for the in_reply_to column when
a message isn't a reply, because that way my plpgsql function doesn't
have to treat NULL as a special case.

In particular, my solution doesn't require a message to keep pointers
to its children. If a message is a reply, it simply points to its
parent's id via in_reply_to. You can add as many messages as you want
with just single simple INSERT statements; you don't have to do any
tree-refactoring or updating to the parent. The downside is that while
insert speed couldn't be any better and inserting couldn't be any
easier, building the threaded view seems rather algorithmically
inefficient, and in almost all applications optimising for obtaining
the threaded view rather than insert speed is more important. One
probably couldn't base even a moderate-load application on this
solution, but if one wanted to anyways I suppose an in-memory tree
representation could be maintained which allows new messages to be
linked into the in-memory tree efficiently as they're inserted into
the database, and then whenever the application is shutdown and
reloaded it could rebuild that in-memory representation on startup. Or
something. And until you run out of memory.... (Also, simply caching
the results of queries could be effective if you have many identical
queries producing identical results [which my application does] so
this solution might not work too bad for me.)

For the sake of googlers and like novices reading this, I've adapted
my PL/pgSQL function so that it works with the original example I
posted. (My real code uses more fields, different types, and has some
other subtle differences because there's more than one type of table
to consider and there are foreign key constraints.) After loading the
below code, evaluating

select * from threadview(0, 0);

builds a table like the one I wanted in my original posting.

---Chris

=================================================
-- This code originally due to Chris Barry,
http://www.goodfig.org/feedback
-- It's hereby placed in the public domain. These public domain
licenses
-- usually have some sort of warning about no guarantee of fitness for
a particular
-- purpose, etc. Well, the below code is DEFINITELY not fit for any
purpose! So,
-- use it at your own peril. Caveat emptor.

-- drop database discussion;
create database discussion;
\c discussion

-- The path to plpgsql.so may need to be edited for your system.
create function plpgsql_call_handler()
returns opaque as '/usr/local/pgsql/lib/plpgsql.so' language
'c';

create language 'plpgsql' handler plpgsql_call_handler
lancompiler 'PL/pgSQL';

create table messages (
message_id integer,
in_reply_to integer,
created date,
author varchar(20),
title varchar(30),
message varchar(256),
primary key (message_id)
);

-- A threadrow is the same thing as a row from the messages table
-- except a nesting integer has been added so the client knows how
-- much to indent the thread message. I'm not sure if there's a
-- syntax that makes it unnecessary to duplicate the redundant
-- information from the messages table (e.g inheritance).
create type threadrow as (
message_id integer,
in_reply_to integer,
created date,
author varchar(20),
title varchar(30),
message varchar(256),
nesting integer
);


create or replace function threadview(int, int) returns setof
threadrow as '
declare
p alias for $1; -- p is the parent
i alias for $2; -- i is the indentation (nesting)
c threadrow%rowtype;
c2 threadrow%rowtype;
begin
for c in select *, 0 as nesting from messages
where in_reply_to = p
order by created asc
loop
c.nesting = i;
return next c;
for c2 in select * from threadview(c.message_id, i+1) loop
return next c2;
end loop;
end loop;
return;
end;
' language 'plpgsql';


-- Load the table with some example data:

insert into messages values
(1, 0, '2003-09-01', 'John', 'Favorite DB?',
'What is your favorite database?');
insert into messages values
(2, 0, '2003-09-02', 'Mike', 'New DB2 benchmarks',
'I just posted some new DB2 benchmarks.');
insert into messages values
(3, 1, '2003-09-03', 'Mike', 'Re: Favorite DB?',
'I\'d say DB2.');
insert into messages values
(4, 1, '2003-09-05', 'Dave', 'Re: Favorite DB?',
'I\'m an Oracle man myself.');
insert into messages values
(5, 3, '2003-09-07', 'John', 'Re: Favorite DB?',
'DB2? I thought you liked free databases?');

No comments: