Friday 7 August 2009

How to do search programmatically

Here's a common problem - you have deep data - you have invoices in multiple different database tables ; you have customer reference numbers somewhere else ; you have customer details in a further table ; you have order numbers; you have products you sell; etc.

How do you provide a single simple search interface that finds exactly what the user wants from a single 'search box'?

Well, it's actually pretty simple. But it does get complicated later on, but don't worry, all problems are surmountable! Let's describe what we need in terms of sql...

create table keywords (
keyword varchar(20),
refernce varchar(20),
section varchar(1),
signifcance integer
);

create unique index i_keywords_1 on keywords (keyword, refernce, section);

---
From this basic keywords table (and yes, add your own unique references as necessary), you can construct highly efficient queries. For example, searching for 'Fleetwood'...

select K1.refernce, K1.section, K1.significance
from keywords K1
where K1.keyword = 'Fleetwood'
order by K1.significance desc
;

this is efficient as an sql query - searches down keywords first. If you entered two words, eg. 'Fleetwood' 'Mac'...

-- AND query...
select K1.refernce, K1.section, (K1.significance + K2.significance) sig
from keywords K1, keywords K2
where K1.refernce = K2.refernce
and K1.section = K2.section
and K1.keyword = 'Fleetwood'
and K2.keywood = 'Mac'
order by sig

-- OR query...
select K1.refernce, K1.section, K1.significance
from keywords K1
where K1.keyword = 'Fleetwood'
union
select K2.refernce, K2.section, K2.significance
from keywords K2
where K2.keyword = 'Mac'
order by 3

--
Again, extremely efficient as an sql query. Any half-decent database server will return the result in miliseconds, and the sql is easy to construct.
--

So, now comes the slightly harder part - populating the data.

You will need a "spider" to find the information and put it in the keyword index. This is a pretty intensive database process, requires some art to write, and can have problems when you want a truly up-to-date index.

More tomorrow.