Friday, 7 August 2009

How to do search programmatically

Here's a common problem - you have deep data - you have invoices in multiple different database tables ; you have customer reference numbers somewhere else ; you have customer details in a further table ; you have order numbers; you have products you sell; etc.

How do you provide a single simple search interface that finds exactly what the user wants from a single 'search box'?

Well, it's actually pretty simple. But it does get complicated later on, but don't worry, all problems are surmountable! Let's describe what we need in terms of sql...

create table keywords (
keyword varchar(20),
refernce varchar(20),
section varchar(1),
signifcance integer
);

create unique index i_keywords_1 on keywords (keyword, refernce, section);

---
From this basic keywords table (and yes, add your own unique references as necessary), you can construct highly efficient queries. For example, searching for 'Fleetwood'...

select K1.refernce, K1.section, K1.significance
from keywords K1
where K1.keyword = 'Fleetwood'
order by K1.significance desc
;

this is efficient as an sql query - searches down keywords first. If you entered two words, eg. 'Fleetwood' 'Mac'...

-- AND query...
select K1.refernce, K1.section, (K1.significance + K2.significance) sig
from keywords K1, keywords K2
where K1.refernce = K2.refernce
and K1.section = K2.section
and K1.keyword = 'Fleetwood'
and K2.keywood = 'Mac'
order by sig

-- OR query...
select K1.refernce, K1.section, K1.significance
from keywords K1
where K1.keyword = 'Fleetwood'
union
select K2.refernce, K2.section, K2.significance
from keywords K2
where K2.keyword = 'Mac'
order by 3

--
Again, extremely efficient as an sql query. Any half-decent database server will return the result in miliseconds, and the sql is easy to construct.
--

So, now comes the slightly harder part - populating the data.

You will need a "spider" to find the information and put it in the keyword index. This is a pretty intensive database process, requires some art to write, and can have problems when you want a truly up-to-date index.

More tomorrow.

Friday, 20 March 2009

Wicket

Wicket is a java framework for building web applications which we're using at work, and I'm continually impressed with how easy things are to do with it. I've worked on other java projects that were just horrible though, so maybe my experience with Wicket is one that everyone else has with other frameworks - We've been down the struts approach which means holding data in eight different classes and properties files. We've been down the XML route which made the problem even worse, and we've tried out JSF and found it's really not a mature technology and you end up relying on code in obscure tag libraries.

Wicket has a HTML page with special id tags for every element that you want to be controllable from the java side - for example - td wicket:id="description" , but no JSP references or any of the crap that framework designers like to stuff in the html page making it unusable for anybody else. The HTML renders as html on its own - it's actually possible, in a real situation, to get someone to design the HTML and just plonk it into your project. I've not known this to be the case for any other framework.

For every HTML page that you want to be able to change via code, you have a Java class with the same name that sits directly alongside it. This differs from other frameworks which have the controller somewhere, the JSP page somewhere else, the tag libraries elsewhere, the form object in another area, the form validation somewhere else and the XML controlling page flow in another place. Having the page and the code next to each other makes good sense, and it's hard to see why other frameworks have deviated from this.

Also, no XML, properties or other external files. It's a single change to web.xml to get the whole caboodle to work. So effectively, when creating a new page/application, I'm dealing with two files - the html, and the java code behind that page. That's it. Well, there's still stuff to be done in the database classes to get the page to be able to fetch the data properly, but that's common to every project and I wouldn't have it any other way - the view (in this case, the wicket page) should always be separate from the data layer.

There's no real downside that I can see at this stage. The components are extensible and reasonably easy to use. The code in the java class is readable and easy to understand. The one, possible, slight niggle that I have with it is that it encourages the use of inner classes a lot. But that's not necessarily a bad thing, in fact, I'd rather have the code that executes when I click a button right by where I'm adding it to the form - the code would look something like...

add(new FeedbackPanel("feedback"); // any "error" or "info" stuff automatically appears here
Form form = new Form("monday");
Button b = new Button("submitButton"){
@Override
public void onSubmit(){
// Validation checking here
if(itsMondayMorning)
error("Go away I'm still hungover");
else
db.updateSomething();
}
};
form.add(b);
add(form);

---
Using inner classes happens all the time when doing Wicket - it does make things a bit more readable and the code flow nicely.
I'm told by someone who's worked with Swing components that the logic is similar to that, despite being in a HTML stateless environment. I don't really know having never worked with it, but it seems a doddle to get/set objects on forms.
And even when things get complicated, with fields being made invisible, buttons appearing/disappearing depending on user priveleges, validation being complex because there's a ton of fields or multiple combinations not being valid, it's still easy. It doesn't suffer from the old Visual Basic problem where to get something that looked good took 5 minutes but to get something that actually worked took a lifetime and a lot of Windows API calls.

Upsides: It's easy. It's easy to do difficult stuff and still have readable code. It's easy to bolt your database layer to. Ajax is remarkably easy to do with it. It's easily extensible (too much so?). You can write a full scale working application with just the examples from the wicket book and your code won't be any more complicated than the simple examples. The forum people are wonderful at answering questions.

Possible downsides: I think it's quite heavy on the Session object, but we've had no problems with memory usage or speed so far with a ton of users hammering it at the same time (the bottleneck is the database). Um, oh, I don't like the way it handles custom images, but there's several better ways of doing that than the default. Errrrm, and I don't like the way the URLs look by default (but again, this can be changed).

---
So, if you're struggling with your current Java framework, go and try out wicket. Even if it's not for you, it's probably worth a look.

Wednesday, 11 March 2009

In my job, I have to use several programming languages. Normally several a day. I've looked at asp, java, informix 4gl so far today, and it's not even lunchtime!

I'm not a programming snob, I do like some of the things "bad" languages do and detest some of the things "good" languages do.

For example - Visual Basic is generally regarded as a Bad Thing, but the fact that 'true = -1' is a brilliant coup. I'm sure C/Java programmers will read that with horror, but the fact is, if true = -1 then logical and bitwise are the same. Effectively, there's no need for having | and || to represent bit-wise and logical or statements because they are one and the same. I'm not saying that Visual Basic is good in general (it's not), but this is simply a better way of doing it.

Java's generally regarded as a Good Thing, but its error handling is one of its worst features. In theory, it provides wonderful error handling. In practice, everyone who writes library modules throws runtime exceptions instead of requiring those exceptions to be caught. This is wrong and leads to buggy code and vast error stacks that spit out Pythonesque gibberish that bear little relevance to the actual problem. I think designers don't throw normal exceptions because it breaks the "black box" model. But the black box model is decidedly broken in Java anyway, with objects merging, passing black box objects into other black boxes, and other problems.

Thursday, 5 March 2009

I love the BBC, well, more specifically, I love BBC radio 4. I've just listened to the latest In our time episode which covered the "measurement problem in physics" (essentially, the problem of resolving the quantum weirdness we observe at the macro level to the newtonian mechanics which are understandable). The programme was deep, complex, thought-provoking, and suprisingly comprehensible. I do have an A-level physics qualification, but we barely covered quantum malarkey. Melvyn Bragg presented it and I was suprised how well he refereed the eminent guests (eg. Roger Penrose) despite his (probable) lack of knowledge on the subject.

That Auntie can even attempt to broadcast this in primetime to millions of listeners (Radio 4 is one of the largest radio stations in the UK) is a superb example of why you need independent broadcasters that don't depend on advertising revenues. I'm trying to think of an example of where you'd get such rich deep content without it, but I can't.

Radio 4 is pretty much unique. It's a news/spoken word radio station, it's top or close enough to being the number one listened to radio station in the UK. It doesn't do sport (unlike most other talk radio) except of course, for the incomparable test match special. It presents radio plays (eg. Douglas Adams' The Hitchiker's Guide), comedy that is either brilliant on radio permanently (I'm sorry I haven't a clue, The Now Show, etc) or can be transposed to TV too (eg. Goodness gracious me, The Frost Report which spawned Monty Python, etc), or documentaries and debate shows like the one I'm praising today. Worth the licence fee on its own!
Seduced, shaggy Samson snored.
She scissored short. Sorely shorn,
Soon shackled slave, Samson sighed.
Silently scheming,
Sightlessly seeking
Some savage, spectacular suicide.

Saturday, 14 February 2009

Sim Secession

Secession

Just an idea for a game I'm musing about. Basically, it'd be a kind of sim-city/civilizations type game, but with an interface like football manager. Instead of "building an empire" your goal would be to secede from one. In a US scenario, you'd have "easy" and "hard" states to pick from (eg. easy = California; hard = Nevada). Foster dissatisfaction with your national government! Encourage local state rivalries! Leverage governors into voting for things that make secession possible! Finally, build up local militias, grab the local military bases, ensure you have the nuke launch codes and hold out for your own country!! In multi-player, everyone plays a state and is trying to achieve the same thing - the winner is the one to get there first. Note: I'm a Brit, so don't flame me with secessionist US politics ideas - I'm not really bothered, I just think it would make for an interesting game. And it doesn't have to be US either - soviet-era Russia would do just fine too; Roman-era Europe works just as well (though you'd have to do a decent amount of historical research).

Friday, 13 February 2009

Why we need a new programming language

We need a new programming language to better handle databases. Or perhaps, a new paradigm.

Structural Programming is crap for working with databases.
Object Orientated Programming is equally crap for working with databases.

With your perl type-languages (mostly procedural), database handling tends to be: here's the SQL, fire it at the database, do something with the result. This is messy because it leaves SQL trailing all over the place, possibly referencing the same database tables in lots of different files.

With your object orientated languages (say, Java), database handling is hidden to the point that it's difficult to optimise, the SQLs (or whatever the hidden equivalent is - say Hibernate criteria) are still scattered around, and again, difficult to change.

What I want is a SET-orientated language. Databases are based on set theory. Give me a language that's based on that. Not one that simply manages sets - the whole language should be based on the fact that you use and abuse sets of data constantly with SQL. Sure, build in database independence - force me to write ANSI-SQLs in this language, that would ensure far better compatibility than Hibernate or its ilk.

Let's say I have a table called 'customers'. Now, let's say, someone wants a new piece of data. I want to quickly find all the references to the customers table, and check if anything needs changing.

Structural programming, I have to basically grep for all occurrences and then read the code involved.
OO is, in theory, a little better, assuming everything has been done correctly - just search for references to the customers object. Except that you don't just have to search for the customers object, you have to search for everything that extends it. And in most complex projects, you'll probably find someone has mangled a bit of SQL and just like structural, you'd end up grepping for something.

In a set based programming language, you'd have customers and everything else springs off of that set. So much simpler to find everything.

SQL and table structures are probably the most important part of your application. Consider: If someone took your current project away and started rewriting it in another language, what bits would they take? The clever multi-layered model/view/controller architecture that someone spent hours working on? The error handling? Your table-widget with a billion possible combinations? Or the data and ways of accessing that data? If the answer's not obvious, you need to get out of any programming that involves databases.