11th November 2002
Needs to produce a target that searches other targets. (Yet another SuNFiSH-alike!) This came about because there was a bit of money sloshing around, and someone needed to come up with an idea for something to spend it on. It's called CC-Interop, (COPAC/Clumps Interop). Related to the M25 project which has many Z-targets, and a web front-end that searches across them. They'd like to put a Z interface on the top already. No-one knows what the existing web system is written in(!) so the fanout Z-target would be a from-scratch project.
Interested in ZOOM, wants to sort out C++ binding.
No interest in or respect for SRW :-)
[Everything that was in the prototype agenda :-)]
ZOOM marketing, ZeeRex, CQL and zSQLgate.
Wants to get specifications more in the form of interfaces - abstracting specified functionality away from being Z39.50-specific, so that we can build, for example, SRW back-ends.
Some of the C binding extensions to be merged back into the AAPI. Alternatively, some sanctioned way to extend the official ZOOM interface. The biggest hole is in the area of asynchronous operations: the C binding's solution is not great, in that it doesn't let you wait on both clients and servers as the YAZ++ proxy does.
I propose that the acid test of an IR interface is the ability to build an asynchronous proxy using it. Adam points out that there are lots of problems in the details: different event loops, thread models, etc.
Adam doesn't feel that most of the SOAP tools are powerful and general enough to address multiple servers: they assume you will use a separate thread for each server. Maybe we should just accept that and use threads in these enlightened days? Adam is reluctant, and points out that threads don't work well in the Microsoft world. In the COM world, the COM system starts the threads for you: if you start your own thread, there's no way to close your COM component down properly!
Sebastian has a different perspective from Adam's, because much of his work involves deploying, while Adam mostly builds tools.
One of ID's most ambitious projects is DEF, a union catalogue of about 100 Danish Library Z-servers, all more or less conforming to the Bath profile. Now that Sebastian's been actually using this stuff, he's aware of how slow it can be: searches are done in parallel, but you don't see any results until the last server responds. Sebastian's thinking about building cleverer clients in Java or Flash to display dynamic results.
Sebastian fears the impact of SRW, sending us ``back to the early nineties'' in terms of wrestling with toolkits, choosing profiles, etc. - while still having to deal with the difficult problems which are mostly semantics. The cynical perspective on this is that the confusion will produce a lot more consultancy work :-)
So this is where we are: everyone's done web gateways that search a hundred or so servers. That's not enough any more. It has to be useful, not merely interesting. That means it needs to be reliable, predictable, etc. If we try to sell SRW without these attributes, it will not get very far.
Finally, Sebastian is interested in SRW as an arena in which to reach out into broader worlds of structured information retrieval, breaking out of the library domain.
Fingers in many pies. Getting involved again in JISC projects, where Ian is not hearing much interest in SRW. People are still complaining that searching ten Z-targets is slow.
The problem with dynamic results updates comes when the fastest server returns the least relevant record, and you don't want to display it at the top. When you sort on relevance, you run into problems with different servers' relevance scores needing to be interpreted differently, not to mention deduplication.
Rachel Bruce at JISC is in charge of the Common Services Framework (which Rob's people are going to implement). In connection with this, Ian would like the distinction between collections and the servers that hold them to be made explicit, so you can say that ``this collection is hosted on those three Z-servers and this SRW server.''
Ian's colleague Rob is still pursuing local government work. In this context, SRW has one enormous (if pathetic) advantage over Z39.50 - namely, that it runs on port 80 which is open in firewalls, and people in the commercial world will flatly refuse to open port 210. [For heaven's sake! - Ed.]
In this arena too, Ian is keen on separating the ideas of collection and server, so that (for example) you can push your intranet's copy of a collection out onto a public server when you're happy with it.
These seem to be the issues arising from what's people are doing:
We seem to agree that the right solution to this is to have an unconnected-connections constructor. Then you can set your options (authentication, etc.) and call conn->connect()
We must document the standard options in the ZOOM AAPI. Setting non-standard options should return an ``unknown options'' error indicator. So we need to separate the Get Option and Set Option methods.
We all agree that we need to write specifications for asynchrous operations in the AAPI. The choice is between two basic models:
Consensus seems to be that the former is more flexible - you can easily build callbacks out of events, but not vice versa.
Adam will draft some prose for the AAPI.
Adam wants guidelines in the ZOOM AAPI for specifying extensions in a way that ``doesn't make me cross''. We can't think of what such guidelines would look like - not for C, anyway: in more OO languages, we could say something like ``extensions should be implemented in subclasses'', but that makes no sense in C.
The upshot seems to be just that Adam should more clearly document which parts of the ZOOM-C API are standard ZOOM, and which are extensions. Also, some of the extensions - notably not-yet-connected connections - need to factored back into the AAPI.
Adam suggests we lose getRecord() in favour of a record constructor:
class record { public: record(resultSet &rs, size_t i); } resultSet rs; record r = record(rs, 0); record *rp = new record(rs, 0); delete rp;
We think we don't need clone() any more except as a performance measure; and good implementations will achieve this anyway, by reference-counting or similar measures.
[We discussed this between 4.3 and 4.4]
In the old question of whether SRW should return result-set records as XML fragments or strings, Rob suggests that we could use a well-defined Dublic Core schema, and so return DC records as XML fragments; while general records must be encoded as strings because their structure is not known in advance.
The qualifier-set name in CQL qualifiers must be significant in itself, and not require looking up in a ZeeRex record to find a qualifier-set URL. The way things are at the moment:
The same argument applies to record-schema names.
These are hard problems - we can't think of a The Correct Solution. The best we can do is set up an authoritative global registry of qualifier-set names; but that may not work for record schema names, since we expect to have many more of these.
Adam's suggestions:
"srw.prefix.dc=http:/purl.org/dublincore/qualset" dc.title=computer prefix dc="http:/purl.org/dublincore/qualset" dc.title=computer >dc="http:/purl.org/dublincore/qualset" dc.title=computer
The latter of these introduces new syntax: a search clause beginning with >, which is followed by a qualifier-set name, an equals sign, a qualifier-set identifier and a sub-query. Mmmm ... Nice!
The qualifier-set name and equals sign are optional: if they are omitted, the >-clause specifies the default qualifier set that pertains to unqualified terms in the governed sub-query.
We think we're there with the DTD. Rob now needs to update the commentary, and I have some changes to make to the web site.
We'll discuss the ``fluffy'' ones in the pub.