User:CHGiffen/Sandbox

From ChoralWiki
Jump to navigation Jump to search

Library structures for CPDL

One of the things which has struck me more and more over the past three years working with ChoralWiki is that, while the site functions very much like a wiki, structures which reflect or resemble a library and repository of scores are not very well implemented. What I hope to do here is to help us all take a look at the current ChoralWiki structure and propose avenues along which we might proceed to help make ChoralWiki function better as a public domain score library. For the most part, I am going to deal with issues that have to do with (inputs) how users providing scores and other information go about getting their work posted in an appropriate way, (outputs) how end users go about finding what they seek, and (functions) how committed advocates of CPDL go about trying to effect the transition from these inputs to those outputs. But first, let me survey a few of the problems we are faced with.

Some of the perceived problems

The following comments provide a glimpse at some of the problems we face, at least as I see it or have had communicated to me. They are presented in no particular order. Probably there are more that others might propose, but what I present here hopefully gives us some idea of the extent of the problems.

  • Despite Mjolnir's significant efforts at setting up a verified editions scheme and the admirable contributions of many CPDL editors to the verfied editions lists, the results to date are far from complete. At present we are unable to produce a complete list of CPDL edition numbers with direct links to the edition information.
  • Composer pages supposedly provide links to pages (or a hierarchy of subpages in some instances) that contain all the CPDL editions of works. Unfortunately, this is not always the case, for, as recently as 15 September 2008, the score page of a Victoria Mass was found which was not linked the Victoria composer page (this has now been remedied, although the page was "missing" since sometime before the 2007 crash). In another instance, there is a work listed on the Byrd page pointing to a nonexistent works page, and a check of the deletion logs found a page for the link which, now undeleted, reveals only a single creation edit - as a spam page. Fortunately, the Byrd page contains links to the sheet, sound, and source files for the work, but any CPDL edition number for the work is lost. In the past there have been instances of PDF, MIDI, and other files uploaded to ChoralWiki which have never been "processed" at all (e.g. through the AddWorks form or by the uploader or some CPDL editors taking charge). Put bluntly, save for the work of eagle-eyed sysops, the progression from scores and related material uploaded or submitted for posting on the ChoralWiki is rather tortuous, placing perhaps undue strain on the contributor, and fraught with the possiblity of items falling through the cracks.
  • In fact, sometimes such uploads appear to have been done for the convenience (or impish delight) of the uploader. For example, one uploader was posting files for his/her choir/chorus to download and use for rehearsal preparation, without regard as to copyright status and with no intention of actually posting the scores as editions. There have also been some instances of contemporary composers (as well as some contributors of rare or difficult to obtain works) "taking over" their own or pet works pages and trying to manage such pages as if they were their own, sometimes in conflict with the appearance and standards established at least elsewhere on ChoralWiki by precedence if not by set standards. As far as I know, we have no set policy on monitoring new uploads.
  • With regard to copyright issues, CPDL lists and hosts a large number of scores that seem to be "public domain with restrictions" such as: "for religious purposes only", "not for commercial use", "for non-profit use", or "copyright in the US but not in the EU" ... you get the idea. Yet users wishing to download scores do not have to acknowledge to CPDL they understand the copyright restrictions, nor do they have to make a formal (albeit online) commitment to abide by such restrictions. Whether CPDL is safe from action that might be taken for violations of copyright restrictions is not entirely clear. For instance, are we okay with providing external listings where the external site imposes copyright restrictions? Do CPDL copyright statements about scores electronically hosted on CPDL servers override any copyright restrictions (personal, religious, or otherwise) that appear in these scores? To what extent is a CPDL edition in the public domain if it carries restriction such as "for non-profit use"?

Deconstructing the library structure

Perhaps most people who visit CPDL are "consumers" - in the sense that they come to ChoralWiki to see what we have, gain some information, to look for scores or translations or other specific information, to submit requests for scores or other information which they cannot find, and to download material they are interested in or of which they have need.

Many others who come to CPDL are "contributors" - in the sense that they come to ChoralWiki to submit scores (and perhaps provide other related material), to provide texts and/or translations, to provide other information such as publications or information on choral groups, unhosted composers, etc.

Still others (a much smaller number, unfortunately) carry out much of the work that makes it possible for the above people to achieve what they wish, at least insofar as is possible with the means available. Such people are, in a very real sense, "librarians" - although here we tend to call them "editors" - and such people often end up becoming "sysops" (administrators) after proving themselves worthy to these tasks.

This is not unlike the situation with other kinds of libraries, which have "consumers" that may be cardholders who check out materials or others that may copy materials (sometimes with special permission or credentials) or simply browse the collection, "contributors" that may be publishers and other donors of materials for the library's collection, and "librarians" that do the many tasks required, such as cataloguing, procurement, display (shelving or providing special exhibits), handling requests, providing information, etc.

There is also a sense in which the ChoralWiki function as something like a publishing house, where "consumers" have a more familiar character, where "contributors" are those whose works appear (i.e. are published) on ChoralWiki, and where "editors" are somewhat akin to the editors of a publishing house (with some differences).

All of the above seeks to identify and describe how different individuals interact with ChoralWiki. But there is more to how these interactions take place.

How these structures interact and function at CPDL

The use of the (somewhat imperfect) wiki search mechanism and categories permits a consumer to browse or search our holdings by many different criteria. While composer, works, and text-translations, and other pages are not quite the same as card catalog listings (online or not), these are the pages to which consumers gravitate in their efforts to use ChoralWiki, hopefully finding what they are seeking. The analogy with a card catalog (even an online one) is not perfect, however, even with respect to score editions and related material. How the material appears on pages generally has a more-or-less uniform appearance, depending upon the kind of pages one looks at, but achieving that level of uniformity, while partially and imperfectly automated, is not an easy task - a task that becomes more and more of a burden on sysops as the CPDL listings increase.

The current methods of submission of scores and related material vary widely from do-it-yourself posting by those with some WikiSense, to an awkward online add works submission form, to inability to cope with these methods resulting in scores being e-mailed to a sympathetic and cooperative editor to do the work - again increasing the tasks already heaped on CPDL sysops, with work sometimes being greatly delayed or lost. And, as mentioned in the previous section, items arrive via uploads without any supporting information as to their source and intended disposition - which presents monitoring, tracking, and appropriate disposition problems.

Time and time again, sysops and other hardworking editors have had to split their time amongst many tasks. Some of these have involved trying to devise band-aid kludges to devise something that works imperfectly but would work much better if there were upgraded MediaWiki software and essential extensions. Others have involved huge "manual" processing chores to get something done (such as verified editions, categorization of works pages, devising and applying appropriate templates, rooting out spam and vandalism to pages, providing help, answering questions, developing communications avenues when the external bulletin board was unusable, correcting errors on pages, moving pages when the title is incorrect, deleting superfluous redirects and dead pages, etc.).

The heart of a good library, aside from its collections (books and other media), is its cataloguing and access methods. It is or at least should be the same with ChoralWiki. We have a huge database of printable sheet music files, downloadable sound and source files, and links to offsite resources that are posted here. We have a steadily growing base of texts and translations of choral works, and more. Moreover, editions posted or listed at ChoralWiki have CPDL edition numbers - the analog of which for other libraries might be called collection log numbers for their holdings. But whereas other libraries maintain an actual log/index of such holdings, at CPDL we do not have the corresponding log of holdings indexed by CPDL number. What's worse, is that a single CPDL number may refer to as many as three or more files (sheet, sound, source) in varying formats (landscape, portrait, A4, letter, MIDI, MP3, etc.), and the incomplete tables of verified edition numbers do not reflect these but each entry simply points to a page containing a posting of the edition in question.

Our method of processing editions submitted to CPDL is, more or less, to assign a CPDL number and slap the information up on a works page, creating the works page if necessary, and perhaps also creating a composer page if necessary - and, typically, forgetting about the assigned CPDL number.

This is pretty much the reverse of the way a library manages its holdings. First the new item is entered into a permanent log/index, then catalog information generated, which is then entered in to the "card catalog" (online or not) when the item is "shelved" (made available to the public). Linkage is always maintained both ways between the permanent log and the card catalog (and also between the permanent log and the actual item). Even if an item is lost, one can always get to the particulars of the original procurement of the item (maintained in the permanent log) through the card catalog - and with computerized card catalogs and permanent logs, flags will be raised when something is amiss.

We would do well to emulate these library procedures in some appropriate and effective way. Doing this will surely involve establishing new protocols and different structures for the submission of scores and related materials.

The essentials of processing inputs

What the traditional library model tells us about CPDL is that (1) submitting a score and logging it is one thing, (2) getting the score and relevant related information posted is a different thing (perhaps requiring the creation of a works page to contain the posting, and maybe the creation a composer page for the first CPDL listed work by a composer), (3) the design, format, and content of a works page (or a composer page) is yet another matter (not one which should put extra onus on the contibutor of a score).

Data structures

Moved to User:CHGiffen/Data structures