Monday, 14 July 2014

BIBFRAME

Adrian Pohl ‏wrote some excellent thoughts about the current state of BIBFRAME at http://www.uebertext.org/2014/07/name-authority-files-linked-data.html The following started as a direct response but, after limiting myself to where I felt I knew what I was talking about and felt I was being constructive, turned out to be much much narrower in scope.

My primary concern in relation to BIBFRAME is interlinking and in particular authority control. My concern is that a number of the players (BIBFRAME, ISNI, GND, ORCID, Wikipedia, etc) define key concepts differently and that without careful consideration and planning we will end up muddying our data with bad mappings. The key concepts in question are those for persons, names, identities, sex and gender (there may be others that I’m not aware of).

Let me give you an example.

In the 19th Century there was a mass creation of male pseudonyms to allow women to publish novels. A very few of these rose to such prominence that the authors outed themselves as women (think Currer Bell), but the overwhelming majority didn’t. In the late 20th and early 21st Centuries, entries for the books published were created in computerised catalogue systems and some entries found their way into the GND. My understanding is that the GND assigned gender to entries based entirely on the name of the pseudonym (I’ll admit I don’t have a good source for that statement, it may be largely parable). When a new public-edited encyclopedia based on reliable sources called Wikipedia arose, the GND was very successfully cross-linked with Wikipedia, with hundreds of thousands of articles were linked to the catalogues of their works. Information that was in the GND was sucked into a portion of Wikipedia called Wikidata. A problem now arose: there were no reliable sources for the sex information in GND that had been sucked Wikidata by GND, the main part of Wikipedia (which requires strict sources) blocked itself from showing Wikidata sex information. A secondary problem was that the GND sex data was in ISO 5218 format (male/female/unknown/not applicable) whereas Wikipedia talks not about sex but gender and is more than happy for that to include fa'afafine and similar concepts. Fortunately, Wikidata keeps track of where assertions come from, so the sex info can, in theory, be removed; but while people in Wikipedia care passionately about this, no one on the Wikidata side of the fence seems to understand what the problem is. Stalemate.

There were two separate issues here: a mismatch between the Person in Wikipedia and the Pseudonym (I think) in GND; and a mismatch between a cataloguer-assigned ISO 5218 value and a free-form self-identified value. 

The deeper the interactions between our respective authority control systems become, the more these issues are going to come up, but we need them to come up at the planning and strategy stages of our work, rather than halfway through (or worse, once we think we’ve finished).

My proposed solution to this is examples: pick a small number of ‘hard cases’ and map them between as many pairs of these systems as possible.

The hard cases should include at least: Charlotte Brontë (or similar); a contemporary author who has transitioned between genders and published broadly similar work under both identities; a contemporary author who publishes in different genre using different identities; ...

The cases should be accompanied by instructions for dealing with existing mistakes found (and errors will be found, see https://en.wikipedia.org/wiki/Wikipedia:VIAF/errors for some of the errors recently found during he Wikipedia/VIAF matching).

If such an effort gets off the ground, I'll put my hand up to do the Wikipedia component (as distinct from the Wikidata component).


3 comments:

Jörg Prante said...

Hi Stuart,

GND genders are not well regulated and should not be trusted. They are marked as optional and nothing is known about the source of gender or sex information that is assigned to individuals or not. Before GND, libraries in Germany maintained different versions of PND (Personennamendatei), where for example hbz rules recommended use of ISO 5128 which is not a gender information, while other maintainers recorded gender information as "male (m)" and "female (f)". Later in GND, PND files were consolidated, and I doubt the conflict resolution regarding the sex/gender issue was clean. In fact, german language does also not help much in differentiating sex and gender, there is only one word "Geschlecht".

jrochkind said...

An example of a contemporary author who has published broadly similar work under two gender identities, if you want one, is Pat[rick] Califia.

I'm not entirely sure why it's important for authority records to include gender at all. Did traditional NAF/AACR2 authority files include gender? If they did, in some corner marc field, it has definitely not been something most systems surface in the UI.

So one way to save that headache would just be by punting on gender and not dealing with it in BIBFRAME etc. There are certainly plenty of other difficult metadata control challenges that _are_ vital for actual use cases; I'm not sure gender is one of them, why not save time/energy for the ones that matter.

jrochkind said...

(And for what it's worth, that's my opinion of 'sex' as well as 'gender' -- in fact, both those classes of values are epistemologically/ontologically problematic in many 'edge' cases, even before you get to metadata control -- why beg trouble by trying to deal with them, if you simply don't have to for actual real world use cases?)