Wikidata:Property proposal/WorldCat Identities
WorldCat Identities
[edit]Originally proposed at Wikidata:Property proposal/Authority control
Motivation
[edit]viaf-20191104-links has added links to WorldCat Identities, which are pages with detailed bibliographic info about the entity. These are based on LCCN or VIAF ids. But I don't think they can be derived systematically because not all entities have such pages. Eg a relatively unknown author like Example 2 has one, but Leonardo da Vinci (Q762) doesn't have one. Vladimir Alexiev (talk) 14:38, 27 November 2019 (UTC)
Discussion
[edit]- Support David (talk) 07:14, 28 November 2019 (UTC)
OpposeI think they can be derived sistematically from Library of Congress authority ID (P244) or, only in absence of P244, from VIAF ID (P214); the fact that VIAF dumps don't always link to WorldCat Identities probably can't prove that WorldCat Identities doesn't contain an entry, e.g. for Leonardo da Vinci (Q762) it is https://www.worldcat.org/identities/lccn-n79-034525/. This criterium is already used in en:Template:Authority control and it:Template:Controllo di autorità to show automatic links to WorldCat Identities based on P244 and, only if P244 is absent, on P214. I think a new property isn't needed (also because it's better to concentrate efforts on refining our connection with VIAF). --Epìdosis 13:38, 29 November 2019 (UTC)- @Epìdosis: You are right that absence in viaf-links is not an indication that a WorldCat page does not exist. But just because a database is incomplete doesn't mean it cannot be useful!
- Can you prove your claim that WorldCat id can be derived systematically (first from LCCN then from VIAF)? For that you'd need to grep viaf-20191104-links.txt (7 Gb), find entries with viaf- but no lccn- and check that indeed that person has VIAF and doesn't have LCCN.
- I don't think you're right that en:Template:Authority control applies such logic "first LCCN then VIAF". The examples I've seen use a different URL eg https://www.worldcat.org/identities/containsVIAFID/9847974 that redirects to the final URL https://www.worldcat.org/identities/lccn-n79091479/. Because all of LCCN is supposed to be in VIAF, I rather like this uniform formatter https://www.worldcat.org/identities/containsVIAFID/$1 : but then again, there are WorldCat ids for people without VIAF (np-) as @ArthurPSmith: shows --Vladimir Alexiev (talk) 10:03, 18 December 2019 (UTC)
- @Vladimir Alexiev: You are right, especially the case of "np-" values needs a new property; it will be partially a duplicate of Library of Congress authority ID (P244) and VIAF ID (P214), but it is probably inevitable. Thank you for your clear explanations, --Epìdosis 10:42, 18 December 2019 (UTC)
- @Epìdosis: Last but not least, how would we know whether a WorldCat id exists for a given VIAF if we don't replicate viaf-links? Eg http://viaf.org/viaf/100021346 exists but http://www.worldcat.org/identities/viaf-100021346/ does not. Do we really want to make >32M queries to WorldCat (replicating OCLC's work in viaf-links) and where would we record the result if not in a new prop? --Vladimir Alexiev (talk) 11:58, 18 December 2019 (UTC)
- Support I also think it can be derived but feels it adds value having it as a Wikidata property - Salgo60 (talk) 21:37, 2 December 2019 (UTC)
- Comment: On one hand, I think that since the WorldCat listing is often derived from LCCN or VIAF IDs, an additional redundant identifier is unnecessary. However, I think WorldCat listings are extremely valuable: more so than probably any individual external identifier in that they not only list works by the subject by works about the subject, and easily direct readers to find those works in libraries. Since Wikidata items often don't link directly to WorldCat listings, and the WorldCat link is most often visible only in Wikipedia Authority control templates (which is contingent upon the subject having a Wikipedia article), a more direct route is desirable. Crucially, there is also substantial set of people who have entries in WorldCat but not in VIAF or LCCN, e.g. William Verbeck and Ernestine Hara Kettler: see previous discussions here and here. I'm more interested in finding a way to adroitly include these "orphan" WorldCat entries, rather than the somewhat clumsy work-around of described at URL (P973), if even on a temporary basis until a formal VIAF/LCCN ID is created. -Animalparty (talk) 06:38, 4 December 2019 (UTC)
- @Animalparty: The regex, as it is displayed now, doesn't allow to insert values like William Verbeck and Ernestine Hara Kettler, which could effectively justify the creation of a new ID. Could you try to edit the regex in order to include them? --Epìdosis 18:16, 4 December 2019 (UTC)
- Oppose use the other two properties instead. --- Jura 04:51, 5 December 2019 (UTC)
- Looks like you'd rather need a property for the "np" code unless they are just search strings. --- Jura 18:53, 5 December 2019 (UTC)
- @Jura1: That's only true if Epidosis proves that ALL other WorldCat ids can be derived systematically from LCCN or VIAF. And even then, how would you call this new prop? "WorldCat Identity in absence of VIAF or LCCN?" And once a person gets a VIAF or LCCN entry, what would you do, delete his value in this np- property? WorldCat ids come from a variety of sources (we know of at least 3: lccn, viaf, np) --Vladimir Alexiev (talk) 10:03, 18 December 2019 (UTC)
- @Jura1: Given the extra discusison above, and that Epidosis changed his vote, would you like to change your vote too? --Vladimir Alexiev (talk) 11:58, 18 December 2019 (UTC)
- No, but there still seems to be sufficient support for its creation. --- Jura 14:56, 20 January 2020 (UTC)
- Support Clearly useful and not covered by existing id's. I fixed the regex and added the Verbeck example (there may be 2 id's for him in Worldcat though - the plain np-verbeck,%20william looks like the same person to me.) ArthurPSmith (talk) 18:25, 5 December 2019 (UTC)
I've fixed the regex a bit, added a URL formatter http://experimental.worldcat.org/IDNetwork/display.html?query=$1 (Identities Network visualization) and the total count of records (30M), see https://www.oclc.org/research/themes/data-science/identities.html. VIAF has 32,254,690 entities (2018-07), see https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control#VIAF_Volumetrics . Cheers --Vladimir Alexiev (talk)
- @Animalparty: thanks for the links you provided to previous discussions! @Peaceray, Richard_Arthur_Norton_(1958-_), Tagishsimon, Jheald: who participated in such discussions, please vote here.
- @Animalparty: "There seems to be a small but significant number of authors/subjects who are indexed on WorldCat Identities, but apparently not indexed by any other contributing database (e.g. VIAF, Library of Congress)": We don't know how many are the nc- ids, but given the counts in "Number of IDs in source", it could be a very significant number. Assuming that viaf-links is nearly complete (though @Epìdosis: shows it is not totally complete) and that lccn- precludes viaf-, the number of nc- ids could be as high as 30-11.4-8.4 = 10M. That is a very significant addition to VIAF's 32.2M entities --Vladimir Alexiev (talk) 12:11, 18 December 2019 (UTC)
- Question Given that there are people at VIAF who closely monitor Wikidata, can we get any input from them as to who these people are who have WorldCat IDs but no VIAF, whether there is an ongoing supply of them, and whether VIAF has eyes on them? Similarly if there is anybody we know, or that they could put us in touch with, working directly on these at OCLC. Jheald (talk) 12:19, 18 December 2019 (UTC)
- Support
@Vladimir Alexiev: The property has not been created yet: the regular expression (RegEx) does not conform to all the examples. Could you do more research to include as many combinations as possible or have the examples match RegEx, please. Cordially. —Eihel (talk) 06:34, 20 January 2020 (UTC) @Animalparty: per User:Epìdosis —Eihel (talk) 18:28, 20 January 2020 (UTC){{Status supportc}}
- @Eihel, Animalparty, Epìdosis: I think the regex ^(viaf|lccn|np)-.+$ (as of 18 Dec) covers all examples? Which ones does it not cover? Given that there is general support (as Jura points out), I've set the status to Ready, conditional on agreeing this last point. --Vladimir Alexiev (talk) 13:50, 21 January 2020 (UTC)
- @Vladimir Alexiev: Oh please excuse me! I misread the examples. I am really sorry. —Eihel (talk) 17:43, 21 January 2020 (UTC)
- @Eihel, Animalparty, Epìdosis: I think the regex ^(viaf|lccn|np)-.+$ (as of 18 Dec) covers all examples? Which ones does it not cover? Given that there is general support (as Jura points out), I've set the status to Ready, conditional on agreeing this last point. --Vladimir Alexiev (talk) 13:50, 21 January 2020 (UTC)
- do we know how the np ones work? Are they just search strings? --- Jura 14:56, 20 January 2020 (UTC)
- They can't be just search strings given the two Verbeck examples (one is a subset of the other, but they list two completely distinct sets of works). ArthurPSmith (talk) 17:36, 20 January 2020 (UTC)
- It could be that the first is an actual identifier while the second is just an undifferentiated grouping of all others, similar to those found at LCCN/GND, but not included in Wikidata because of that. Seems odd that the organization would maintain two/three/four different identifier systems simultaneously .. --- Jura 13:32, 21 January 2020 (UTC)
- It seems to me the two are the same person (fl.1912, a horseman and educator). It's not a search as https://www.worldcat.org/identities/np-verbeck is broken --Vladimir Alexiev (talk) 13:55, 21 January 2020 (UTC)
- Well, Wikidata has a search string index as well and "np-verbeck" isn't in it either. --- Jura 14:06, 21 January 2020 (UTC)
- They can't be just search strings given the two Verbeck examples (one is a subset of the other, but they list two completely distinct sets of works). ArthurPSmith (talk) 17:36, 20 January 2020 (UTC)
- Support Gamaliel (talk) 13:19, 21 January 2020 (UTC)
- Support --Rosiestep (talk) 20:15, 21 January 2020 (UTC)
- @Vladimir Alexiev, ديفيد عادل وهبة خليل 2, Epìdosis, Salgo60, Animalparty, Jura1: and @ArthurPSmith, Jheald, Eihel, Gamaliel, Rosiestep: P7859 (P7859) Done. Good contributions, Ederporto (talk) 03:18, 22 January 2020 (UTC)