Wikifunctions:Status updates/2024-10-11
◀ | Wikifunctions Status updates | ▶ |
Wikidata Lexemes in Wikifunctions are coming soon!
Wikidata famously contains a large knowledge graph about more than a hundred million items, but it also has a younger, less known side: lexicographical data. Currently, Wikidata describes more than 1.3 Million lexical items across 1291 languages. The lexicographic data in Wikidata is an essential ingredient for the Abstract Wikipedia vision.
As an early step on this road, support for Lexemes is coming to Wikifunctions very soon! And we want to give a small preview of that.
We are introducing a number of new Types, and each of those will come in two flavors. Let us look at Wikidata Lexemes for an example. We have introduced two Types to handle them: the Wikidata Lexeme itself, and the Wikidata Lexeme Reference. The reference is a wrapper around the Lexeme ID. A Function will be provided that takes a Wikidata Lexeme Reference and returns the respective Wikidata Lexeme.
An instance of the Wikidata Lexeme Type represents a Lexeme from Wikidata. This means that we will not be able to create Lexemes in Wikifunctions on the fly, or modify them: if you want to change or create a Lexeme, you will continue to do so in Wikidata.
We have extended the Wikifunctions user interface to work with the new Types. For Lexemes, there is a built-in search interface that will allow you to search for and select a Lexeme in order to use it as an argument in a function call.
There will be numerous limitations initially; particularly, Statements will be very incomplete. Any statement that has a Type that is not supported (which for now is almost all of them) will be silently dropped. We will, over time, increase the covered Value Types from Wikidata, with the eventual goal to represent Wikidata fully.
One very important restriction is that you won’t be initially able to select Wikidata entities through incoming Statements. This is a very notable restriction: it does not allow us to take, e.g. the item for dog as an argument and then, using a function, follow the item for this sense statement on the first sense of the Lexeme dog, in order to pick the relevant Lexeme. As this is a very important use case for the Abstract Wikipedia story, we will be working on resolving this swiftly.
Lexeme access will be a major new capability with many moving parts, and there is a good chance that we will need more documentation, that some workflows will initially be unclear, and also that some things might be broken at the beginning. We ask for your patience with us so we can improve it, but we also will ask for your feedback so we know what to improve.
We are excited to get this launched!
Recent Changes in the software
Most of our work over the past two weeks has been on the new Quarterly work, including the Wikidata access discussed above, and on "Fix It" work to pay down our technical debt. We also landed a few fixes this week:
We've adjusted the code that picks your interface language when clicking links on Wikifunctions to also respect your account language preference, if set (T374309). Thanks to User:Ameisenigel for finding and reporting the issue!
As part of wider work to remove raw HTML interface messages across MediaWiki, we replaced the site copyright message written by Legal that appears in the footer with ones that are in wikitext (T375882).
We've re-written the build process for our back-end evaluator service to be simpler and faster through Docker layer caching, and by loosening the load stress-test job (T376053). We've dropped an unused method for loading HTML content from a wiki that we inherited from the "service-template-node" template that was causing confusion (T366733). We've improved the way we include our utilities in the back-end for less code duplication (T347086). We have added some better metrics and logging for our monitoring of the back-end services (T376225, T375457).
We, along with all Wikimedia-deployed code, are now using the latest version of the Codex UX library, v1.13.1, as of this week. We believe that there should be no user-visible changes on Wikifunctions, so please comment on the Project chat or file a Phabricator task if you spot an issue.
Recording of Volunteers’ Corner
The recording of this month’s Volunteers' Corner is now available on Commons.
Function of the Week: English plural possessive
Given that we are getting close on supporting Lexemes, this week’s Function of the Week will be about the plural possessive in English. It is also the Function we have built together in this week’s Volunteer’s Corner. So if you want to see that Function being created, there’s a video on Commons!
In English, nouns usually have a singular and a plural form. The singular is used when we talk about one instance of the noun, and the plural when we talk about multiples. The possessive is used when we want to express that there’s something that belongs to it. So we may say there is one dog, there are two dogs, and this is the dog’s house. The singular is dog, the plural is dogs, and the singular possessive is dog’s. Combining these, we get the plural possessive: the dogs’ barking would refer to barking done by several dogs.
Of the 30,599 English nouns in Wikidata, the vast majority (28,038) have two forms, but five Lexemes also feature possessive or genitive forms. They are rarely specifically listed (e.g. on sport), because they are almost always regular, given the singular and plural forms.
Regular forms are great for functions! The English plural possessive function takes the lemma, i.e. the singular form, and returns the plural possessive. There is one implementation, which is a composition: it first generates the plural out of the lemma, and then creates the possessive out of the plural.
I think this Function is a good example of a Function that probably doesn’t need any further implementations: the only other way to implement it would be to redo the two functions that are used in the composition, and there seems no benefit in that.
The Function has five tests, of which three are connected – the other two are left for discussion, whether they should be connected or not:
- volunteer to volunteers’ (in order to honor the Volunteers’ Corner)
- kiss to kisses’ (a slightly more complex pluralization)
- dog to dogs’
- fish to fish’s (not connected, and it fails with the current implementation)
- Matrix to Matrices’ (not connected either)
One main point is to decide whether this Function should always return the correct plural possessive, in which case the unconnected tests should be connected, or just regular plural possessives – in which case these tests shouldn’t be there.
As always, this was a fun exercise, and I want to thank the Volunteers who showed up and helped us in building the Function.