Plugins: auto-anchor: use ASCII-only slug#349
Conversation
|
tACK 1b66c6a Built, ran, tested duplicates provided, smoke tested some other pages in order to ensure nothing was aversely affected. Also diffed and spot checked the outputs |
|
Thanks @harding . Looks like this fixes the immediate issue, although I agree that it's a shame that we can't have localized anchors. Did you take a look at the transliterated_header_ids option in |
|
I think we might also want to add a note to https://github.com/bitcoinops/bitcoinops.github.io/blob/master/CONTRIBUTING.md#translations explaining how to modify the anchor tags in case of duplicate anchors for non-ascii languages. @bitschmidty - can you do that? |
No (though I have now). Three notes:
|
|
ok, thanks for checking, Dave! |
Closes #348
For some reason, our Markdown interpreter (Kramdown) won't create anchor tags containing non-ASCII text. I don't know if that its fault or there's something in the HTML5 spec. This affects both our Japanese and Spanish translations (the later only occasionally when one of the words in the anchor text has an accent character).
The fix implemented in this PR is to (1) remove all non-ASCII accents (converting those characters to their nearest ASCII representation, e.g. õ to o) and (2) further remove all non-ASCII characters (e.g. all Japanese characters) before generating the anchor id ("slug"). This works, but it has two implications for non-Latin character sets:
If all of the text in the bolded, italicized, or linkified part of a bullet is non-Latin, no anchor will be generated (because there's no text left after the non-ASCII is removed). This doesn't seem to happen often in the Japanese translations because there's usually technical terms like "LN" or "bech32" that are used unchanged in the translation.
Once all of the non-Latin text has been stripped, there's sometimes more than one bullet in an article that has the same anchor as another bullet. E.g. after stripping non-ASCII, these produce the same anchor:
Our tests catch cases where two anchors have the same id, so this is easy to catch before publication. I found that it was easy to fix by just inserting a single-character comment in the bolded text to distinguish the duplicates (see PR for a few cases where I had to do this).
Overall, I think this is an adequate solution, although it's still unfortunate that we can't have localized anchors.
Testing: I tested by building the site with and without this PR's commit and then diffing the HTML. It looks to me like this change is purely additive---in only adds anchors (and HTML comments) in places where there were no anchors before. No existing anchors are changed.