Madra Teanga - Open Source Irish Language Programming
An open source project that has already produced a great app for learning Irish—programmed in a language called Draíocht (sin “magic” as Béarla)!
I’m supporting this on Open Collective.
An open source project that has already produced a great app for learning Irish—programmed in a language called Draíocht (sin “magic” as Béarla)!
I’m supporting this on Open Collective.
A different world is possible. Here, for example, is an open-source large language model from Europe, designed to support the 24 official languages of the European Union.
I have no idea why their top level domain is for the British Indian Ocean Territory, soon to be no more. That doesn’t instil confidence.
The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.
Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.
Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:
Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.
And no, a robots.txt file doesn’t help.
If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.
Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:
LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.
You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:
There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.
My own experience with The Session bears this out.
Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .
So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.
When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.
The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.
If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.
If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.
As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.
More on how large language bots are DDOSing the web:
LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.
This project, based on OpenStreetMap, looks great:
OpenFreeMap lets you display custom maps on your website and apps for free.
You can either self-host or use our public instance.
I’m going to try it out on The Session once there’s documentation for using this with Leaflet.
I like this framing:
If you’ve ever corrected a typo in an Open Source readme, or added alt-text to an image, or tidied up some broken references in Wikipedia - you’re doing Digital Litter Picking. You’re cleaning up after others. And I think that’s a marvellous way to spend a little time.
Two new lovely open source variable fonts from Github.
This story of the Network Time Protocol hammers home the importance of infrastructure and its maintenance:
Technology companies worth billions rely on open-source code, including N.T.P., and the maintenance of that code is often handled by a small group of individuals toiling away without pay.
I love how easy it is to use these icons: you can copy and paste the SVG or even get it encoded as a data URL.
Our mental model for how we build for the web is too reliant on canned solutions to unique problems.
This is very perceptive indeed.
Compounding this problem is that too few boot camps are preparing new web developers to think critically about what problems are best solved by JavaScript and which aren’t — and that those problems that are best solved by JavaScript can be solved without engaging in frivolous framework whataboutism. The question developers should ask more often when grappling with framework shortcomings shouldn’t be “what about that other framework?”, but rather “what’s best for the user experience?”.
This font is a crossover of different font types: it is semi-condensed, semi-rounded, semi-geometric, semi-din, semi-grotesque. It employs minimal stoke thickness variations and a semi-closed aperture.
Inspired by Terence Eden’s example, I applied for membership of the AMP advisory committee last year. To my surprise, my application was successful.
I’ve spent the time since then participating in good faith, but I can’t do that any longer. Here’s what I wrote in my resignation email:
Hi all,
As mentioned at the end of the last call, I’m stepping down from the AMP advisory committee.
I can’t in good faith continue to advise on the AMP project for the OpenJS Foundation when it has become clear to me that AMP remains a Google product, with only a subset of pieces that could even be considered open source.
If I were to remain on the advisory committee, my feelings of resentment about this situation would inevitably affect my behaviour. So it’s best for everyone if I step away now instead of descending into outright sabotage. It’s not you, it’s me.
I’d like to thank the OpenJS Foundation for allowing me to participate. It’s been an honour to watch Tobie and Jory in action.
I wish everyone well and I hope that the advisory committee can successfully guide the AMP project towards a happy place where it can live out its final days in peace.
I don’t have a replacement candidate to nominate but I’ll ask around amongst other independent sceptical folks to see if there’s any interest.
All the best,
Jeremy
I wrote about the fundamental problem with Google AMP when I joined the advisory committee:
This is an interesting time for AMP …whatever AMP is.
See, that’s been a problem with Google AMP from the start. There are multiple defintions of what AMP is.
There’s the collection of web components. If that were all AMP is, it would be a very straightforward project, similar to other collections of web components (like Polymer). But then there’s the concept of validation. The validation comes from a set of rules, defined by Google. And there’s the AMP cache, or more accurately, Google hosting.
Only one piece of that trinity—the collection of web components—is eligible for the label of being open source, and even that’s a stretch considering that most of the contributions come from full-time Google employees. The other two parts are firmly under Google’s control.
I was hoping it was a marketing problem. We spent a lot of time on the advisory committee trying to figure out ways of making it clearer what AMP actually is. But it was a losing battle. The phrase “the AMP project” is used to cover up the deeply interwingled nature of its constituent parts. Bits of it are open source, but most of it is proprietary. The OpenJS Foundation doesn’t seem like a good home for a mostly-proprietary project.
Whenever a representative from Google showed up at an advisory committee meeting, it was clear that they viewed AMP as a Google product. I never got the impression that they planned to hand over control of the project to the OpenJS Foundation. Instead, they wanted to hear what people thought of their project. I’m not comfortable doing that kind of unpaid labour for a large profitable organisation.
Even worse, Google representatives reminded us that AMP was being used as a foundational technology for other Google products: stories, email, ads, and even some weird payment thing in native Android apps. That’s extremely worrying.
While I was serving on the AMP advisory committee, a coalition of attorneys general filed a suit against Google for anti-competitive conduct:
Google designed AMP so that users loading AMP pages would make direct communication with Google servers, rather than publishers’ servers. This enabled Google’s access to publishers’ inside and non-public user data.
We were immediately told that we could not discuss an ongoing court case in the AMP advisory committee. That’s fair enough. But will it go both ways? Or will lawyers acting on Google’s behalf be allowed to point to the AMP advisory committee and say, “But AMP is an open source project! Look, it even resides under the banner of the OpenJS Foundation.”
If there’s even a chance of the AMP advisory committee being used as a Potempkin village, I want no part of it.
But even as I’m noping out of any involvement with Google AMP, my parting words have to be about how impressed I am with the OpenJS Foundation. Jory and Tobie have been nothing less than magnificent in their diplomacy, cat-herding, schedule-wrangling, timekeeping, and other organisational superpowers that I’m crap at.
I sincerely hope that Google isn’t taking advantage of the OpenJS Foundation’s kind-hearted trust.
This is a great talk by Nadia Eghbal on software, open source, maintenance, and of course, long-term thinking.
A people’s history of copying, from art to software.
Designers copy. We steal like great artists. But when we see a copy of our work, we’re livid.
A really nice open-source font-previewing tool for the Mac.
The transcript of David Heinemeier Hansson keynote from last year’s RailsConf is well worth reading. It’s ostensibily about open source software but it delves into much larger questions.
If we continue as we are, who will maintain the maintainers?
In the world of open source, we tend to give plaudits and respect to makers …but maintainers really need our support and understanding.
Users and new contributors often don’t see, much less think about, the nontechnical issues—like mental health, or work-life balance, or project governance—that maintainers face. And without adequate support, our digital infrastructure, as well as the people who make it run, suffer.
A free and open source neutral sans-serif typeface, released as part of version two of the design system for the US federal government.