Tags: cc

788

sparkline

Monday, April 7th, 2025

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Friday, March 28th, 2025

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Wednesday, March 26th, 2025

Go To Hellman: AI bots are destroying Open Access

AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

Sunday, March 16th, 2025

“Wait, not like that”: Free and open access in the age of generative AI

Anyone at an AI company who stops to think for half a second should be able to recognize they have a vampiric relationship with the commons. While they rely on these repositories for their sustenance, their adversarial and disrespectful relationships with creators reduce the incentives for anyone to make their work publicly available going forward (freely licensed or otherwise). They drain resources from maintainers of those common repositories often without any compensation.

Even if AI companies don’t care about the benefit to the common good, it shouldn’t be hard for them to understand that by bleeding these projects dry, they are destroying their own food supply.

And yet many AI companies seem to give very little thought to this, seemingly looking only at the months in front of them rather than operating on years-long timescales. (Though perhaps anyone who has observed AI companies’ activities more generally will be unsurprised to see that they do not act as though they believe their businesses will be sustainable on the order of years.)

It would be very wise for these companies to immediately begin prioritizing the ongoing health of the commons, so that they do not wind up strangling their golden goose. It would also be very wise for the rest of us to not rely on AI companies to suddenly, miraculously come to their senses or develop a conscience en masse.

Instead, we must ensure that mechanisms are in place to force AI companies to engage with these repositories on their creators’ terms.

Wednesday, March 12th, 2025

A woman playing box with two fiddlers, a man and a woman at a pub table with pints on it.

Wednesday session in Amsterdam

Friday, March 7th, 2025

Plane GPS systems are under sustained attack - is the solution a new atomic clock? - BBC News

A fascinating look at the modern equivalent of the Longitude problem.

Sunday, March 2nd, 2025

The web was always about redistribution of power. Let’s bring that back.

Many of us got excited about technology because of the web, and are discovering, latterly, that it was always the web itself — rather than technology as a whole — that we were excited about. The web is a movement: more than a set of protocols, languages, and software, it was always about bringing about a social and cultural shift that removed traditional gatekeepers to publishing and being heard.

Friday, February 21st, 2025

Generative AI use and human agency

You do not have to use generative AI.

AI itself cannot be held to account.

If you use AI, you are the one who is accountable for whatever you produce with it.

There are contexts in which it is immoral to use generative AI.

Correcting or fact checking generative AI may take longer than just doing a task yourself, or with conventional AI tools.

You do not have to use generative AI.

Sunday, December 15th, 2024

Century-Scale Storage

This magnificent piece by Maxwell Neely-Cohen—with some tasteful art-direction—is right up my alley!

This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all. This is not a piece about the sheer volume of data we are creating each day, and how we might store all of it. Nor is it a piece about the extremely tough curatorial process of deciding what is and isn’t worth preserving and storing. It is about longevity, about the potential methods of preserving what we make for future generations, about how we make bits endure. If you had to store something for 100 years, how would you do it? That’s it.

Thursday, November 21st, 2024

CCC | Ban tracking and personalised advertising

YES! THIS!!!

A ban on tracking-based personalised advertising will provide an incentive to reinforce sustainable alternative models and, in fact, will be a condition for making them viable. The advertising industry already has sustainable, proven concepts for effective online advertising that do not require targeted tracking and personalisation (e.g. contextual advertising).

Wednesday, November 6th, 2024

A woman playing fiddle and a man playing accordion at a pub table with the headstock of a bouzouki in the foreground.

Wednesday session

Wednesday, October 30th, 2024

W3C@30: W3C and me - YouTube

This is a lovely, lovely talk from Léonie!

W3C@30: W3C and me

Friday, September 27th, 2024

Hire HTML and CSS people

Every problem at every company I’ve ever worked at eventually boils down to “please dear god can we just hire people who know how to write HTML and CSS.”

Thursday, August 29th, 2024

80 / 20 accessibility · marcus.io

So my observation is that 80% of the subject of accessibility consists of fairly simple basics that can probably be learnt in 20% of the time available. The remaining 20% are the difficult situations, edge cases, assistive technology support gaps and corners of specialised knowledge, but these are extrapolated to 100% of the subject, giving it a bad, anxiety-inducing and difficult reputation overall.

Thursday, June 27th, 2024

How do we build the future with AI? – Chelsea Troy

This is the transcript of a fantastic talk called “The Tools We Still Need to Build with AI.”

Absorb every word!

Thursday, May 30th, 2024

Applying the four principles of accessibility

Web Content Accessibility Guidelines—or WCAG—looks very daunting. It’s a lot to take in. It’s kind of overwhelming. It’s hard to know where to start.

I recommend taking a deep breath and focusing on the four principles of accessibility. Together they spell out the cutesy acronym POUR:

  1. Perceivable
  2. Operable
  3. Understandable
  4. Robust

A lot of work has gone into distilling WCAG down to these four guidelines. Here’s how I apply them in my work…

Perceivable

I interpret this as:

Content will be legible, regardless of how it is accessed.

For example:

  • The contrast between background and foreground colours will meet the ratios defined in WCAG 2.
  • Content will be grouped into semantically-sensible HTML regions such as navigation, main, footer, etc.

Operable

I interpret this as:

Core functionality will be available, regardless of how it is accessed.

For example:

  • I will ensure that interactive controls such as links and form inputs will be navigable with a keyboard.
  • Every form control will be labelled, ideally with a visible label.

Understandable

I interpret this as:

Content will make sense, regardless of how it is accessed.

For example:

  • Images will have meaningful alternative text.
  • I will make sensible use of heading levels.

This is where it starts to get quite collaboritive. Working at an agency, there will some parts of website creation and maintenance that will require ongoing accessibility knowledge even when our work is finished.

For example:

  • Images uploaded through a content management system will need sensible alternative text.
  • Articles uploaded through a content management system will need sensible heading levels.

Robust

I interpret this as:

Content and core functionality will still work, regardless of how it is accessed.

For example:

  • Drop-down controls will use the HTML select element rather than a more fragile imitation.
  • I will only use JavaScript to provide functionality that isn’t possible with HTML and CSS alone.

If you’re applying a mindset of progressive enhancement, this part comes for you. If you take a different approach, you’re going to have a bad time.

Taken together, these four guidelines will get you very far without having to dive too deeply into the rest of WCAG.

Tuesday, May 28th, 2024

The Web Accessibility Cookbook

Manu’s book is available to pre-order now. I’ve had a sneak peek and I highly recommend it!

You’ll learn how to build common patterns written accessibly in HTML, CSS, and JavaScript. You’ll also start to understand how good and bad practices affect people, especially those with disabilities.

Monday, May 20th, 2024

Home - Sa11y

Another handy accessibility testing tool that can be used as a bookmarklet.

Monday, May 13th, 2024

Manifesto for a Humane Web

I endorse this message.

This manifesto is intended as a personal response to the current state of the web. It is a statement of intent and a call to arms, inviting you, the reader, to go forth and build humane websites, and to resist the erosion of the web we know and love.

Sunday, April 28th, 2024

Write Alt Text Like You’re Talking To A Friend – Cloud Four

This is good advice:

Write alternative text as if you’re describing the image to a friend.