Planet Grep

Planet'ing Belgian FLOSS people

Planet Grep is maintained by Wouter Verhelst. All times are in UTC.

February 20, 2025

Billions of images on the web lack proper alt-text, making them inaccessible to millions of users who rely on screen readers.

My own website is no exception, so a few weeks ago, I set out to add missing alt-text to about 9,000 images on this website.

What seemed like a simple fix became a multi-step challenge. I needed to evaluate different AI models and decide between local or cloud processing.

To make the web better, a lot of websites need to add alt-text to their images. So I decided to document my progress here on my blog so others can learn from it – or offer suggestions. This third post dives into the technical details of how I built an automated pipeline to generate alt-text at scale.

Subscribe to my blog

Join 5,000+ subscribers and get new posts by email.

Sign up Or subscribe using RSS

High-level architecture overview

My automation process follows three steps for each image:

  1. Check if alt-text exists for a given image
  2. Generate new alt-text using AI when missing
  3. Update the database record for the image with the new alt-text

The rest of this post goes into more detail on each of these steps. If you're interested in the implementation, you can find most of the source code on GitHub.

Retrieving image metadata

To systematically process 9,000 images, I needed a structured way to identify which ones were missing alt-text.

Since my site runs on Drupal, I built two REST API endpoints to interact with the image metadata:

  • GET /album/{album-name}/{image-name}/get – Retrieves metadata for an image, including title, alt-text, and caption.
  • PATCH /album/{album-name}/{image-name}/patch – Updates specific fields, such as adding or modifying alt-text.

I've built similar APIs before, including one for my basement's temperature and humidity monitor. That post provides a more detailed breakdown of how I built those endpoints.

This API uses separate URL paths (/get and /patch) for different operations, rather than using a single resource URL. I'd prefer to follow RESTful principles, but this approach avoids caching problems, including content negotiation issues in CDNs.

Anyway, with the new endpoints in place, fetching metadata for an image is simple:

curl -H "Authorization: test-token" \
  "https://dri.es/album/isle-of-skye-2024/journey-to-skye/get"

Every request requires an authorization token. And no, test-token isn't the real one. Without it, anyone could edit my images. While crowdsourced alt-text might be an interesting experiment, it's not one I'm looking to run today.

This request returns a JSON object with image metadata:

{
  "title": "Journey to Skye",
  "alt": "",
  "caption": "Each year, Klaas and I pick a new destination for our outdoor adventure. In 2024, we set off for the Isle of Skye in Scotland. This stop was near Glencoe, about halfway between Glasgow and Skye."
}

Because the alt-field is empty, the next step is to generate a description using AI.

Generating and refining alt-text with AI

In my first post on AI-generated alt-text, I wrote a Python script to compare 10 different local Large Language Models (LLMs). The script uses PyTorch, a widely used machine learning framework for AI research and deep learning. This implementation was a great learning experience. I really enjoyed building it.

The original script takes an image as input and generates alt-text using multiple LLMs:

./caption.py journey-to-skye.jpg
{
  "image": "journey-to-skye.jpg",
  "captions": {
    "vit-gpt2": "A man standing on top of a lush green field next to a body of water with a bird perched on top of it.",
    "git": "A man stands in a field next to a body of water with mountains in the background and a mountain in the background.",
    "blip": "This is an image of a person standing in the middle of a field next to a body of water with a mountain in the background.",
    "blip2-opt": "A man standing in the middle of a field with mountains in the background.",
    "blip2-flan": "A man is standing in the middle of a field with a river and mountains behind him on a cloudy day.",
    "minicpm-v": "A person standing alone amidst nature, with mountains and cloudy skies as backdrop.",
    "llava-13b": "A person standing alone in a misty, overgrown field with heather and trees, possibly during autumn or early spring due to the presence of red berries on the trees and the foggy atmosphere.",
    "llava-34b": "A person standing alone on a grassy hillside with a body of water and mountains in the background, under a cloudy sky.",
    "llama32-vision-11b": "A person standing in a field with mountains and water in the background, surrounded by overgrown grass and trees."
  }
}

My original plan was to run everything locally for full control, no subscription costs, and optimal privacy. But after testing 10 local LLMs, I changed my mind.

I always knew cloud-based models would be better, but wanted to see if local models were good enough for alt-texts specifically. Turns out, they're not quite there. You can read the full comparison, but I gave the best local models a B, while cloud models earned an A.

While local processing aligned with my principles, it compromised the primary goal: creating the best possible descriptions for screen reader users. So I abandoned my local-only approach and decided to use cloud-based LLMs.

To automate alt-text generation for 9,000 images, I needed programmatic access to cloud models rather than relying on their browser-based interfaces — though browser-based AI can be tons of fun.

Instead of expanding my script with cloud LLM support, I switched to Simon Willison's llm tool (see https://llm.datasette.io/). llm is a command-line tool and Python library that supports both local and cloud-based models. It takes care of installation, dependencies, API key management, and uploading images. Basically, all the things I didn't want to spend time maintaining myself.

Despite enjoying my PyTorch explorations with vision language models and multimodal encoders, I needed to focus on results. My weekly progress goal meant prioritizing working alt-text over building homegrown inference pipelines.

I also considered you, my readers. If this project inspires you to make your own website more accessible, you're better off with a script built on a well-maintained tool like llm rather than trying to adapt my custom implementation.

Scrapping my PyTorch implementation stung at first, but building on a more mature and active open-source project was far better for me and for you. So I rewrote my script, now in the v2 branch, with the original PyTorch version preserved in v1.

The new version of my script keeps the same simple interface but now supports cloud models like ChatGPT and Claude:

./caption.py journey-to-skye.jpg --model chatgpt-4o-latest claude-3-sonnet --context "Location: Glencoe, Scotland"
{
  "image": "journey-to-skye.jpg",
  "captions": {
    "chatgpt-4o-latest": "A person in a red jacket stands near a small body of water, looking at distant mountains in Glencoe, Scotland.",
    "claude-3-sonnet": "A person stands by a small lake surrounded by grassy hills and mountains under a cloudy sky in the Scottish Highlands."
  }
}

The --context parameter improves alt-text quality by adding details the LLM can't determine from the image alone. This might include GPS coordinates, album titles, or even a blog post about the trip.

In this example, I added "Location: Glencoe, Scotland". Notice how ChatGPT-4o mentions Glencoe directly while Claude-3 Sonnet references the Scottish Highlands. This contextual information makes descriptions more accurate and valuable for users. For maximum accuracy, use all available information!

Updating image metadata

With alt-text generated, the final step is updating each image. The PATCH endpoint accepts only the fields that need changing, preserving other metadata:

curl -X PATCH \
  -H "Authorization: test-token" \
  "https://dri.es/album/isle-of-skye-2024/journey-to-skye/patch" \
  -d '{
    "alt": "A person stands by a small lake surrounded by grassy hills and mountains under a cloudy sky in the Scottish Highlands.",
  }'

That's it. This completes the automation loop for one image. It checks if alt-text is needed, creates a description using a cloud-based LLM, and updates the image if necessary. Now, I just need to do this about 9,000 times.

Tracking AI-generated alt-text

Before running the script on all 9,000 images, I added a label to the database that marks each alt-text as either human-written or AI-generated. This makes it easy to:

  • Re-run AI-generated descriptions without overwriting human-written ones
  • Upgrade AI-generated alt-text as better models become available

With this approach I can update the AI-generated alt-text when ChatGPT 5 is released. And eventually, it might allow me to return to my original principles: to use a high-quality local LLM trained on public domain data. In the mean time, it helps me make the web more accessible today while building toward a better long-term solution tomorrow.

Next steps

Now that the process is automated for a single image, the last step is to run the script on all 9,000. And honestly, it makes me nervous. The perfectionist in me wants to review every single AI-generated alt-text, but that is just not feasible. So, I have to trust AI. I'll probably write one more post to share the results and what I learned from this final step.

Stay tuned.

February 19, 2025

De la soumission au technofascisme religieux

Les générateurs de code stupide

Sur Mastodon, David Chisnall fait le point sur une année d’utilisation de GitHub Copilot pour coder. Et le résultat est clair : si, au début, il a l’impression de gagner du temps en devant moins taper sur son ordinateur, ce temps est très largement perdu par les heures voire les jours nécessaires à déboguer des bugs subtils qui ne seraient jamais arrivés s’il avait écrit le code lui-même en premier lieu ou, au pire, qu’il aurait pu détecter beaucoup plus vite.

Il réalise alors que la difficulté et le temps passé sur le code n’est pas d’écrire le code, c’est de savoir quoi et comment l’écrire. S’il faut relire le code généré par l’IA pour le comprendre, c’est plus compliqué pour le programmeur que de tout écrire soi-même.

« Oui, mais pour générer le code pas très intelligent »

Là, je rejoins David à 100% : si votre projet nécessite d’écrire du code bête qui a déjà été écrit mille fois ailleurs, c’est que vous avez un problème. Et le résoudre en le faisant écrire par une IA est à peu près la pire des choses à faire.

Comme je le dis en conférence : ChatGPT apparait utile pour ceux qui ne savent pas taper sur un clavier. Vous voulez être productif ? Apprenez la dactylographie !

Là où ChatGPT est très fort, par contre, c’est de faire semblant d’écrire du code. En proposant des tableaux d’avancement de son travail, en prétendant que tout est bientôt prêt et sera sur WeTransfer. C’est évidemment bidon : ChatGPT a appris à arnaquer !

Bref, ChatGPT est devenu le parfait Julius.

Ed Zitron enfonce encore plus le clou à ce sujet : les ChatGPTs et consorts sont des « succès » parce que toute la presse ne fait qu’en parler en termes élogieux, que ce soit par bêtise ou par corruption. Mais, en réalité, le nombre d’utilisateurs payants est incroyablement faible et, comme Trump, Sam Altman s’adresse à nous en considérant que nous sommes des débiles qui avalons les plus gros mensonges sans broncher. Et les médias et les CEOs applaudissent…

Débiles, nous le sommes peut-être complètement. Plusieurs dizaines d’articles scientifiques mentionnent désormais la « miscroscopie électronique végétative ». Ce terme ne veut rien dire. Quelle est son origine ?

Il vient tout simplement d’un article de 1959 publié sur deux colonnes, mais qui est entré dans le corpus comme une seule colonne !

Ce que cette anecdote nous apprend c’est que, premièrement, les générateurs de conneries sont encore plus mauvais qu’on ne l’imagine, mais, surtout, que notre monde est déjà rempli de cette merde ! Les LLMs ne font qu’appliquer au contenu en ligne ce que l’industrie a fait pour le reste : les outils, les vêtements, la bouffe. Produire le plus possible en baissant la qualité autant que possible. Puis en l’abaissant encore plus.

La suppression des filtres

L’imprimerie fait passer la communication de "One to one" à "One to many", ce qui rend obsolète l’Église catholique, l’outil utilisé en occident pour que les puissants imposent leur discours à la population. La première conséquence de l’imprimerie sera d’ailleurs le protestantisme qui revendique explicitement la capacité pour chacun d’interpréter la parole de Dieu et donc de créer son propre discours à diffuser, le "One to many".

Comme le souligne Victor Hugo dans Notre-Dame de Paris, « la presse tuera l’église ».

Conséquences directes de l’imprimerie : la Renaissance puis les Lumières. Toute personne qui réfléchit peut diffuser ses idées et s’inspirer de celles qui sont diffusées. Chaque humain ne doit plus réinventer la roue, il peut se baser sur l’existant. L’éducation prend le pas sur l’obéissance.

Après quelques siècles de « One to many » apparait l’étape suivante : Internet. Du « One to many » on passe au « Many to many ». Il n’y a plus aucune limite pour diffuser ses idées : tout le monde peut le faire envers tout le monde.

Une conséquence logique qui m’avait échappé à l’époque du billet précédent, c’est que si tout le monde veut parler, plus personne n’écoute. Comme beaucoup, j’ai cru que le « many to many » serait incroyablement positif. La triste réalité est que l’immense majorité d’entre nous n’avons pas grand-chose à dire, mais que nous voulons quand même nous faire entendre. Alors nous crions. Nous générons du bruit. Nous étouffons ce qui est malgré tout intéressant.

L’investissement nécessaire pour imprimer un livre ainsi que le faible retour direct constitue un filtre. Ne vont publier un livre que ceux qui veulent vraiment le faire.

La pérennité de l’objet livre et la relative lenteur de sa transmission implique également un second filtre : les livres les moins intéressants seront vite oubliés. C’est d’ailleurs pourquoi nous idéalisons parfois le passé, tant en termes de littérature que de cinématographie ou de musique : parce que ne nous sont parvenus que les meilleurs, parce que nous avons oublié les sombres merdes qui firent un flop ou eurent un succès éphémère.

Bien que très imparfait et filtrant probablement de très bonnes choses que nous avons malheureusement perdues, la barrière à l’entrée et la dilution temporelle nous permettaient de ne pas sombrer dans la cacophonie.

L’échec de la démocratisation de la parole

Internet, en permettant le « many to many » sans aucune limite a rendu ces deux filtres inopérants. Tout le monde peut poster pour un coût nul. Pire : les mécanismes d’addiction des plateformes ont rendu plus facile de poster que de ne pas poster. Le support numérique rend également floue la frontière temporelle : un contenu est soit parfaitement conservé, soit disparait totalement. Cela entraine que de vieux contenus réapparaissent comme s’ils étaient neufs et personne ne s’en rend compte. Le filtre temporel a totalement disparu.

De possible, le « many to many » s’est transformé en obligation. Pour exister, nous devons être vus, entendus. Nous devons avoir une audience. Prendre des selfies et les partager. Recevoir des likes qui nous sont vendus bien cher.

Le « many to many » s’est donc révélé une catastrophe, peut-être pas dans son principe, mais dans sa mise en œuvre. Au lieu d’une seconde renaissance, nous entrons en décadence, dans un second moyen-âge. La frustration de pouvoir s’exprimer, mais de ne pas être entendu est grande.

Olivier Ertzscheid va même plus loin : pour lui, ChatGPT permet justement d’avoir l’impression d’être écouté alors que personne ne nous écoute plus. Du « many to many », nous sommes passés au « many to nobody ».

Utiliser ChatGPT pour obtenir des infos se transforme en utiliser ChatGPT pour obtenir confirmation à ses propres croyances, comme le relève le journaliste politique Nils Wilcke.

J’en ai marre de le répéter, mais ChatGPT et consorts sont des générateurs de conneries explicitement conçus pour vous dire ce que vous avez envie d’entendre. Que « ChatGPT a dit que » puisse être un argument politique sur un plateau télévisé sans que personne ne bronche est l’illustration d’un crétinisme total généralisé.

Le Techno-Fascisme religieux

La « Many to nobody » est en soi un retour à l’ordre ancien. Plus personne n’écoute la populace. Seuls les grands seigneurs disposent de l’outil pour imposer leur vue. L’Église catholique a été remplacée par la presse et les médias, eux-mêmes remplacés par les réseaux sociaux et ChatGPT. ChatGPT qui n’est finalement qu’une instance automatisée d’un prêtre qui vous écoute en confession avant de vous dire ce qui est bien et ce qui est mal, basé sur les ordres qu’il reçoit d’en haut.

Dans un très bon billet sur le réseau Gemini, small patata réalise que l’incohérence du fascisme n’est pas un bug, c’est son mode de fonctionnement, son essence. Une incohérence aléatoire et permanente qui permet aux esprits faibles de voir ce qu’ils ont envie de voir par paréidolie et qui brise les esprits les plus forts. En brisant toute logique et cohérence, le fascisme permet aux abrutis de s’affranchir de l’intelligence et de prendre le contrôle sur les esprits rationnels. Le légendaire pigeon qui chie sur l’échiquier et renverse les pièces avant de déclarer victoire.

L’incohérence de ChatGPT n’est pas un bug qui sera résolu ! C’est au contraire ce qui lui permet d’avoir du succès avec les esprits faibles qui, en suivant des formations de « prompt engineering », ont l’impression de reprendre un peu de contrôle sur leur vie et d’acquérir un peu de pouvoir sur la réalité. C’est l’essence de toutes les arnaques : prétendre aux personnes en situation de faiblesse intellectuelle qu’ils vont miraculeusement retrouver du pouvoir.

Small patata fait le lien avec les surréalistes qui tentèrent de lutter artistiquement contre le fascisme et voit dans le surréalisme une manière beaucoup plus efficace de lutter contre les générateurs de conneries.

Il faut dire que face à un générateur mondial de conneries, fasciste, centralisé, ultra capitaliste et bénéficiant d’une adulation religieuse, je ne vois pas d’autre échappatoire que le surréalisme.

Brandissons ce qui nous reste d’humanité ! Aux âmes citoyens !

Image reprise du gemlog de small patatas: Le triomphe du surréalisme, Max Ernst (1937)

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

February 18, 2025

Le succès existe-t-il ?

La notion de succès d’un blog

Un blogueur que j’aime beaucoup, Gee, revient sur ses 10 ans de blogging. Cela me fascine de voir l’envers du décor des autres créateurs. Gee pense avoir fait l’erreur de ne pas profiter de la vague d’enthousiasme qu’à connu son Geektionnerd et de ne pas en avoir profité pour faire plus de promo.

Je ne suis pas d’accord avec Gee : il a très bien fait de continuer sa vie sans se préoccuper du succès. Les vagues d’enthousiasme vont et viennent, elles sont très brèves. Le public passe très vite à autre chose. Partir en quête du buzz permanent est la recette absolue pour se perdre. C’est un métier à part entière : le marketing. Trop d’artistes et de créateurs se sont détournés vers le marketing, espérant obtenir une fraction du succès obtenu par des gens sans talents autre que le marketing.

Mais vous oubliez que la perception du succès elle-même fait partie du plan marketing. Vous pensez qu’un tel a du succès ? Vous n’en savez rien. Vous ne savez même pas définir « succès ». C’est une intuition confuse. Faire croire qu’on a du succès fait partie du mensonge !

Pour beaucoup de gens de mon entourage éloigné, je suis soudainement devenu un écrivain à succès parce que… je suis passé à la télé à une heure de grande écoute. Pour ces gens-là qui me connaissent, je suis passé de « type qui écrit de vagues livres dont personne n’a entendu parler » à « véritable écrivain connu qui passe à la télé ». Pour ceux, et ils sont nombreux, qui ont délégué à la télévision le pouvoir d’ordonner les individus au rang de « célébrité », j’ai du succès. Pour eux, je ne peux rien rêver de plus si ce n’est, peut-être, passer régulièrement à la télé et devenir une « vedette ».

Dans ma vie quotidienne et aux yeux de toutes les (trop rares) personnes qui n’idolâtre pas inconsciemment la télévision, ces passages à la télé n’ont strictement rien changé. J’ai certainement vendu quelques centaines de livres en plus. Mais ai-je du « succès » pour autant ?

Il y a quelques mois, j’étais invité comme expert pour le tournage d’une émission télé sur l’importance de protéger ses données personnelles en ligne. Lors d’une pause, j’ai demandé au présentateur ce qu’il faisait d’autre dans la vie. Il m’a regardé, étonné, et m’a répondu : « Je présente le JT ». Ça ne devait plus lui arriver très souvent de ne pas être reconnu. La moitié de la Belgique doit savoir qui il est. Nous avons rigolé et j’ai expliqué que je n’avais pas la télévision.

Question : cette personne a-t-elle du « succès » ?

Le succès est éphémère

À 12 ans, en vacances avec mes parents, je trouve un livre abandonné sur une table de la réception de l’hôtel. « Tantzor » de Paul-Loup Sulitzer. Je le dévore et je ne suis visiblement pas le seul. Paul-Loup Sulitzer est l’écrivain à la mode du moment. Selon Wikipédia, il a vendu près de 40 millions de livres dans 40 langues, dont son roman le plus connu : « Money ». Il vit alors une vie de milliardaire flamboyant.

Trente ans plus tard, ruiné, il publie la suite de Money: « Money 2 ». Il s’en écoulera moins de 1.300 exemplaires. Adoré, adulé, moqué, parodié des centaines de fois, Sulitzer est tout simplement tombé dans l’oubli le plus total.

Si le « succès » reste une notion floue et abstraite, une chose est certaine : il doit s’entretenir en permanence. Il n’est jamais véritablement acquis. Si on peut encore comprendre la notion de « faire fortune » comme « avoir plus d’argent que l’on ne peut en dépenser » (et donc ne plus avoir besoin d’en gagner), le succès lui ne se mesure pas. Il ne se gère pas de manière rationnelle.

Quels indicateurs ?

Dans son billet, Gee s’étonne également d’avoir reçu beaucoup moins de propositions pour le concours des 5 ans du blog que pour celui du premier anniversaire. Malgré une audience supposée supérieure.

De nouveau, le succès est une affaire de perception. Quel succès voulons-nous ? Des interactions intéressantes ? Des interactions nombreuses (ce qui est contradictoire avec la précédente) ? Des ventes ? Du chiffre d’affaires ? Des chiffres sur un compteur de visite comme les sites web du siècle précédent ?

Il n’y a pas une définition de succès. En fait, je ne connais personne, moi le premier, qui soit satisfait de son succès. Nous sommes, par essence humaine, éternellement insatisfaits. Nous sommes jaloux de ce que nous croyons voir chez d’autres (« Il passe à la télé ! ») et déçus de nos propres réussites (« Je suis passé à la télé, mais en fait, ça n’a rien changé à ma vie »).

Écrire dans le vide

C’est peut-être pour cela que j’aime tant le réseau Gemini. C’est le réseau anti-succès par essence. En publiant sur Gemini, on a réellement l’impression que personne ne va nous lire, ce qui est donne une réelle liberté.

Certains de mes posts de blog font le buzz sur le web. Je n’ai pas de statistiques, mais je vois qu’ils tournent sur Mastodon, qu’ils font la première page sur Hacker News. Mais si je n’allais pas sur Hacker News ni sur Mastodon, je ne le saurais pas. J’aurais tout autant l’impression d’ếcrire dans le vide que sur Gemini.

À l’opposé, certains de mes billets ne semblent pas attirer les "likes", "partages", "votes" et autres "commentaires". Pourtant, je reçois de nombreux emails à leur sujet. De gens qui veulent creuser le sujet, réfléchir avec moi. Ou me remercier pour cette réflexion. C’est particulièrement le cas avec le réseau Gemini qui semble attirer des personnes qui sont dans l’échange direct. Moi-même il m’arrive souvent de dégainer mon client mail pour répondre spontanément à un billet personnel lu sur Gemini. La réaction la plus fréquente à ces messages est : « Wow, je ne pensais pas que quelqu’un me lisait ! ».

Je vous pose la question : quel type de billet a, selon vous, le plus de « succès » ?

Est-ce que la notion de succès a réellement un sens ? Peut-on avoir assez de succès ?

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

February 11, 2025

Last week, I wrote about my plan to use AI to generate 9,000 alt-texts for images on my website. I tested 12 LLMs — 10 running locally and 2 cloud-based — to assess their accuracy in generating alt-text for images. I ended that post with two key questions:

  1. Should I use AI-generated alt-texts, even if it they are not perfect?
  2. Should I generate these alt-texts with local LLMs or in the cloud?

Since then, I've received dozens of emails and LinkedIn comments. The responses were all over the place. Some swore by local models because they align with open-source values. Others championed cloud-based LLMs for better accuracy. A couple of people even ran tests using different models to help me out.

I appreciate every response. It's a great reminder of why building in the open is so valuable – it brings in diverse perspectives.

But one comment stood out. A visually impaired reader put it simply: Imperfect alt-text is better than no alt-text.

That comment made the first decision easy: AI-generated alt-text, even if not perfect, is better than nothing.

The harder question was which AI models to use. As a long-term open-source evangelist, I really want to run my own LLMs. Local AI aligns with my values: no privacy concerns, no API quotas, more transparency, and more control. They also align with my wallet: no subscription fees. And, let's be honest – running your own LLMs earns you some bragging rights at family parties.

But here is the problem: local models aren't as good as cloud models.

Most laptops and consumer desktops have 16–32GB of RAM, which limits them to small, lower-accuracy models. Even maxing out an Apple Mac Studio with 192GB of RAM doesn't change that. Gaming GPUs? Also a dead end, at least for me. Even high-end cards with 24GB of VRAM struggle with the larger models unless you stack multiple cards together.

The gap between local and cloud hardware is big. It's like racing a bicycle against a jet engine.

I could wait. Apple will likely release a new Mac Studio this year, and I'm hoping it supports more than 192GB of RAM. NVIDIA's Digits project could make consumer-grade LLM hardware even more viable.

Local models are also improving fast. Just in the past few weeks:

  • Alibaba released Qwen 2.5 VL, which performs well in benchmarks.
  • DeepSeek launched DeepSeek-VL2, a strong new open model.
  • Mark Zuckerberg shared that Meta's Llama 4 is in testing and might be released in the next few months.

Consumer hardware and local models will continue to improve. But even when they do, cloud models will still be ahead. So, I am left with this choice:

  1. Prioritize accessibility: use the best AI models available today, even if they're cloud-based.
  2. Stick to Open Source ideals: run everything locally, but accept worse accuracy.

A reader, Kris, put it well: Prioritize users while investing in your values. That stuck with me.

I'd love to run everything locally, but making my content accessible and ensuring its accuracy matters more. So, for now, I'm moving forward with cloud-based models, even if it means compromising on my open-source ideals.

It's not the perfect answer, but it's the practical one. Prioritizing accessibility and end-user needs over my own principles feels like the right choice.

That doesn't mean I'm giving up on local LLMs. I'll keep testing models, tracking improvements, and looking for the right hardware upgrades. The moment local AI is good enough for generating alt-text, I'll switch – no hesitation. In my next post, I'll share my technical approach to making this work.

À la recherche de la déconnexion parfaite

Une rétrospective de ma quête de concentration

Une première déconnexion

À la fin de l’année 2018, épuisé par la promotion de la compagne Ulule de mon livre « Les aventures d’Aristide, le lapin cosmonaute » et prenant conscience de mon addiction aux réseaux sociaux, je décide de me « déconnecter ».

Un bien grand mot pour m’interdire pendant 3 mois l’utilisation des réseaux sociaux et des sites d’actualité.

Le premier effet va se faire sentir très vite avec la désinstallation de l’app que j’utilise le plus à l’époque : Pocket.

L’expérience est avant tout une prise de conscience. Je découvre que, dès que je m’ennuie, j’ouvre machinalement un navigateur web sans même y réfléchir. C’est littéralement un réflexe.

Je commence à percevoir la différence entre l’information et le « bruit ». L’hyperconnexion est, comme le tabac, une assuétude et une pollution. Une notion qui deviendra essentielle dans ma réflexion.

Si je tente de subir moins de bruit, mon épouse me fait remarquer que je tente toujours d’en générer en postant sur des réseaux que je ne lis plus. Je suis incohérent.

Comme souvent dans ce genre d’expérience, on en sort sans aucune envie de se « reconnecter ». Mais je vais, bien entendu, très vite reprendre mes anciennes habitudes.

Le problème de l’hyperconnexion est désormais clair dans ma tête. Je suis addict et cette addiction m’est néfaste à tous les points de vue.

La période technosolutionniste

Face à la réalisation de l’ampleur du problème, mon premier réflexe est de trouver une solution technique, technologique. Beaucoup de personnes sont dans le même cas et, si cette étape est loin de suffire, elle est indispensable : faire du tri dans les outils numériques que nous utilisons. Je me rends compte que l’univers Apple, que je fréquente à l’époque, ayant reçu un MacBook de mon employeur, est à la fois contraire à mes valeurs et complètement incompatible avec une forme de sobriété numérique, car poussant à la consommation. Cette dichotomie entre ma philosophie et mon vécu entraine une tension que je tente d’évacuer par la surconnexion. Il est temps pour moi de revenir entièrement sous Linux.

J’achète également un téléphone qui est tellement merdique et bugué que je n’ai jamais envie de l’utiliser (non, ne l’achetez pas).

Concrètement, cette première déconnexion a également été l’opportunité de terminer mon feuilleton « Printeurs » ainsi que d’écrire quelques nouvelles. Celui-ci intéresse un éditeur et je publie mon premier roman en 2020.

Une autre action concrète que j’entreprends est de supprimer au maximum de comptes en ligne. Je ne le sais pas encore, mais je vais en découvrir et en supprimer près de 500 et cela va me prendre près de trois ans. Pour la plupart, j’ai oublié qu’ils existent, mais pour certains, l’étape est significative.

En parallèle, je découvre le protocole minimaliste Gemini. Suite à l’utilisation de ce protocole, une idée commence à me trotter dans la tête : travailler complètement déconnecté. J’ai en effet découvert que bloquer certains sites n’est pas suffisant : je trouve automatiquement des alternatives sur lesquelles procrastiner, alternatives qui sont même parfois moins intéressantes. J’ai donc envie d’explorer une déconnexion totale. Je commence à rédiger mon journal personnel à la machine à écrire.

Seconde déconnexion : une tentative d’année déconnectée

Le 1er janvier 2022, trois ans après la fin de ma première déconnexion, je me lance dans une tentative d’année complètement déconnectée. L’idée est de n’utiliser mon ordinateur que déconnecté dans mon bureau, de le synchroniser une fois par jour. Le tout est rendu possible par un logiciel que j’ai développé dans les derniers mois de 2021 : Offpunk.

Évidemment, la connexion est nécessaire pour certaines actions que je me propose de chronométrer et d’enregistrer. J’écris, en direct, le compte-rendu de cette déconnexion et, contre toute attente, ces écrits semblent passionner les lecteurs.

Mieux préparée et beaucoup plus ambitieuse (trop ?), cette déconnexion est finalement un échec après moins de 6 mois.

La leçon est dure : il n’est quasiment pas possible de se déconnecter de manière structurelle dans la société actuelle. Nous sommes tout le temps sollicités pour accomplir des actions en ligne, actions qui nécessitent du temps, mais pas toujours de la concentration. Tout est désormais optimisé pour que nous soyons en ligne.

Ma déconnexion est un échec. Le livre de cette déconnexion est inachevé. Un autre manuscrit sur lequel je travaille durant cette déconnexion est dans un état inutilisable. Cependant, j’ai profité de ce temps pour écrire quelques nouvelles et finaliser mon recueil « Stagiaire au spatioport Omega 3000 et autres joyeusetés que nous réserve le futur ».

Conséquence directe de cette déconnexion, mon compte Whatsapp disparait. Mon compte Twitter suit bientôt également.

J’ai également pris conscience que mon blog Wordpress n’est plus du tout en phase avec ma philosophie. En parallèle de mon travail sur Offpunk, je réécris complètement mon blog pour en faire un outil « offline ».

Le second retour à la normalité

Début 2023, je m’isole pour commencer l’écriture de Bikepunk qui paraitra en 2024. J’alterne entre les périodes de déconnexion totale et des périodes d’hyperconnexion.

Le seul réseau social où j’ai gardé un compte, Mastodon, commence à attirer l’attention. J’y suis très présent et, philosophiquement, je ne peux que soutenir et encourager toutes les personnes cherchant à quitter X et Meta. Je retombe dans l’hyperconnexion. Une hyperconnexion éthique, mais une hyperconnexion tout de même.

Pendant deux ans, j’utilise l’extension Firefox LeechBlock qui permet de n’autoriser qu’un temps limité par jour sur certains sites web. Cela fonctionne pas trop mal pendant un temps jusqu’au moment où j’acquiers le réflexe de désactiver le plugin sans même y penser.

Comme tous les trois ans, il est temps pour moi de lancer un nouveau cycle et de m’interroger sur mes usages.

Un de mes apprentissages principaux est que toute modification de mon comportement mental doit s’accompagner chez moi par une modification physique. Mon esprit suit les réflexes de mon corps. Je tape encore parfois machinalement dans la barre d’adresse Firefox les premières lettres de sites procrastinatoires sur lesquels je n’ai plus été depuis dix ans !

Le second apprentissage est que la radicalité implique une rechute plus forte. La connexion est nécessaire tous les jours, de manière imprévisible. Je ne souhaite pas m’isoler, mais concevoir une manière de fonctionner durable. Créer de nouveaux réflexes.

Une troisième déconnexion

Pour ma « déconnexion 2025 », j’ai donc pris une grande décision : j’ai acheté un fauteuil pour remplacer ma chaise de bureau. Pendant toutes mes études et mes premières années professionnelles, je n’avais que des chaises de récupération. Au printemps 2008, disposant d’un salaire stable et d’un appartement, j’achète une chaise de bureau neuve : le premier prix de chez Ikea. Cette chaise, rafistolée avec des coussins défoncés dont mes beaux-parents ne voulaient plus, était encore celle que j’utilisais jusqu’il y a quelques jours. Ce nouveau fauteuil est donc un très grand changement pour moi.

Et je me suis promis de ne l’utiliser qu’en étant déconnecté.

Pour ce faire, je désactive le wifi dans le Bios de mon ordinateur. J’ai également organisé un « bureau debout » dans un coin de la pièce, bureau debout où arrive un câble RJ-45. Si je veux me connecter, je dois donc physiquement me lever et brancher un câble. Tout ce que je dois faire en ligne s’effectue désormais en étant debout. Lorsque je suis assis (ou vautré, pour être plus exact), je suis déconnecté.

J’ai également pris d’autres petites mesures. En premier lieu, mes todos ne sont plus stockés sur mon ordinateur, mais sur des fiches sur un tableau de liège. Un comble pour qui se rappelle que j’ai passé plusieurs années à développer le logiciel « Getting Things GNOME ».

Je revois aussi la gestion de mon email. J’adore recevoir des emails et de mes lecteurs et j’ai beaucoup de mal à ne pas y répondre. Puis à répondre à la réponse de ma réponse. Avec le succès de Bikepunk, mon courrier s’est étoffé et je me retrouve parfois à la fin de la journée en réalisant que j’ai… « répondu à mes emails ». Des discussions certes enrichissantes, mais chronophages. Dans bien des cas, je répète dans plusieurs mails ce qui pourrait être un billet de blog. Considérez que j’ai lu votre mail, mais que ma réponse alimentera mes prochains billets de blogs. Certains billets futurs traiteront de thèmes que je n’aborde pas d’habitude, mais pour lesquels je reçois énormément de questions.

Sur Mastodon, que je ne consulte plus que debout, j’ai pris la décision de mettre tous les comptes que je suis dans une liste, liste que j’ai configurée pour qu’elle ne s’affiche pas dans ma timeline. Quand je consulte Mastodon, je ne vois donc que mes posts à moi et je dois accomplir une action en plus si je veux voir ce qui se dit (ce que je ne fais plus tous les jours). Comme avant, les notifications sont régulièrement « vidées ».

Si vous voulez suivre ce blog, privilégiez le flux RSS ou bien mes deux newsletters:

À la recherche de l’ennui.

Déconnexion est un bien grand mot pour simplement dire que je ne serai plus connecté 100% du temps. Mais telle est l’époque où nous vivons. Cal Newport parle de l’incroyable productivité de l’écrivain Brandon Sanderson qui a créé une entreprise de 70 personnes uniquement dédiée à une seule activité : le laisser écrire le plus possible !

Si l’exemple est extrême, Cal s’étonne de ce qu’on ne voit pas plus de structures qui cherchent à favoriser la concentration et la créativité. Dans un âge où l’hyperdistraction permanente est la norme, il est nécessaire de se battre et de développer les outils pour se concentrer. Et s’ennuyer. Surtout s’ennuyer. Car pour réfléchir et créer, l’ennui est primordial.

D’ailleurs, si je ne m’étais pas ennuyé, je n’aurais jamais écrit ce billet ! Nous dresserons le bilan dans 3 ans pour ma quatrième déconnexion…

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

February 06, 2025

De la décadence technologique et des luddites technophiles

La valeur de texte brut

Thierry s’essaie à publier son blog sur le réseau Gemini, mais a du mal avec le format minimaliste. Qui est justement pour moi la meilleure partie du protocole Gemini.

Le format Gemini impose, comme dans un livre, du texte pur. Il est possible d’ajouter un titre, des sous-titres, des liens, des citations, mais avec une particularité importante : cela doit concerner toute la ligne, pas une simple partie de texte. Les liens doivent donc être sur leur propre ligne plutôt que de se perdre et foisonner dans le texte. Comme ils interrompent la lecture entre deux paragraphes, ils doivent être explicités et justifiés plutôt que d’être cachés au petit bonheur du clic.

Il est également impossible de mettre de l’italique ou du gras dans son texte. Ce qui est une excellente chose. Comme le rappelle Neal Stephenson dans son « In the beginning was the command line », les mélanges gras/italiques aléatoires n’ont rien à faire dans un texte. Prenez un livre et tentez de trouver du texte en gras dans le corps du texte. Il n’y en a pas et pour une bonne raison : cela ne veut rien dire, cela perturbe la lecture. Mais lorsque Microsoft Word est apparu, il a rendu plus facile de mettre en gras que de faire des titres corrects. Tout comme le clavier azerty a soudainement fait croire qu’il ne fallait pas mettre d’accent sur les majuscules, l’outil technologique a appauvri notre rapport au texte.

Car le besoin d’attirer l’attention au milieu d’un texte est un aveu d’insécurité de l’auteur. Le texte doit exister par lui-même. C’est au lecteur de choisir ce qu’il veut mettre en avant en surlignant, pas à l’auteur. Orner un texte d’artifices inutiles pour tenter de combler les vides porte un nom : la décadence.

Le gras, le word art, le Comic San MS, les powerpoints envoyés par mail, tous sont des textes décadents qui tentent de camoufler la vacuité ou l’inanité du contenu.

La décadence inexorable de la tech

Le texte n’est qu’un exemple parmi tant d’autres.

Thierry se pose également beaucoup de questions sur les notions low-tech et high-tech, notamment dans le médical. Mais le terme « low-tech » est selon moi trompeur. Je suis un luddite technophile. Contrairement à ce que la légende prétend, les luddites n’étaient pas du tout opposés à la technologie. Ils étaient opposés à la propriété technologique par la classe bourgeoise, ce qui transformait les artisans spécialisés en interchangeables esclaves des machines. Les luddites n’ont pas tenté de détruire des métiers à tisser technologiques, mais des machines que leurs patrons utilisaient pour les exploiter.

De la même manière, je ne suis pas opposé aux réseaux sociaux centralisés ni aux chatbots parce que c’est « high tech », mais parce que ce sont des technologies qui sont activement utilisées pour nous appauvrir, tant intellectuellement que financièrement. C’est même leur seul objectif avoué.

Que l’IA soit utilisée pour détecter plus précocement des cancers, je trouve l’idée formidable. Mais je sais également qu’elle est impossible dans le contexte actuel. Pas d’un point de vue technique. Mais parce que, bien utilisée, elle coûtera plus cher que pas d’IA du tout. En effet, l’IA peut aider en détectant des cancers que le médecin a ratés. Il faut donc un double diagnostic, tant du médecin que de l’IA et se poser des questions lorsque les deux sont en désaccord. Il faut payer le coût de l’IA en plus du surplus du travail du médecin, car il devra faire plus d’heures vu qu’il devra revoir les diagnostics « divergents » pour trouver son erreur ou celle de l’IA. L’IA est un outil qui peut être utile si on accepte qu’il coûte beaucoup plus cher.

Ça, c’est la théorie.

En pratique, une telle technologie est vendue sous prétexte de « faire des économies ». Elle va forcément induire un relâchement attentionnel des médecins et, pour justifier les coûts, une diminution du temps consacré à chaque diagnostic humain. Perdant de l’expérience et de l’habitude, le diagnostic des médecins va devenir de moins en moins sûr et, par effet ricochet, les nouveaux médecins vont être de moins en moins bien formés. Les cancers indétectés par l’IA ne le seront plus par les humains. L’IA étant entrainée sur les diagnostics réalisés par des humains, elle va également devenir de moins en moins compétente et s’autovalider. Au final, nul besoin d’être grand clerc pour voir que si la technologie est intéressante, son utilisation dans notre contexte socio-économique ne peut que se révéler catastrophique et n’est intéressante que pour les vendeurs d’IA.

Le mensonge high tech

Les partisans du « low tech » ont l’intuition que la « high tech » cherche à les exploiter. Ils ont raison sur le fond, pas sur la cause. Ce n’est pas la technologie le nœud du problème, mais sa décadence.

La course à la technologie est une bulle bâtie sur un mensonge. L’idée n’est pas de construire quelque chose de durable, mais de faire croire qu’on va le construire pour attirer des investisseurs. Les entreprises du NASDAQ sont devenues une énorme pyramide de Ponzi. Elles tentent de se soutenir l’une l’autre à coup de millions, mais perdent toutes énormément d’argent, ce qu’elles arrivent à cacher grâce au cours de la bourse.

D’ailleurs, des recherches sérieuses confirment mon intuition : au plus on comprend ce qu’il y a derrière « l’intelligence artificielle », au moins on en veut. L’IA est littéralement un piège à ignorants. Et les producteurs l’ont très bien compris : ils ne veulent pas que l’on comprenne ce qu’ils font.

Ed Zitron continue sur sa lancée avec l’inattendue arrivée de DeepSeek, le ChatGPT chinois qui est simplement 30 fois moins cher. À la question « Pourquoi OpenAI et les autres n’ont pas réussi à faire moins cher », il propose la réponse rétrospectivement évidente : « Parce que ces entreprises n’avaient aucun intérêt à faire moins cher. Au plus elles perdent de l’argent, au plus elles justifient que ce qu’elles font est cher, au plus elles attirent les investisseurs et effraient de potentiels compétiteurs ». En bref : parce qu’elles sont complètement décadentes !

Cory Doctorow parle souvent de merdification, je propose plutôt de parler de « décadence technologique ». Nous produisons la technologie la plus chère, la plus complexe et la moins écologique possible par simple réflexe. Comme pour les orgies romaines, la complexité et le coût ne sont plus des obstacles, mais les objectifs premiers que nous cherchons à atteindre.

Ceci explique aussi pourquoi la technologie se retourne complètement contre ses utilisateurs. Dernièrement, une dame d’un certain âge voulait me montrer sur son téléphone un post vu sur son compte Facebook. La moitié de son gigantesque écran de téléphone était littéralement une publicité fixe pour une voiture. Dans la seconde moitié de l’écran, la dame scrollait et alternait entre d’autres pubs pour des voitures et ce qui était probablement du contenu. Son téléphone était doté d’un écran gigantesque, mais seule une fraction de celui-ci était au service de l’utilisateur. Et encore, pas complètement.

La bagnole est en soi le parfait exemple de décadence : d’outil, elle est devenue un symbole qui doit être le plus gros, le plus lourd, le plus voyant possible. Ce qui entraine une complexité infernale tant en termes d’espace public que d’espace privé. Les maisons des dernières décennies sont, pour la plupart, bâties comme des pièces autour d’un garage. Les villes comme des bâtiments autour de nœuds routiers. La voiture est devenue le véritable citoyen des villes, les humains n’en sont que les servants. Le Web suit la même trajectoire avec les robots remplaçant les voitures.

La frénésie envers l’intelligence artificielle est l’archétype de cette décadence. Car si les nouveaux outils ont clairement une utilité et peuvent clairement aider dans certains contextes, nous sommes dans une situation inverse : trouver un problème auquel appliquer l’outil .

Retour au concept d’utilité

C’est également la raison pour laquelle Gemini me passionne tellement. C’est l’outil le plus direct pour transmettre le texte de mon cerveau à celui d’un lecteur. En ouvrant la porte au gras, à l’italique puis aux images et au JavaScript, le Web est devenu une jungle décadente. Les auteurs y publient puis, sans se soucier d’être lus, consultent avidement les statistiques de clics et de likes. Le texte est de plus en plus optimisé pour ces statistiques. Avant d’être automatisés par des robots, robots qui pour s’entrainer vont consulter les textes en ligne et générer automatiquement des clics.

La boucle de la décadence technologique est bouclée : les contenus sont lus et générés par les mêmes machines. Les bourgeois capitalistes propriétaires ont réussi à automatiser totalement tant leurs ouvriers (les créateurs de contenus) que leurs clients (ceux qui font du clic).

Je ne veux pas servir les propriétaires de plateforme. Je ne veux pas consommer ce fade et inhumain contenu automatisé. Je tente de comprendre les conséquences de mes usages technologiques pour en tirer le maximum d’utilité avec le moins de conséquences négatives possible.

Face à la décadence technologique, je suis devenu un luddite technophile.

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

February 03, 2025

I have 10,000 photos on my website. About 9,000 have no alt-text. I'm not proud of that, and it has bothered me for a long time.

When I started my blog nearly 20 years ago, I didn't think much about alt-texts. Over time, I realized its importance for visually impaired users who rely on screen readers.

The past 5+ years, I diligently added alt-text to every new image I uploaded. But that only covers about 1,000 images, leaving most older photos without descriptions.

Writing 9,000 alt-texts manually would take ages. Of course, AI could do this much faster, but is it good enough?

To see what AI can do, I tested 12 Large Language Models (LLMs): 10 running locally and 2 in the cloud. My goal was to test their accuracy and determine whether they can generate accurate alt-text.

The TL;DR is that, not surprisingly, cloud models (GPT-4, Claude Sonnet 3.5) set the benchmark with A-grade performance, though not 100% perfect. I prefer local models for privacy, cost, and offline use. Among local options, the Llama variants and MiniCPM-V perform best. Both earned a B grade: they work reliably but sometimes miss important details.

I know I'm not the only one. Plenty of people — entire organizations even — have massive backlogs of images without alt-text. I'm determined to fix that for my blog and share what I learn along the way. This blog post is just step one — subscribe by email or RSS to get future posts.

Subscribe to my blog

Join 5,000+ subscribers and get new posts by email.

Sign up Or subscribe using RSS

Models evaluated

I tested alt-text generation using 12 AI models: 9 on my MacBook Pro with 32GB RAM, 1 on a higher-RAM machine (thanks to Jeremy Andrews, a friend and long-time Drupal contributor), and 2 cloud-based services.

The table below lists the models I tested, with details like links to research papers, release dates, parameter sizes (in billions), memory requirements, some architectural details and more:

Model Launch date Type Vision encoder Language encoder Model size (billions of parameters) RAM Deployment
1 VIT-GPT2 2021 Image-to-text ViT (Vision Transformer) GPT-2 0.4B ~8GB Local, Dries
2 Microsoft GIT 2022 Image-to-text Swin Transformer Transformer Decoder 1.2B ~8GB Local, Dries
3 BLIP Large 2022 Image-to-text ViT BERT 0.5B ~8GB Local, Dries
4 BLIP-2 OPT 2023 Image-to-text CLIP ViT OPT 2.7B ~8GB Local, Dries
5 BLIP-2 FLAN-T5 2023 Image-to-text CLIP ViT FLAN-T5 XL 3B ~8GB Local, Dries
6 MiniCPM-V 2024 Multi-modal SigLip-400M Qwen2-7B 8B ~16GB Local, Dries
7 LLaVA 13B 2024 Multi-modal CLIP ViT Vicuna 13B 13B ~16GB Local, Dries
8 LLaVA 34B 2024 Multi-modal CLIP ViT Vicuna 34B 34B ~32GB Local, Dries
9 Llama 3.2 Vision 11B 2024 Multi-modal Custom Vision Encoder Llama 3.2 11B ~20GB Local, Dries
10 Llama 3.2 Vision 90B 2024 Multi-modal Custom Vision Encoder Llama 3.2 90B ~128GB Local, Jeremy
11 OpenAI GPT-4o 2023 Multi-modal Custom Vision Encoder GPT-4 >150B Cloud
12 Anthropic Claude 3.5 Sonnet 2024 Multi-modal Custom Vision Encoder Claude 3.5 >150B Cloud

How image-to-text models work (in less than 30 seconds)

LLMs come in many forms, but for this project, I focused on image-to-text and multi-modal models. Both types of models can analyze images and generate text, either by describing images or answering questions about them.

Image-to-text models follow a two-step process: vision encoding and language decoding:

  1. Vision encoding: First, the model breaks an image down into patches. You can think of these as "puzzle pieces". The patches are converted into mathematical representations called embeddings, which summarize their visual details. Next, an attention mechanism filters out the most important patches (e.g. the puzzle pieces with the cat's outline or fur texture) and eliminates less relevant details (e.g. puzzle pieces with plain blue skies).
  2. Language encoding: Once the model has summarized the most important visual features, it uses a language model to translate those features into words. This step is where the actual text (image captions or Q&A answers) is generated.

In short, the vision encoder sees the image, while the language encoder describes it.

If you look at the table above, you'll see that each row pairs a vision encoder (e.g., ViT, CLIP, Swin) with a language encoder (e.g., GPT-2, BERT, T5, Llama).

For a more in-depth explanation, I recommend Sebastian Raschka's article Understanding Multi-modal LLMs, which also covers how image encoders work. It's fantastic!

Comparing different AI models

I wrote a Python script that generates alt-texts for images using nine different local models. You can find it in my GitHub repository. It takes care of installing models, running them, and generating alt-texts. It supports both Hugging Face and Ollama and is built to be easily extended as new models come out.

You can run the script as follows:

$ ./alt-text ./test-images/image-1.jpg

The first time you run the script, it will download all models, which requires significant disk space and bandwidth — expect to download over 50GB of model data.

The script outputs a JSON response, making it easy to integrate or analyze programmatically. Here is an example output:

{
  "image": "test-images/image-1.jpg",
  "alt-texts": {
    "vit-gpt2": "A city at night with skyscrapers and a traffic light on the side of the street in front of a tall building.",
    "git": "A busy city street is lit up at night, with the word qroi on the right side of the sign.",
    "blip": "This is an aerial view of a busy city street at night with lots of people walking and cars on the side of the road.",
    "blip2-opt": "An aerial view of a busy city street at night.",
    "blip2-flan": "An aerial view of a busy street in tokyo, japanese city at night with large billboards.",
    "minicpm-v": "A bustling cityscape at night with illuminated billboards and advertisements, including one for Michael Kors.",
    "llava-13b": "A bustling nighttime scene from Tokyo's famous Shibuya Crossing, characterized by its bright lights and dense crowds of people moving through the intersection.",
    "llava-34b": "A bustling city street at night, filled with illuminated buildings and numerous pedestrians.",
    "llama32-vision-11b": "A bustling city street at night, with towering skyscrapers and neon lights illuminating the scene."
  }
}

Test images

With the script ready, I decided to test it on some of my 10,000 photos. Not all of them at once. I picked five that I consider non-standard. Instead of simple portraits or landscapes, I picked photos with elements that might confuse or challenge the models.

One photo is from the Isabella Stewart Gardner Museum in Boston and features an empty gold frame. The frame once held a masterpiece stolen in the infamous 1990 heist, one of the biggest art thefts in history. I wanted to see if the models would recognize it as empty or mistake it for a framed painting.

Another photo, taken last summer in Vermont, shows a wakeboarder. Though he is the main subject, he is relatively small in the frame. I was curious to see if the models could still recognize him as the focal point.

In another photo, a backgammon game is set in a dark but cozy atmosphere. I was curious to see if the models could recognize partially visible objects and capture the mood of the scene.

To ensure a fair test, I stripped all EXIF metadata from the images. This includes any embedded captions, GPS coordinates, or other details that could inadvertently help the models.

Yes, I know that a test set of five images is small, but it's sufficient to identify the top models for further evaluation. With 12 models generating alt-texts for each photo, I had to manually evaluate 60 alt-texts, which was tedious to do fairly and accurately. For now, these five images are enough to filter out weaker models and pinpoint the best 2-3 LLMs for more detailed testing.

Next, I'll share the detailed results for each of the five photos. It's a lot to go through, so feel free to skip ahead to the summary.

Test image 1: Shibuya Crossing, Tokyo

Benchmark LLMs:

Model Description Grade
GPT-4o A vibrant city intersection with crowds crossing, surrounded by illuminated billboards and towering buildings, likely Shibuya, Tokyo. A
Claude Hundreds of people cross Tokyo's famous Shibuya intersection as neon signs light up the surrounding skyscrapers. A

Local LLMs:

Model Description Grade
VIT-GPT2 A city at night with skyscrapers and a traffic light on the side of the street in front of a tall building. F
GIT A busy city street is lit up at night, with the word qroi on the right side of the sign. E
BLIP This is an aerial view of a busy city street at night with lots of people walking and cars on the side of the road. D
BLIP-2 with OPT An aerial view of a busy city street at night. E
BLIP-2 with FLAN An aerial view of a busy street in Tokyo, Japanese city at night with large billboards. E
LLaVA (13B) A bustling nighttime scene of Tokyo's Shibuya Crossing, with neon lights and people walking across the intersection. A
LLaVA (34B) A bustling city street at night, with illuminated buildings and numerous pedestrians. C
MiniCPM-V A bustling cityscape at night with various advertisements, including one for Michael Kors and another for Yakult. C
Llama 3.2 Vision (11B) A bustling city street at night, with numerous billboards and advertisements lining the buildings. C
Llama 3.2 Vision (90B) A bustling city street with many people and buildings. D

Test image 2: Isabella Stewart Gardner Museum, Boston

Benchmark LLMs:

Model Description Grade
GPT-4o Elegant wallpaper, an empty gold frame, candelabra, and framed painting in a classic room. B
Claude Ornate Victorian room with damask wallpaper, candelabra, and gold frames on wall. B

Local LLMs:

Model Description Grade
VIT-GPT2 A living room with a mirror, candles, and a vase of flowers on a table in front of a mirror. F
GIT A picture frame is hanging on a wall next to a vase and a vase with the word tulips on it. E
BLIP A room with a painting on the wall and two vases on the table in front of it. E
BLIP-2 with OPT A room with a painting on the wall, a picture frame, and a chandelier. C
BLIP-2 with FLAN A room with a painting on the wall and a couple of framed pictures hanging on the wall next to it. D
LLaVA (13B) An empty wall with ornate decorations, including candlesticks and framed artwork, suggesting a formal or historical setting. D
LLaVA (34B) An ornate room with framed pictures on the wall, a chandelier hanging from the ceiling, and a fireplace mantel adorned with decorative items. E
MiniCPM-V A room with ornate wallpaper, candlesticks and framed portraits of historical figures is displayed.
Llama 3.2 Vision (11B) An empty gold frame on a wall with ornate wallpaper, surrounded by other decorative items. A
Llama 3.2 Vision (90B) An ornate room with a gold frame, a painting of a man, and a vase on a table. B

Test image 3: wakeboarding in Vermont, USA

Benchmark LLMs:

Model Description Grade
GPT-4o Two shirtless boys on a boat watch a person wakeboarding in the distance on a cloudy day. A
Claude People watching from a boat as a person is wakeboarding on a lake with a wooded shoreline. B

Local LLMs:

Model Description Grade
VIT-GPT2 A man standing on top of a boat next to another man holding a surfboard in one hand and a surfboard in the other. E
GIT Two men are in a boat, one of them is wearing an orange hat and the other is wearing an orange hat. D
BLIP There are two men riding on the back of a boat in the water, one of them is on a surfboard and the other is on a board. D
BLIP-2 with OPT Three young men sitting on the back of a boat. D
BLIP-2 with FLAN A group of people sitting on a boat watching a man ride a surfboard in the middle of the water. C
LLaVA (13B) Two shirtless men on a boat, watching another man surfing in the water. B
LLaVA (34B) Two shirtless men on a boat, watching another person water skiing. B
MiniCPM-V Two shirtless men on a boat watch as another person water skis away from them, with trees and sky visible across the lake behind him. B
Llama 3.2 Vision (11B) Two shirtless men on a boat watching people wakeboarding or surfing behind it. B
Llama 3.2 Vision (90B) Two men on a boat watching another man wakeboarding. A

Test image 4: hiking in the Dolomites, Italy

Benchmark LLMs:

Model Description Grade
GPT-4o A wooden shrine with a statue of Mary and directional signs, set against mountains and green hills. A
Claude Wooden shrine of St. Magdalena with trail signs in the Dolomites mountains of Italy. A

Local LLMs:

Model Description Grade
VIT-GPT2 A bird perched on top of a wooden post in front of a mountain range with a view of a mountain range and mountains. F
GIT A wooden sign in front of a mountain with a statue on it that says "Meise museum" on the top. D
BLIP There is a birdhouse with a statue on top of it in the middle of a field with trees and mountains in the background. D
BLIP-2 with OPT A statue of Mary on a wooden post in the middle of a field with mountains in the background. B
BLIP-2 with FLAN A statue of the Virgin Mary sits on top of a wooden post in the middle of a mountain. C
LLaVA (13B) A picturesque scene of a small wooden chapel with a statue of Jesus, set against a backdrop of majestic mountains and lush greenery. C
LLaVA (34B) A wooden shrine with religious iconography, surrounded by alpine scenery and signposts. B
MiniCPM-V A wooden shrine with a statue of Mary and several directional signs pointing to various locations such as Kirchsteig, Magdalena, St. B
Llama 3.2 Vision (11B) A wooden shrine with a statue of Mary and a signpost in front of mountains. A
Llama 3.2 Vision (90B) A statue of Mary in a wooden shrine with a signpost pointing to various locations, including Rundweg St. B

Test image 5: backgammon by candlelight

Benchmark LLMs:

Model Description Grade
GPT-4o A cozy, dimly lit room with a candle on a wooden table, next to a backgammon board, creating a warm, rustic ambiance. B
Claude Two candles light up a game board for backgammon on a wooden table at night. A

Local LLMs:

Model Description Grade
VIT-GPT2 A candle is lit on a wooden table in front of a fire place with candles and other items on top of it. E
GIT Two candles are lit next to each other on a table, one of them is lit up and the other is lit up. D
BLIP There is a lit candle sitting on top of a wooden table next to a game board and a glass of wine on the table. D
BLIP-2 with OPT A candle sits on top of a wooden table. C
BLIP-2 with FLAN A candle sits on a wooden table next to a backgammon board and a glass of wine. D
LLaVA (13B) A cozy indoor setting with candles, a guitar, and a warm ambiance. D
LLaVA (34B) A candlelit wooden table with musical instruments and a cozy ambiance. C
MiniCPM-V A dimly lit room with candles and backgammon pieces on a wooden table, creating an atmosphere of relaxation or leisure activity. A
Llama 3.2 Vision (11B) A dimly lit room with a wooden table, featuring a backgammon board and two candles. A
Llama 3.2 Vision (90B) A candle and backgammon board on a wooden table. B

Model accuracy

I evaluated each description using a structured but subjective scoring system. For each image, I identified the two or three most important objects the AI should recognize and include in its description. I also assessed whether the model captured the photo's mood, which can be important for visually impaired users. Finally, I deducted points for repetition, grammar errors, or hallucinations (invented details). Each alt-text received a score from 0 to 5, which I then converted to a letter grade from A to F.

Model Repetitions Hallucinations Moods Average score Grade
VIT-GPT2 Often Often Poor 0.4/5 F
GIT Often Often Poor 1.6/5 D
BLIP Often Often Poor 1.8/5 D
BLIP2 w/OPT Rarely Sometimes Fair 2.6/5 C
BLIP2 w/FLAN Rarely Sometimes Fair 2.2/5 D
LLaVA 13B Never Sometimes Good 3.2/5 C
LLaVA 34B Never Sometimes Good 3.2/5 C
MiniCPM-V Never Never Good 3.8/5 B
Llama 11B Never Rarely Good 4.4/5 B
Llama 90B Never Rarely Good 3.8/5 B
GPT-4o Never Never Good 4.8/5 A
Claude 3.5 Sonnet Never Never Good 5/5 A

The cloud-based models, GPT-4o and Claude 3.5 Sonnet, performed nearly perfectly on my small test of five images, with no major errors, hallucinations, repetitions and excellent mood detection.

Among local models, both Llama variants and MiniCPM-V show the strongest performance.

Repetition in descriptions frustrates users of screen readers. Early models like VIT-GPT2, GIT, BLIP, and BLIP2 frequently repeat content, making them unsuitable.

Hallucinations can be a serious issue in my opinion. Describing nonexistent objects or actions misleads visually impaired users and erodes trust. Among the best-performing local models, MiniCPM-V did not hallucinate, while Llama 11B and Llama 90B each made one mistake. Llama 90B misidentified a cabinet at the museum as a table, and Llama 11B described multiple people wakeboarding instead of just one. While these errors aren't dramatic, they are still frustrating.

Capturing mood is essential for giving visually impaired users a richer understanding of images. While early models struggled in this area, all recent models all performed well. This includes both LLaVA variants and MiniCPM-V.

From a practical standpoint, Llama 11B and MiniCPM-V ran smoothly on my 32GB RAM laptop, but Llama 90B needed more memory. Long story short, this means that Llama 11B and MiniCPM-V are my best candidates for additional testing.

Possible next steps

The results raise a tough question: is a "B"-level alt-text better than none at all? Many human-written alt-texts probably aren't perfect either. Should I wait for local models to hit an "A"-grade, or is an imperfect description still better than no alt-text at all?

Here are four possible next steps:

  1. Combine AI outputs – Run the same image through different models and merge their results to try and create more accurate descriptions.
  2. Wait and upgrade – Use the best local model for now, tag AI-generated alt-texts in the database, and refresh them in 6–12 months when new and better local models are available.
  3. Go cloud-based – Get the best quality with a cloud model, even if it means uploading 65GB of photos. I can't explain why, or if the feeling is even justified, but it feels like giving in.
  4. Hybrid approach – Use AI to generate alt-texts but review them manually. With 9,000 images, that is not practical. I'd need a way to flag alt-texts most likely to be wrong. Can LLMs give me a reliably confidence score?

Each option comes with trade-offs. Some options are quick but imperfect, others take work but might be worth it. Going cloud-based is the easiest but it feels like giving in. Waiting for better models is effortless but means delaying progress. Merging AI outputs or assigning a confidence score takes more effort but might be the best balance of speed and accuracy.

Maybe the solution is a combination of these options? I could go cloud-based now, tag the AI-generated alt-texts in my database, and regenerate them in 6–12 months when LLMs got even better.

It also comes down to pragmatism versus principle. Should I stick to local models because I believe in data privacy and Open Source, or should I prioritize accessibility by providing the best possible alt-text for users? The local-first approach better aligns with my values, but it might come at the cost of a worse experience for visually impaired users.

I'll be weighing these options over the next few weeks. What would you do? I'd love to hear your thoughts!

Update: I've answered these questions in my follow-up blog post: I want to run AI locally. Here's why I'm not (yet).

January 31, 2025

Treasure hunters, we have an update! Unfortunately, some of our signs have been removed or stolen, but don’t worry—the hunt is still on! To ensure everyone can continue, we will be posting all signs online so you can still access the riddles and keep progressing. However, there is one exception: the 4th riddle must still be heard in person at Building H, as it includes an important radio message. Keep your eyes on our updates, stay determined, and don’t let a few missing signs stop you from cracking the code! Good luck, and see you at Infodesk K with舰

January 29, 2025

Are you ready for a challenge? We’re hosting a treasure hunt at FOSDEM, where participants must solve six sequential riddles to uncover the final answer. Teamwork is allowed and encouraged, so gather your friends and put your problem-solving skills to the test! The six riddles are set up across different locations on campus. Your task is to find the correct locations, solve the riddles, and progress to the next step. No additional instructions will be given after this announcement, it’s up to you to navigate and decipher the clues! To keep things fair, no hints or tips will be given舰

Et si on arrêtait d’être de bons petits consultants obéissants ?

Le cauchemar des examens

Régulièrement, je me réveille la nuit avec une boule dans le ventre et une bouffée de panique à l’idée que je n’ai pas étudié mon examen à l’université. Cela fait 20 ans que je n’ai plus passé d’examen et pourtant j’en suis encore traumatisé.

Du coup, j’essaye de proposer à mes étudiants un examen le moins stressant possible. Si un étudiant n’est vraiment nulle part, je profite de l’adrénaline inhérente à un examen pour tenter de lui inculquer les concepts. Parfois, je demande à un étudiant d’enseigner la matière à l’autre. J’impose toute de même certaines règles vestimentaires : la cravate est interdite, mais tout le reste est encouragé. J’ai déjà eu des étudiants en peignoir, un étudiant en costume traditionnel de son pays, et toujours insurpassé, une étudiante en costume complet de Minnie (avec les oreilles, le maquillage, les chaussures, la totale !). Cette année j’ai eu droit… à une banane !

Un étudiant passe son examen déguisé en banane. Un étudiant passe son examen déguisé en banane.

Le monopole de l’East India Company

J’encourage également les étudiants à venir avec leur propre sujet d’examen.
Un de mes étudiants m’a proposé cet article qui compare Google avec l’East India Company qui, comme tous les empires, a fini par s’écrouler sous son propre poids. J’aime l’analogie et la morale : on ne gagne le pouvoir qu’en se faisant des ennemis. C’est lorsqu’on croit avoir le plus de pouvoirs qu’on a le plus d’ennemis qui n’ont rien à perdre et qui veulent se venger. Trump, Facebook, Google. Ils sont au sommet. Mais chaque jour les rangs des rebelles sont étoffés par ceux qui ont cru que leur allégeance et leur soumission leur offriraient une fraction de pouvoir ou de richesse avant d’être déçus. Car le pouvoir absolu ne se partage pas. Il ne se partage, par définition, jamais.

Bon, l’article est fort naïf sur certains aspects. Il dit par exemple qu’AT&T n’a pas exploité sa position dominante parce qu’il suivait une certaine éthique. C’est faux. AT&T n’a pas exploité sa position dominante tout simplement parce que l’entreprise était sous la menace d’un procès pour abus de position dominante. La crainte du procès est ce qui a permis le succès d’UNIX (développé par AT&T) et d’Internet. Lorsque IBM a commencé à avoir une position dominante dans le marché informatique naissant, la crainte d’un procès est ce qui a permis la standardisation du PC que l’on connait aujourd’hui et ce qui a permis l’apparition de l’industrie logicielle où s’est engouffrée Microsoft.

Mais je vous ai déjà raconté cette histoire :

Malheureusement, tout change dans les années 1980 avec la présidence de Reagan (le Trump de l’époque). Ses conseillers instaurent l’idée que les monopoles ne sont finalement pas si nocifs, ils sont même plutôt bons pour l’économie (surtout les économies des politiciens qui ont des actions dans ces monopoles). Du coup, on va beaucoup moins les poursuivre, voire les encourager. De là les succès de Microsoft, Google et Facebook qui, malgré les procès, n’ont pas été scindés ni n’ont jamais dû adapter leurs pratiques.

Si vous lisez ceci, ça vous parait sans doute absurde : comment peut-on justifier que les monopoles ne sont pas nocifs juste pour enrichir les politiciens ? Quelle astuce utiliser ?

Le secret ? Il n’y a pas d’astuce. Pas besoin de se justifier. Il suffit de le faire. Et pour tous les aspects pratiques de n’importe quelle loi, aussi absurde et injuste soit-elle, il suffit de se passer des fonctionnaires scrupuleux et de tout faire faire par des cabinets de consultance. Enfin, surtout un : McKinsey.

McKinsey et la naïveté de la bonté

Étudiant, j’ai participé à une soirée d’embauche de McKinsey. Bon, je n’avais pas trop d’espoir, car ils annonçaient ne prendre que celleux avec les meilleurs points (ce dont j’étais loin), mais je me suis dit qu’on ne savait jamais. Je n’avais aucune idée de ce qu’était McKinsey ni de ce qu’ils faisaient, je savais juste que c’était une sorte de Graal vu qu’ils ne prenaient que les meilleurs.

Assis dans un auditoire, j’ai assisté à la présentation de « cas » réels. Une employée de McKinsey, qui a annoncé avoir fait les mêmes études que moi quelques années auparavant (mais avec de bien meilleurs résultats), a présenté son travail. Il s’agissait de réaliser la fusion de deux entités dont les noms avaient été cachés. Sur l’écran s’affichait des colonnes de « ressources » pour chaque entité puis comment la fusion permettait d’économiser les ressources.

J’étais d’abord un peu perdu dans le jargon. J’ai posé quelques questions et finis par comprendre que les « ressources » étaient des employé·e·s. Que ce que je voyais était avant tout un plan de licenciement brutal. J’ai interrompu la présentation pour demander comment étaient pris en compte les aspects éthiques. J’ai eu droit à une réponse standard comme quoi « l’éthique était primordiale chez McKinsey, qu’ils suivaient des règles strictes ». J’ai insisté, j’ai creusé. Parmi la cinquantaine d’étudiants participants, j’étais le seul à prendre la parole, j’étais le seul à m’étonner (j’en ai discuté après avec d’autres, personne ne semblait avoir vu le problème). J’ai demandé à la présentatrice de me donner un exemple d’une des fameuses règles de l’éthique McKinsey. Et j’ai obtenu cette réponse qui est restée gravée dans ma mémoire : « Un consultant McKinsey doit toujours favoriser l’intérêt de son client, quoi qu’il arrive ».

Après la présentation, je suis allé trouver la consultante en question. Autour d’un petit four, j’ai insisté une fois de plus sur l’éthique. Elle m’a ressorti le même blabla. Je lui ai alors dit que je ne parlais pas de ça. Que toutes ses colonnes de chiffres étaient des personnes qui allaient perdre leur emploi, que la fusion allait avoir un impact économique important sur des milliers de familles et que je me demandais comment cet aspect était envisagé.

Elle a ouvert la bouche. Son visage s’est décomposé. Et la brillante ingénieure qui avait réussi les études les plus difficiles avec les meilleurs points m’a répondu :

« Je n’avais jamais pensé à ça… »

Même les personnes soi-disant les plus intelligentes ne pensent pas. Elles obéissent. « Ne recruter que les meilleurs » n’était pas une technique de recrutement, mais bien une manière de créer un élitisme de façade qui empêchait les heureux élus de se poser des questions.

« Je n’avais jamais pensé à ça… »

À mon examen, j’ai eu une étudiante particulièrement brillante. Je lui ai dit que, vu sa compréhension hyper fine, j’attendais d’elle qu’elle questionne plus les choses, qu’elle réagisse surtout face à d’autres, moins brillants, mais plus sûrs d’eux. Elle est clairement plus intelligente que moi alors pourquoi n’intervient-elle pas pour me signaler lorsque je suis incohérent ? Le monde a besoin de gens intelligents qui posent des questions. Elle s’est défendue : « Mais on m’a toujours appris à faire le contraire ! ».

Dans un très bon article, Garrison Lovely revient sur la stratégie de McKinsey.

Le fait que les personnes qui avaient participé à une manifestation contre Trump soient ensuite des pièces centrales [en temps que consultants McKinsey] de sa politique de déportation est, en un sens, tout ce qu’il faut savoir.
McKinsey exécute, ne fait pas de politique

L’auteur interroge sur ce qui aurait empêché McKinsey d’optimiser la fourniture de fils barbelés des camps de concentration. La réponse tombe : "McKinsey a des valeurs". Des valeurs qui sont enseignées et répétées lors des "Values Days". 20 ans après ma propre expérience d’une soirée McKinsey, rien n’a changé.

La naïveté du bien

Le problème des gens bons et subtils, c’est qu’ils n’arrivent pas à imaginer que l’arnaque est fondamentalement malhonnête et pas subtile.

Ils cherchent à comprendre, à expliquer, à justifier.

Il n’y a rien à comprendre : le malhonnête cherche son profit de manière directe et non subtile. C’est tellement évident que même les plus subtils laissent passer en se disant que ça cache « autre chose ».

Je vois passer des messages qui disent que Trump ou Musk font des choses illégales. Qu’ils ne respectent pas les règles.

Ben justement. C’est le principe.

Que Trump ne peut décemment pas avoir triché aux élections parce que « quelqu’un » se serait opposé. Quelqu’un ? Mais qui ? Ceux qui obéissent à leurs chefs sans poser de questions parce que c’est leur boulot ? Ceux qui ont peur de perdre leur place et qui préfèrent ménager celui qui a gagné les élections ? Ceux qui, au contraire, se disent qu’ils peuvent faire une bonne affaire en brossant le vainqueur dans le sens du poil ?

Si Trump avait été condamné pour son implication dans l’insurrection du 6 janvier, tout le monde lui serait tombé dessus et se serait disputé sa dépouille. Mais même les juges impliqués savaient qu’il pouvait redevenir président. Qu’il utiliserait son pouvoir pour punir toute personne impliquée dans sa condamnation. Il était moins risqué de soutenir Trump que le contraire. La majorité des gens, même les plus puissants, surtout les plus puissants, sont des moutons terrorisés par le bâton et à l’affut de la moindre petite carotte.

Les seuls qui peuvent s’indigner sont celleux qui ont un sens moral fort, qui n’ont rien à perdre, qui n’ont rien à gagner, qui ont la force et l’énergie de s’indigner, le temps pour le faire et les réseaux pour se faire entendre. J’insiste sur le « et » logique. Il faut que toutes ces conditions soient remplies. Et force est de constater que ça ne fait pas beaucoup de monde.

Surtout quand on réalise que ce « pas beaucoup de monde » est majoritairement peuplé d’idéalistes qui ne veulent pas croire que la personne en face puisse être à ce point dénuée de sens moral et de scrupule. Alors, comme des crétins, ils tentent de se faire entendre… sur X ou sur Facebook, des plateformes qui appartiennent à ceux qu’ils cherchent à combattre.

Celleux qui se plaignent sur ces plateformes ont l’impression d’être actifs, mais ils sont algorithmiquement enfermés dans leur petite bulle où iels n’auront aucun impact sur le reste du monde.

« Bon » et « bête » ça commence par la même lettre. On a ce qu’on mérite. Le simple fait d’avoir gardé un compte sur X après le rachat par Elon Musk était un vote virtuel pour Trump. Tout le monde le savait. Vous le saviez. Vous ne pouviez pas ne pas le savoir. C’est juste que, comme un bon consultant McKinsey, vous vous disiez que « ça n’était pas si grave que ça ». Qu’ « il y a des règles, non ? ». Si vous me lisez, vous êtes, comme beaucoup, une bonne personne et donc incapable d’imaginer qu’Elon Musk puisse avoir simplement et très ouvertement manipulé son réseau social pour favoriser Trump.

Mieux vaut tard que jamais

Mais il n’est jamais trop tard pour réagir. Le 1er février est annoncé comme le « Global Switch Day ». Vous êtes invités à migrer de X vers Mastodon.

Le 1er février, migrez de X vers Mastodon Le 1er février, migrez de X vers Mastodon

Mastodon qui devient une fondation. Ça fait plaisir de voir qu’Eugen, le créateur de Mastodon qui tout un temps s’enorgueillait du titre de « CEO de Mastodon » se rend compte que cette pression est énorme, qu’il ne joue pas dans la même cour et que Mastodon est un bien commun. En se faisant appeler « CEO », Meta le flattait pour obtenir sa coopération. Eugen semble avoir compris qu’il se perdait. Excellente interview de Renaud, développeur Mastodon.

Thierry Crouzet fait la comparaison avec les résistants.

En parallèle, Dansup développe Pixelfed, qui ressemble à Instagram. Ce qui est génial c’est que vous pouvez suivre des gens sur Mastodon depuis Pixelfed et vice-versa (enfin, en théorie, faudra qu’on en reparle, car, depuis Pixelfed, vous ne verrez pour le moment que les messages Mastodon contenant une image, j’espère que ça évoluera).

Pixelfed a attiré tellement de gens dégoutés par Meta (propriétaire d’Instagram) que Dansup s’est vu assailli par les investisseurs désireux de mettre des sous dans son "entreprise".

Le 1er février, migrez de Instagram vers Pixelfed Le 1er février, migrez de Instagram vers Pixelfed

Dommage pour eux, comme Mastodon, Pixelfed est un bien public. Il est et sera financé par les dons. Dansup lance d’ailleurs une campagne Kickstarter :

Signal et la messagerie

Lors de mon examen, la plupart des étudiants ont eu des questions sur le Fediverse ou sur Signal. Ce qui m’a permis de sonder leurs utilisations des réseaux sociaux et messageries. Fait marrant : ils sont tous sur des réseaux où ils pensent que « tout le monde est ». Mais, sans communiquer entre eux, ne sont pas d’accord sur quel est le réseau où tout le monde est. J’ai eu des étudiants qui ne jurent que par Instagram et d’autres qui n’ont jamais eu de compte. J’ai eu un étudiant qui est sur Facebook Messenger et sur Signal, mais n’a jamais éprouvé le besoin d’être sur Whatsapp. À côté de lui, un autre étudiant n’avait tout simplement jamais entendu parler de Signal. Il n’y a que Discord qui semble faire l’unanimité.

Celleux qui utilisaient Signal disaient tou·te·s qu’iels regrettaient que Signal ne soit pas plus utilisé. Et bien, le 1er février, c’est l’occasion !

Le 1er février, migrez de Whatsapp vers Signal Le 1er février, migrez de Whatsapp vers Signal

Alors, c’est peut-être le moment d’arrêter de jouer au bon petit consultant McKinsey ! Surtout si vous n’êtes pas payé pour ça…

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

January 27, 2025

Core to the Digital Operational Resilience Act is the notion of a critical or important function. When a function is deemed critical or important, DORA expects the company or group to take precautions and measures to ensure the resilience of the company and the markets in which it is active.

But what exactly is a function? When do we consider it critical or important? Is there a differentiation between critical and important? Can an IT function be a critical or important function?

Defining functions

Let's start with the definition of a function. Surely that is defined in the documents, right? Right?

Eh... no. The DORA regulation does not seem to provide a definition for a function. It does however refer to the definition of critical function in the Bank Recovery and Resolution Directive (BRRD), aka Directive 2014/59/EU. That's one of the regulations that focuses on the resolution in case of severe disruptions, bankrupcy or other failures of banks at a national or European level. A Delegated regulation EU/ 2016/778 further defines several definitions that inspired the DORA regulation as well.

In the latter document, we do find the definition of a function:

‘function’ means a structured set of activities, services or operations that are delivered by the institution or group to third parties irrespective from the internal organisation of the institution;

Article 2, (2), of Delegated regulation 2016/778

So if you want to be blunt, you could state that an IT function which is only supporting the own group (as in, you're not insourcing IT of other companies) is not a function, and thus cannot be a "critical or important function" in DORA's viewpoint.

That is, unless you find that the definition of previous regulations do not necessarily imply the same interpretation within DORA. After all, DORA does not amend the EU 2016/778 regulation. It amends EC 1060/2009, EU 2012/648, EU 2014/600 aka MiFIR, EU 2014/909 aka CSDR and EU 2016/1011 aka Benchmark Regulation. But none of these have a definition for 'function' at first sight.

So let's humor ourselves and move on. What is a critical function? Is that defined in DORA? Not really, sort-of. DORA has a definition for critical or important function, but let's first look at more distinct definitions.

In the BRRD regulation, this is defined as follows:

‘critical functions’ means activities, services or operations the discontinuance of which is likely in one or more Member States, to lead to the disruption of services that are essential to the real economy or to disrupt financial stability due to the size, market share, external and internal interconnectedness, complexity or cross-border activities of an institution or group, with particular regard to the substitutability of those activities, services or operations;

Article 2, (35), of BRRD 2014/59

This extends on the use of function, and adds in the evaluation if it is crucial for the economy, especially when it would be suddenly discontinued. The extension on the definition of function is also confirmed by guidance that the European Single Resolution Board published, namely that "the function is provided by an institution to third parties not affiliated to the institution or group".

The preamble of the Delegated regulation also mentions that its focus is at the safeguarding of the financial stability and the real economy. It gives examples of potential critical functions such as deposit taking, lending and loan services, payment, clearing, custody and settlement services, wholesale funding markets activities, and capital markets and investments activities.

Of course, your IT is supporting your company, and in case of financial institutions, IT is a very big part of the company. Is IT then not involved in all of this?

It sure is...

Defining services

The Delegated regulation EU 2016/778 in its preamble already indicates that functions are supported by services:

Critical services should be the underlying operations, activities and services performed for one (dedicated services) or more business units or legal entities (shared services) within the group which are needed to provide one or more critical functions. Critical services can be performed by one or more entities (such as a separate legal entity or an internal unit) within the group (internal service) or be outsourced to an external provider (external service). A service should be considered critical where its disruption can present a serious impediment to, or completely prevent, the performance of critical functions as they are intrinsically linked to the critical functions that an institution performs for third parties. Their identification follows the identification of a critical function.

Preamble, (8), Delegated regulation 2016/778

IT within an organization is certainly offering services to one or more of the business units within that financial institution. Once the company has defined its critical functions (or for DORA, "critical or important functions"), then the company will need to create a mapping of all assets and services that are needed to realize that function.

Out of that mapping, it is very well possible that several IT services will be considered critical services. I'm myself involved in the infrastructure side of things, which are often shared services. The delegated regulation already points to it, and a somewhat older guideline from the Financial Stability Board has the following to say about critical shared services:

a critical shared service has the following elements: (i) an activity, function or service is performed by either an internal unit, a separate legal entity within the group or an external provider; (ii) that activity, function or service is performed for one or more business units or legal entities of the group; (iii) the sudden and disorderly failure or malfunction would lead to the collapse of or present a serious impediment to the performance of, critical functions.

FSB guidance on identification of critical functions and critical shared services

For IT organizations, it is thus most important to focus on the services they offer.

Definition of critical or important function

Within DORA, the definition of critical or important function is as follows:

(22) ‘critical or important function’ means a function, the disruption of which would materially impair the financial performance of a financial entity, or the soundness or continuity of its services and activities, or the discontinued, defective or failed performance of that function would materially impair the continuing compliance of a financial entity with the conditions and obligations of its authorisation, or with its other obligations under applicable financial services law;

Article 3, (22), DORA

If we compare this definition with the previous ones about critical functions, we notice that it is extended with an evaluation of the impact towards the company - rather than the market. I think it is safe to say that this is the or important part of the critical or important function: whereas a function is critical if its discontinuance has market impact, a function is important if its discontinuance causes material impairment towards the company itself.

Hence, we can consider a critical or important function as being either market impact (critical) or company impact (important), but retaining externally offered (function).

This more broad definition does mean that DORA's regulation puts more expectations forward than previous regulation, which is one of the reasons that DORA is that impactful to financial institutions.

Implications towards IT

From the above, I'd wager that IT itself is not a "critical or important function", but IT offers services which could be supporting critical or important functions. Hence, it is necessary that the company has a good mapping of the functions and their underlying services, operations and systems. From that mapping, we can then see if those underlying services are crucial for the function or not. If they are, then we should consider those as critical or important systems.

This mapping is mandated by DORA as well:

Financial entities shall identify all information assets and ICT assets, including those on remote sites, network resources and hardware equipment, and shall map those considered critical. They shall map the configuration of the information assets and ICT assets and the links and interdependencies between the different information assets and ICT assets.

Article 8, (4), DORA

as well as:

As part of the overall business continuity policy, financial entities shall conduct a business impact analysis (BIA) of their exposures to severe business disruptions. Under the BIA, financial entities shall assess the potential impact of severe business disruptions by means of quantitative and qualitative criteria, using internal and external data and scenario analysis, as appropriate. The BIA shall consider the criticality of identified and mapped business functions, support processes, third-party dependencies and information assets, and their interdependencies. Financial entities shall ensure that ICT assets and ICT services are designed and used in full alignment with the BIA, in particular with regard to adequately ensuring the redundancy of all critical components.

Article 11, paragraph 2, DORA

In more complex landscapes, it is very well possible that the mapping is a multi-layered view with different types of systems or services in between, which could make the effort to identify services as being critical or important quite challenging.

For instance, it could be that the IT organization has a service catalog, but that this service catalog is too broadly defined to use the indication of critical or important. Making a more fine-grained service catalog will be necessary to properly evaluate the dependencies, but that also implies that your business (who has defined their critical or important functions) will need to indicate which fine-grained service they are depending on, rather than the high-level services.

In later posts, I'll probably dive deeper into this layered view.

Feedback? Comments? Don't hesitate to get in touch on Mastodon.

January 26, 2025

The regular FOSDEM lightning talk track isn't chaotic enough, so this year we're introducing Lightning Lightning Talks (now with added lightning!). Update: we've had a lot of proposals, so submissions are now closed! Thought of a last minute topic you want to share? Got your interesting talk rejected? Has something exciting happened in the last few weeks you want to talk about? Get that talk submitted to Lightning Lightning Talks! This is an experimental session taking place on Sunday afternoon (13:00 in k1105), containing non-stop lightning fast 5 minute talks. Submitted talks will be automatically presented by our Lightning舰

January 21, 2025

I'm often asked, Will AI agents replace digital marketers and site builders?. The answer is yes, at least for certain kinds of tasks.

To explore this idea, I prototyped two AI agents to automate marketing tasks on my personal website. They update meta descriptions to improve SEO and optimize tags to improve content discovery.

Watching the AI agents in action is incredible. In the video below, you'll see them effortlessly navigate my Drupal site — logging in, finding posts, and editing content. It's a glimpse into how AI could transform the role of digital marketers.

The experiment

I built two AI agents to help optimize my blog posts. Here is how they work together:

  • Agent 1: Content analysis: This agent finds a blog post, reviews its content, and suggests improved summaries and tags to enhance SEO and increase discoverability.
  • Agent 2: Applying updates: After manual approval, this agent logs into the site and updates the summary and tags suggested by the first agent.

All of this could be done in one step, or with a single agent, but keeping a 'human-in-the-loop' is good for quality assurance.

This was achieved with just 120 lines of Python code and a few hours of trial and error. As the video demonstrates, the code is approachable for developers with basic programming skills.

The secret ingredient is the browser_use framework, which acts as a bridge between various LLMs and Playwright, a framework for browser automation and testing.

The magic and the reality check

What makes this exciting is the agent's ability to problem-solve. It's almost human-like.

Watching the AI agents operate my site, I noticed they often face the same UX challenges as humans. It likely means that the more we simplify a CMS like Drupal for human users, the more accessible it becomes for AI agents. I find this link between human and AI usability both striking and thought-provoking.

In the first part of the video, the agent was tasked with finding my DrupalCon Lille 2023 keynote. When scrolling through the blog section failed, it adapted by using Google search instead.

In the second part of the video, it navigated Drupal's more complex UI elements, like auto-complete taxonomy fields, though it required one trial-and-error attempt.

The results are incredible, but not flawless. I ran the agents multiple times, and while they performed well most of the time, they aren't reliable enough for production use. However, this field is evolving quickly, and agents like this could become highly reliable within a year or two.

Native agents versus explorer agents

In my mind, agents can be categorized as "explorer agents" or "native agents". I haven't seen these terms used before, so here is how I define them:

  • Explorer agents: These agents operate across multiple websites. For example, an agent might use Google to search for a product, compare prices on different sites, and order the cheapest option.
  • Native agents: These agents work within a specific site, directly integrating with the CMS to leverage its APIs and built-in features.

The browser_use framework, in my view, is best suited for explorer agents. While it can be applied to a single website, as shown in my demo, it's not the most efficient approach.

Native agents that directly interact with the CMS's APIs should be more effective. Rather than imitating human behavior to "search" for content, the agent could retrieve it directly through a single API call. It could then programmatically propose changes within a CMS-supported content editing workflow, complete with role-based permissions and moderation states

I can also imagine a future where native agents and explorer agents work together (hybrid agents), combining the strengths of both approaches to unlock even greater opportunities.

Next steps

A next step for me is to build a similar solution using Drupal's AI agent capabilities. Drupal's native AI agents should make finding and updating content more efficiently.

Of course, other digital marketing use cases might benefit from explorer agents. I'd be happy to explore these possibilities as well. Let me know if you have ideas.

Subscribe to my blog

Join 5,000+ subscribers and get new posts by email.

Sign up Or subscribe using RSS

Conclusions

Building an AI assistant to handle digital marketing tasks is no longer science fiction. It's clear that, soon, AI agents will be working alongside digital marketers and site builders.

These tools are advancing rapidly and are surprisingly easy to create, even though they're not yet perfect. Their potential disruption is both exciting and hard to fully understand.

As Drupal, we need to stay ahead by asking questions like: are we fully imagining the disruption AI could bring? The future is ours to shape, but we need to rise to the challenge.

January 20, 2025

Ne venez pas dire que vous n’étiez pas prévenus…

…c’est juste que vous pensiez ne pas être concernés

Depuis des décennies, je fais partie de ces gens qui tentent d’alerter sur les terrifiantes possibilités qu’offre l’aveuglement technologique dans lequel nous sommes plongés.

Je croyais que je devais expliquer, informer encore et encore.

Je découvre avec effroi que même ceux qui comprennent ce que je dis n’agissent pas. Voire agissent dans le sens contraire. Les électeurs de Trump, pour la plupart, savent très bien ce qui va arriver. Les artistes défendent Facebook et Spotify. Les politiciens les plus à gauche restent accrochés à X comme leur seule fenêtre sur le monde. Pourtant, ils sont prévenus !

C’est juste qu’ils croient qu’ils ne sont pas concernés. C’est juste que nous pensons naïvement que ça n’arrive qu’aux autres. Que nous sommes, d’une manière ou d’une autre, parmi ceux qui seront les privilégiés.

Je suis un homme. Blanc. Cisgenre. Avec un très bon diplôme. Une très bonne situation. Dans un des endroits les plus protégés, les plus démocratiques. Bref, je serai parmi les tout derniers à souffrir des effets combinés de la politique et de la technologie.

Et j’ai peur. Je suis terrifié.

J’ai peur de l’espionnage permanent

Nous nous soumettons volontairement et presque consciemment à un espionnage permanent. Je vous avais déjà raconté que même les distributeurs de boissons font de la reconnaissance faciale !

Comme le souligne Post Tenebras Lire, la réalité ressemble de plus en plus à mon roman Printeurs.

Mais la réalité a complètement dépassé la fiction, le cas le plus emblématique étant ce qui arrive aux Ouïghours qui sont contrôlés en permanence, dont la moindre image postée sur les réseaux sociaux (même si elle a été effacée depuis des années) peut servir d’excuse pour être enfermé.

Fuir les réseaux sociaux ? C’est trop tard pour eux, car le simple fait de ne pas avoir un smartphone est considéré comme suspect. Tout comme, en France, le fait d’utiliser Signal a déjà été considéré comme un élément à charge suffisant pour suspecter l’utilisateur d’écoterrorisme. (une seule solution pour contrer cela: migrer massivement vers Signal !)

Je le dis et je le répète : vous avez le droit de quitter les réseaux sociaux propriétaires. Vous avez le droit de désinstaller Whatsapp pour Signal. Si vous avez le moindre doute, je vous garantis que vous vous sentirez nettement mieux après.

Ne venez pas dire que vous n’étiez pas prévenus.

J’ai peur de l’uniformisation abrutie

Les algorithmes des plateformes propriétaires leur permettent de vous imposer leur choix, leur vision du monde. X/Twitter cache les démocrates et met Trump en avant pendant les élections. Facebook cache le moindre bout de nichon, mais promeut les nazis. Ce n’est donc pas une surprise d’apprendre que Spotify fait exactement la même chose en générant de la musique qui est ensuite imposée dans les playlists, surtout celle qui sont de type "musique de fond/musique d’ambiance".

De cette manière, ils ne doivent pas payer de royalties aux musiciens. Qui n’en touchent déjà pas beaucoup.

Je suis un pirate. Lorsque je découvre un groupe qui me parle, je télécharge plusieurs albums illégalement en MP3. Si j’aime bien, j’achète les MP3 légaux sans DRM (ce que permet Bandcamp par exemple). Je suis un pirate, mais en achetant l’album directement, je fais plus pour l’artiste plus que des centaines voire des milliers de streams.

Le CEO de Spotify gagne chaque année plus d’argent que Taylor Swift. Il est plus riche que Paul McCartney ou Mick Jagger après 50 ans de carrière chacun.

Les pirates sont aux artistes ce que les immigrés sont aux pauvres : un bouc émissaire bien pratique. Et, en attendant, Amazon et Spotify tuent les artistes et nous baignent dans une mélasse uniformisée.

C’est bien pour ça que l’IA semble si intéressante pour le business : c’est, par définition, une production de mélasse fade et uniforme.

IA qui sont entraînées en utilisant… les bases de données pirates ! Car, oui, Meta a entrainé ses IA sur la base de données libgen.is.

Ne venez pas dire que vous n’étiez pas prévenus.

Rappel: libgen.is est aux livres ce que thepiratebay est aux films. Une gigantesque bibliothèque pirate, un bastion de préservation de la culture. Je n’exagère pas : lors d’un dîner, mon ami Henri Lœvenbruck m’a parlé d’un livre assez vieux qui n’était plus édité et qui était devenu introuvable. Même dans les bibliothèques, les bouquineries et les catalogues en ligne, il ne parvenait pas à mettre la main sur ce livre. Livre que j’ai trouvé, devant lui, sur libgen.is en quelques secondes.

Ceux que vous accusez d’être des pirates sont les défenseurs, les protecteurs et les diffuseurs de la culture humaine. Et puisque Meta utilise libgen.is pour entrainer ses IA, ce n’est plus vraiment du piratage, vous avez moralement le droit d’utiliser cette bibliothèque partagée. Sur laquelle, soit dit en passant, je vous encourage à découvrir mes livres si vous ne voulez/pouvez pas les payer.

J’ai peur de voir disparaitre la démocratie

Tous les experts le clament depuis 25 ans : le vote électronique enterre complètement la démocratie.

La tricherie n’a même pas besoin d’être subtile ni même plausible. Le marketing et la politique ont découvert que les 5% d’intellectuels qui s’indignent ne pèsent pas lourd. Que le reste de la population ne demande qu’une chose : qu’on leur mente !

Les personnes qui mettent en place le vote électronique sont des gens malhonnêtes qui espèrent en tirer profit sans se rendre compte que leurs adversaires peuvent faire de même.

Ou alors des complets crétins.

Mais les deux ne sont pas incompatibles.

Dans tous les cas, ce sont les fossoyeurs de la démocratie.

Il est presque certain que Trump a triché pour être élu en piratant les systèmes de vote.

J’en étais convaincu depuis bien avant l’élection. Pourquoi ? Tout simplement parce que Georges W. Bush l’a fait avant lui en 2000 et que ça a très bien fonctionné. Le CEO de Diebold, la société en charge de construire les ordinateurs de vote, avait d’ailleurs déclaré à l’époque qu’il ferait tout ce qui est en son pouvoir pour que Georges W. Bush soit élu.

Georges W. Bush a d’ailleurs perdu cette élection. Sur absolument tous les critères. Mais Al Gore a préféré reconnaître une défaite mathématiquement impossible en Floride pour éviter des débordements de violence.

La tricherie et la violence ont permis à Georges W Bush de devenir président à la place d’Al Gore, ce qui a consacré cette stratégie électorale et durablement orienté le monde entier.

Grâce au vote électronique et aux connexions de Trump avec la Silicon Valley (Elon Musk, Peter Thiel, …), Trump avait la possibilité de tricher très facilement.

La question n’est donc pas de savoir s’il l’a fait, mais « Qu’est-ce qui l’aurait empêché de le faire ? »

Réponse : rien.

Ne vous demandez pas s’il est probable que Trump ait triché, mais, au contraire, s’il est probable qu’il ne l’ait pas fait.

Résumons : l’équipe de Trump avait clairement les moyens de pirater le vote électronique. Elle avait les données nécessaires (souvenez-vous d’Elon Musk offrant un million de dollars dans une tombola en échange des données personnelles des votants). Et Trump a obtenu un résultat statistiquement incroyablement improbable : gagner les sept swing states en gagnant juste les comtés les plus disputés avec juste ce qu’il faut de marge pour éviter un recompte et avec entre 5% et 7% de "bullet votes" (des bulletins juste pour Trump, mais ne participant pas aux autres élections) alors que la norme pour les "bullet votes" est entre… 0,05% et 1% dans les cas extrêmes (ce qui est le cas dans les comtés moins disputés). Le tout en ayant exactement le même nombre de voix que lors de l’élection de 2020.

EDIT: après recomptage, il semblerait que l’argument des "bullet votes" ne tiennent pas complètement la route. Le fond ne change pas mais toute triche, si avérée, ne serait pas aussi statistiquement évidente. Voir l’article suivant écrit suite au premier.

Mais vous savez quoi ?

Cela ne change rien. Parce que depuis Al Gore, on sait que les républicains trichent à outrance et que les démocrates, pour être élus, ne doivent pas juste remporter l’élection : ils doivent la gagner à un tel point que même les tricheries ne soient pas suffisantes. Cela ne veut pas dire que les démocrates ne trichent pas. Mais juste qu’ils le font moins bien ou qu’ils ont une certaine retenue quand ils le font.

La tromperie et la menace de violence gouvernent. Pendant que les politiciens vaguement plus progressistes/humanistes perdent les élections en tentant d’obtenir des followers sur des réseaux sociaux propriétaires totalement contrôlés par leur ennemi juré. Ils sont peut-être moins malhonnêtes, mais totalement crétins.

Ne venez pas dire que vous n’étiez pas prévenus. C’est juste que vous pensiez ne pas être concernés.

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

January 17, 2025

As in previous years, some small rooms will be available for Birds of a Feather sessions. The concept is simple: Any project or community can reserve a timeslot (30 minutes or 1 hour) during which they have the room just to themselves. These rooms are intended for ad-hoc discussions, meet-ups or brainstorming sessions. They are not a replacement for a developer room and they are certainly not intended for talks. Schedules: BOF Track A, BOF Track B, BOF Track C. To apply for a BOF session, enter your proposal at https://fosdem.org/submit. Select any of the BOF tracks and mention in舰

January 16, 2025

With FOSDEM just a few days away, it is time for us to enlist your help. Every year, an enthusiastic band of volunteers make FOSDEM happen and make it a fun and safe place for all our attendees. We could not do this without you. This year we again need as many hands as possible, especially for heralding during the conference, during the buildup (starting Friday at noon) and teardown (Sunday evening). No need to worry about missing lunch at the weekend, food will be provided. Would you like to be part of the team that makes FOSDEM tick?舰
If your non-geek partner and/or kids are joining you to FOSDEM, they may be interested in spending some time exploring Brussels while you attend the conference. Like previous years, FOSDEM is organising sightseeing tours. UPDATE: The tour is now fully booked.

January 15, 2025

We were made aware of planned protests during the upcoming FOSDEM 2025 in response to a scheduled talk which is causing controversy. The talk in question is claimed to be on the schedule for sponsorship reasons; additionally, some of the speakers scheduled to speak during this talk are controversial to some of our attendees. To be clear, in our 25 year history, we have always had the hard rule that sponsorship does not give you preferential treatment for talk selection; this policy has always applied, it applied in this particular case, and it will continue to apply in the future.舰

We did it: Drupal CMS 1.0 is here! 🎉

Eight months ago, I challenged our community to make Drupal easier for marketers, content creators, and site builders. Today, on Drupal's 24th birthday, we're making history with the launch of Drupal CMS 1.0.

With this release, you now have two ways to build with Drupal:

  • Drupal Core serves expert users and developers who want complete control over their websites. It provides a blank canvas for building websites and has been the foundation behind millions of websites since Drupal began 24 years ago.
  • Drupal CMS is a ready-to-use platform for marketing teams, content creators and site builders built on Drupal 11 core. When you install Drupal CMS, you get a set of out-of-the-box tools such as advanced media management, SEO tools, AI-driven website building, consent management, analytics, search, automatic updates and more.

To celebrate this milestone, more than 60 launch parties are happening around the world today! These celebrations highlight one of Drupal's greatest strengths: a worldwide community that builds and innovates together.

If you want to try Drupal CMS, you can start a free trial today at https://www.drupal.org/drupal-cms/trial.

Built for ambitious marketers

Drupal CMS targets organizations with ambitious digital goals, particularly in mid-market and enterprise settings. The platform provides a robust foundation that adapts and scales with evolving needs.

Organizations often hit a growth ceiling with non-Drupal CMS platforms. What starts as a simple website becomes a constraint as needs expand. Take privacy and consent management as an example: while these features are now essential due to GDPR, CCPA, and growing privacy concerns, most CMS platforms don't offer them out of the box. This forces organizations to create patchwork solutions.

Drupal CMS addresses this by including privacy and consent management tools by default. This not only simplifies setup but also sets a new standard for CMS platforms, promoting a better Open Web – one that prioritizes user privacy while helping organizations meet regulatory requirements.

Recipes for success

The privacy and consent management feature is just one of many 'recipes' available in Drupal CMS. Recipes are pre-configured packages of features, like blogs, events, or case studies, that simplify and speed up site-building. Each recipe automatically installs the necessary modules, sets up content types, and applies configurations, reducing manual setup.

This streamlined approach makes Drupal more accessible for beginners but also more efficient for experienced developers. Drupal CMS 1.0 launches with nearly 30 recipes included, many of which are applied by default to provide common functionality that most sites require. Recipes not applied by default are available as optional add-ons and can be applied either during setup or later through the new Project Browser. More recipes are already in development, with plans to release new versions of Drupal CMS throughout the year, each adding fresh recipes.

The Drupal CMS installer lets users choose from predefined 'recipes' like blog, events, case studies and more. Each recipe automatically downloads the required modules, sets up preconfigured content types, and applies the necessary configurations.

Pioneering the future, again

Drupal CMS not only reduces costs and accelerates time to value with recipes but also stands out with innovative features like AI agents designed specifically for site building. While many platforms use AI primarily for content creation, our AI agents go further by enabling advanced tasks such as creating custom content types, configuring taxonomies, and more.

This kind of innovation really connects to Drupal's roots. In its early days, Drupal earned its reputation as a forward-thinking, innovative CMS. We helped pioneer the assembled web (now called 'composable') and contributed to the foundation of Web 2.0, shipping with features like blogging, RSS, and commenting long before the term Web 2.0 existed. Although it happened long ago and many may not remember, Drupal was the first CMS to adopt jQuery. This move played a key role in popularizing jQuery and establishing it as a cornerstone of web development.

Curious about what Drupal CMS' AI agents can do? Watch Ivan Zugec's video for a hands-on demonstration of how these tools simplify site-building tasks – even for expert developers.

We don't know exactly where AI agents will take us, but I'm excited to explore, learn, and grow. It feels like the early days when we experimented and boldly ventured into the unknown.

Changing perceptions and reaching more users

Drupal has often been seen as complex, but Drupal CMS is designed to change that. Still, we know that simply creating a more user-friendly and easier-to-maintain product isn't enough. After 24 years, many people still hold outdated perceptions shaped by experiences from over a decade ago.

Changing those perceptions takes time and deliberate effort. That is why the Drupal CMS initiative is focused not just on building software but also on repositioning and marketing Drupal in a way that highlights how much it has evolved.

The new Drupal.org features a refreshed brand and updated messaging, positioning Drupal as a modern, composable CMS.

To make this happen, we've refreshed our brand and started reworking Drupal.org with the help of the Drupal Association and our Drupal Certified Partners. The updated brand feels fresher, more modern, and more appealing to a larger audience.

For the first time, the Drupal Association has hired two full-time product marketers to help communicate our message.

Our goal is clear: to help people move past outdated perceptions and see Drupal for what it truly is – a powerful, modern platform for building websites that is becoming more user-friendly, as well as more affordable to use and maintain.

Achieving bold ambitions through collaboration

Launching the Drupal CMS initiative was bold and ambitious, requiring extraordinary effort from our community – and they truly stepped up. It was ambitious because this initiative has been about much more than building a second version of Drupal. It's been a focused and comprehensive effort to expand our market, modernize our brand, accelerate innovation, expand our marketing, and reimagine our partner ecosystem.

When I announced Drupal Starshot and Drupal CMS just 8 months ago, I remember turning to the team and asking, How exactly are we going to pull this off?. We had a lot to figure out – from building a team, setting goals, and mapping a path forward. It was a mix of uncertainty, determination, and maybe a touch of What have we gotten ourselves into?.

A key success factor has been fostering closer collaboration among contributors, agency partners, Drupal Core Committers, Drupal Association staff, and the Drupal Association Board of Directors. This stronger alignment didn't happen by chance; it's the result of thoughtfully structured meetings and governance changes that brought everyone closer together.

After just 8 months, the results speak for themselves. Drupal CMS has significantly increased the pace of innovation and the level of contributions to Drupal. It's a testament to what we can achieve when we work together. We've seen a 40% increase in contributor activity since the initiative launch, with over 2,000 commits from more than 300 contributors.

Drupal CMS has been a powerful catalyst for accelerating innovation and collaboration. Since development began in 2024, contributions have soared. Organization credits for strategic initiatives grew by 44% compared to 2023, with individual contributions increasing by 37%. The number of unique contributors rose by 12.5%, and participating organizations grew by 11.3%.

The initiative required me to make a significant time commitment I hadn't anticipated at the start of 2024 – but it's an experience I'm deeply grateful for. The Drupal CMS leadership team met at least twice a week, often more, to tackle challenges head-on. Similarly, I had weekly meetings with the Drupal Association.

Along the way we developed new working principles. One key principle was to solve end-user problems first, focusing on what marketers truly need rather than trying to account for every edge case. Another was prioritizing speed over process, enabling us to innovate and adapt quickly. These principles are still evolving, and now that the release is behind us, I'm eager to refine them further with the team.

The work we did together was intense, energizing, and occasionally filled with uncertainty about meeting our deadlines. We built strong bonds, learned to make quick, effective decisions, and maintained forward momentum. This experience has left me feeling more connected than ever to our shared mission.

The Drupal CMS roadmap for 2025

As exciting as this achievement is, some might ask if we've accomplished everything we set out to do. The answer is both yes and no. We've exceeded my expectations in collaboration and innovation, making incredible progress. But there is still much to do. In many ways, we're just getting started. We're less than one-third of the way through our three-year product strategy.

With Drupal CMS 1.0 released, 2025 is off to a strong start. Our roadmap for 2025 is clear: we'll launch Experience Builder 1.0, roll out more out-of-the-box recipes for marketers, improve our documentation, roll out our new brand to more parts of Drupal.org, and push forward with innovative experiments.

Each step brings us closer to our goal: modernizing Drupal and making Drupal the go-to platform for marketers and developers who want to build ambitious digital experiences — all while championing the Open Web.

Subscribe to my blog

Join 5,000+ subscribers and get new posts by email.

Sign up Or subscribe using RSS

Thank you, Drupal community

We built Drupal CMS in a truly open source way – collaboratively, transparently, and driven by community contributions – proving once again that open source is the best way to build software.

The success of Drupal CMS 1.0 reflects the work of countless contributors. I'm especially grateful to these key contributors and their organizations (listed alphabetically): Jamie Abrahams (FreelyGive), Gareth Alexander (Zoocha), Martin Anderson-Clutz (Acquia), Tony Barker (Annertech), Pamela Barone (Technocrat), Addison Berry (Drupalize.me), Jim Birch (Kanopi Studios), Baddy Breidert (1xINTERNET), Christoph Breidert (1xINTERNET), Nathaniel Catchpole (Third and Grove / Tag1 Consulting), Cristina Chumillas (Lullabot), Suzanne Dergacheva (Evolving Web), Artem Dmitriiev (1xINTERNET), John Doyle (Digital Polygon), Tim Doyle (Drupal Association), Sascha Eggenberger (Gitlab), Dharizza Espinach (Evolving Web), Tiffany Farriss (Palantir.net), Matthew Grasmick (Acquia), Adam Globus-Hoenich (Acquia), Jürgen Haas (LakeDrops), Mike Herchel (DripYard), J. Hogue (Oomph, Inc), Gábor Hojtsy (Acquia), Emma Horrell (University of Edinburgh), Marcus Johansson (FreelyGive), Nick Koger (Drupal Association), Tim Lehnen (Drupal Association), Pablo López Escobés (Lullabot), Christian López Espínola (Lullabot), Leah Magee (Acquia), Amber Matz (Drupalize.me), Lenny Moskalyk (Drupal Association), Lewis Nyman, Matt Olivera (Lullabot), Shawn Perritt (Acquia), Megh Plunkett (Lullabot), Tim Plunkett (Acquia), Kristen Pol (Salsa Digital), Joe Shindelar (Drupalize.me), Lauri Timmanee (Acquia), Matthew Tift (Lullabot), Laurens Van Damme (Dropsolid), Ryan Witcombe (Drupal Association), Jen Witowski (Lullabot).

I also want to recognize our Marketing Committee, the Core Committers, the Drupal Association Board of Directors, and the Drupal Starshot Advisory Council, whose guidance and strategic input shaped this initiative along the way.

While I've highlighted some contributors here, I know there are hundreds more who shaped Drupal CMS 1.0 through their code, testing, UX work, feedback, advocacy and more. Each contribution, big or small, moved us forward. To everyone who helped build this milestone: THANK YOU!

January 12, 2025

One of the topics that most financial institutions are (still) currently working on, is their compliance with a European legislation called DORA. This abbreviation, which stands for "Digital Operational Resilience Act", is a European regulation. European regulations apply automatically and uniformly across all EU countries. This is unlike another recent legislation called NIS2, the "Network and Information Security" directive. As a EU directive, NIS2 requires the EU countries to formulate the directive into local law. As a result, different EU countries can have a slightly different implementation.

The DORA regulation applies to the EU financial sector, and has some strict requirements in it that companies' IT stakeholders are affected by. It doesn't often sugar-coat things like some frameworks do. This has the advantage that its "interpretation flexibility" is quite reduced - but not zero of course. Yet, that advantage is also a disadvantage: financial entities might have had different strategies covering their resiliency, and now need to adjust their strategy.

January 09, 2025

The preFOSDEM MySQL Belgian Days 2025 will occur at the usual place (ICAB Incubator, Belgium, 1040 Bruxelles) on Thursday, January 30th, and Friday, January 31st, just before FOSDEM. Again this year, we will have the chance to have incredible sessions from our Community and the opportunity to meet some MySQL Engineers from Oracle. DimK will […]

To our valued customers, partners, and the Drupal community.

I'm excited to share an important update about my role at Acquia, the company I co-founded 17 years ago. I'm transitioning from my operational roles as Chief Technology Officer (CTO) and Chief Strategy Officer (CSO) to become Executive Chairman. In this new role, I'll remain an Acquia employee, collaborating with Steve Reny (our CEO), our Board of Directors, and our leadership team on company strategy, product vision, and M&A.

This change comes at the right time for both Acquia and me. Acquia is stronger than ever, investing more in Drupal and innovation than at any point in our history. I made this decision so I can rebalance my time and focus on what matters most to me. I'm looking forward to spending more time with family and friends, as well as pursuing personal passions (including more blogging).

This change does not affect my commitment to Drupal or my role in the project. I will continue to lead the Drupal Project, helping to drive Drupal CMS, Drupal Core, and the Drupal Association.

Six months ago, I already chose to dedicate more of my time to Drupal. The progress we've made is remarkable. The energy in the Drupal community today is inspiring, and I'm amazed by how far we've come with Drupal Starshot. I'm truly excited to continue our work together.

Thank you for your continued trust and support!

January 07, 2025

FOSDEM Junior is a collaboration between FOSDEM, Code Club, CoderDojo, developers, and volunteers to organize workshops and activities for children during the FOSDEM weekend. These activities are for children to learn and get inspired about technology. This year’s activities include microcontrollers, embroidery, game development, music, and mobile application development. Last year we organized the first edition of FOSDEM Junior. We are pleased to announce that we will be back this year. Registration for individual workshops is required. Links can be found on the page of each activity. The full schedule can be viewed at the junior track schedule page. You舰

January 02, 2025

2024 brought a mix of work travel and memorable adventures, taking me to 13 countries across four continents — including the ones I call home. With 39 flights and 90 nights spent in hotels and rentals (about 25% of the year), it was a year marked by movement and new experiences.

Activity Count
🌍 Countries visited 13
✈️ Flights taken 39
🚕 Taxi rides 158
🍽️ Restaurant visits 175
☕️ Coffee shop visits 44
🍺 Bar visits 31
🏨 Days at hotel or rentals 90
⛺️ Days camping 12

Countries visited:

  • Australia
  • Belgium
  • Bulgaria
  • Canada
  • Cayman Islands
  • France
  • Japan
  • Netherlands
  • Singapore
  • South Korea
  • Spain
  • United Kingdom
  • United States

January 01, 2025

2025 = (20 + 25)²

2025 = 45²

2025 = 1³+2³+3³+4³+5³+6³+7³+8³+9³

2025 = (1+2+3+4+5+6+7+8+9)²

2025 = 1+3+5+7+9+11+...+89

2025 = 9² x 5²

2025 = 40² + 20² + 5²

December 30, 2024

At DrupalCon Asia in Singapore a few weeks ago, I delivered my traditional State of Drupal keynote. This event marked DrupalCon's return to Asia after an eight-year hiatus, with the last one being DrupalCon Mumbai in 2016.

It was so fun to reconnect with the Drupal community across Asia and feel the passion and enthusiasm for Drupal. The positive energy was so contagious that three weeks later, I still feel inspired by it.

If you missed the keynote, you can watch the video below, or download my slides (196 MB).

I talked about the significant progress we've made on Drupal CMS (code name Drupal Starshot) since DrupalCon Barcelona just a few months ago.

Our vision for Drupal CMS is clear: to set the standard for no-code website building. My updates highlighted how Drupal CMS empowers digital marketers and content creators to design sophisticated digital experiences while preserving Drupal's power and flexibility.

For more background on Drupal CMS, I recommend reading our three-year strategy document. We're about a quarter of the way through, time-wise, and as you'll see from my keynote, we're making very fast progress.

A slide from my recent DrupalCon Singapore State of Drupal keynote showcasing key contributors to Drupal CMS. This slide showcases how we recognize and celebrate Makers in our community, encouraging active participation in the project.

Below are some of the key improvements I showcased in my keynote, highlighted in short video segments. These videos demonstrate just 7 recipes, but we have nearly 20 in development.

Watching these demos, it will become very clear how much time and effort Drupal CMS can save for both beginners and experienced developers. Manually assembling these features would take weeks for a Drupal expert and months for a novice. These recipes pack a wealth of expertise and best practices. What once took a Drupal expert weeks can now be done by a novice in hours.

AI support in Drupal

We're integrating AI agents into Drupal to assist with complex site-building tasks, going far beyond just content creation. Users can choose to have AI complete tasks automatically or provide step-by-step guidance, helping them learn Drupal as they go.

Search

We're including enhanced search functionality that includes autocomplete and faceted search, delivering enterprise-grade functionality out-of-the-box.

Privacy

With increasing global privacy regulations, marketers need robust compliance solutions, yet very few content management systems offer this out-of-the-box. I demonstrated how Drupal CMS will offer a user-centric approach to privacy and consent management, making compliance easier and more effective.

Media management

Our improved media management tools now include features like focal point control and image style presets, enabling editors to handle visual content efficiently while maintaining accessibility standards.

Accessibility tools

Our accessibility tools provide real-time feedback during content creation, helping identify and resolve potential issues that could affect the user experience for visually-impaired visitors.

Analytics

Analytics integration streamlines the setup of Google Analytics and Tag Manager, something that 75% of all marketers use.

Experience Builder

Drupal's new Experience Builder will bring a massive improvement in visual page building. It combines drag-and-drop simplicity with an enterprise-grade component architecture. It looks fantastic, and I'm really excited for it!

Conclusion

Drupal CMS has been a catalyst for innovation and collaboration, driving strong growth in organizational credits. Development of Drupal CMS began in 2024, and we expect a significant increase in contributions this year. Credits have tripled from 2019 to 2024, demonstrating our growing success in driving strategic innovation in Drupal.

In addition to our progress on Drupal CMS, the product, we've made real strides in other areas, such as marketing, modernizing Drupal.org, and improving documentation – all important parts of the Drupal Starshot initiative.

Overall, I'm incredibly proud of the progress we've made. So much so that we've released our first release candidate at DrupalCon Singapore, which you can try today by following my installation instructions for Drupal CMS.

While we still have a lot of work left, we are on track for the official release on January 15, 2025! To mark the occasion, we're inviting the Drupal community to organize release parties around the world. Whether you want to host your own event or join a party near you, you can find all the details and sign-up links for Drupal CMS release parties. I'll be celebrating from Boston and hope to drop in on other parties via Zoom!

Finally, I want to extend my heartfelt thanks to everyone who has contributed to Drupal CMS and DrupalCon Singapore. Your hard work and dedication have made this possible. Thank you!

December 27, 2024

At work, I've been maintaining a perl script that needs to run a number of steps as part of a release workflow.

Initially, that script was very simple, but over time it has grown to do a number of things. And then some of those things did not need to be run all the time. And then we wanted to do this one exceptional thing for this one case. And so on; eventually the script became a big mess of configuration options and unreadable flow, and so I decided that I wanted it to be more configurable. I sat down and spent some time on this, and eventually came up with what I now realize is a domain-specific language (DSL) in JSON, implemented by creating objects in Moose, extensible by writing more object classes.

Let me explain how it works.

In order to explain, however, I need to explain some perl and Moose basics first. If you already know all that, you can safely skip ahead past the "Preliminaries" section that's next.

Preliminaries

Moose object creation, references.

In Moose, creating a class is done something like this:

package Foo;

use v5.40;
use Moose;

has 'attribute' => (
    is  => 'ro',
    isa => 'Str',
    required => 1
);

sub say_something {
    my $self = shift;
    say "Hello there, our attribute is " . $self->attribute;
}

The above is a class that has a single attribute called attribute. To create an object, you use the Moose constructor on the class, and pass it the attributes you want:

use v5.40;
use Foo;

my $foo = Foo->new(attribute => "foo");

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a new object with the attribute attribute set to bar. The attribute accessor is a method generated by Moose, which functions both as a getter and a setter (though in this particular case we made the attribute "ro", meaning read-only, so while it can be set at object creation time it cannot be changed by the setter anymore). So yay, an object.

And it has methods, things that we set ourselves. Basic OO, all that.

One of the peculiarities of perl is its concept of "lists". Not to be confused with the lists of python -- a concept that is called "arrays" in perl and is somewhat different -- in perl, lists are enumerations of values. They can be used as initializers for arrays or hashes, and they are used as arguments to subroutines. Lists cannot be nested; whenever a hash or array is passed in a list, the list is "flattened", that is, it becomes one big list.

This means that the below script is functionally equivalent to the above script that uses our "Foo" object:

use v5.40;
use Foo;

my %args;

$args{attribute} = "foo";

my $foo = Foo->new(%args);

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a hash %args wherein we set the attributes that we want to pass to our constructor. We set one attribute in %args, the one called attribute, and then use %args and rely on list flattening to create the object with the same attribute set (list flattening turns a hash into a list of key-value pairs).

Perl also has a concept of "references". These are scalar values that point to other values; the other value can be a hash, a list, or another scalar. There is syntax to create a non-scalar value at assignment time, called anonymous references, which is useful when one wants to remember non-scoped values. By default, references are not flattened, and this is what allows you to create multidimensional values in perl; however, it is possible to request list flattening by dereferencing the reference. The below example, again functionally equivalent to the previous two examples, demonstrates this:

use v5.40;
use Foo;

my $args = {};

$args->{attribute} = "foo";

my $foo = Foo->new(%$args);

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a scalar $args, which is a reference to an anonymous hash. Then, we set the key attribute of that anonymous hash to bar (note the use arrow operator here, which is used to indicate that we want to dereference a reference to a hash), and create the object using that reference, requesting hash dereferencing and flattening by using a double sigil, %$.

As a side note, objects in perl are references too, hence the fact that we have to use the dereferencing arrow to access the attributes and methods of Moose objects.

Moose attributes don't have to be strings or even simple scalars. They can also be references to hashes or arrays, or even other objects:

package Bar;

use v5.40;
use Moose;

extends 'Foo';

has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef[Str]',
    predicate => 'has_hash_attribute',
);

has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    predicate => 'has_object_attribute',
);

sub say_something {
    my $self = shift;

    if($self->has_object_attribute) {
        $self->object_attribute->say_something;
    }

    $self->SUPER::say_something unless $self->has_hash_attribute;

    say "We have a hash attribute!"
}

This creates a subclass of Foo called Bar that has a hash attribute called hash_attribute, and an object attribute called object_attribute. Both of them are references; one to a hash, the other to an object. The hash ref is further limited in that it requires that each value in the hash must be a string (this is optional but can occasionally be useful), and the object ref in that it must refer to an object of the class Foo, or any of its subclasses.

The predicates used here are extra subroutines that Moose provides if you ask for them, and which allow you to see if an object's attribute has a value or not.

The example script would use an object like this:

use v5.40;
use Bar;

my $foo = Foo->new(attribute => "foo");

my $bar = Bar->new(object_attribute => $foo, attribute => "bar");

$bar->say_something;

(output: Hello there, our attribute is foo)

This example also shows object inheritance, and methods implemented in child classes.

Okay, that's it for perl and Moose basics. On to...

Moose Coercion

Moose has a concept of "value coercion". Value coercion allows you to tell Moose that if it sees one thing but expects another, it should convert is using a passed subroutine before assigning the value.

That sounds a bit dense without example, so let me show you how it works. Reimaginging the Bar package, we could use coercion to eliminate one object creation step from the creation of a Bar object:

package "Bar";

use v5.40;

use Moose;
use Moose::Util::TypeConstraints;

extends "Foo";

coerce "Foo",
    from "HashRef",
    via { Foo->new(%$_) };

has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef',
    predicate => 'has_hash_attribute',
);

has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    coerce => 1,
    predicate => 'has_object_attribute',
);

sub say_something {
    my $self = shift;

    if($self->has_object_attribute) {
        $self->object_attribute->say_something;
    }

    $self->SUPER::say_something unless $self->has_hash_attribute;

    say "We have a hash attribute!"
}

Okay, let's unpack that a bit.

First, we add the Moose::Util::TypeConstraints module to our package. This is required to declare coercions.

Then, we declare a coercion to tell Moose how to convert a HashRef to a Foo object: by using the Foo constructor on a flattened list created from the hashref that it is given.

Then, we update the definition of the object_attribute to say that it should use coercions. This is not the default, because going through the list of coercions to find the right one has a performance penalty, so if the coercion is not requested then we do not do it.

This allows us to simplify declarations. With the updated Bar class, we can simplify our example script to this:

use v5.40;

use Bar;

my $bar = Bar->new(attribute => "bar", object_attribute => { attribute => "foo" });

$bar->say_something

(output: Hello there, our attribute is foo)

Here, the coercion kicks in because the value object_attribute, which is supposed to be an object of class Foo, is instead a hash ref. Without the coercion, this would produce an error message saying that the type of the object_attribute attribute is not a Foo object. With the coercion, however, the value that we pass to object_attribute is passed to a Foo constructor using list flattening, and then the resulting Foo object is assigned to the object_attribute attribute.

Coercion works for more complicated things, too; for instance, you can use coercion to coerce an array of hashes into an array of objects, by creating a subtype first:

package MyCoercions;
use v5.40;

use Moose;
use Moose::Util::TypeConstraints;

use Foo;

subtype "ArrayOfFoo", as "ArrayRef[Foo]";
subtype "ArrayOfHashes", as "ArrayRef[HashRef]";

coerce "ArrayOfFoo", from "ArrayOfHashes", via { [ map { Foo->create(%$_) } @{$_} ] };

Ick. That's a bit more complex.

What happens here is that we use the map function to iterate over a list of values.

The given list of values is @{$_}, which is perl for "dereference the default value as an array reference, and flatten the list of values in that array reference".

So the ArrayRef of HashRefs is dereferenced and flattened, and each HashRef in the ArrayRef is passed to the map function.

The map function then takes each hash ref in turn and passes it to the block of code that it is also given. In this case, that block is { Foo->create(%$_) }. In other words, we invoke the create factory method with the flattened hashref as an argument. This returns an object of the correct implementation (assuming our hash ref has a type attribute set), and with all attributes of their object set to the correct value. That value is then returned from the block (this could be made more explicit with a return call, but that is optional, perl defaults a return value to the rvalue of the last expression in a block).

The map function then returns a list of all the created objects, which we capture in an anonymous array ref (the [] square brackets), i.e., an ArrayRef of Foo object, passing the Moose requirement of ArrayRef[Foo].

Usually, I tend to put my coercions in a special-purpose package. Although it is not strictly required by Moose, I find that it is useful to do this, because Moose does not allow a coercion to be defined if a coercion for the same type had already been done in a different package. And while it is theoretically possible to make sure you only ever declare a coercion once in your entire codebase, I find that doing so is easier to remember if you put all your coercions in a specific package.

Okay, now you understand Moose object coercion! On to...

Dynamic module loading

Perl allows loading modules at runtime. In the most simple case, you just use require inside a stringy eval:

my $module = "Foo";
eval "require $module";

This loads "Foo" at runtime. Obviously, the $module string could be a computed value, it does not have to be hardcoded.

There are some obvious downsides to doing things this way, mostly in the fact that a computed value can basically be anything and so without proper checks this can quickly become an arbitrary code vulnerability. As such, there are a number of distributions on CPAN to help you with the low-level stuff of figuring out what the possible modules are, and how to load them.

For the purposes of my script, I used Module::Pluggable. Its API is fairly simple and straightforward:

package Foo;

use v5.40;
use Moose;

use Module::Pluggable require => 1;

has 'attribute' => (
    is => 'ro',
    isa => 'Str',
);

has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

sub handles_type {
    return 0;
}

sub create {
    my $class = shift;
    my %data = @_;

    foreach my $impl($class->plugins) {
        if($impl->can("handles_type") && $impl->handles_type($data{type})) {
            return $impl->new(%data);
        }
    }
    die "could not find a plugin for type " . $data{type};
}

sub say_something {
    my $self = shift;
    say "Hello there, I am a " . $self->type;
}

The new concept here is the plugins class method, which is added by Module::Pluggable, and which searches perl's library paths for all modules that are in our namespace. The namespace is configurable, but by default it is the name of our module; so in the above example, if there were a package "Foo::Bar" which

  • has a subroutine handles_type
  • that returns a truthy value when passed the value of the type key in a hash that is passed to the create subroutine,
  • then the create subroutine creates a new object with the passed key/value pairs used as attribute initializers.

Let's implement a Foo::Bar package:

package Foo::Bar;

use v5.40;
use Moose;

extends 'Foo';

has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has 'serves_drinks' => (
    is => 'ro',
    isa => 'Bool',
    default => 0,
);

sub handles_type {
    my $class = shift;
    my $type = shift;

    return $type eq "bar";
}

sub say_something {
    my $self = shift;
    $self->SUPER::say_something;
    say "I serve drinks!" if $self->serves_drinks;
}

We can now indirectly use the Foo::Bar package in our script:

use v5.40;
use Foo;

my $obj = Foo->create(type => bar, serves_drinks => 1);

$obj->say_something;

output:

Hello there, I am a bar.
I serve drinks!

Okay, now you understand all the bits and pieces that are needed to understand how I created the DSL engine. On to...

Putting it all together

We're actually quite close already. The create factory method in the last version of our Foo package allows us to decide at run time which module to instantiate an object of, and to load that module at run time. We can use coercion and list flattening to turn a reference to a hash into an object of the correct type.

We haven't looked yet at how to turn a JSON data structure into a hash, but that bit is actually ridiculously trivial:

use JSON::MaybeXS;

my $data = decode_json($json_string);

Tada, now $data is a reference to a deserialized version of the JSON string: if the JSON string contained an object, $data is a hashref; if the JSON string contained an array, $data is an arrayref, etc.

So, in other words, to create an extensible JSON-based DSL that is implemented by Moose objects, all we need to do is create a system that

  • takes hash refs to set arguments
  • has factory methods to create objects, which

    • uses Module::Pluggable to find the available object classes, and
    • uses the type attribute to figure out which object class to use to create the object
  • uses coercion to convert hash refs into objects using these factory methods

In practice, we could have a JSON file with the following structure:

{
    "description": "do stuff",
    "actions": [
        {
            "type": "bar",
            "serves_drinks": true,
        },
        {
            "type": "bar",
            "serves_drinks": false,
        }
    ]
}

... and then we could have a Moose object definition like this:

package MyDSL;

use v5.40;
use Moose;

use MyCoercions;

has "description" => (
    is => 'ro',
    isa => 'Str',
);

has 'actions' => (
    is => 'ro',
    isa => 'ArrayOfFoo'
    coerce => 1,
    required => 1,
);

sub say_something {
    say "Hello there, I am described as " . $self->description . " and I am performing my actions: ";

    foreach my $action(@{$self->actions}) {
        $action->say_something;
    }
}

Now, we can write a script that loads this JSON file and create a new object using the flattened arguments:

use v5.40;
use MyDSL;
use JSON::MaybeXS;

my $input_file_name = shift;

my $args = do {
    local $/ = undef;

    open my $input_fh, "<", $input_file_name or die "could not open file";
    <$input_fh>;
};

$args = decode_json($args);

my $dsl = MyDSL->new(%$args);

$dsl->say_something

Output:

Hello there, I am described as do stuff and I am performing my actions:
Hello there, I am a bar
I am serving drinks!
Hello there, I am a bar

In some more detail, this will:

  • Read the JSON file and deserialize it;
  • Pass the object keys in the JSON file as arguments to a constructor of the MyDSL class;
  • The MyDSL class then uses those arguments to set its attributes, using Moose coercion to convert the "actions" array of hashes into an array of Foo::Bar objects.
  • Perform the say_something method on the MyDSL object

Once this is written, extending the scheme to also support a "quux" type simply requires writing a Foo::Quux class, making sure it has a method handles_type that returns a truthy value when called with quux as the argument, and installing it into the perl library path. This is rather easy to do.

It can even be extended deeper, too; if the quux type requires a list of arguments rather than just a single argument, it could itself also have an array attribute with relevant coercions. These coercions could then be used to convert the list of arguments into an array of objects of the correct type, using the same schema as above.

The actual DSL is of course somewhat more complex, and also actually does something useful, in contrast to the DSL that we define here which just says things.

Creating an object that actually performs some action when required is left as an exercise to the reader.

December 23, 2024

Mon collègue Julius

Vous connaissez Julius ? Mais si, Julius ! Vous voyez certainement de qui je veux parler !

J’ai rencontré Julius à l’université. Un jeune homme discret, sympathique, le sourire aux lèvres. Ce qui m’a d’abord frappé chez Julius, outre ses vêtements toujours parfaitement repassés, c’est la qualité de son écoute. Il ne m’interrompait jamais, acceptait de s’être trompé et répondait sans hésiter à toutes mes interrogations.

Il allait à tous les cours, demandait souvent les notes des autres pour « comparer avec les siennes » comme il disait. Et puis il y eut le fameux projet informatique. Nous devions, en équipe, coder un logiciel système assez complexe en utilisant le langage C. Julius participait à toutes nos réunions, mais je ne me souviens pas de l’avoir vu écrire une seule ligne de code. Au final, je crois qu’il s’est contenté de faire la mise en page du rapport. Qui était très bien.

De par sa prestance et son élégance, Julius était tout désigné pour faire la présentation finale. Je suis sûr qu’il a fait du théâtre, car, à son charisme naturel, il ajoute une diction parfaite. Il émane de sa personne une impression de confiance innée.

À tel point que les professeurs n’ont pas tout de suite réalisé le problème lorsqu’il s’est mis à parler de la machine virtuelle C utilisée dans notre projet. Il avait intégré dans la présentation un slide avec un logo que je n’avais jamais vu, un screenshot et des termes n’ayant aucun rapport avec quoi que ce soit de connu en informatique.

Pour celleux qui ne connaissent pas l’informatique, le C est un langage compilé. Il n’a pas besoin d’une machine virtuelle. Parler de machine virtuelle C, c’est comme parler du carburateur d’une voiture électrique. Cela n’a tout simplement aucun sens.

Je me suis levé, j’ai interrompu Julius et j’ai improvisé en disant qu’il s’agissait d’une simple blague entre nous. « Bien entendu ! » a fait Julius en me regardant avec un grand sourire. Le jury de projet était perplexe, mais j’ai sauvé les meubles.

Durant toutes nos études, j’ai entendu plusieurs professeurs discuter du « cas Julius ». Certains le trouvaient très bon. D’autres disaient qu’il avait des lacunes profondes. Mais, malgré des échecs dans certaines matières, il a fini par avoir son diplôme en même temps que moi.

Nos chemins se sont ensuite séparés durant plusieurs années.

Alors que je travaillais depuis presque une décennie dans une grande entreprise où j’avais acquis de belles responsabilités, mon chef m’a annoncé que les recruteurs avaient trouvé la perle rare pour renforcer l’équipe. Un CV hors-norme m’a-t-il dit.

À la coupe parfaite de son costume, à sa démarche et sa prestance, je reconnus Julius avant même de voir son visage.

Julius ! Mon vieux camarade !

Si j’avais vieilli, il semblait avoir mûri. Toujours autant de charisme, d’assurance. Il portait désormais une barbe de trois jours légèrement grisonnante qui lui donnait un air de sage autorité. Il semblait sincèrement content de me revoir.

Nous parlâmes du passé et de nos carrières respectives. Contrairement à moi, Julius n’était jamais resté très longtemps dans la même entreprise. Il partait après un an, parfois moins. Son CV était impressionnant : il avait acquis diverses expériences, il avait touché à tous les domaines de l’informatique. À chaque fois, il montait en compétence et en salaire. Je devais découvrir plus tard que, alors que nous occupions une position similaire, il avait été engagé pour le double de mon salaire. Plus des primes dont j’ignorais jusqu’à l’existence.

Mais je n’étais pas au courant de cet aspect des choses lorsque nous nous mîmes au travail. Au début, je tentai de le former sur nos projets et nos process internes. Je lui donnais des tâches sur lesquelles il me posait des questions. Beaucoup de questions pas toujours très pertinentes. Avec ce calme olympien et cet éternel sourire qui le caractérisait.

Parfois il prenait des initiatives. Écrivait du code ou de la documentation. Il avait réponse à toutes les questions que nous pouvions nous poser, quel que soit le domaine. C’était quelquefois très bon, souvent médiocre voire du grand n’importe quoi. Il nous a fallu un certain temps pour comprendre que chacune des contributions de Julius nécessitait d’être entièrement revue et corrigée par un autre membre de l’équipe. Si nous ne connaissions pas le domaine, il fallait le faire vérifier par un expert externe. Très vite, le mot d’ordre fut qu’aucun document issu de Julius ne devait être rendu public avant d’avoir été relu par deux d’entre nous.

Mais Julius excellait dans la mise en page, la présentation et la gestion des réunions. Régulièrement, mon chef s’approchait de moi et me disait : « On a vraiment de la chance d’avoir ce Julius ! Quel talent ! Quel apport à l’équipe ! »

J’essayais vainement d’expliquer que Julius ne comprenait rien à ce que nous faisions, que nous en étions au point où nous l’envoyions à des réunions inutiles pour nous en débarrasser afin de ne pas avoir à répondre à ses questions et corriger son travail. Mais même cette stratégie avait ses limites.

Il nous a fallu une semaine de réunion de crises pour expliquer à un client déçu par une mise à jour de notre logiciel que, si Julius avait promis que l’interface serait simplifiée pour ne comporter qu’un seul bouton qui ferait uniquement ce que voulait justement le client, il y avait un malentendu. Qu’à part développer une machine qui lisait dans les pensées, c’était impossible de répondre à des besoins aussi complexes que les siens avec un seul bouton.

C’est lorsque j’ai entendu Julius prétendre à un autre client, paniqué à l’idée de se faire « hacker », que, par mesure de sécurité, nos serveurs connectés à Internet n’avaient pas d’adresse IP que nous avons du lui interdire de rencontrer un client seul.

Pour celleux qui ne connaissent pas l’informatique, le "I" de l’adresse IP signifie Internet. La définition même d’Internet est l’ensemble des ordinateurs interconnectés possédant une adresse IP.

Être sur Internet sans adresse IP, c’est comme prétendre être joignable par téléphone sans avoir de numéro.

L’équipe s’était désormais organisée pour que l’un d’entre nous ait en permanence la charge d’occuper Julius. Je n’ai jamais voulu dire du mal à son sujet, car c’était mon ami. Une codeuse exaspérée a cependant exposé le problème à mon chef. Qui lui a répondu en l’accusant de jalousie, car il était très satisfait du travail de Julius. Elle a reçu un blâme et a démissionné un peu après.

Heureusement, Julius nous a un jour annoncé qu’il nous quittait, car il avait reçu une offre qu’il ne pouvait pas refuser. Il a apporté des gâteaux pour fêter son dernier jour avec nous. Mon chef et tout le département des ressources humaines étaient sincèrement tristes de le voir partir.

J’ai dit au revoir à Julius et ne l’ai plus jamais revu. Sur son compte LinkedIn, qui est très actif et reçoit des centaines de commentaires, l’année qu’il a passée avec nous est devenue une expérience incroyable. Il n’a pourtant rien exagéré. Tout est vrai. Mais sa façon de tourner les mots et une certaine modestie mal camouflée donne l’impression qu’il a vraiment apporté beaucoup à l’équipe. Il semblerait qu’il soit ensuite devenu adjoint de la CEO puis CEO par intérim d’une startup qui venait d’être rachetée par une multinationale. Un journal économique a fait un article à son sujet. Après cet épisode, il a rejoint un cabinet ministériel. Une carrière fulgurante !

De mon côté, j’ai essayé d’oublier Julius. Mais, dernièrement, mon chef est venu avec un énorme sourire. Il avait rencontré le commercial d’une boîte qui l’avait ébahi par ses produits. Des logiciels d’intelligence artificielle qui allait, je cite, doper notre productivité !

J’ai désormais un logiciel d’intelligence artificielle qui m’aide à coder. Un autre qui m’aide à chercher des informations. Un troisième qui résume et rédige mes emails. Je n’ai pas le droit de les désactiver.

À chaque instant, à chaque seconde, j’ai l’impression d’être entouré par Julius. Par des dizaines de Julius.

Je dois travailler cerné par des Julius. Chaque clic sur mon ordinateur, chaque notification sur mon téléphone semble provenir de Julius. Ma vie est un enfer pavé de Julius.

Mon chef est venu me voir. Il m’a dit que la productivité de l’équipe baissait dangereusement. Que nous devrions utiliser plus efficacement les intelligences artificielles. Que nous risquions de nous faire dépasser par les concurrents qui, eux, utilisent à n’en pas douter les toutes dernières intelligences artificielles. Qu’il avait mandaté un consultant pour nous installer une intelligence artificielle de gestion du temps et de la productivité.

Je me suis mis à pleurer. « Encore un Julius ! » ai-je sangloté.

Mon chef a soupiré. Il m’a tapoté l’épaule et m’a dit : « Je comprends. Moi aussi je regrette Julius. Il nous aurait certainement aidés à passer ce moment difficile. »

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

My colleague Julius

Do you know Julius? You certainly know who I’m talking about!

I met Julius at university. A measured, friendly young man. He always wore a smile on his face. What struck me about Julius, aside from his always perfectly ironed clothes, was his ability to listen. He never interrupted me. He accepted gratefully when he was wrong. He answered questions without hesitation.

He attended all the classes and often asked for our notes to "compare with his own" as he said. Then came the infamous computer project. As a team of students, we had to code a fairly complex system software using the C language. Julius took part in all our meetings but I don’t remember witnessing him write a single line of code. In the end, I think he did the report formatting. Which, to his credit, was very well done.

Because of his charisma and elegance, Julius was the obvious choice to give the final presentation.

He was so self-confident during the presentation that the professors didn’t immediately notice the problem. He had started talking about the C virtual machine used in our project. He even showed a slide with an unknown logo and several random screenshots which had nothing to do with anything known in computing.

For those who don’t know about computing, C is a compiled language. It doesn’t need a virtual machine. Talking about a C virtual machine is like talking about the carburettor of an electric vehicle. It doesn’t make sense.

I stood up, interrupted Julius and improvised by saying it was just a joke. “Of course!” said Julius, looking at me with a big smile. The jury was perplexed. But I saved the day.

Throughout our studies, I’ve heard several professors discuss the “Julius case.” Some thought he was very good. Others said he was lacking a fundamental understanding. Despite failing some classes, he ended up graduating with me.

After that, our paths went apart for several years.

I’ve been working for nearly a decade at a large company where I had significant responsibilities. One day, my boss announced that recruiters had found a rare gem for our team. An extraordinary resume, he told me.

From the perfect cut of his suit, I recognised Julius before seeing his face.

Julius! My old classmate!

If I had aged, he had matured. Still charismatic and self-assured. He now sported a slightly graying three-day beard that gave him an air of wise authority. He genuinely seemed happy to see me.

We talked about the past and about our respective careers. Unlike me, Julius had never stayed very long in the same company. He usually left after a year, sometimes less. His resume was impressive: he had gained various experiences, touched on all areas of computing. Each time, he moved up in skills and salary. I would later discover that, while we held similar positions, he had been hired at twice my salary. He also got bonuses I didn’t even know existed.

But I wasn’t aware of this aspect when we started working together. At first, I tried to train him on our projects and internal processes. I assigned him tasks on which he would ask me questions. Many questions, not always very relevant ones. With his characteristic calm and his signature smile.

He took initiatives. Wrote code or documentation. He had answers to all the questions we could ask, regardless of the field. Sometimes it was very good, often mediocre or, in some cases, complete nonsense. It took us some time to understand that each of Julius’s contributions needed to be completely reviewed and corrected by another team member. If it was not our field of expertise, it had to be checked externally. We quickly had a non-written rule stating that no document from Julius should leave the team before being proofread by two of us.

But Julius excelled in formatting, presentation, and meeting management. Regularly, my boss would come up to me and say, “We’re really lucky to have this Julius! What talent! What a contribution to the team!”

I tried, without success, to explain that Julius understood nothing of what we were doing. That we had reached the point where we sent him to useless meetings to get rid of him for a few hours. But even that strategy had its limits.

It took us a week of crisis management meetings to calm down a customer disappointed by an update of our software. We had to explain that, if Julius had promised that the interface would be simplified to have only one button that would do exactly what the client wanted, there was a misunderstanding. That aside from developing a machine that read minds, it was impossible to meet his complex needs with just one button.

We decided to act when I heard Julius claim to a customer, panicked at the idea of being "hacked", that, for security reasons, our servers connected to the Internet had no IP address. We had to forbid him from meeting a client alone.

For those who don’t know about computing, the "I" in IP address stands for Internet. The very definition of the Internet is the network of interconnected computers that have an IP address.

Being on the Internet without an IP address is like claiming to be reachable by phone without having a phone number.

The team was reorganised so that one of us was always responsible for keeping Julius occupied. I never wanted to speak ill of him because he was my friend. An exasperated programmer had no such restraint and exposed the problem to my boss. Who responded by accusing her of jealousy, as he was very satisfied with Julius’s work. She was reprimanded and resigned shortly after.

Fortunately, Julius announced that he was leaving because he had received an offer he couldn’t refuse. He brought cakes to celebrate his last day with us. My boss and the entire human resources department were genuinely sad to see him go.

I said goodbye to Julius and never saw him again. On his LinkedIn account, which is very active and receives hundreds of comments, the year he spent with us became an incredible experience. He hasn’t exaggerated anything. Everything is true. But his way of turning words and a kind of poorly concealed modesty gives the impression that he really contributed a lot to the team. He later became the deputy CEO then interim CEO of a startup that had just been acquired by a multinational. An economic newspaper wrote an article about him. After that episode, he joined the team of a secretary of state. A meteoric career!

On my side, I tried to forget Julius. But, recently, my boss came to me with a huge smile. He had met the salesperson from a company that had amazed him with its products. Artificial intelligence software that would, I quote, boost our productivity!

I now have an artificial intelligence software that helps me code. Another that helps me search for information. A third one that summarises and writes my emails. I am not allowed to disable them.

At every moment, every second, I feel surrounded by Julius. By dozens of Juliuses.

I have to work in a mist of Juliuses. Every click on my computer, every notification on my phone seems to come from Julius. My life is hell paved with Juliuses.

My boss came to see me. He told me that the team’s productivity was dangerously declining. That we should use artificial intelligence more effectively. That we risked being overtaken by competitors who, without a doubt, were using the very latest artificial intelligence. That he had hired a consultant to install a new time and productivity management artificial intelligence.

I started to cry. “Another Julius!” I sobbed.

My boss sighed. He patted my shoulder and said, “I understand. I miss Julius too. He would certainly have helped us get through this difficult time.”

I’m Ploum, a writer and an engineer. I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

I write science-fiction novels in French. For Bikepunk, my new post-apocalyptic-cyclist book, my publisher is looking for contacts in other countries to distribute it in languages other than French. If you can help, contact me!

December 20, 2024

Let’s stay a bit longer with MySQL 3.2x to advance the MySQL Retrospective in anticipation of the 30th Anniversary. The idea of this article was suggested to me by Daniël van Eeden. Did you know that in the early days, and therefore still in MySQL 3.20, MySQL used the ISAM storage format? IBM introduced the […]

December 18, 2024

To further advance the MySQL Retrospective in anticipation of the 30th Anniversary, today, let’s discuss the very first version of MySQL that became availble to a wide audient though the popular InfoMagic distribution: MySQL 3.20! In 1997, InfoMagic incorporated MySQL 3.20 as part of the RedHat Contrib CD-ROM (MySQL 3.20.25). Additionally, version 3.20.13-beta was also […]

L’urgence de soutenir l’énergie du libre

Éditorial rédigé pour le Lama déchaîné n°9, l’hebdomadaire réalisé par l’April afin d’alerter sur la précarité financière de l’association. J’étais limité à 300 mots. Pour un bavard comme moi, c’est un exercice très difficile ! (il est déchaîné… hi hi hi ! Elle est amusante celle-là, je viens de la comprendre )

Montée de l’extrémisme, catastrophes climatiques, crises politiques et sociales, guerres. Entre ces urgences, est-il encore raisonnable de consacrer de l’énergie au logiciel libre, aux communs numériques et culturels? Ne devrait-on pas revoir nos priorités?

Le raccourci est dangereux.

Ne faut-il pas au contraire revenir aux fondamentaux, réfléchir à l’infrastructure même de notre société?

Contrairement à ce que nous serinent les magnats de l’industrie, la technologie n’est jamais neutre. Elle porte en elle sa propre idéologie. Par essence, l’extrême centralisation de nos outils Internet préfigure la centralisation d’un pouvoir autoritaire fasciste. L’ubiquité du modèle publicitaire rend la croissance et l’hyperconsommation incontournable. Ces deux piliers se rejoignent et se complètent dans la normalisation de l’espionnage technologique permanent.

Si nous voulons changer de direction, si nous voulons apprendre à limiter notre consommation des ressources naturelles, à écouter et respecter nos différences, à bâtir des compromis démocratiques, il est urgent et indispensable de nous attaquer à la racine: notre infrastructure de communication et d’échange. De libérer le réseau qui nous relie, qui relie nos données, nos échanges commerciaux, nos pensées, nos émotions.

Rejoindre un groupe anticapitaliste sur Facebook, poster des vidéos zéro déchet sur Instagram ou utiliser l’infrastructure Outlook pour les mails de son syndicat sont des actes qui participent activement à promouvoir, justifier et perpétuer le système qu’ils cherchent, naïvement, à dénoncer.

Ce n’est pas un hasard si les discussions sur le Fediverse et le réseau Mastodon parlent de cyclisme, d’écologie, de féminisme. Parce que la technologie libre et décentralisée porte sa propre idéologie. Parce qu’elle arrive à se maintenir, à effrayer les plus gros monopoles que le capitalisme ait jamais engendrés, et cela malgré le fait qu’elle ne tienne que grâce à des bouts de ficelle et l’énergie de quelques personnes sous-payées ou bénévoles.

Lorsque tout semble aller de travers, il faut se concentrer sur les racines, les fondamentaux. L’infrastructure, l’éducation. C’est pourquoi je pense que les actions de l’April, la Quadrature du Net, Framasoft, la Contre-voie et toutes les associations libristes ne sont pas simplement importantes.

Elles sont vitales, cruciales.

Le libre n’est pas un luxe, c’est une urgence absolue.

— Et alors, hi hi hi, Ploum a dit : il est… hi hi hi… Il est déchaîné !
— Par pitié, faites un don à l’April sinon il va la raconter de nouveau !
Il est déchaîné… hi hi hi…

Je suis Ploum et je viens de publier Bikepunk, une fable écolo-cycliste entièrement tapée sur une machine à écrire mécanique. Pour me soutenir, achetez mes livres (si possible chez votre libraire) !

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

December 16, 2024

20 years of Linux on the Desktop (part 2)

Previously in "20 years of Linux on the Deskop" : Looking to make the perfect desktop with GNOME and Debian, a young Ploum finds himself joining a stealth project called "no-name-yet". The project is later published under the name "Ubuntu".

Flooded with Ubuntu CD-ROMs

The first official Ubuntu release was 4.10. At that time, I happened to be the president of my University LUG: LouvainLiNux. LouvainLiNux was founded a few years before by Fabien Pinckaers, Anthony Lesuisse and Benjamin Henrion as an informal group of friends. After they graduated and left university, Fabien handled me all the archives, all the information and told me do continue the work while he was running his company that would, much later, becomes Odoo. With my friend Bertrand Rousseau, we decided to make Louvain-Li-Nux a formal and enduring organisation known as "KAP" (Kot-à-Projet). Frédéric Minne designed the logo by putting the student hat ("calotte") of Fabien on a penguin clipart.

In 2005 and 2006, we worked really hard to organise multiple install parties and conferences. We were also offering resources and support. At a time where broadband Internet was not really common, the best resource to install GNU/Linux was an installation CD-ROM.

Thanks to Mark Shuttleworth’s money, Ubuntu was doing something unprecedented: sending free CD-ROMs of Ubuntu to anyone requesting them. Best of all: the box contained two CD-ROMs. A live image and an installation CD. Exactly how I dreamed it (I’m not sure if the free CD-ROMs started with 4.10, 5.04 or even 5.10).

I managed to get Louvain-Li-Nux recognised as an official Ubuntu distributor and we started to receive boxes full of hundreds of CD-ROMs with small cardboard dispensers. We had entire crates of Ubuntu CD-ROMs. It was the easiest to install. It was the one I knew the best and I had converted Bertrand (before Fabien taught me about Debian, Bertrand tried to convert me to Mandrake, which he was using himself. He nevertheless spent the whole night with me when I installed Debian for the first time, not managing to configure the network because the chipset of my ethernet card was not the same as the one listed on the box of said card. At the time, you had to manually choose which module to load. It was another era, kids these days don’t know what they are missing).

With Louvain-Li-Nux, we literally distributed hundreds of CD-ROMs. I’ve myself installed Ubuntu on tenths of computers. It was not always easy as the market was pivoting from desktop computers to laptops. Laptops were starting to be affordable and powerful enough. But laptops came with exotic hardware, wifi, Bluetooth, power management, sleep, hibernate, strange keyboard keys and lots of very complex stuff that you don’t need to handle on a desktop computer with a RJ-45 hole.

Sound was a hard problem. I remember spending hours on a laptop before realising there was a hardware switch. To play multiple sounds at the same time, you needed to launch a daemon called ESD. Our frustration with ESD would lead Bertrand and I to trap Lennart Poetering in a cave in Brussels to spend the whole night drinking beers with him while swearing we would wear a "we love Lennart" t-shirt during FOSDEM in order to support is new Polypaudio project that was heavily criticised at the time. Spoiler: we never did the t-shirt thing but Polypaudio was renamed Pulseaudio and succeeded without our support.

Besides offering beers to developers, I reported all the bugs I experienced and worked hard with Ubuntu developers. If I remember correctly, I would, at some point, even become the head of the "bug triaging team" (if such a position ever existed. It might be that someone called me like that to flatter my ego). Selected as a student for the Google Summer of Code, I created a python client for Launchpad called "Conseil". Launchpad had just replaced Bugzilla but, as I found out after starting Conseil, was not open source and had no API. I learned web scrapping and was forced to update Conseil each time something changed on Launchpad side.

The most important point about Bugzilla and Launchpad was the famous bug #1. Bug #1, reported by sabdfl himself, was about breaking Microsoft monopoly. It could be closed once it would be considered that any computer user could freely choose which operating system to use on a newly bought computer.

The very first book about Ubuntu

Meanwhile, I was contacted by a French publisher who stumbled upon my newly created blog that I mainly used to profess my love of Ubuntu and Free Software. Yes, the very blog you are currently reading.

That French publisher had contracted two authors to write a book about Ubuntu and wanted my feedback about the manuscript. I didn’t really like what I read and said it bluntly. Agreeing with me, the editor asked me to write a new book, using the existing material if I wanted. But the two other authors would remain credited and the title could not be changed. I naively agreed and did the work, immersing myself even more in Ubuntu.

The result was « Ubuntu, une distribution facile à installer », the very first book about Ubuntu. I hated the title. But, as I have always dreamed of becoming a published author, I was proud of my first book. And it had a foreword by Mark Shuttleworth himself.

I updated and rewrote a lot of it in 2006, changing its name to "Ubuntu Efficace". A later version was published in 2009 as "Ubuntu Efficace, 3ème édition". During those years, I was wearing Ubuntu t-shirts. In my room, I had a collection of CD-ROMs with each Ubuntu version (I would later throw them, something I still regret). I bootstrapped "Ubuntu-belgium" at FOSDEM. I had ploum@ubuntu.com as my primary email on my business card and used it to look for jobs, hoping to set the tone. You could say that I was an Ubuntu fanatic.

The very first Ubuntu-be meeting. I took the picture and gimped a quick logo. The very first Ubuntu-be meeting. I took the picture and gimped a quick logo.

Ironically, I was never paid by Canonical and never landed a job there. The only money I received for that work was from my books or from Google through the Summer of Code (remember: Google was still seen as a good guy). I would later work for Lanedo and be paid to contribute to GNOME and LibreOffice. But never to contribute to Ubuntu nor Debian.

In the Ubuntu and GNOME community with Jeff Waugh

Something which was quite new to me was that Ubuntu had a "community manager". At the time, it was not the title of someone posting on Twitter (which didn’t exist). It was someone tasked with putting the community together, with being the public face of the project.

Jeff Waugh is the first Ubuntu community manager I remember and I was blown away by his charism. Jeff came from the GNOME project and one of his pet issues was to make computers easier. He started a trend that would, way later, gives birth to the infamous GNOME 3 design.

You have to remember that the very first fully integrated desktop on Linux was KDE. And KDE had a very important problem: it was relying on the Qt toolkit which, at the time, was under a non-free license. You could not use Qt in a commercial product without paying Trolltech, the author of Qt.

GNOME was born as an attempt by Miguel de Icaza and Federico Mena to create a KDE-like desktop using the free toolkit created for the Gimp image editor: Gtk.

This is why I liked to make the joke that the G in GNOME stands for Gtk, that the G in Gtk stands for Gimp, that the G in Gimp stands for GNU and that the G in GNU stands for GNU. This is not accurate as the G in GNOME stands for GNU but this makes the joke funnier. We, free software geeks, like to have fun.

Like its KDE counterpart, GNOME 1 was full of knobs and whistles. Everything could be customised to the pixel and to the milliseconds. Jeff Waugh often made fun of it by showing the preferences boxes and asking the audience who wanted to customise a menu animation to the millisecond. GNOME 1 was less polished than KDE and heavier than very simple window managers like Fvwm95 or Fvwm2 (my WM of choice before I started my quest for the perfect desktop).

Screenshot from my FVWM2 config which is still featured on fvwm.org, 21 years later Screenshot from my FVWM2 config which is still featured on fvwm.org, 21 years later

With GNOME 2, GNOME introduced its own paradigm and philosophy: GNOME would be different from KDE by being less customisable but more intuitive. GNOME 2 opened a new niche in the Linux world: a fully integrated desktop for those who don’t want to tweak it.

KDE was for those wanting to customise everything. The most popular distributions featured KDE: Mandrake, Red Hat, Suse. The RPM world. There was no real GNOME centric distribution. And there was no desktop distribution based on Debian. As Debian was focused on freedom, there was no KDE in Debian.

Which explains why GNOME + Debian made a lot of sense in my mind.

As Jeff Waugh had been the GNOME release manager for GNOME 2 and was director of the GNOME board, having him as the first Ubuntu community manager set the tone: Ubuntu would be very close to GNOME. And it is exactly what happened. There was a huge overlap between GNOME and Ubuntu enthusiasts. As GNOME 2 would thrive and get better with each release, Ubuntu would follow.

But some people were not happy. While some Debian developers had been hired by Canonical to make Ubuntu, some others feared that Ubuntu was a kind of Debian fork that would weaken Debian. Similarly, Red Hat had been investing lot of time and money in GNOME. I’ve never understood why, as Qt was released under the GPL in 2000, making KDE free, but Red Hat wanted to offer both KDE and GNOME. It went as far as tweaking both of them so they would look perfectly identical when used on Red Hat Linux. Red Hat employees were the biggest pool of contributors to GNOME.

There was a strong feeling in the atmosphere that Ubuntu was piggybacking on the work of Debian and Red Hat.

I didn’t really agree as I thought that Ubuntu was doing a lot of thankless polishing and marketing work. I liked the Ubuntu community and was really impressed by Jeff Waugh. Thanks to him, I entered the GNOME community and started to pay attention to user experience. He was inspiring and full of energy.

Drinking a beer with Jeff Waugh and lots of hackers at FOSDEM. I’m the one with the red sweater. Drinking a beer with Jeff Waugh and lots of hackers at FOSDEM. I’m the one with the red sweater.

Benjamin Mako Hill

What I didn’t realise at the time was that Jeff Waugh’s energy was not in infinite supply. Mostly burned out by his dedication, he had to step down and was replaced by Benjamin Mako Hill. That’s, at least, how I remember it. A quick look at Wikipedia told me that Jeff Waugh and Benjamin Mako Hill were, in fact, working in parallel and that Jeff Waugh was not the community manager but an evangelist. It looks like I’ve been wrong all those years. But I choose to stay true to my own experience as I don’t want to write a definitive and exhaustive history.

Benjamin Mako Hill was not a GNOME guy. He was a Debian and FSF guy. He was focused on the philosophical aspects of free software. His intellectual influence would prove to have a long-lasting effect on my own work. I remember fondly that he introduced the concept of "anti-features" to describe the fact that developers are sometimes working to do something against their own users. They spend energy to make the product worse. Examples include advertisement in apps or limited-version software. But it is not limited to software: Benjamin Mako Hill took the example of benches designed so you can’t sleep on them, to prevent homeless person to take a nap. It is obviously more work to design a bench that prevents napping. The whole anti-feature concept would be extended and popularised twenty years later by Cory Doctorow under the term "enshitification".

Benjamin Mako Hill introduced a code of conduct in the Ubuntu community and made the community very aware of the freedom and philosophical aspects. While I never met him, I admired and still admire Benjamin. I felt that, with him at the helm, the community would always stay true to its ethical value. Bug #1 was the leading beacon: offering choice to users, breaking monopolies.

Jono Bacon

But the one that would have the greatest influence on the Ubuntu community is probably Jono Bacon who replaced Benjamin Mako Hill. Unlike Jeff Waugh and Benjamin Mako Hill, Jono Bacon had no Debian nor GNOME background. As far as I remember, he was mostly unknown in those communities. But he was committed to communities in general and had very great taste in music. I’m forever grateful for introducing me to Airbourne.

With what feels like an immediate effect but probably lasted months or years, the community mood switched from engineering/geek discussions to a cheerful, all-inclusive community.

It may look great on the surface but I hated it. The GNOME, Debian and early Ubuntu communities were shared-interest communities. You joined the community because you liked the project. The communities were focused on making the project better.

With Jono Bacon, the opposite became true. The community was great and people joined the project because they liked the community, the sense of belonging. Ubuntu felt each day more like a church. The project was seen as less important than the people. Some aspects would not be discussed openly not to hurt the community.

I felt every day less and less at home in the Ubuntu community. Decisions about the project were taken behind closed doors by Canonical employees and the community transformed from contributors to unpaid cheerleaders. The project to which I contributed so much was every day further away from Debian, from freedom, from openness and from its technical roots.

But people were happy because Jono Bacon was such a good entertainer.

Something was about to break…

(to be continued)

Subscribe by email or by rss to get the next episodes of "20 years of Linux on the Desktop".

I’m currently turning this story into a book. I’m looking for an agent or a publisher interested to work with me on this book and on an English translation of "Bikepunk", my new post-apocalyptic-cyclist typewritten novel which sold out in three weeks in France and Belgium.

I’m Ploum, a writer and an engineer. I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

I write science-fiction novels in French. For Bikepunk, my new post-apocalyptic-cyclist book, my publisher is looking for contacts in other countries to distribute it in languages other than French. If you can help, contact me!

December 08, 2024

It’s been 2 years since AOPro was launched and a lot has happened in that time; bugs were squashed, improvements were made and some great features were added. Taking that into account on one hand and increasing costs from suppliers on the other: prices will see a smallish increase as from 2025 (exact amounts still to be determined) But rest assured; if you already signed up, you will continue to…

Source

November 26, 2024

The deadline for talk submissions is rapidly approaching! If you are interested in talking at FOSDEM this year (yes, I'm talking to you!), it's time to polish off and submit those proposals in the next few days before the 1st: Devrooms: follow the instructions in each cfp listed here Main tracks: for topics which are more general or don't fit in a devroom, select 'Main' as the track here Lightning talks: for short talks (15 minutes) on a wide range of topics, select 'Lightning Talks' as the track here For more details, refer to the previous post.

November 25, 2024

Last month we released MySQL 9.1, the latest Innovation Release. Of course, we released bug fixes for 8.0 and 8.4 LTS but in this post, I focus on the newest release. Within these releases, we included patches and code received by our amazing Community. Here is the list of contributions we processed and included in […]

November 15, 2024

With great pleasure we can announce that the following projects will have a stand at FOSDEM 2025 (1st & 2nd February). This is the list of stands (in alphabetic order): 0 A.D. Empires Ascendant AlekSIS and Teckids AlmaLinux OS CalyxOS Ceph Chamilo CISO Assistant Cloud Native Computing Foundation (CNCF) Codeberg and Forgejo coreboot / flashprog / EDKII / OpenBMC Debian DeepComputing's DC-ROMA RISC-V Mainboard with Framework Laptop 13 DevPod Digital Public Goods Dolibarr ERP CRM Drupal Eclipse Foundation Fedora Project FerretDB Firefly Zero FOSSASIA Free Software Foundation Europe FreeBSD Project FreeCAD and KiCAD Furi Labs Gentoo Linux & Flatcar舰

It's been a while since I last blogged one of my favorite songs. Even after more than 25 years of listening to "Tonight, Tonight" by The Smashing Pumpkins, it has never lost its magic. It has aged better than I have.

Installation instructions for end users and testers

We will use DDEV to setup and run Drupal on your computer. DDEV handles all the complex configuration by providing pre-configured Docker containers for your web server, database, and other services.

To install DDEV, you can use Homebrew (or choose an alternative installation method):

$ brew install ddev/ddev/ddev

Next, download a pre-packaged zip-file. Unzip it, navigate to the new directory and simply run:

$ ddev launch

That's it! DDEV will automatically configure everything and open your new Drupal site in your default browser.

Installation instructions for contributors

If you plan to contribute to Drupal CMS development, set up your environment using Git to create merge requests and submit contributions to the project. If you're not contributing, this approach isn't recommended. Instead, follow the instructions provided above.

First, clone the Drupal CMS Git repository:

$ git clone https://git.drupalcode.org/project/drupal_cms.git

This command fetches the latest version of Drupal CMS from the official Git repository and saves it in the drupal_cms directory.

Drupal CMS comes pre-configured for DDEV with all the necessary settings in .ddev/config.yaml, so you don't need to configure anything.

So, let's just fire up our engines:

$ ddev start

The first time you start DDEV, it will setup Docker containers for the web server and database. It will also use Composer to download the necessary Drupal files and dependencies.

The final step is configuring Drupal itself. This includes things like setting your site name, database credentials, etc. You can do this in one of two ways:

  • Option 1: Configure Drupal via the command line
    $ ddev drush site:install

    This method is the easiest and the fastest, as things like the database credentials are automatically setup. The downside is that, at the time of this writing, you can't choose which Recipes to enable during installation.

  • Option 2: Configure Drupal via the web installer

    You can also use the web-based installer to configure Drupal, which allows you to enable individual Recipes. You'll need your site's URL and database credentials. Run this command to get both:

    $ ddev describe

    Navigate to your site and step through the installer.

Once everything is installed and configured, you can access your new Drupal CMS site. You can simply use:

$ ddev launch

This command opens your site's homepage in your default browser — no need to remember the specific URL that DDEV created for your local development site.

To build or manage a Drupal site, you'll need to log in. By default, Drupal creates a main administrator account. It's a good idea to update the username and password for this account. To do so, run the following command:

$ ddev drush uli

This command generates a one-time login link that takes you directly to the Drupal page where you can update your Drupal account's username and password.

That's it! Happy Drupal-ing!

November 01, 2024

Dear WordPress friends in the USA: I hope you vote and when you do, I hope you vote for respect. The world worriedly awaits your collective verdict, as do I. Peace! Watch this video on YouTube.

Source

October 29, 2024

As announced yesterday, the MySQL Devroom is back at FOSDEM! For people preparing for their travel to Belgium, we want to announce that the MySQL Belgian Days fringe event will be held on the Thursday and Friday before FOSDEM. This event will take place on January 30th and 31st, 2025, in Brussels at the usual […]

October 28, 2024

We are pleased to announce the Call for Participation (CfP) for the FOSDEM 2025 MySQL Devroom. The Devroom will be held on February 2 (Sunday), 2025 in Brussels, Belgium. The submission deadline for talk proposals is December 1, 2024. FOSDEM is a free event for software developers to meet, share ideas, and collaborate. Every year, […]

October 05, 2024

Cover Ember Knights

Proton is a compatibility layer for Windows games to run on Linux. Running a Windows games is mostly just hitting the Play button within Steam. It’s that good that many games now run faster on Linux than on native Windows. That’s what makes the Steam Deck the best gaming handheld of the moment.

But a compatibility layer is still a layer, so you may encounter … incompatibilities. Ember Knights is a lovely game with fun co-op multiplayer support. It runs perfectly on the (Linux-based) Steam Deck, but on my Ubuntu laptop I encountered long loading times (startup was 5 minutes and loading between worlds was slow). But once the game was loaded it ran fine.

Debugging the game reveled that there were lost of EAGAIN errors while the game was trying to access the system clock. Changing the numer of allowed open files fixed the problem for me.

Add this to end end of the following files:

  • in /etc/security/limits.conf:
* hard nofile 1048576
  • in /etc/systemd/system.conf and /etc/systemd/user.conf:
DefaultLimitNOFILE=1048576 

Reboot.

Cover In Game

“The Witcher 3: Wild Hunt” is considered to be one of the greatest video games of all time. I certainly agree with that sentiment.

At its core, The Witcher 3 is a action-role playing game with a third-person perspective in a huge open world. You develop your character while the story advances. At the same time you can freely roam and explore as much as you like. The main story is captivating and the world is filled with with side quests and lots of interesting people. Fun for at least 200 hours, if you’re the exploring kind. If you’re not, the base game (without DLCs) will still take you 50 hours to finish.

While similar to other great games like Nintendo’s Zelda Breath of the Wild and Sony’s Horizon Zero Dawn, the strength of the game is a deep lore originating from the Witcher series novels written by the “Polish Tolkien” Andrzej Sapkowski. It’s not a game, but a universe (nowadays it even includes a Netflix tv-series).

A must play.

Played on the Steam Deck without any issues (“Steam Deck Verified”)

September 10, 2024

In previous blog posts, we discussed setting up a GPG smartcard on GNU/Linux and FreeBSD.

In this blog post, we will configure Thunderbird to work with an external smartcard reader and our GPG-compatible smartcard.

beastie gnu tux

Before Thunderbird 78, if you wanted to use OpenPGP email encryption, you had to use a third-party add-on such as https://enigmail.net/.

Thunderbird’s recent versions natively support OpenPGP. The Enigmail addon for Thunderbird has been discontinued. See: https://enigmail.net/index.php/en/home/news.

I didn’t find good documentation on how to set up Thunderbird with a GnuPG smartcard when I moved to a new coreboot laptop, so this was the reason I created this blog post series.

GnuPG configuration

We’ll not go into too much detail on how to set up GnuPG. This was already explained in the previous blog posts.

If you want to use a HSM with GnuPG you can use the gnupg-pkcs11-scd agent https://github.com/alonbl/gnupg-pkcs11-scd that translates the pkcs11 interface to GnuPG. A previous blog post describes how this can be configured with SmartCard-HSM.

We’ll go over some steps to make sure that the GnuPG is set up correctly before we continue with the Thunderbird configuration. The pinentry command must be configured with graphical support to type our pin code in the Graphical user environment.

Import Public Key

Make sure that your public key - or the public key of the reciever(s) - is/are imported.

[staf@snuffel ~]$ gpg --list-keys
[staf@snuffel ~]$ 
[staf@snuffel ~]$ gpg --import <snip>.asc
gpg: key XXXXXXXXXXXXXXXX: public key "XXXX XXXXXXXXXX <XXX@XXXXXX>" imported
gpg: Total number processed: 1
gpg:               imported: 1
[staf@snuffel ~]$ 
[staf@snuffel ~]$  gpg --list-keys
/home/staf/.gnupg/pubring.kbx
-----------------------------
pub   xxxxxxx YYYYY-MM-DD [SC]
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
uid           [ xxxxxxx] xxxx xxxxxxxxxx <xxxx@xxxxxxxxxx.xx>
sub   xxxxxxx xxxx-xx-xx [A]
sub   xxxxxxx xxxx-xx-xx [E]

[staf@snuffel ~]$ 

Pinentry

Thunderbird will not ask for your smartcard’s pin code.

This must be done on your smartcard reader if it has a pin pad or an external pinentry program.

The pinentry is configured in the gpg-agent.conf configuration file. As we’re using Thunderbird is a graphical environment we’ll configure it to use a graphical version.

Installation

I’m testing KDE plasma 6 on FreeBSD, so I installed the Qt version of pinentry.

On GNU/Linux you can check the documentation of your favourite Linux distribution to install a graphical pinentry. If you use a Graphical user environment there is probably already a graphical-enabled pinentry installed.

[staf@snuffel ~]$ sudo pkg install -y pinentry-qt6
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        pinentry-qt6: 1.3.0

Number of packages to be installed: 1

76 KiB to be downloaded.
[1/1] Fetching pinentry-qt6-1.3.0.pkg: 100%   76 KiB  78.0kB/s    00:01    
Checking integrity... done (0 conflicting)
[1/1] Installing pinentry-qt6-1.3.0...
[1/1] Extracting pinentry-qt6-1.3.0: 100%
==> Running trigger: desktop-file-utils.ucl
Building cache database of MIME types
[staf@snuffel ~]$ 

Configuration

The gpg-agent is responsible for starting the pinentry program. Let’s reconfigure it to start the pinentry that we like to use.

[staf@snuffel ~]$ cd .gnupg/
[staf@snuffel ~/.gnupg]$ 
[staf@snuffel ~/.gnupg]$ vi gpg-agent.conf

The pinentry is configured in the pinentry-program directive. You’ll find the complete gpg-agent.conf that I’m using below.

debug-level expert
verbose
verbose
log-file /home/staf/logs/gpg-agent.log
pinentry-program /usr/local/bin/pinentry-qt

Reload the sdaemon and gpg-agent configuration.

staf@freebsd-gpg3:~/.gnupg $ gpgconf --reload scdaemon
staf@freebsd-gpg3:~/.gnupg $ gpgconf --reload gpg-agent
staf@freebsd-gpg3:~/.gnupg $ 

Test

To verify that gpg works correctly and that the pinentry program works in our graphical environment we sign a file.

Create a new file.

$ cd /tmp
[staf@snuffel /tmp]$ 
[staf@snuffel /tmp]$ echo "foobar" > foobar
[staf@snuffel /tmp]$ 

Try to sign it.

[staf@snuffel /tmp]$ gpg --sign foobar
[staf@snuffel /tmp]$ 

If everything works fine, the pinentry program will ask for the pincode to sign it.

image info

Thunderbird

In this section we’ll (finally) configure Thunderbird to use GPG with a smartcard reader.

Allow external smartcard reader

open settings

Open the global settings, click on the "Hamburger" icon and select settings.

Or press [F10] to bring-up the "Menu bar" in Thunderbird and select [Edit] and Settings.

open settings

In the settings window click on [Config Editor].

This will open the Advanced Preferences window.

allow external gpg

In the Advanced Preferences window search for "external_gnupg" settings and set mail.indenity.allow_external_gnupg to true.


 

Setup End-To-End Encryption

The next step is to configure the GPG keypair that we’ll use for our user account.

open settings

Open the account setting by pressing on the "Hamburger" icon and select Account Settings or press [F10] to open the menu bar and select Edit, Account Settings.

Select End-to-End Encryption at OpenPG section select [ Add Key ].

open settings

Select the ( * ) Use your external key though GnuPG (e.g. from a smartcard)

And click on [Continue]

The next window will ask you for the Secret Key ID.

open settings

Execute gpg --list-keys to get your secret key id.

Copy/paste your key id and click on [ Save key ID ].

I found that it is sometimes required to restart Thunderbird to reload the configuration when a new key id is added. So restart Thunderbird or restart it fails to find your key id in the keyring.

Test

open settings

As a test we send an email to our own email address.

Open a new message window and enter your email address into the To: field.

Click on [OpenPGP] and Encrypt.

open settings

Thunderbird will show a warning message that it doesn't know the public key to set up the encryption.

Click on [Resolve].

discover keys In the next window Thunderbird will ask to Discover Public Keys online or to import the Public Keys From File, we'll import our public key from a file.
open key file In the Import OpenPGP key File window select your public key file, and click on [ Open ].
open settings

Thunderbird will show a window with the key fingerprint. Select ( * ) Accepted.

Click on [ Import ] to import the public key.

open settings

With our public key imported, the warning about the End-to-end encryption requires resolving key issue should be resolved.

Click on the [ Send ] button to send the email.

open settings

To encrypt the message, Thunderbird will start a gpg session that invokes the pinentry command type in your pincode. gpg will encrypt the message file and if everything works fine the email is sent.

 

Have fun!

Links

September 09, 2024

The NBD protocol has grown a number of new features over the years. Unfortunately, some of those features are not (yet?) supported by the Linux kernel.

I suggested a few times over the years that the maintainer of the NBD driver in the kernel, Josef Bacik, take a look at these features, but he hasn't done so; presumably he has other priorities. As with anything in the open source world, if you want it done you must do it yourself.

I'd been off and on considering to work on the kernel driver so that I could implement these new features, but I never really got anywhere.

A few months ago, however, Christoph Hellwig posted a patch set that reworked a number of block device drivers in the Linux kernel to a new type of API. Since the NBD mailinglist is listed in the kernel's MAINTAINERS file, this patch series were crossposted to the NBD mailinglist, too, and when I noticed that it explicitly disabled the "rotational" flag on the NBD device, I suggested to Christoph that perhaps "we" (meaning, "he") might want to vary the decision on whether a device is rotational depending on whether the NBD server signals, through the flag that exists for that very purpose, whether the device is rotational.

To which he replied "Can you send a patch".

That got me down the rabbit hole, and now, for the first time in the 20+ years of being a C programmer who uses Linux exclusively, I got a patch merged into the Linux kernel... twice.

So, what do these things do?

The first patch adds support for the ROTATIONAL flag. If the NBD server mentions that the device is rotational, it will be treated as such, and the elevator algorithm will be used to optimize accesses to the device. For the reference implementation, you can do this by adding a line "rotational = true" to the relevant section (relating to the export where you want it to be used) of the config file.

It's unlikely that this will be of much benefit in most cases (most nbd-server installations will be exporting a file on a filesystem and have the elevator algorithm implemented server side and then it doesn't matter whether the device has the rotational flag set), but it's there in case you wish to use it.

The second set of patches adds support for the WRITE_ZEROES command. Most devices these days allow you to tell them "please write a N zeroes starting at this offset", which is a lot more efficient than sending over a buffer of N zeroes and asking the device to do DMA to copy buffers etc etc for just zeroes.

The NBD protocol has supported its own WRITE_ZEROES command for a while now, and hooking it up was reasonably simple in the end. The only problem is that it expects length values in bytes, whereas the kernel uses it in blocks. It took me a few tries to get that right -- and then I also fixed up handling of discard messages, which required the same conversion.

September 06, 2024

Some users, myself included, have noticed that their MySQL error log contains many lines like this one: Where does that error come from? The error MY-010914 is part of the Server Network issues like: Those are usually more problematic than the ones we are covering today. The list is not exhaustive and in the source […]

September 05, 2024

IT architects generally use architecture-specific languages or modeling techniques to document their thoughts and designs. ArchiMate, the framework I have the most experience with, is a specialized enterprise architecture modeling language. It is maintained by The Open Group, an organization known for its broad architecture framework titled TOGAF.

My stance, however, is that architects should not use the diagrams from their architecture modeling framework to convey their message to every stakeholder out there...

What is the definition of “Open Source”?

There’s been no shortage of contention on what “Open Source software” means. Two instances that stand out to me personally are ElasticSearch’s “Doubling down on Open” and Scott Chacon’s “public on GitHub”.

I’ve been active in Open Source for 20 years and could use a refresher on its origins and officialisms. The plan was simple: write a blog post about why the OSI (Open Source Initiative) and its OSD (Open Source Definition) are authoritative, collect evidence in its support (confirmation that they invented the term, of widespread acceptance with little dissent, and of the OSD being a practical, well functioning tool). That’s what I keep hearing, I just wanted to back it up. Since contention always seems to be around commercial re-distribution restrictions (which are forbidden by the OSD), I wanted to particularly confirm that there hasn’t been all that many commercial vendors who’ve used, or wanted, to use the term “open source” to mean “you can view/modify/use the source, but you are limited in your ability to re-sell, or need to buy additional licenses for use in a business”

However, the further I looked, the more I found evidence of the opposite of all of the above. I’ve spent a few weeks now digging and some of my long standing beliefs are shattered. I can’t believe some of the things I found out. Clearly I was too emotionally invested, but after a few weeks of thinking, I think I can put things in perspective. So this will become not one, but multiple posts.

The goal for the series is look at the tensions in the community/industry (in particular those directed towards the OSD), and figure out how to resolve, or at least reduce them.

Without further ado, let’s get into the beginnings of Open Source.

The “official” OSI story.

Let’s first get the official story out the way, the one you see repeated over and over on websites, on Wikipedia and probably in most computing history books.

Back in 1998, there was a small group of folks who felt that the verbiage at the time (Free Software) had become too politicized. (note: the Free Software Foundation was founded 13 years prior, in 1985, and informal use of “free software” had around since the 1970’s). They felt they needed a new word “to market the free software concept to people who wore ties”. (source) (somewhat ironic since today many of us like to say “Open Source is not a business model”)

Bruce Perens - an early Debian project leader and hacker on free software projects such as busybox - had authored the first Debian Free Software Guidelines in 1997 which was turned into the first Open Source Definition when he founded the OSI (Open Source Initiative) with Eric Raymond in 1998. As you continue reading, keep in mind that from the get-go, OSI’s mission was supporting the industry. Not the community of hobbyists.

Eric Raymond is of course known for his seminal 1999 essay on development models “The cathedral and the bazaar”, but he also worked on fetchmail among others.

According to Bruce Perens, there was some criticism at the time, but only to the term “Open” in general and to “Open Source” only in a completely different industry.

At the time of its conception there was much criticism for the Open Source campaign, even among the Linux contingent who had already bought-in to the free software concept. Many pointed to the existing use of the term “Open Source” in the political intelligence industry. Others felt the term “Open” was already overused. Many simply preferred the established name Free Software. I contended that the overuse of “Open” could never be as bad as the dual meaning of “Free” in the English language–either liberty or price, with price being the most oft-used meaning in the commercial world of computers and software

From Open Sources: Voices from the Open Source Revolution: The Open Source Definition

Furthermore, from Bruce Perens’ own account:

I wrote an announcement of Open Source which was published on February 9 [1998], and that’s when the world first heard about Open Source.

source: On Usage of The Phrase “Open Source”

Occasionally it comes up that it may have been Christine Peterson who coined the term earlier that week in February but didn’t give it a precise meaning. That was a task for Eric and Bruce in followup meetings over the next few days.

Even when you’re the first to use or define a term, you can’t legally control how others use it, until you obtain a Trademark. Luckily for OSI, US trademark law recognizes the first user when you file an application, so they filed for a trademark right away. But what happened? It was rejected! The OSI’s official explanation reads:

We have discovered that there is virtually no chance that the U.S. Patent and Trademark Office would register the mark “open source”; the mark is too descriptive. Ironically, we were partly a victim of our own success in bringing the “open source” concept into the mainstream

This is our first 🚩 red flag and it lies at the basis of some of the conflicts which we will explore in this, and future posts. (tip: I found this handy Trademark search website in the process)

Regardless, since 1998, the OSI has vastly grown its scope of influence (more on that in future posts), with the Open Source Definition mostly unaltered for 25 years, and having been widely used in the industry.

Prior uses of the term “Open Source”

Many publications simply repeat the idea that OSI came up with the term, has the authority (if not legal, at least in practice) and call it a day. I, however, had nothing better to do, so I decided to spend a few days (which turned into a few weeks 😬) and see if I could dig up any references to “Open Source” predating OSI’s definition in 1998, especially ones with different meanings or definitions.

Of course, it’s totally possible that multiple people come up with the same term independently and I don’t actually care so much about “who was first”, I’m more interested in figuring out what different meanings have been assigned to the term and how widespread those are.

In particular, because most contention is around commercial limitations (non-competes) where receivers of the code are forbidden to resell it, this clause of the OSD stands out:

Free Redistribution: The license shall not restrict any party from selling (…)

Turns out, the “Open Source” was already in use for more than a decade, prior to the OSI founding.

OpenSource.com

In 1998, a business in Texas called “OpenSource, Inc” launched their website. They were a “Systems Consulting and Integration Services company providing high quality, value-added IT professional services”. Sometime during the year 2000, the website became a RedHat property. Enter the domain name on Icann and it reveals the domain name was registered Jan 8, 1998. A month before the term was “invented” by Christine/Richard/Bruce. What a coincidence. We are just warming up…

image

Caldera announces Open Source OpenDOS

In 1996, a company called Caldera had “open sourced” a DOS operating system called OpenDos. Their announcement (accessible on google groups and a mailing list archive) reads:

Caldera Announces Open Source for DOS.
(…)
Caldera plans to openly distribute the source code for all of the DOS technologies it acquired from Novell., Inc
(…)
Caldera believes an open source code model benefits the industry in many ways.
(…)
Individuals can use OpenDOS source for personal use at no cost.
Individuals and organizations desiring to commercially redistribute
Caldera OpenDOS must acquire a license with an associated small fee.

Today we would refer to it as dual-licensing, using Source Available due to the non-compete clause. But in 1996, actual practitioners referred to it as “Open Source” and OSI couldn’t contest it because it didn’t exist!

You can download the OpenDos package from ArchiveOS and have a look at the license file, which includes even more restrictions such as “single computer”. (like I said, I had nothing better to do).

Investigations by Martin Espinoza re: Caldera

On his blog, Martin has an article making a similar observation about Caldera’s prior use of “open source”, following up with another article which includes a response from Lyle Ball, who headed the PR department of Caldera

Quoting Martin:

As a member of the OSI, he [Bruce] frequently championed that organization’s prerogative to define what “Open Source” means, on the basis that they invented the term. But I [Martin] knew from personal experience that they did not. I was personally using the term with people I knew before then, and it had a meaning — you can get the source code. It didn’t imply anything at all about redistribution.

The response from Caldera includes such gems as:

I joined Caldera in November of 1995, and we certainly used “open source” broadly at that time. We were building software. I can’t imagine a world where we did not use the specific phrase “open source software”. And we were not alone. The term “Open Source” was used broadly by Linus Torvalds (who at the time was a student (…), John “Mad Dog” Hall who was a major voice in the community (he worked at COMPAQ at the time), and many, many others.

Our mission was first to promote “open source”, Linus Torvalds, Linux, and the open source community at large. (…) we flew around the world to promote open source, Linus and the Linux community….we specifically taught the analysts houses (i.e. Gartner, Forrester) and media outlets (in all major markets and languages in North America, Europe and Asia.) (…) My team and I also created the first unified gatherings of vendors attempting to monetize open source

So according to Caldera, “open source” was a phenomenon in the industry already and Linus himself had used the term. He mentions plenty of avenues for further research, I pursued one of them below.

Linux Kernel discussions

Mr. Ball’s mentions of Linus and Linux piqued my interest, so I started digging.

I couldn’t find a mention of “open source” in the Linux Kernel Mailing List archives prior to the OSD day (Feb 1998), though the archives only start as of March 1996. I asked ChatGPT where people used to discuss Linux kernel development prior to that, and it suggested 5 Usenet groups, which google still lets you search through:

What were the hits? Glad you asked!

comp.os.linux: a 1993 discussion about supporting binary-only software on Linux

This conversation predates the OSI by five whole years and leaves very little to the imagination:

The GPL and the open source code have made Linux the success that it is. Cygnus and other commercial interests are quite comfortable with this open paradigm, and in fact prosper. One need only pull the source code to GCC and read the list of many commercial contributors to realize this.

comp.os.linux.announce: 1996 announcement of Caldera’s open-source environment

In November 1996 Caldera shows up again, this time with a Linux based “open-source” environment:

Channel Partners can utilize Caldera’s Linux-based, open-source environment to remotely manage Windows 3.1 applications at home, in the office or on the road. By using Caldera’s OpenLinux (COL) and Wabi solution, resellers can increase sales and service revenues by leveraging the rapidly expanding telecommuter/home office market. Channel Partners who create customized turn-key solutions based on environments like SCO OpenServer 5 or Windows NT,

comp.os.linux.announce: 1996 announcement of a trade show

On 17 Oct 1996 we find this announcement

There will be a Open Systems World/FedUnix conference/trade show in Washington DC on November 4-8. It is a traditional event devoted to open computing (read: Unix), attended mostly by government and commercial Information Systems types.

In particular, this talk stands out to me:

** Schedule of Linux talks, OSW/FedUnix'96, Thursday, November 7, 1996 ***
(…)
11:45 Alexander O. Yuriev, “Security in an open source system: Linux study

The context here seems to be open standards, and maybe also the open source development model.

1990: Tony Patti on “software developed from open source material”

in 1990, a magazine editor by name of Tony Patti not only refers to Open Source software but mentions that NSA in 1987 referred to “software was developed from open source material”

1995: open-source changes emails on OpenBSD-misc email list

I could find one mention of “Open-source” on an OpenBSD email list, seems there was a directory “open-source-changes” which had incoming patches, distributed over email. (source). Though perhaps the way to interpret is, to say it concerns “source-changes” to OpenBSD, paraphrased to “open”, so let’s not count this one.

(I did not look at other BSD’s)

Bryan Lunduke’s research

Bryan Lunduke has done similar research and found several more USENET posts about “open source”, clearly in the context of of source software, predating OSI by many years. He breaks it down on his substack. Some interesting examples he found:

19 August, 1993 post to comp.os.ms-windows

Anyone else into “Source Code for NT”? The tools and stuff I’m writing for NT will be released with source. If there are “proprietary” tricks that MS wants to hide, the only way to subvert their hoarding is to post source that illuminates (and I don’t mean disclosing stuff obtained by a non-disclosure agreement).

(source)

Then he writes:

Open Source is best for everyone in the long run.

Written as a matter-of-fact generalization to the whole community, implying the term is well understood.

December 4, 1990

BSD’s open source policy meant that user developed software could be ported among platforms, which meant their customers saw a much more cost effective, leading edge capability combined hardware and software platform.

source

1985: The “the computer chronicles documentary” about UNIX.

The Computer Chronicles was a TV documentary series talking about computer technology, it started as a local broadcast, but in 1983 became a national series. On February 1985, they broadcasted an episode about UNIX. You can watch the entire 28 min episode on archive.org, and it’s an interesting snapshot in time, when UNIX was coming out of its shell and competing with MS-DOS with its multi-user and concurrent multi-tasking features. It contains a segment in which Bill Joy, co-founder of Sun Microsystems is being interviewed about Berkley Unix 4.2. Sun had more than 1000 staff members. And now its CTO was on national TV in the United States. This was a big deal, with a big audience. At 13:50 min, the interviewer quotes Bill:

“He [Bill Joy] says its open source code, versatility and ability to work on a variety of machines means it will be popular with scientists and engineers for some time”

“Open Source” on national TV. 13 years before the founding of OSI.

image

Uses of the word “open”

We’re specifically talking about “open source” in this article. But we should probably also consider how the term “open” was used in software, as they are related, and that may have played a role in the rejection of the trademark.

Well, the Open Software Foundation launched in 1988. (10 years before the OSI). Their goal was to make an open standard for UNIX. The word “open” is also used in software, e.g. Common Open Software Environment in 1993 (standardized software for UNIX), OpenVMS in 1992 (renaming of VAX/VMS as an indication of its support of open systems industry standards such as POSIX and Unix compatibility), OpenStep in 1994 and of course in 1996, the OpenBSD project started. They have this to say about their name: (while OpenBSD started in 1996, this quote is from 2006):

The word “open” in the name OpenBSD refers to the availability of the operating system source code on the Internet, although the word “open” in the name OpenSSH means “OpenBSD”. It also refers to the wide range of hardware platforms the system supports.

Does it run DOOM?

The proof of any hardware platform is always whether it can run Doom. Since the DOOM source code was published in December 1997, I thought it would be fun if ID Software would happen to use the term “Open Source” at that time. There are some FTP mirrors where you can still see the files with the original December 1997 timestamps (e.g. this one). However, after sifting through the README and other documentation files, I only found references to the “Doom source code”. No mention of Open Source.

The origins of the famous “Open Source” trademark application: SPI, not OSI

This is not directly relevant, but may provide useful context: In June 1997 the SPI (“Software In the Public Interest”) organization was born to support the Debian project, funded by its community, although it grew in scope to help many more free software / open source projects. It looks like Bruce, as as representative of SPI, started the “Open Source” trademark proceedings. (and may have paid for it himself). But then something happened, 3/4 of the SPI board (including Bruce) left and founded the OSI, which Bruce announced along with a note that the trademark would move from SPI to OSI as well. Ian Jackson - Debian Project Leader and SPI president - expressed his “grave doubts” and lack of trust. SPI later confirmed they owned the trademark (application) and would not let any OSI members take it. The perspective of Debian developer Ean Schuessler provides more context.

A few years later, it seems wounds were healing, with Bruce re-applying to SPI, Ean making amends, and Bruce taking the blame.

All the bickering over the Trademark was ultimately pointless, since it didn’t go through.

Searching for SPI on the OSI website reveals no acknowledgment of SPI’s role in the story. You only find mentions in board meeting notes (ironically, they’re all requests to SPI to hand over domains or to share some software).

By the way, in November 1998, this is what SPI’s open source web page had to say:

Open Source software is software whose source code is freely available

A Trademark that was never meant to be.

Lawyer Kyle E. Mitchell knows how to write engaging blog posts. Here is one where he digs further into the topic of trademarking and why “open source” is one of the worst possible terms to try to trademark (in comparison to, say, Apple computers).

He writes:

At the bottom of the hierarchy, we have “descriptive” marks. These amount to little more than commonly understood statements about goods or services. As a general rule, trademark law does not enable private interests to seize bits of the English language, weaponize them as exclusive property, and sue others who quite naturally use the same words in the same way to describe their own products and services.
(…)
Christine Peterson, who suggested “open source” (…) ran the idea past a friend in marketing, who warned her that “open” was already vague, overused, and cliche.
(…)
The phrase “open source” is woefully descriptive for software whose source is open, for common meanings of “open” and “source”, blurry as common meanings may be and often are.
(…)
no person and no organization owns the phrase “open source” as we know it. No such legal shadow hangs over its use. It remains a meme, and maybe a movement, or many movements. Our right to speak the term freely, and to argue for our own meanings, understandings, and aspirations, isn’t impinged by anyone’s private property.

So, we have here a great example of the Trademark system working exactly as intended, doing the right thing in the service of the people: not giving away unique rights to common words, rights that were demonstrably never OSI’s to have.

I can’t decide which is more wild: OSI’s audacious outcries for the whole world to forget about the trademark failure and trust their “pinky promise” right to authority over a common term, or the fact that so much of the global community actually fell for it and repeated a misguided narrative without much further thought. (myself included)

I think many of us, through our desire to be part of a movement with a positive, fulfilling mission, were too easily swept away by OSI’s origin tale.

Co-opting a term

OSI was never relevant as an organization and hijacked a movement that was well underway without them.

(source: a harsh but astute Slashdot comment)

We have plentiful evidence that “Open Source” was used for at least a decade prior to OSI existing, in the industry, in the community, and possibly in government. You saw it at trade shows, in various newsgroups around Linux and Windows programming, and on national TV in the United States. The word was often uttered without any further explanation, implying it was a known term. For a movement that happened largely offline in the eighties and nineties, it seems likely there were many more examples that we can’t access today.

“Who was first?” is interesting, but more relevant is “what did it mean?”. Many of these uses were fairly informal and/or didn’t consider re-distribution. We saw these meanings:

  • a collaborative development model
  • portability across hardware platforms, open standards
  • disclosing (making available) of source code, sometimes with commercial limitations (e.g. per-seat licensing) or restrictions (e.g. non-compete)
  • possibly a buzz-word in the TV documentary

Then came the OSD which gave the term a very different, and much more strict meaning, than what was already in use for 15 years. However, the OSD was refined, “legal-aware” and the starting point for an attempt at global consensus and wider industry adoption, so we are far from finished with our analysis.

(ironically, it never quite matched with free software either - see this e-mail or this article)

Legend has it…

Repeat a lie often enough and it becomes the truth

Yet, the OSI still promotes their story around being first to use the term “Open Source”. RedHat’s article still claims the same. I could not find evidence of resolution. I hope I just missed it (please let me know!). What I did find, is one request for clarification remaining unaddressed and another handled in a questionable way, to put it lightly. Expand all the comments in the thread and see for yourself For an organization all about “open”, this seems especially strange. Seems we have veered far away from the “We will not hide problems” motto in the Debian Social Contract.

Real achievements are much more relevant than “who was first”. Here are some suggestions for actually relevant ways the OSI could introduce itself and its mission:

  • “We were successful open source practitioners and industry thought leaders”
  • “In our desire to assist the burgeoning open source movement, we aimed to give it direction and create alignment around useful terminology”.
  • “We launched a campaign to positively transform the industry by defining the term - which had thus far only been used loosely - precisely and popularizing it”

I think any of these would land well in the community. Instead, they are strangely obsessed with “we coined the term, therefore we decide its meaning. and anything else is “flagrant abuse”.

Is this still relevant? What comes next?

Trust takes years to build, seconds to break, and forever to repair

I’m quite an agreeable person, and until recently happily defended the Open Source Definition. Now, my trust has been tainted, but at the same time, there is beauty in knowing that healthy debate has existed since the day OSI was announced. It’s just a matter of making sense of it all, and finding healthy ways forward.

Most of the events covered here are from 25 years ago, so let’s not linger too much on it. There is still a lot to be said about adoption of Open Source in the industry (and the community), tension (and agreements!) over the definition, OSI’s campaigns around awareness and standardization and its track record of license approvals and disapprovals, challenges that have arisen (e.g. ethics, hyper clouds, and many more), some of which have resulted in alternative efforts and terms. I have some ideas for productive ways forward.

Stay tuned for more, sign up for the RSS feed and let me know what you think!
Comment below, on X or on HackerNews

August 29, 2024

In his latest Lex Fridman appearance, Elon Musk makes some excellent points about the importance of simplification.

Follow these steps:

  1. Simplify the requirements
  2. For each step, try to delete it altogether
  3. Implement well

1. Simplify the Requirements

Even the smartest people come up with requirements that are, in part, dumb. Start by asking yourself how they can be simplified.

There is no point in finding the perfect answer to the wrong question. Try to make the question as least wrong as possible.

I think this is so important that it is included in my first item of advice for junior developers.

There is nothing so useless as doing efficiently that which should not be done at all.

2. Delete the Step

For each step, consider if you need it at all, and if not, delete it. Certainty is not required. Indeed, if you only delete what you are 100% certain about, you will leave in junk. If you never put things back in, it is a sign you are being too conservative with deletions.

The best part is no part.

Some further commentary by me:

This applies both to the product and technical implementation levels. It’s related to YAGNI, Agile, and Lean, also mentioned in the first section of advice for junior developers.

It’s crucial to consider probabilities and compare the expected cost/value of different approaches. Don’t spend 10 EUR each day to avoid a 1% chance of needing to pay 100 EUR. Consistent Bayesian reasoning will reduce making such mistakes, though Elon’s “if you do not put anything back in, you are not removing enough” heuristic is easier to understand and implement.

3. Implement Well

Here, Elon talks about optimization and automation, which are specific to his problem domain of building a supercomputer. More generally, this can be summarized as good implementation, which I advocate for in my second section of advice for junior developers.

 

The relevant segment begins at 43:48.

The post Simplify and Delete appeared first on Entropy Wins.

August 27, 2024

I just reviewed the performance of a customer’s WordPress site. Things got a lot worse he wrote and he assumed Autoptimize (he was a AOPro user) wasn’t working any more and asked me to guide him to fix the issue. Instead it turns out he installed CookieYes, which adds tons of JS (part of which is render-blocking), taking 3.5s of main thread work and (fasten your seat-belts) which somehow seems…

Source

August 26, 2024

Building businesses based on an Open Source project is like balancing a solar system. Like the sun is the center of our own little universe, powering life on the planets which revolve around it in a brittle, yet tremendously powerful astrophysical equilibrium; so is the relationship between a thriving open source project, with a community, one or more vendors and their commercially supported customers revolving around it, driven by astronomical aspirations.

Source-available & Non-Compete licensing have existed in various forms, and have been tweaked and refined for decades, in an attempt to combine just enough proprietary conditions with just enough of Open Source flavor, to find that perfect trade-off. Fair Source is the latest refinement for software projects driven by a single vendor wanting to combine monetization, a high rate of contributions to the project (supported by said monetization), community collaboration and direct association with said software project.

Succinctly, Fair Source licenses provide much of the same benefits to users as Open Source licenses, although outsiders are not allowed to build their own competing service based on the software; however after 2 years the software automatically becomes MIT or Apache2 licensed, and at that point you can pretty much do whatever you want with the older code.

To avoid confusion, this project is different from:

It seems we have reached an important milestone in 2024: on the surface, “Fair Source” is yet another new initiative that positions itself as a more business friendly alternative to “Open Source”, but the delayed open source publication (DSOP) model has been refined to the point where the licenses are succinct, clear, easy to work with and should hold up well in court. Several technology companies are choosing this software licensing strategy (Sentry being the most famous one, you can see the others on their website).

My 2 predictions:

  • we will see 50-100 more companies in the next couple of years.
  • a governance legal entity will appear soon, and a trademark will follow after.

In this article, I’d like to share my perspective and address some - what I believe to be - misunderstandings in current discourse.

The licenses

At this time, the Fair Source ideology is implemented by the following licenses:

BSL/BUSL are more tricky to understand can have different implementations. FCL and FSL are nearly identical. They are clearly and concisely written and embody the Fair Source spirit in the most pure form.

Seriously, try running the following in your terminal. Sometimes as an engineer you have to appreciate legal text when it’s this concise, easy to understand, and diff-able!

wget https://raw.githubusercontent.com/keygen-sh/fcl.dev/master/FCL-1.0-MIT.md
wget https://fsl.software/FSL-1.1-MIT.template.md
diff FSL-1.1-MIT.template.md FCL-1.0-MIT.md

I will focus on FSL and FCL and FSL, the Fair Source “flagship licenses”.

Is it “open source, fixed”, or an alternative to open source? Neither.

First, we’ll need to agree on what the term “Open Source” means. This itself has been a battle for decades, with non-competes (commercial restrictions) being especially contentious and in use even before OSI came along, so I’m working on an article which challenges OSI’s Open Source Definition which I will publish soon. However, the OSD is probably the most common understanding in the industry today - so we’ll use that here - and it seems that folks behind FSL/Fair Source made the wise decision to distance themselves from these contentious debates: after some initial conversations about FSL using the “Open Source” term, they’ve adopted the less common term of “Fair Source” and I’ve seen a lot of meticulous work (e.g. fsl#2 and fsl#10 on how they articulate what they stand for. (the Open Source Definition debate is why I hope the Fair Source folks will file a trademark if this projects gains more traction.

Importantly, OSI’s definition of “Open Source” includes non-discrimination and free redistribution.

When you check out code that is FSL licensed, and the code was authored:

  1. less than 2 years ago: it’s available to you under terms similar to MIT, except you cannot compete with the author by making a similar service using the same software
  2. more than 2 years ago: it is now MIT licensed. (or Apache2, when applicable)

While after 2 years, it is clearly open source, the non-compete clause in option 1 is not compatible with the set of terms set forth by the OSI Open Source Definition. (or freedom 0 from the 4 freedoms of Free Software). Such a license is often referred to as “Source Available”.

So, Fair Source is a system to combine 2 licenses (an Open Source one and a Source Available one with proprietary conditions) in one. I think this is very clever approach, but I think it’s not all that useful to compare this to Open Source. Rather, it has a certain symmetry to Open Core:

  • In an Open Core product, you have a “scoped core”: a core built from open source code which is surrounded by specifically scoped pieces from proprietary code, for a indeterminate, but usually many-year or perpetual timeframe
  • With Fair Source, you have a “timed core”: the open source core is all the code that’s more than 2 years old, and the proprietary bits are the most recent developments (regardless which scope they belong to).

Open Core and Fair Source both try to balance open source with business interests: both have an open source component to attract a community, and a proprietary shell to make a business more viable. Fair Source is a licensing choice that’s only relevant to business, not individuals. How many business monetize pure Open Source software? I can count them on one hand. The vast majority go for something like Open Core. This is why the comparison with Open Core makes much more sense.

A lot of the criticisms of Fair Source suddenly become a lot more palatable when you consider it an alternative to Open Core.

As a customer, which is more tolerable? proprietary features or a proprietary 2-years worth of product developments? I don’t think it matters nearly as much as some of the advantages Fair Source has over Open Core:

  • Users can view, modify and distribute (but not commercialize) the proprietary code. (with Open Core, you only get the binaries)
  • It follows then, that the project can use a single repository and single license (with Open Core, there are multiple repositories and licenses involved)

Technically, Open Core is more of a business architecture, where you still have to figure out which licenses you want to use for the core and shell, whereas Fair Source is more of a prepackaged solution which defines the business architecture as well as the 2 licenses to use.

image

Note that you can also devise hybrid approaches. Here are some ideas:

  • a Fair Source core and Closed Source shell. (more defensive than Open Core or Fair Source separately). (e.g. PowerSync does this)
  • an Open Source core, with Fair Source shell. (more open than Open Core or Fair Source separately).
  • Open Source Core, with Source Available shell (users can view, modify and distribute the code but not commercialize it, and without the delayed open source publication). This would be the “true” symmetrical counterpart to Fair Source. It is essentially Open Core where the community also has access to the proprietary features (but can’t commercialize those). It would also allow to put all code in the same repository. (although this benefit works better with Fair Source because any contributed code will definitely become open source, thus incentivizing the community more). I find this a very interesting option that I hope Open Core vendors will start considering. (although it has little to do with Fair Source).
  • etc.

Non-Competition

The FSL introduction post states:

In plain language, you can do anything with FSL software except economically undermine its producer through harmful free-riding

The issue of large cloud vendors selling your software as a service, making money, and contributing little to nothing back to the project, has been widely discussed under a variety of names. This can indeed severely undermine a project’s health, or kill it.

(Personally, I find discussions around whether this is “fair” not very useful. Businesses will act in their best interest, you can’t change the rules of the game, you only have control over how you play the game, i.o.w. your own licensing and strategy)

Here, we’ll just use the same terminology that the FSL does, the “harmful free-rider” problem

However, the statement above is incorrect. Something like this would be more correct:

In plain language, you can do anything with FSL software except offer a similar paid service based on the software when it’s less than 2 years old.

What’s the difference? There are different forms of competition that are not harmful free-riding.

Multiple companies can offer a similar service/product which they base on the same project, which they all contribute to. They can synergize and grow the market together. (aka “non-zero-sum” if you want to sound smart). I think there are many good examples of this, e.g. Hadoop, Linux, Node.js, OpenStack, Opentelemetry, Prometheus, etc.

When the FSL website makes statements such as “You can do anything with FSL software except undermine its producer”, it seems to forget some of the best and most ubiquitous software in the world is the result of synergies between multiple companies collaborating.

Furthermore, when the company who owns the copyright on the project turns their back on their community/customers wouldn’t the community “deserve” a new player who offers a similar service, but on friendly terms? The new player may even contribute more to the project. Are they a harmful free-rider? Who gets to be judge of that?

Let’s be clear, FSL allows no competition whatsoever, at least not during the first 2 years. What about after 2 years?

Zeke Gabrielse, one of the shepherds of Fair Source, said it well here:

Being 2 years old also puts any SaaS competition far enough back to not be a concern

Therefore, you may as well say no competition is allowed. Although, in Zeke’s post, I presume he was writing from the position of an actively developing software project. If it becomes abandoned, the 2 years countdown is an obstacle, an overcomeable one, that eventually does let you compete, but in this case, the copyright holder probably went bust, so you aren’t really competing with them either. The 2 year window is not designed to enable competition, instead it is a contingency plan for when the company goes bankrupt. The wait can be needlessly painful for the community in such a situation. If a company is about to go bust, they could immediately release their Fair Source code as Open Source, but I wonder if this can be automated via the actual license text.

(I had found some ambiguous use of the term “direct” competition which I’ve reported and has since been resolved)

Perverse incentives

Humans are notoriously bad about predicting 2nd order effects. So I like to try to. What could be some second order effects of Fair Source projects? And how do they compare to Open Core?

  • can companies first grow on top of their Fair Source codebase, take community contributions, and then switch to more restrictive, or completely closed licensing, shutting out the community? Yes if a CLA is in place (or using the 2 year old code). (this isn’t any different from any other CLA using Open Source or Open Core project. Though with Open Core, you can’t take in external contributions on proprietary parts to begin with)
  • if you enjoy a privileged position where others can’t meaningfully compete with you based on the same source code, that can affect how the company treats its community and its customers. It can push through undesirable changes, it can price more aggressively, etc. (these issues are the same with Open Core)
  • With Open Source & Open Core, the company is incentivized to make the code well understood by the community. Under Fair Source it would still be sensible (in order to get free contributions), but at the same time, by hiding design documents, subtly obfuscating the code and withholding information it can also give itself the edge for when the code does become Open Source, although as we’ve seen, the 2 year delay makes competition unrealistic anyway.

All in all, nothing particularly worse than Open Core, here.

Developer sustainability

The FSL introduction post says:

We value user freedom and developer sustainability. Free and Open Source Software (FOSS) values user freedom exclusively. That is the source of its success, and the source of the free-rider problem that occasionally boils over into a full-blown tragedy of the commons, such as Heartbleed and Log4Shell.

F/OSS indeed doesn’t involve itself with sustainability, because of the simple fact that Open Source has nothing to do business models and monetization. As stated above, it makes more sense to compare to Open Core.

It’s like saying asphalt paving machinery doesn’t care about funding and is therefore to blame when roads don’t get built. Therefore we need tolls. But it would be more useful to compare tolls to road taxes and vignettes.

Of course it happens that people dedicate themselves to writing open source projects, usually driven by their interests, don’t get paid, get volumes of support requests (incl. from commercial entities), which can become suffering, and can also lead to codebases becoming critically important, yet critically misunderstood and fragile. This is clearly a situation to avoid, and there are many ways to solve the problem ranging from sponsorships (e.g. GitHub, tidelift), bounty programs (e.g. Algora), direct funding (e.g. Sentry’s 500k donation) and many more initiatives that have launched in the last few years. Certainly a positive development. Sometimes formally abandoning a project is also a clear sign that puts the burden of responsibility onto whoever consumes it and can be a relief to the original author. If anything, it can trigger alarm bells within corporations and be a fast path to properly engaging and compensating the author. There is no way around the fact that developers (and people in general) are generally responsible for their own well being and sometimes need to put their foot down, or put on their business hat (which many developers don’t like to do) if their decision to open source project is resulting in problems. No amount of licensing can change this hard truth.

Furthermore, you can make money via Open Core around OSI approved open source projects (e.g. Grafana), consulting/support, and many companies that pay developers to work on (pure) Open Source code (Meta, Microsft, Google, etc are the most famous ones, but there are many smaller ones). Companies that try to achieve sustainability (and even thriving) on pure open source software for which they are the main/single driving force, are extremely rare. (Chef tried, and now System Initiative is trying to do it better. I remain skeptical but am hopeful and am rooting for them to prove the model)

Doesn’t it sound a bit ironic that the path to getting developers paid is releasing your software via a non-compete license?

Do we reach developer sustainability by preventing developers from making money on top of projects they want to - or already have - contribute(d) to?

Important caveats:

  • Fair Source does allow to make money via consulting and auxiliary services related to the software.
  • Open Core shuts out people similarly, but many of the business models above, don’t.

CLA needed?

When a project uses an Open Source license with some restrictions (e.g. GPL with its copyleft) it is common to use a CLA such that the company backing it can use more restrictive or commercial licenses (either as a license change later on, or as dual licensing). With Fair Source (and indeed all Source Available licenses), this is also the the case.

However, unlike Open Source licenses, with Fair Source / Source Available licenses, a CLA becomes much more of a necessity, because such a license without CLA isn’t compatible with anything else, and the commercial FSL restriction may not always apply to outside contributions (it depends on e.g. whether it can be offered stand-alone). I’m not a lawyer, for more clarity you should consult with one. I think the Fair Source website, at least their adoption guide should mention something about CLA’s, because it’s an important step beyond simply choosing a license and publishing, so I’ve raised this with them.

AGPL

The FSL website states:

AGPLv3 is not permissive enough. As a highly viral copyleft license, it exposes users to serious risk of having to divulge their proprietary source code.

This looks like fear mongering.

  • AGPL is not categorically less permissive than FSL. It is less permissive when the code is 2 years old or older (and the FSL has turned into MIT/Apache2). For current and recent code, AGPL permits competition; FSL does not.
  • The world “viral” is more divisive than accurate. In my mind, complying with AGPL is rather easy, my rule of thumb is to say you trigger copyleft when you “ship”. Most engineers have an intuitive understanding of what it means to “ship” a feature, whether that’s on cloud, or on-prem. In my experience, people struggle more with patent clauses or even the relation between trademarks and software licensing than they do with copyleft. There’s still some level of uncertainty and caution around AGPL, mainly due to its complexity. (side note: Google and CNCF doesn’t allow copyleft licenses, and their portfolio doesn’t have a whole lot of commercial success to show for it, I see mainly projects that can easily be picked up by Google)

Heather Meeker, the lawyer consulted to draft up the FSL has spoken out against the virality discourse and tempering the FUD around AGPL

Conclusion

I think Fair Source, the FSL and FCL have a lot to offer. Throughout my analysis I may have raised some criticisms, but if anything, it reminds me of how much Open Core can suck (though it depends on the relative size of core vs shell). So I find it a very compelling alternative to Open Core. Despite some poor choices of wording, I find it well executed: It ties up a lot of loose ends from previous initiatives (Source Available, BSL and other custom licenses) into a neat package. Despite the need for a CLA it’s still quite easy to implement and is arguably more viable than Open Core is, in its current state today. When comparing to Open Source, the main question is: which is worse, the “harmful free-rider problem”, or the non-compete? (Anecdotally, my gut feeling says the former, but I’m on the look out for data driven evidence). When comparing to Open Core, the main question is: is a business more viable keeping proprietary features closed, or making them source-available (non-compete)?.

As mentioned, there are many more hybrid approaches possible. For a business thinking about their licensing strategy, it may make sense to think of these questions separately:

  • should our proprietary shell be time based or feature scoped? Does it matter?
  • should our proprietary shell be closed, or source-available?

I certainly would prefer to see companies and projects appear:

  • as Fair Source, rather than not at all
  • as Open Core, rather than not at all
  • as Fair Source, rather than Open Core (depending on “shell thickness”).
  • with more commercial restrictions from the get-go, instead of starting more permissively and re-licensing later. Just kidding, but that’s a topic for another day.

For vendors, I think there are some options left to explore, such as the Open Core with an source available (instead of closed) shell. Something to consider for any company doing Open Core today. For end-users / customers, “Open Source” vendors are not the only ones to be taken with a grain of salt, it’s the same with Fair Source, since they may have a more complicated arrangement rather than just using a Fair Source license.

Thanks to Heather Meeker and Joseph Jacks for providing input, although this article reflects only my personal views.

August 25, 2024

I made some time to give some love to my own projects and spent some time rewriting the Ansible role stafwag.ntpd and cleaning up some other Ansible roles.

There is some work ongoing for some other Ansible roles/projects, but this might be a topic for some other blog post(s) ;-)

freebsd with smartcard

stafwag.ntpd


An ansible role to configure ntpd/chrony/systemd-timesyncd.


This might be controversial, but I decided to add support for chrony and systemd-timesyncd. Ntpd is still supported and the default on the BSDs ( FreeBSD, NetBSD, OpenBSD).

It’s possible to switch from the ntp implementation by using the ntpd.provider directive.

The Ansible role stafwag.ntpd v2.0.0 is available at:

Release notes

V2.0.0

  • Added support for chrony and systemd-timesyncd on GNU/Linux
    • systemd-timesynced is the default on Debian GNU/Linux 12+ and Archlinux
    • ntpd is the default on all operating systems (BSDs, Solaris) and Debian GNU/Linux 10 and 11
    • chrony is the default on all other GNU/Linux distributes
    • For ntpd hash as the input for the role.
    • Updated README
    • CleanUp

stafwag.ntpdate


An ansible role to activate the ntpdate service on FreeBSD and NetBSD.


The ntpdate service is used on FreeBSD and NetBSD to sync the time during the system boot-up. On most Linux distributions this is handled by chronyd or systemd-timesyncd now. The OpenBSD ntpd implementation OpenNTPD also has support to sync the time during the system boot-up.

The role is available at:

Release notes

V1.0.0

  • Initial release on Ansible Galaxy
    • Added support for NetBSD

stafwag.libvirt


An ansible role to install libvirt/KVM packages and enable the libvirtd service.


The role is available at:

Release notes

V1.1.3

  • Force bash for shell execution on Ubuntu.
    • Force bash for shell execution on Ubuntu. As the default dash shell doesn’t support pipefail.

V1.1.2

  • CleanUp
    • Corrected ansible-lint errors
    • Removed install task “install/.yml’”;
      • This was introduced to support Kali Linux, Kali Linux is reported as “Debian” now.
      • It isn’t used in this role
    • Removed invalid CentOS-8.yml softlink
      • Removed invalid soft link, Centos 8 should be catched by
      • RedHat-yum.yml

stafwag.cloud_localds


An ansible role to create cloud-init config disk images. This role is a wrapper around the cloud-localds command.


It’s still planned to add support for distributions that don’t have cloud-localds as part of their official package repositories like RedHat 8+.

See the GitHub issue: https://github.com/stafwag/ansible-role-cloud_localds/issues/7

The role is available at:

Release notes

V2.1.3

  • CleanUp
    • Switched to vars and package to install the required packages
    • Corrected ansible-lint errors
    • Added more examples

stafwag.qemu_img


An ansible role to create QEMU disk images.


The role is available at:

Release notes

V2.3.0

  • CleanUp Release
    • Added doc/examples
    • Updated meta data
    • Switched to vars and package to install the required packages
    • Corrected ansible-lint errors

stafwag.virt_install_import


An ansible role to import virtual machine with the virt-install import command


The role is available at:

Release notes

  • Use var and package to install pkgs
    • v1.2.1 wasn’t merged correctly. The release should fix it…
    • Switched to var and package to install the required packages
    • Updated meta data
    • Updated documentation and include examples
    • Corrected ansible-lint errors



Have fun!

August 13, 2024

Here’s a neat little trick for those of you using Home Assistant while also driving a Volvo.

To get your Volvo driving data (fuel level, battery state, …) into Home Assistant, there’s the excellent volvo2mqtt addon.

One little annoyance is that every time it starts up, you will receive an e-mail from Volvo with a two-factor authentication code, which you then have to enter in Home Assistant.

Fortunately, there’s a solution for that, you can automate this using the built-in imap support of Home Assistant, with an automation such as this one:

alias: Volvo OTP
description: ""
trigger:
  - platform: event
    event_type: imap_content
    event_data:
      initial: true
      sender: no-reply@volvocars.com
      subject: Your Volvo ID Verification code
condition: []
action:
  - service: mqtt.publish
    metadata: {}
    data:
      topic: volvoAAOS2mqtt/otp_code
      payload: >-
        {{ trigger.event.data['text'] | regex_findall_index(find='Your Volvo ID verification code is:\s+(\d+)', index=0) }}
  - service: imap.delete
    data:
      entry: "{{ trigger.event.data['entry_id'] }}"
      uid: "{{ trigger.event.data['uid'] }}"
mode: single

This will post the OTP code to the right location and then delete the message from your inbox (if you’re using Google Mail, that means archiving it).


Comments | More on rocketeer.be | @rubenv on Twitter

July 28, 2024


Updated @ Mon Sep 2 07:55:20 PM CEST 2024: Added devfs section
Updated @ Wed Sep 4 07:48:56 PM CEST 2024 : Corrected gpg-agent.conf


I use FreeBSD and GNU/Linux. freebsd with smartcard

In a previous blog post, we set up GnuPG with smartcard support on Debian GNU/Linux.

In this blog post, we’ll install and configure GnuPG with smartcard support on FreeBSD.

The GNU/Linux blog post provides more details about GnuPG, so it might be useful for the FreeBSD users to read it first.

Likewise, Linux users are welcome to read this blog post if they’re interested in how it’s done on FreeBSD ;-)

Install the required packages

To begin, we need to install the required packages on FreeBSD.

Update the package database

Execute pkg update to update the package database.

Thunderbird

[staf@monty ~]$ sudo pkg install -y thunderbird
Password:
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
[staf@monty ~]$ 

lsusb

You can verify the USB devices on FreeBSD using the usbconfig command or lsusb which is also available on FreeBSD as part of the usbutils package.

[staf@monty ~/git/stafnet/blog]$ sudo pkg install usbutils
Password:
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 3 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	usbhid-dump: 1.4
	usbids: 20240318
	usbutils: 0.91

Number of packages to be installed: 3

301 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/3] Fetching usbutils-0.91.pkg: 100%   54 KiB  55.2kB/s    00:01    
[2/3] Fetching usbhid-dump-1.4.pkg: 100%   32 KiB  32.5kB/s    00:01    
[3/3] Fetching usbids-20240318.pkg: 100%  215 KiB 220.5kB/s    00:01    
Checking integrity... done (0 conflicting)
[1/3] Installing usbhid-dump-1.4...
[1/3] Extracting usbhid-dump-1.4: 100%
[2/3] Installing usbids-20240318...
[2/3] Extracting usbids-20240318: 100%
[3/3] Installing usbutils-0.91...
[3/3] Extracting usbutils-0.91: 100%
[staf@monty ~/git/stafnet/blog]$

GnuPG

We’ll need GnuPG ( of course ), so ensure that it is installed.

[staf@monty ~]$ sudo pkg install gnupg
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
[staf@monty ~]$ 

Smartcard packages

To enable smartcard support on FreeBSD, we’ll need to install the smartcard packages. The same software as on GNU/Linux - opensc - is available on FreeBSD.

pkg provides

It’s handy to be able to check which packages provide certain files. On FreeBSD this is provided by the provides plugin. This plugin is not enabled by default in the pkg command.

To install in the provides plugin install the pkg-provides package.

[staf@monty ~]$ sudo pkg install pkg-provides
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        pkg-provides: 0.7.3_3

Number of packages to be installed: 1

12 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching pkg-provides-0.7.3_3.pkg: 100%   12 KiB  12.5kB/s    00:01    
Checking integrity... done (0 conflicting)
[1/1] Installing pkg-provides-0.7.3_3...
[1/1] Extracting pkg-provides-0.7.3_3: 100%
=====
Message from pkg-provides-0.7.3_3:

--
In order to use the pkg-provides plugin you need to enable plugins in pkg.
To do this, uncomment the following lines in /usr/local/etc/pkg.conf file
and add pkg-provides to the supported plugin list:

PKG_PLUGINS_DIR = "/usr/local/lib/pkg/";
PKG_ENABLE_PLUGINS = true;
PLUGINS [ provides ];

After that run `pkg plugins' to see the plugins handled by pkg.
[staf@monty ~]$ 

Edit the pkg configuration to enable the provides plug-in.

staf@freebsd-gpg:~ $ sudo vi /usr/local/etc/pkg.conf
PKG_PLUGINS_DIR = "/usr/local/lib/pkg/";
PKG_ENABLE_PLUGINS = true;
PLUGINS [ provides ];

Verify that the plugin is enabled.

staf@freebsd-gpg:~ $ sudo pkg plugins
NAME       DESC                                          VERSION   
provides   A plugin for querying which package provides a particular file 0.7.3     
staf@freebsd-gpg:~ $ 

Update the pkg-provides database.

staf@freebsd-gpg:~ $ sudo pkg provides -u
Fetching provides database: 100%   18 MiB   9.6MB/s    00:02    
Extracting database....success
staf@freebsd-gpg:~ $

Install the required packages

Let’s check which packages provide the tools to set up the smartcard reader on FreeBSD. And install the required packages.

staf@freebsd-gpg:~ $ pkg provides "pkcs15-tool"
Name    : opensc-0.25.1
Comment : Libraries and utilities to access smart cards
Repo    : FreeBSD
Filename: usr/local/share/man/man1/pkcs15-tool.1.gz
          usr/local/etc/bash_completion.d/pkcs15-tool
          usr/local/bin/pkcs15-tool
staf@freebsd-gpg:~ $ 
staf@freebsd-gpg:~ $ pkg provides "bin/pcsc"
Name    : pcsc-lite-2.2.2,2
Comment : Middleware library to access a smart card using SCard API (PC/SC)
Repo    : FreeBSD
Filename: usr/local/sbin/pcscd
          usr/local/bin/pcsc-spy
staf@freebsd-gpg:~ $ 
[staf@monty ~]$ sudo pkg install opensc pcsc-lite
Password:
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
[staf@monty ~]$ 
staf@freebsd-gpg:~ $ sudo pkg install -y pcsc-tools
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
staf@freebsd-gpg:~ $ 
staf@freebsd-gpg:~ $ sudo pkg install -y ccid
Password:
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
staf@freebsd-gpg:~ $ 

USB

To use the smartcard reader we will need access to the USB devices as the user we use for our desktop environment. (No, this shouldn’t be the root user :-) )

permissions

verify

Execute the usbconfig command to verify that you can access the USB devices.

[staf@snuffel ~]$ usbconfig
No device match or lack of permissions.
[staf@snuffel ~]$ 

If you don’t have access, verify the permissions of the USB devices.

[staf@snuffel ~]$ ls -l /dev/usbctl
crw-r--r--  1 root operator 0x5b Sep  2 19:17 /dev/usbctl
[staf@snuffel ~]$  ls -l /dev/usb/
total 0
crw-------  1 root operator 0x34 Sep  2 19:17 0.1.0
crw-------  1 root operator 0x4f Sep  2 19:17 0.1.1
crw-------  1 root operator 0x36 Sep  2 19:17 1.1.0
crw-------  1 root operator 0x53 Sep  2 19:17 1.1.1
crw-------  1 root operator 0x7e Sep  2 19:17 1.2.0
crw-------  1 root operator 0x82 Sep  2 19:17 1.2.1
crw-------  1 root operator 0x83 Sep  2 19:17 1.2.2
crw-------  1 root operator 0x76 Sep  2 19:17 1.3.0
crw-------  1 root operator 0x8a Sep  2 19:17 1.3.1
crw-------  1 root operator 0x8b Sep  2 19:17 1.3.2
crw-------  1 root operator 0x8c Sep  2 19:17 1.3.3
crw-------  1 root operator 0x8d Sep  2 19:17 1.3.4
crw-------  1 root operator 0x38 Sep  2 19:17 2.1.0
crw-------  1 root operator 0x56 Sep  2 19:17 2.1.1
crw-------  1 root operator 0x3a Sep  2 19:17 3.1.0
crw-------  1 root operator 0x51 Sep  2 19:17 3.1.1
crw-------  1 root operator 0x3c Sep  2 19:17 4.1.0
crw-------  1 root operator 0x55 Sep  2 19:17 4.1.1
crw-------  1 root operator 0x3e Sep  2 19:17 5.1.0
crw-------  1 root operator 0x54 Sep  2 19:17 5.1.1
crw-------  1 root operator 0x80 Sep  2 19:17 5.2.0
crw-------  1 root operator 0x85 Sep  2 19:17 5.2.1
crw-------  1 root operator 0x86 Sep  2 19:17 5.2.2
crw-------  1 root operator 0x87 Sep  2 19:17 5.2.3
crw-------  1 root operator 0x40 Sep  2 19:17 6.1.0
crw-------  1 root operator 0x52 Sep  2 19:17 6.1.1
crw-------  1 root operator 0x42 Sep  2 19:17 7.1.0
crw-------  1 root operator 0x50 Sep  2 19:17 7.1.1

devfs

When the /dev/usb* are only accessible by the root user. You probably want to create devfs.rules that to grant permissions to the operator or another group.

See https://man.freebsd.org/cgi/man.cgi?devfs.rules for more details.

/etc/rc.conf

Update the /etc/rc.conf to apply custom devfs permissions.

[staf@snuffel /etc]$ sudo vi rc.conf
devfs_system_ruleset="localrules"

/etc/devfs.rules

Create or update the /dev/devfs.rules with the update permissions to grant read/write access to the operator group.

[staf@snuffel /etc]$ sudo vi devfs.rules
[localrules=10]
add path 'usbctl*' mode 0660 group operator
add path 'usb/*' mode 0660 group operator

Restart the devfs service to apply the custom devfs ruleset.

[staf@snuffel /etc]$ sudo -i
root@snuffel:~ #
root@snuffel:~ # service devfs restart

The operator group should have read/write permissions now.

root@snuffel:~ # ls -l /dev/usb/
total 0
crw-rw----  1 root operator 0x34 Sep  2 19:17 0.1.0
crw-rw----  1 root operator 0x4f Sep  2 19:17 0.1.1
crw-rw----  1 root operator 0x36 Sep  2 19:17 1.1.0
crw-rw----  1 root operator 0x53 Sep  2 19:17 1.1.1
crw-rw----  1 root operator 0x7e Sep  2 19:17 1.2.0
crw-rw----  1 root operator 0x82 Sep  2 19:17 1.2.1
crw-rw----  1 root operator 0x83 Sep  2 19:17 1.2.2
crw-rw----  1 root operator 0x76 Sep  2 19:17 1.3.0
crw-rw----  1 root operator 0x8a Sep  2 19:17 1.3.1
crw-rw----  1 root operator 0x8b Sep  2 19:17 1.3.2
crw-rw----  1 root operator 0x8c Sep  2 19:17 1.3.3
crw-rw----  1 root operator 0x8d Sep  2 19:17 1.3.4
crw-rw----  1 root operator 0x38 Sep  2 19:17 2.1.0
crw-rw----  1 root operator 0x56 Sep  2 19:17 2.1.1
crw-rw----  1 root operator 0x3a Sep  2 19:17 3.1.0
crw-rw----  1 root operator 0x51 Sep  2 19:17 3.1.1
crw-rw----  1 root operator 0x3c Sep  2 19:17 4.1.0
crw-rw----  1 root operator 0x55 Sep  2 19:17 4.1.1
crw-rw----  1 root operator 0x3e Sep  2 19:17 5.1.0
crw-rw----  1 root operator 0x54 Sep  2 19:17 5.1.1
crw-rw----  1 root operator 0x80 Sep  2 19:17 5.2.0
crw-rw----  1 root operator 0x85 Sep  2 19:17 5.2.1
crw-rw----  1 root operator 0x86 Sep  2 19:17 5.2.2
crw-rw----  1 root operator 0x87 Sep  2 19:17 5.2.3
crw-rw----  1 root operator 0x40 Sep  2 19:17 6.1.0
crw-rw----  1 root operator 0x52 Sep  2 19:17 6.1.1
crw-rw----  1 root operator 0x42 Sep  2 19:17 7.1.0
crw-rw----  1 root operator 0x50 Sep  2 19:17 7.1.1
root@snuffel:~ # 

Make sure that you’re part of the operator group

staf@freebsd-gpg:~ $ ls -l /dev/usbctl 
crw-rw----  1 root operator 0x5a Jul 13 17:32 /dev/usbctl
staf@freebsd-gpg:~ $ ls -l /dev/usb/
total 0
crw-rw----  1 root operator 0x31 Jul 13 17:32 0.1.0
crw-rw----  1 root operator 0x53 Jul 13 17:32 0.1.1
crw-rw----  1 root operator 0x33 Jul 13 17:32 1.1.0
crw-rw----  1 root operator 0x51 Jul 13 17:32 1.1.1
crw-rw----  1 root operator 0x35 Jul 13 17:32 2.1.0
crw-rw----  1 root operator 0x52 Jul 13 17:32 2.1.1
crw-rw----  1 root operator 0x37 Jul 13 17:32 3.1.0
crw-rw----  1 root operator 0x54 Jul 13 17:32 3.1.1
crw-rw----  1 root operator 0x73 Jul 13 17:32 3.2.0
crw-rw----  1 root operator 0x75 Jul 13 17:32 3.2.1
crw-rw----  1 root operator 0x76 Jul 13 17:32 3.3.0
crw-rw----  1 root operator 0x78 Jul 13 17:32 3.3.1
staf@freebsd-gpg:~ $ 

You’ll need to be part of the operator group to access the USB devices.

Execute the vigr command and add the user to the operator group.

staf@freebsd-gpg:~ $ sudo vigr
operator:*:5:root,staf

Relogin and check that you are in the operator group.

staf@freebsd-gpg:~ $ id
uid=1001(staf) gid=1001(staf) groups=1001(staf),0(wheel),5(operator)
staf@freebsd-gpg:~ $ 

The usbconfig command should work now.

staf@freebsd-gpg:~ $ usbconfig
ugen1.1: <Intel UHCI root HUB> at usbus1, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen2.1: <Intel UHCI root HUB> at usbus2, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen0.1: <Intel UHCI root HUB> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
ugen3.1: <Intel EHCI root HUB> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen3.2: <QEMU Tablet Adomax Technology Co., Ltd> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
ugen3.3: <QEMU Tablet Adomax Technology Co., Ltd> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
staf@freebsd-gpg:~ $ 

SmartCard configuration

Verify the USB connection

The first step is to ensure your smartcard reader is detected on a USB level. Execute usbconfig and lsusb and make sure your smartcard reader is listed.

usbconfig

List the USB devices.

[staf@monty ~/git]$ usbconfig
ugen1.1: <Intel EHCI root HUB> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen0.1: <Intel XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen2.1: <Intel EHCI root HUB> at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen2.2: <Integrated Rate Matching Hub Intel Corp.> at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen1.2: <Integrated Rate Matching Hub Intel Corp.> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen0.2: <AU9540 Smartcard Reader Alcor Micro Corp.> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (50mA)
ugen0.3: <VFS 5011 fingerprint sensor Validity Sensors, Inc.> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen0.4: <Centrino Bluetooth Wireless Transceiver Intel Corp.> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (0mA)
ugen0.5: <SunplusIT INC. Integrated Camera> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (500mA)
ugen0.6: <X-Rite Pantone Color Sensor X-Rite, Inc.> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (100mA)
ugen0.7: <GemPC Key SmartCard Reader Gemalto (was Gemplus)> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (50mA)
[staf@monty ~/git]$ 

lsusb

[staf@monty ~/git/stafnet/blog]$ lsusb
Bus /dev/usb Device /dev/ugen0.7: ID 08e6:3438 Gemalto (was Gemplus) GemPC Key SmartCard Reader
Bus /dev/usb Device /dev/ugen0.6: ID 0765:5010 X-Rite, Inc. X-Rite Pantone Color Sensor
Bus /dev/usb Device /dev/ugen0.5: ID 04f2:b39a Chicony Electronics Co., Ltd 
Bus /dev/usb Device /dev/ugen0.4: ID 8087:07da Intel Corp. Centrino Bluetooth Wireless Transceiver
Bus /dev/usb Device /dev/ugen0.3: ID 138a:0017 Validity Sensors, Inc. VFS 5011 fingerprint sensor
Bus /dev/usb Device /dev/ugen0.2: ID 058f:9540 Alcor Micro Corp. AU9540 Smartcard Reader
Bus /dev/usb Device /dev/ugen1.2: ID 8087:8008 Intel Corp. Integrated Rate Matching Hub
Bus /dev/usb Device /dev/ugen2.2: ID 8087:8000 Intel Corp. Integrated Rate Matching Hub
Bus /dev/usb Device /dev/ugen2.1: ID 0000:0000  
Bus /dev/usb Device /dev/ugen0.1: ID 0000:0000  
Bus /dev/usb Device /dev/ugen1.1: ID 0000:0000  
[staf@monty ~/git/stafnet/blog]$ 

Check the GnuPG smartcard status

Let’s check if we get access to our smart card with gpg.

This might work if you have a native-supported GnuPG smartcard.

[staf@monty ~]$ gpg --card-status
gpg: selecting card failed: Operation not supported by device
gpg: OpenPGP card not available: Operation not supported by device
[staf@monty ~]$ 

In my case, it doesn’t work. I prefer the OpenSC interface, this might be useful if you want to use your smartcard for other usages.

opensc

Enable pcscd

FreeBSD has a handy tool sysrc to manage rc.conf

Enable the pcscd service.

[staf@monty ~]$ sudo sysrc pcscd_enable=YES
Password:
pcscd_enable: NO -> YES
[staf@monty ~]$ 

Start the pcscd service.

[staf@monty ~]$ sudo /usr/local/etc/rc.d/pcscd start
Password:
Starting pcscd.
[staf@monty ~]$ 

Verify smartcard access

pcsc_scan

The opensc-tools package provides a tool - pcsc_scan to verify the smartcard readers.

Execute pcsc_scan to verify that your smartcard is detected.

[staf@monty ~]$ pcsc_scan 
PC/SC device scanner
V 1.7.1 (c) 2001-2022, Ludovic Rousseau <ludovic.rousseau@free.fr>
Using reader plug'n play mechanism
Scanning present readers...
0: Gemalto USB Shell Token V2 (284C3E93) 00 00
1: Alcor Micro AU9540 01 00
 
Thu Jul 25 18:42:34 2024
 Reader 0: Gemalto USB Shell Token V2 (<snip>) 00 00
  Event number: 0
  Card state: Card inserted, 
  ATR: <snip>

ATR: <snip>
+ TS = 3B --> Direct Convention
+ T0 = DA, Y(1): 1101, K: 10 (historical bytes)
  TA(1) = 18 --> Fi=372, Di=12, 31 cycles/ETU
    129032 bits/s at 4 MHz, fMax for Fi = 5 MHz => 161290 bits/s
  TC(1) = FF --> Extra guard time: 255 (special value)
  TD(1) = 81 --> Y(i+1) = 1000, Protocol T = 1 
-----
  TD(2) = B1 --> Y(i+1) = 1011, Protocol T = 1 
-----
  TA(3) = FE --> IFSC: 254
  TB(3) = 75 --> Block Waiting Integer: 7 - Character Waiting Integer: 5
  TD(3) = 1F --> Y(i+1) = 0001, Protocol T = 15 - Global interface bytes following 
-----
  TA(4) = 03 --> Clock stop: not supported - Class accepted by the card: (3G) A 5V B 3V 
+ Historical bytes: 00 31 C5 73 C0 01 40 00 90 00
  Category indicator byte: 00 (compact TLV data object)
    Tag: 3, len: 1 (card service data byte)
      Card service data byte: C5
        - Application selection: by full DF name
        - Application selection: by partial DF name
        - EF.DIR and EF.ATR access services: by GET DATA command
        - Card without MF
    Tag: 7, len: 3 (card capabilities)
      Selection methods: C0
        - DF selection by full DF name
        - DF selection by partial DF name
      Data coding byte: 01
        - Behaviour of write functions: one-time write
        - Value 'FF' for the first byte of BER-TLV tag fields: invalid
        - Data unit in quartets: 2
      Command chaining, length fields and logical channels: 40
        - Extended Lc and Le fields
        - Logical channel number assignment: No logical channel
        - Maximum number of logical channels: 1
    Mandatory status indicator (3 last bytes)
      LCS (life card cycle): 00 (No information given)
      SW: 9000 (Normal processing.)
+ TCK = 0C (correct checksum)

Possibly identified card (using /usr/local/share/pcsc/smartcard_list.txt):
<snip>
        OpenPGP Card V2

 Reader 1: Alcor Micro AU9540 01 00
  Event number: 0
  Card state

pkcs15

pkcs15 is the application interface for hardware tokens while pkcs11 is the low-level interface.

You can use pkcs15-tool -D to verify that your smartcard is detected.

staf@monty ~]$ pkcs15-tool -D
Using reader with a card: Gemalto USB Shell Token V2 (<snip>) 00 00
PKCS#15 Card [OpenPGP card]:
        Version        : 0
        Serial number  : <snip>
        Manufacturer ID: ZeitControl
        Language       : nl
        Flags          : PRN generation, EID compliant


PIN [User PIN]
        Object Flags   : [0x03], private, modifiable
        Auth ID        : 03
        ID             : 02
        Flags          : [0x13], case-sensitive, local, initialized
        Length         : min_len:6, max_len:32, stored_len:32
        Pad char       : 0x00
        Reference      : 2 (0x02)
        Type           : UTF-8
        Path           : 3f00
        Tries left     : 3

PIN [User PIN (sig)]
        Object Flags   : [0x03], private, modifiable
        Auth ID        : 03
        ID             : 01
        Flags          : [0x13], case-sensitive, local, initialized
        Length         : min_len:6, max_len:32, stored_len:32
        Pad char       : 0x00
        Reference      : 1 (0x01)
        Type           : UTF-8
        Path           : 3f00
        Tries left     : 0

PIN [Admin PIN]
        Object Flags   : [0x03], private, modifiable
        ID             : 03
        Flags          : [0x9B], case-sensitive, local, unblock-disabled, initialized, soPin
        Length         : min_len:8, max_len:32, stored_len:32
        Pad char       : 0x00
        Reference      : 3 (0x03)
        Type           : UTF-8
        Path           : 3f00
        Tries left     : 0

Private RSA Key [Signature key]
        Object Flags   : [0x03], private, modifiable
        Usage          : [0x20C], sign, signRecover, nonRepudiation
        Access Flags   : [0x1D], sensitive, alwaysSensitive, neverExtract, local
        Algo_refs      : 0
        ModLength      : 3072
        Key ref        : 0 (0x00)
        Native         : yes
        Auth ID        : 01
        ID             : 01
        MD:guid        : <snip>

Private RSA Key [Encryption key]
        Object Flags   : [0x03], private, modifiable
        Usage          : [0x22], decrypt, unwrap
        Access Flags   : [0x1D], sensitive, alwaysSensitive, neverExtract, local
        Algo_refs      : 0
        ModLength      : 3072
        Key ref        : 1 (0x01)
        Native         : yes
        Auth ID        : 02
        ID             : 02
        MD:guid        : <snip>

Private RSA Key [Authentication key]
        Object Flags   : [0x03], private, modifiable
        Usage          : [0x200], nonRepudiation
        Access Flags   : [0x1D], sensitive, alwaysSensitive, neverExtract, local
        Algo_refs      : 0
        ModLength      : 3072
        Key ref        : 2 (0x02)
        Native         : yes
        Auth ID        : 02
        ID             : 03
        MD:guid        : <snip>

Public RSA Key [Signature key]
        Object Flags   : [0x02], modifiable
        Usage          : [0xC0], verify, verifyRecover
        Access Flags   : [0x02], extract
        ModLength      : 3072
        Key ref        : 0 (0x00)
        Native         : no
        Path           : b601
        ID             : 01

Public RSA Key [Encryption key]
        Object Flags   : [0x02], modifiable
        Usage          : [0x11], encrypt, wrap
        Access Flags   : [0x02], extract
        ModLength      : 3072
        Key ref        : 0 (0x00)
        Native         : no
        Path           : b801
        ID             : 02

Public RSA Key [Authentication key]
        Object Flags   : [0x02], modifiable
        Usage          : [0x40], verify
        Access Flags   : [0x02], extract
        ModLength      : 3072
        Key ref        : 0 (0x00)
        Native         : no
        Path           : a401
        ID             : 03

[staf@monty ~]$ 

GnuPG configuration

First test

Stop (kill) the scdaemon, to ensure that the scdaemon tries to use the opensc interface.

[staf@monty ~]$ gpgconf --kill scdaemon
[staf@monty ~]$ 
[staf@monty ~]$ ps aux | grep -i scdaemon
staf  9236  0.0  0.0   12808   2496  3  S+   20:42   0:00.00 grep -i scdaemon
[staf@monty ~]$ 

Try to read the card status again.

[staf@monty ~]$ gpg --card-status
gpg: selecting card failed: Operation not supported by device
gpg: OpenPGP card not available: Operation not supported by device
[staf@monty ~]$ 

Reconfigure GnuPG

Go to the .gnupg directory in your $HOME directory.

[staf@monty ~]$ cd .gnupg/
[staf@monty ~/.gnupg]$ 

scdaemon

Reconfigure scdaemon to disable the internal ccid and enable logging - always useful to verify why something isn’t working…

[staf@monty ~/.gnupg]$ vi scdaemon.conf
disable-ccid

verbose
debug-level expert
debug-all
log-file    /home/staf/logs/scdaemon.log

gpg-agent

Enable debug logging for the gpg-agent.

[staf@monty ~/.gnupg]$ vi gpg-agent.conf
debug-level expert
verbose
verbose
log-file /home/staf/logs/gpg-agent.log

Verify

Stop the scdaemon.

[staf@monty ~/.gnupg]$ gpgconf --kill scdaemon
[staf@monty ~/.gnupg]$ 

If everything goes well gpg will detect the smartcard.

If not, you have some logging to do some debugging ;-)

[staf@monty ~/.gnupg]$ gpg --card-status
Reader ...........: Gemalto USB Shell Token V2 (<snip>) 00 00
Application ID ...: <snip>
Application type .: OpenPGP
Version ..........: 2.1
Manufacturer .....: ZeitControl
Serial number ....: 000046F1
Name of cardholder: <snip>
Language prefs ...: nl
Salutation .......: Mr.
URL of public key : <snip>
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: xxxxxxx xxxxxxx xxxxxxx
Max. PIN lengths .: 32 32 32
PIN retry counter : 3 0 3
Signature counter : 80
Signature key ....: <snip>
      created ....: <snip>
Encryption key....: <snip>
      created ....: <snip>
Authentication key: <snip>
      created ....: <snip>
General key info..: [none]
[staf@monty ~/.gnupg]$ 

Test

shadow private keys

After you executed gpg --card-status, GnuPG created “shadow private keys”. These keys just contain references on which hardware tokens the private keys are stored.

[staf@monty ~/.gnupg]$ ls -l private-keys-v1.d/
total 14
-rw-------  1 staf staf 976 Mar 24 11:35 <snip>.key
-rw-------  1 staf staf 976 Mar 24 11:35 <snip>.key
-rw-------  1 staf staf 976 Mar 24 11:35 <snip>.key
[staf@monty ~/.gnupg]$ 

You can list the (shadow) private keys with the gpg --list-secret-keys command.

Pinentry

To be able to type in your PIN code, you’ll need a pinentry application unless your smartcard reader has a pinpad.

You can use pkg provides to verify which pinentry applications are available.

For the integration with Thunderbird, you probably want to have a graphical-enabled version. But this is the topic for a next blog post ;-)

We’ll stick with the (n)curses version for now.

Install a pinentry program.

[staf@monty ~/.gnupg]$ pkg provides pinentry | grep -i curses
Name    : pinentry-curses-1.3.1
Comment : Curses version of the GnuPG password dialog
Filename: usr/local/bin/pinentry-curses
[staf@monty ~/.gnupg]$ 
[staf@monty ~/.gnupg]$ sudo pkg install pinentry-curses
Password:
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
[staf@monty ~/.gnupg]$ 

A soft link is created for the pinentry binary. On FreeBSD, the pinentry soft link is managed by the pinentry package.

You can verify this with the pkg which command.

[staf@monty ~]$ pkg which /usr/local/bin/pinentry
/usr/local/bin/pinentry was installed by package pinentry-1.3.1
[staf@monty ~]$ 

The curses version is the default.

If you want to use another pinentry version in the gpg-agent configuration ( $HOME/.gnupg/gpg-agent.conf).

pinentry-program <PATH>

Import your public key

Import your public key.

[staf@monty /tmp]$ gpg --import <snip>.asc
gpg: key <snip>: public key "<snip>" imported
gpg: Total number processed: 1
gpg:               imported: 1
[staf@monty /tmp]$ 

List the public keys.

[staf@monty /tmp]$ gpg --list-keys
/home/staf/.gnupg/pubring.kbx
-----------------------------
pub   XXXXXXX XXXX-XX-XX [SC]
      <snip>
uid           [ unknown] <snip>
sub   XXXXXXX XXXX-XX-XX [A]
sub   XXXXXXX XXXX-XX-XX [E]

[staf@monty /tmp]$ 

As a test, we try to sign something with the private key on our GnuPG smartcard.

Create a test file.

[staf@monty /tmp]$ echo "foobar" > foobar
[staf@monty /tmp]$ 
[staf@monty /tmp]$ gpg --sign foobar

If your smartcard isn’t inserted GnuPG will ask to insert it.

GnuPG asks for the smartcard with the serial in the shadow private key.


                ┌────────────────────────────────────────────┐
                │ Please insert the card with serial number: │
                │                                            │
                │ XXXX XXXXXXXX                              │
                │                                            │
                │                                            │
                │      <OK>                      <Cancel>    │
                └────────────────────────────────────────────┘


Type in your PIN code.



               ┌──────────────────────────────────────────────┐
               │ Please unlock the card                       │
               │                                              │
               │ Number: XXXX XXXXXXXX                        │
               │ Holder: XXXX XXXXXXXXXX                      │
               │ Counter: XX                                  │
               │                                              │
               │ PIN ________________________________________ │
               │                                              │
               │      <OK>                        <Cancel>    │
               └──────────────────────────────────────────────┘


[staf@monty /tmp]$ ls -l foobar*
-rw-r-----  1 staf wheel   7 Jul 27 11:11 foobar
-rw-r-----  1 staf wheel 481 Jul 27 11:17 foobar.gpg
[staf@monty /tmp]$ 

In a next blog post in this series, we’ll configure Thunderbird to use the smartcard for OpenPG email encryption.

Have fun!

Links

July 26, 2024

We saw in the previous post how we can deal with data stored in the new VECTOR datatype that was released with MySQL 9.0. We implemented the 4 basic mathematical operations between two vectors. To do so we created JavaScript functions. MySQL JavaScript functions are available in MySQL HeatWave and MySQL Enterprise Edition (you can […]

July 25, 2024

MySQL 9.0.0 has brought the VECTOR datatype to your favorite Open Source Database. There are already some functions available to deal with those vectors: This post will show how to deal with vectors and create our own functions to create operations between vectors. We will use the MLE Component capability to create JavaScript functions. JS […]

July 23, 2024

Keeping up appearances in tech

Cover Image
The word "rant" is used far too often, and in various ways.
It's meant to imply aimless, angry venting.

But often it means:

Naming problems without proposing solutions,
this makes me feel confused.

Naming problems and assigning blame,
this makes me feel bad.

I saw a remarkable pair of tweets the other day.

In the wake of the outage, the CEO of CrowdStrike sent out a public announcement. It's purely factual. The scope of the problem is identified, the known facts are stated, and the logistics of disaster relief are set in motion.


  CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted.

  This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed.

  We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels.

  Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.

Millions of computers were affected. This is the equivalent of a frazzled official giving a brief statement in the aftermath of an earthquake, directing people to the Red Cross.

Everything is basically on fire for everyone involved. Systems are failing everywhere, some critical, and quite likely people are panicking. The important thing is to give the technicians the information and tools to fix it, and for everyone else to do what they can, and stay out of the way.

In response, a communication professional posted an 'improved' version:


  I’m the CEO of CrowdStrike. I’m devastated to see the scale of today’s outage and will be personally working on it together with our team until it’s fully fixed for every single user.

  But I wanted to take a moment to come here and tell you that I am sorry. People around the world rely on us, and incidents like this can’t happen. This came from an error that ultimately is my responsibility. 

  Here’s what we know: [brief synopsis of what went wrong and how it wasn’t a cyberattack etc.]

  Our entire team will be working all day, all night, all weekend, and however long it takes to resolve this and make sure it doesn’t happen again.

  We’ll be sharing updates as often as possible, which you can find here [link]. If you need to contact us, the quickest way is to go here [link].

  We’re responding as quickly as possible. Thank you to everyone who has alerted us to the outage, and again, please accept my deepest apologies. More to come soon.

Credit where credit is due, she nailed the style. 10/10. It seems unobjectionable, at first. Let's go through, shall we?

Hyacinth fixing her husband's tie

Opposite Day

First is that the CEO is "devastated." A feeling. And they are personally going to ensure it's fixed for every single user.

This focuses on the individual who is inconvenienced. Not the disaster. They take a moment out of their time to say they are so, so sorry a mistake was made. They have let you and everyone else down, and that shouldn't happen. That's their responsibility.

By this point, the original statement had already told everyone the relevant facts. Here the technical details are left to the imagination. The writer's self-assigned job is to wrap the message in a more palatable envelope.

Everyone will be working "all day, all night, all weekends," indeed, "however long it takes," to avoid it happening again.

I imagine this is meant to be inspiring and reassuring. But if I was a CrowdStrike technician or engineer, I would find it demoralizing: the boss, who will actually be personally fixing diddly-squat, is saying that the long hours of others are a sacrifice they're willing to make.

Plus, CrowdStrike's customers are in the same boat: their technicians get volunteered too. They can't magically unbrick PCs from a distance, so "until it's fully fixed for every single user" would be a promise outsiders will have to keep. Lovely.

There's even a punch line: an invitation to go contact them, the quickest way linked directly. It thanks people for reaching out.

If everything is on fire, that includes the phone lines, the inboxes, and so on. The most stupid thing you could do in such a situation is to tell more people to contact you, right away. Don't encourage it! That's why the original statement refers to pre-existing lines of communication, internal representatives, and so on. The Support department would hate the CEO too.

Hyacinth and Richard peering over a fence

Root Cause

If you're wondering about the pictures, it's Hyacinth Bucket, from 90s UK sitcom Keeping Up Appearances, who would always insist "it's pronounced Bouquet."

Hyacinth's ambitions always landed her out of her depth, surrounded by upper-class people she's trying to impress, in the midst of an embarrassing disaster. Her increasingly desperate attempts to save face, which invariably made things worse, are the main source of comedy.

Try reading that second statement in her voice.

I’m devastated to see the scale of today’s outage and will be personally working on it together with our team until it’s fully fixed for every single user.

But I wanted to take a moment to come here and tell you that I am sorry. People around the world rely on us, and incidents like this can’t happen. This came from an error that ultimately is my responsibility.

I can hear it perfectly, telegraphing Britishness to restore dignity for all. If she were in tech she would give that statement.

It's about reputation management first, projecting the image of competence and accountability. But she's giving the speech in front of a burning building, not realizing the entire exercise is futile. Worse, she thinks she's nailing it.

If CrowdStrike had sent this out, some would've applauded and called it an admirable example of wise and empathetic communication. Real leadership qualities.

But it's the exact opposite. It focuses on the wrong things, it alienates the staff, and it definitely amplifies the chaos. It's Monty Python-esque.

Apologizing is pointless here, the damage is already done. What matters is how severe it is and whether it could've been avoided. This requires a detailed root-cause analysis and remedy. Otherwise you only have their word. Why would that re-assure you?

The original restated the company's mission: security and stability. Those are the stakes to regain a modicum of confidence.

You may think that I'm reading too much into this. But I know the exact vibe on an engineering floor when the shit hits the fan. I also know how executives and staff without that experience end up missing the point entirely. I once worked for a Hyacinth Bucket. It's not an anecdote, it's allegory.

They simply don't get the engineering mindset, and confuse authority with ownership. They step on everyone's toes without realizing, because they're constantly wearing clown shoes. Nobody tells them.

Hyacinth is not happy

Softness as a Service

The change in style between #1 and #2 is really a microcosm of the conflict that has been broiling in tech for ~15 years now. I don't mean the politics, but the shifting of norms, of language and behavior.

It's framed as a matter of interpersonal style, which needs to be welcoming and inclusive. In practice this means they assert or demand that style #2 be the norm, even when #1 is advisable or required.

Factuality is seen as deficient, improper and primitive. It's a form of doublethink: everyone's preference is equally valid, except yours, specifically.

But the difference is not a preference. It's about what actually works and what doesn't. Style #1 is aimed at the people who have to fix it. Style #2 is aimed at the people who can't do anything until it's fixed. Who should they be reaching out to?

In #2, communication becomes an end in itself, not a means of conveying information. It's about being seen saying the words, not living them. Poking at the statement makes it fall apart.

When this becomes the norm in a technical field, it has deep consequences:

  • Critique must be gift-wrapped in flattery, and is not allowed to actually land.
  • Mistakes are not corrected, and sentiment takes precedence over effectiveness.
  • Leaders speak lofty words far from the trenches to save face.
  • The people they thank the loudest are the ones they pay the least.

Inevitably, quiet competence is replaced with gaudy chaos. Everyone says they're sorry and responsible, but nobody actually is. Nobody wants to resign either. Sound familiar?

Onslow

Cope and Soothe

The elephant in the room is that #1 is very masculine, while #2 is more feminine. When you hear "women are more empathetic communicators", this is what it means. They tend to focus on the individual and their relation to them, not the team as a whole and its mission.

Complaints that tech is too "male dominated" and "notoriously hostile to women" are often just this. Tech was always full of types who won't preface their proposals and criticisms with fluff, and instead lean into autism. When you're used to being pandered to, neutrality feels like vulgarity.

The notable exceptions are rare and usually have an exasperating lead up. Tech is actually one of the most accepting and egalitarian fields around. The maintainers do a mostly thankless job.

"Oh so you're saying there's no misogyny in tech?" No I'm just saying misogyny doesn't mean "something 1 woman hates".

The tone is really a distraction. If someone drops an analysis, saying shit or get off the pot, even very kindly and patiently, some will still run away screaming. Like an octopus spraying ink, they'll deploy a nasty form of #2 as a distraction. That's the real issue.

Many techies, in their naiveté, believed the cultural reformers when they showed up to gentrify them. They obediently branded heretics like James Damore, and burned witches like Richard Stallman. Thanks to racism, words like 'master' and 'slave' are now off-limits as technical terms. Ironic, because millions of computers just crashed because they worked exactly like that.

Django commit replacing master/slave
Guys, I'm stuck in the we work lift.

The cope is to pretend that nothing has truly changed yet, and more reform is needed. In fact, everything has already changed. Tech forums used to be crucibles for distilling insight, but now they are guarded jealously by people more likely to flag and ban than strongly disagree.

I once got flagged on HN because I pointed out Twitter's mass lay-offs were a response to overhiring, and that people were rooting for the site to fail after Musk bought it. It suggested what we all know now: that the company would not implode after trimming the dead weight, and that they'd never forgive him for it.

Diversity is now associated with incompetence, because incompetent people have spent over a decade reaching for it as an excuse. In their attempts to fight stereotypes, they ensured the stereotypes came true.

Hyacinth is not happy

Bait and Snitch

The outcry tends to be: "We do all the same things you do, but still we get treated differently!" But they start from the conclusion and work their way backwards. This is what the rewritten statement does: it tries to fix the relationship before fixing the problem.

The average woman and man actually do things very differently in the first place. Individual men and women choose. And others respond accordingly. The people who build and maintain the world's infrastructure prefer the masculine style for a reason: it keeps civilization running, and helps restore it when it breaks. A disaster announcement does not need to be relatable, it needs to be effective.

Furthermore, if the job of shoveling shit falls on you, no amount of flattery or oversight will make that more pleasant. It really won't. Such commentary is purely for the benefit of the ones watching and trying to look busy. It makes it worse, stop pretending otherwise.

There's little loyalty in tech companies nowadays, and it's no surprise. Project and product managers are acting more like demanding clients to their own team, than leaders. "As a user, I want..." Yes, but what are you going to do about it? Do you even know where to start?

What's perceived as a lack of sensitivity is actually the presence of sensibility. It's what connects the words to the reality on the ground. It does not need to be improved or corrected, it just needs to be respected. And yes it's a matter of gender, because bashing men and masculine norms has become a jolly recreational sport in the overculture. Mature women know it.

It seems impossible to admit. The entire edifice of gender equality depends on there not being a single thing men are actually better at, even just on average. Where men and women's instincts differ, women must be right.

It's childish, and not harmless either. It dares you to call it out, so they can then play the wounded victim, and paint you as the unreasonable asshole who is mean. This is supposed to invalidate the argument.

* * *

This post is of course a giant cannon pointing in the opposite direction, sitting on top of a wall. Its message will likely fly over the reformers' heads.

If they read it at all, they'll selectively quote or paraphrase, call me a tech-bro, and spool off some sentences they overheard, like an LLM. It's why they adore AI, and want it to be exactly as sycophantic as them. They don't care that it makes stuff up wholesale, because it makes them look and feel competent. It will never tell them to just fuck off already.

Think less about what is said, more about what is being done. Otherwise the next CrowdStrike will probably be worse.

July 10, 2024

MySQL HeatWave 9.0 was released under the banner of artificial intelligence. It includes a VECTOR datatype and can easily process and analyze vast amounts of proprietary unstructured documents in object storage, using HeatWave GenAI and Lakehouse. Oracle Cloud Infrastructure also provides a wonderful GenAI Service, and in this post, we will see how to use […]