Planet Grep

Planet'ing Belgian FLOSS people

Planet Grep is maintained by Wouter Verhelst. All times are in UTC.

June 07, 2023

Last time we tried to connect to a MySQL DB instance in OCI with Cloud Shell, we needed to use the bastion service. See here.

Now, we also have the possibility to bypass the bastion host as Cloud Shell offers the possibility to change network.

As you know, in Oracle Cloud Infrastructure, a MySQL DB instance is not exposed in the public subnet and doesn’t have the possibility to get a public IP.

In the Private Subnet, we often have a security list allowing all internal IPs (from public and private subnet of the VCN) to connect to the MySQL port(s).

If the security list is present, we can click on the Cloud Shell icon and once loaded, change the network:

We need to create a new Private Network Definition:

In case you don’t have the ports open for MySQL in the Security List, you can then use a previously created Network Security Group dedicated to access the MySQL port(s), 3306 and 33060, for the Classic and X protocol.

During the Private Network Definition, we need to use the same VCN and the same Private Subnet used by the MySQL DB Instanc

Once created, we need to wait for the Cloud Shell to be connected:

Once connected, you can use MySQL Shell in the Cloud Shell console to connect to MySQL:

We use the Private IP address of the MySQL DB System… and we are connected !

Cloud Shell on Oracle Cloud Infrastructure (OCI) provides a simplified, user-friendly interface for connecting to MySQL HeatWave Database Service using MySQL Shell, making database management an effortless task. And all via the browser !

La privatisation de nos sens

J’ai déjà glosé ad nauseam sur nos nuques penchées en permanence sur un petit rectangle en plastique, sur notre attention aspirée pour se cantonner à un minuscule écran ne nous montrant que ce que deux ou trois monopoles mondiaux veulent bien nous transmettre.

L’idée, explorée dans Printeurs, que ces monopoles se branchent directement dans nos cerveaux pour les influencer semble encore de la science-fiction.

Pourtant, la capture de nos sens a déjà commencé.

Avez-vous observé le nombre de personnes se baladant avec des écouteurs blancs dans les oreilles et ne les retirant pas pour converser voire même pour passer à la télévision ? Ces personnes vivent dans un environnement en « réalité augmentée ». Ils peuvent entendre un mélange des sons virtuels et des sons réels. Ce mélange étant contrôlé… par les monopoles qui vendent ces écouteurs.

Porter ce genre d’écouteur revient à littéralement vendre sa perception à des entreprises publicitaires (oui, Apple est une entreprise qui vit de la pub, même si c’est essentiellement de la pub pour elle-même). Un jour, vous vous réveillerez avec des publicités dans l’oreille. Ou bien vous ne comprendrez pas un discours, car certaines parties auront été censurées.

Ce n’est pas une potentialité éloignée, c’est l’objectif avoué de ces technologies.

Après l’audition, c’est au tour de la vue d’être attaquée à traves des lunettes de réalité augmentée.

Les publicités pour la nouvelle mouture Apple montrent des gens souriants, portant les lunettes pour participer à des vidéoconférences tout en semblant profiter de la vie. Fait amusant : personne d’autre dans ces conférences factices ne semble porter ce genre de lunettes.

Parce que ce n’est pas encore socialement accepté. Ne vous inquiétez pas, ils y travaillent. Il a fallu 20 ans pour que porter des écouteurs en public passe de psychopathe asocial à adolescent branché. C’est d’ailleurs la raison pour laquelle les lunettes Apple sont si chères : elles deviennent une marque de statut, un objet de luxe. Les premières personnes que vous verrez dans la rue les portant seront celles qui ont de l’argent à dépenser et tiennent à le faire savoir. Ce qui entrainera fatalement la popularisation des modèles meilleur marché.

Dans Tantzor, paru en 1991, Paul-Loup Sulitzer se moquait déjà de cet aspect en racontant la vie d’un entrepreneur russe qui vendait des faux écouteurs verts fluo bon marché aux gens qui ne savaient pas se payer un walkman. Pour pouvoir faire comme tout le monde, pour avoir l’air de posséder un walkman.

Porter un casque audio et visuel dans la rue deviendra un jour ou l’autre une norme acceptable. Ce qui ne serait pas un problème si la technologie n’était pas complètement contrôlée par ces morbides monopoles qui veulent transformer les humains en utilisateurs, en clients passifs.

Ils ont réussi à le faire en grande partie avec Internet. Ils sont désormais en train de s’attaquer à la grande pièce au plafond bleu en privatisant graduellement nos interactions avec le réel : le transport de nos corps à travers les voitures individuelles, les interactions humaines à travers les messageries propriétaires, l’espionnage de nos faits, paroles et gestes jusque dans nos maisons et désormais le contrôle direct de nos sens.

La technologie peut paraitre terrifiante à certains. Mais elle est merveilleuse quand on en est acteur. Elle n’est pas la cause.

Nous avons, à un moment, accepté que la technologie appartenait à une élite éthérée et que nous n’en étions que les utilisateurs. Que les outils pouvaient avoir un propriétaire différent de son utilisateur. Les luddites l’avaient bien compris dans leur chair. Marx en a eu l’intuition. Personne ne les a entendus.

Tant que nous restons soumis aux dictats du marketing, tant que nous acceptons la pression sociale provenant parfois de nos proches, nous sommes condamnés à rester des utilisateurs de la technologie, à devenir des utilisateurs de notre propre corps, de notre propre cerveau.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

June 05, 2023

I already wrote on how to deploy WordPress on OCI using MySQL HeatWave, the MySQL Database Service in Oracle Cloud Infrastructure:

This time we will see the easiest way to deploy WordPress on OCI using an always free tier. We will deploy WordPress and MySQL Community Server 8.0 on an Ampere compute instance.

To deploy just using one click, we will use an OCI Resource Manager Stack (Terraform modules).

Just click on the following button to deploy WordPress on OCI:

Deploy to Oracle Cloud

The sources are available on GitHub.

Let’s see what is happening when we click on the “Deploy to Oracle Cloud” button:

June 04, 2023

La grenouille dans la bouilloire qui voulait que rien ne change

Nous imaginons, rêvons ou frissonnons à l’idée de changements brusques : le fameux « grand soir », les catastrophes naturelles ou politiques… Et nous oublions que les évolutions sont progressives, insidieuses.

L’extrême droite dure néonazie n’est que rarement entrée ouvertement au gouvernement en Europe depuis la Seconde Guerre mondiale. Mais la plupart des gouvernements se veulent aujourd’hui des coalitions « centristes ». Au centre de quoi ? De cette extrême droite dure et de la droite libérale. Bref, ce qui aurait été considéré comme de l’extrême droite il y a une ou deux décennies.

Les médias d’extrême droite ne sont jamais devenus grand public. Mais Twitter, l’un des médias les plus influents du monde, est devenu un pur média d’extrême droite soutenu par tous ceux qui l’alimentent. Les médias nationaux, eux, appartiennent et obéissent dans une écrasante majorité à des milliardaires rarement connus pour être progressistes (on ne devient pas milliardaire sans être complètement psychopathe).

Même les plus écologistes parlent du futur, de la catastrophe qui arrive « si on ne fait rien ». Mais elle n’arrive pas la catastrophe. Nous sommes en plein dedans. La pollution de l’air tue, en Europe, chaque année des centaines de milliers de personnes. Au niveau mondial, si j’en crois des chiffres rapidement moyennés sur le web, la pollution de l’air est l’équivalent de deux pandémies de COVID. Chaque année ! Nos enfants sont asthmatiques. Ils souffrent. Les océans sont remplis de déchets. Nous sommes en plein cœur de la catastrophe. Mais nous l’attendons. C’est d’ailleurs pour ça que le nucléaire fait tellement débat : il nous promet une catastrophe ! Le charbon, lui, nous plonge en plein dedans et tout le monde s’en fout.

Dans le cultissime « Planète à gogos », Pohl et Kornbluth tentaient de nous alerter sur ce pseudolibéralisme débridé qui mène mécaniquement à un contrôle total de la société par quelques monopoles. C’est déjà le cas sur Internet ou la jeune génération ne connait qu’une unique alternative à Méta (Facebook,Instagram,Whatsapp) : Tiktok. Les milliards d’internautes n’ont aucune idée de comment tout cela fonctionne, ils obéissent aveuglément à quelques grandes sociétés. Les militants de tout poil ne connaissent plus qu’une manière de s’organiser : créer une page Facebook ou un groupe Whatsapp. De même pour les quelques petits magasins indépendants qui tentent de survivre à la taxe Visa/Mastercard qui leur est imposée, à la guerre au cash menée par les gouvernements, aux tarifs exorbitants imposés par des fournisseurs monopolistiques. Ils perdent pied et ne voient pas d’autres solutions que de… créer une page Facebook.

Facebook dont les algorithmes sont très similaires à Twitter, Facebook qui a permis l’ascension de Trump au pouvoir et qui est, il ne faut pas se le cacher, d’extrême droite et monopolistique. Par essence.

Toutes ces catastrophes ne sont pas hypothétiques. Elles sont actuelles, sous nos yeux. Elles sont liées. On ne peut pas militer pour le social sur Twitter. On ne peut pas être écologiste sur Facebook. On ne peut pas lutter contre les monopoles en fumant des clopes de chez Philip Morris. On ne traite pas un cancer généralisé en allant voir un spécialiste de l’estomac et en prétendant que les autres organes ne nous intéressent pas.

Mais personne n’est parfait. Nous avons tous nos contradictions. Nous avons tous nos obligations. Nous avons le droit d’être imparfaits. Nous ne pouvons pas être spécialistes en tout.

L’important pour moi est d’en être conscient. De ne pas nous autojustifier dans nos comportements morbides. Soyons responsables de nos actions, soyons honnêtes avec nous-mêmes. On a le droit de craquer (moi c’est le chocolat !). Mais on n’a pas le droit de prétendre qu’un craquage est « sain ». On a le droit d’avoir un compte Whatsapp. On n’a pas le droit de prendre pour acquis que tout le monde en a un.

Chaque année, je dis à mes étudiants qui vont sortir de polytechnique (donc avec d’excellentes perspectives d’emploi) : « Si vous qui n’avez aucun souci à vous faire pour trouver de l’emploi ne faites pas des choix moraux forts, qui les fera ? N’acceptez pas de contrevenir à votre propre morale ! ».

Et puis je me replonge dans les différentes révolutions historiques. Et je réalise que les changements viennent rarement de ceux qui avaient le choix, de ceux qui pouvaient se permettre. Ceux-là étaient, le plus souvent, corrompus par le système. Le changement vient de ceux qui n’ont pas le choix et le prennent quand même. De ceux qui risquent tout. Et le perdent.

Je réalise que je suis moi-même enfoncé dans un petit confort bourgeois. Que je protège égoïstement ma petite famille et mon petit confort. Qu’à part théoriser et gloser sur mon blog, ce qui me plait et valorise mon ego, je ne fais rien. Je sais même pas quoi faire.

Ça y’est, j’ai passé le cap. Nous sommes au milieu d’une catastrophe et j’ai tout intérêt à ce que rien ne change.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

June 02, 2023

Considérations sur le talent, le génie, le travail et un jeu vidéo que je vous recommande

À Épinal, j’ai eu la grande chance d’échanger avec Denis Bajram, auteur de la BD culte Universal War 1. La conversation s’est très vite portée sur la notion de génie, un sujet sur lequel je méditais justement depuis longtemps.

Dans ma vision personnelle, le talent n’est finalement qu’une facilité, un état de départ. Prenez deux individus sans la moindre expérience et demandez-leur de chanter, dessiner, courir, jongler avec un ballon ou n’importe quoi. Il y’a de grandes chances que l’un soit plus doué que l’autre. Bajram me confiait qu’il était le meilleur en dessin de son lycée. Lorsqu’on a du talent, tout semble facile. Bernard Werber a d’ailleurs dit « Écrire, c’est facile, tout le monde peut le faire » avant qu’Henri Lœvenbruck ne le reprenne « C’est facile pour toi. Pour l’infime minorité de génies. Pour les autres, c’est du travail, beaucoup de travail ». Hemingway ne disait-il pas que « Écrire c’est s’asseoir devant sa machine et saigner » ?

Cependant, le talent n’est que la base et vient ensuite le travail, l’entrainement. Le jeune, aussi talentueux soit-il, sort de son microcosme et se voit soudain confronté aux meilleurs de son pays voire, grâce à Internet, de la planète. Il se rend compte qu’il n’est pas aussi talentueux que cela. Il doit travailler, s’améliorer. Souvent, il abandonne.

Au plus on travaille, au plus on acquiert de l’expérience et de la capacité à comprendre ce que l’on fait. À percevoir les défauts de ses propres réalisations. On comprend pourquoi certaines œuvres sont bien meilleures que ce que l’on fait. On en arrive même à un point où on comprend intellectuellement ce qui est nécessaire pour arriver à un résultat extraordinaire. Sans toujours être capable de le mettre réellement en pratique.

À titre personnel, j’ai énormément travaillé la structure du récit, la narration. L’histoire d’Universal War 1 est extraordinaire, prenante et complexe. Je ne sais pas si je pourrai un jour égaler ce niveau. Mais je comprends intellectuellement le processus mis en œuvre par Bajram pour y arriver. Je vois comment il s’y prend, comment il utilise son talent et sa capacité de travail. Je pourrais dire la même chose de celui qui est, à mes yeux, le meilleur scénariste de bande dessinée de sa génération : Alain Ayrolles, auteur de l’incroyable « De Capes et de Crocs ». Si la série est l’une de meilleures qui soit, je crois que je comprends les processus créatifs à l’œuvre. Et si je « comprends » UW1 et De capes et de Crocs, j’en reste néamoins muet d’admiration et les relis régulièrement.

Mais, parfois, arrive un génie. Contrairement au talent, le génie est incompréhensible. Le génie sort de toutes les normes, de toutes les cases. Même les meilleurs experts doivent avouer « Je ne sais pas comment il a fait ». En bande dessinée, c’est par exemple un Marc-Antoine Mathieu. Sa série « Julius Corentin Acquefaques, prisonnier des rêves » relève du pur génie. J’ai beau les lire te relire, je ne vois pas comment on peut produire ce genre de livres complètement hors-normes. Je rends d’ailleurs hommage à cette série dans ma nouvelle « Le Festival », cachée dans mon recueil « Stagiaire au spatioport Omega 3000 ».

Face à un génie, même les plus grands talents doutent. Dans l’extraordinaire film « Amadeus », de Milos Forman, le musicien Salieri, pourtant un des meilleurs de son époque, se retrouve confronté à Mozart, l’adore, le jalouse, l’admire et le déteste à la fois. C’est en y faisant référence que Bajram m’a parlé de ce qu’il appelle le syndrome « Salieri », cette confrontation au génie qui fait douter même les plus talentueux.

Ce doute de l’artiste, ce syndrome est intéressant, car, sur son blog, Bajram confie être déçu par les séances de signatures où les fans font la file sans même lui parler. Fans qui, pour certains, vont même jusqu’à se plaindre sur Facebook.

Les artistes sont des éponges émotionnelles et pour une critique négative sur Facebook ou Twitter, combien de fans intimidés qui n’ont même pas osé adresser la parole à leur idole ? D’ailleurs, si j’ai moi-même franchi ce pas, c’est parce que je m’étais préparé mentalement depuis une semaine : « si tu vois Bajram et/ou Mangin, tu vas vers eux et tu leur offres un livre ». En lisant le post de Bajram, j’ai envie de lui dire : « Ce ne sont pas les séances de signatures qu’il faut arrêter, c’est Facebook ! »

Régulièrement, des artistes, parfois très connus, parlent de mettre leur carrière en pause à cause du harcèlement continu qu’ils subissent en ligne. Mais ce n’est pas l’art ni la notoriété le problème, c’est bel et bien les plateformes qui exploitent les failles de la psyché humaine et nous font ressortir le négatif. Même sur Mastodon, je le vis assez régulièrement : un simple commentaire négatif peut me faire douter, voire m’énerver durant plusieurs heures (solution: allez relire les critiques positives sur Babelio ou sur les blogs, ça fait du bien, merci à ceux qui les postent !)

De plus en plus de professionnels se coupent des réseaux sociaux. C’est par exemple le cas du cycliste Remco Evenepoel que le staff isole totalement des réseaux sociaux pour être sûr qu’il soit concentré et moralement au top lors des courses.

Le talent et le jeu de Gee

Pourquoi vous parler de talent, de travail et de génie ? Parce que c’est justement une réflexion qui murit en moi depuis que j’ai joué à Superflu Riteurnz, le jeu de Gee.

Je suis Gee depuis qu’il a commencé à poster sur Framasoft. Et un truc qui m’a marqué depuis le début, c’est qu’il n’a pas un grand talent pour le dessin. Yep, je sais, ce n’est pas sympa. Mais faisant moi-même des crobards de temps à autre, je pense avoir au moins autant de talent que lui. Il me fait bien marrer Gee, il a un humour bien à lui, mais ce n’est pas un grand dessinateur.

Y’a juste une petite subtilité. C’est que lui il travaille. Il persévère. Il a créé un univers avec son dessin assez simpliste. Il a même auto-publié une BD de Superflu.

Et, soyons honnêtes, si la BD est sympathique, voire amusante, elle n’est pas transcendante.

Sauf que Gee ne s’est pas arrêté en chemin. Il a sorti le jeu. Qui est la suite de la BD, mais vous pouvez jouer sans avoir lu la BD.

Et là, l’incroyable travail de Gee m’a sauté aux yeux. L’univers Superflu s’est affiné. S’est enrichi du talent informatique de l’auteur. Les décors du jeu, les animations comme le vent dans les arbres où dans les cheveux m’ont bluffé. J’ai plongé avec Miniploumette (11 ans) et Miniploum (6ans). Ils ont adoré.

Je suis un énorme fan des point-n-click. Le premier jeu vidéo auquel je forme mes enfants est Monkey Island, mon jeu fétiche. De temps en temps, je réessaye un vieux jeu (je suis d’ailleurs bloqué depuis des mois dans Sherlock Holmes : Case of the Rose Tatoo, malgré toutes les soluces que j’ai pu lire en ligne, rien n’y fait). Superflu Riteurnz n’est pas seulement un hommage, c’est une véritable version moderne du principe. La jouabilité est excellente. Il y’a très peu de redondances ou de longueurs.

Le jeu innove également avec une mécanique très appréciable : la hotline pour obtenir des indices. Plutôt que d’aller chercher sur le web des soluces, le jeu vous les apporte sur un plateau. Est-ce de la triche ? Spontanément, mes enfants ne veulent pas utiliser la hotline sauf quand ça commence à les gonfler. Il n’y a pas de score, pas d’enjeu et pourtant ça fonctionne. Des enquêtes dans les bars crapuleux de Fochougny aux hauteurs vertigineuses du château d’eau en passant par les courses poursuites infernales. En tracteur.

Le seul reproche ? C’est trop court. Après l’avoir terminé, on veut une extension, une nouvelle aventure.

Mon conseil : si vous pouvez vous le permettre financièrement, achetez la BD et le jeu. Les deux sont complémentaires. Si la BD ne vous intéresse pas, pas de soucis, je l’ai lue après le jeu et le jeu fonctionne très bien sans.

Ce jeu démontre qu’avec un travail de fou au dessin (les décors du jeu sont vraiment superbes), à la programmation (et là, je m’y connais) voire à la musique, Gee produit une œuvre multifacette particulièrement intéressante, ludique, drôle, divertissante et intergénérationnelle. Politique et critique, aussi. Le final m’a ôté mes dernières hésitations. Le résultat est sans appel : le travail paie ! (du moins si vous achetez le jeu)

Peut-être qu’après toutes ces superproductions hollywoodiennes, les aventures de Superflu à Fochougny (dont la maire m’a fait éclater de rire) sont un retour bienvenu au confort de la proximité, du local. Peut-être qu’après toutes ces années à suivre le blog de Gee sans être fan de ses dessins, l’univers de Superflu, dont je trouvais le concept moyennement amusant, s’est enfin mis en place pour moi et sans doute pour beaucoup d’autres.

Allez à Fochougny, le voyage vaut le déplacement !

Et souvenez-vous que des débutants au plus grands artistes que vous admirez, tout le monde doute. Qu’un petit encouragement, un message sympa, un serrage de main, une poignée d’étoiles sur votre site de recommandation préféré sont le carburant qui produira le prochain livre, le prochain jeu, le prochain court-métrage ou la prochaine musique qui vous accompagnera dans un petit bout de vie. Ou qui vous inspirera.

Bonne découvertes, bonne créations !

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

May 24, 2023

How the Live effect run-time is implemented

Cover Image - Live effect run-time inspector

In this post I describe how the Live run-time internals are implemented, which drive Use.GPU. Some pre-existing React and FP effect knowledge is useful.

I have written about Live before, but in general terms. You may therefor have a wrong impression of this endeavor.

When a junior engineer sees an application doing complex things, they're often intimidated by the prospect of working on it. They assume that complex functionality must be the result of complexity in code. The fancier the app, the less understandable it must be. This is what their experience has been so far: seniority correlates to more and hairier lines of code.

After 30 years of coding though, I know it's actually the inverse. You cannot get complex functionality working if you've wasted all your complexity points on the code itself. This is the main thing I want to show here, because this post mainly describes 1 data structure and a handful of methods.

Live has a real-time inspector, so a lot of this can be demonstrated live. Reading this on a phone is not recommended, the inspector is made for grown-ups.

Live run-time debug inspector

The story so far:

The main mechanism of Live is to allow a tree to expand recursively like in React, doing breadth-first expansion. This happens incrementally, and in a reactive, rewindable way. You use this to let interactive programs knit themselves together at run-time, based on the input data.

Like a simple CLI program with a main() function, the code runs top to bottom, and then stops. It produces a finite execution trace that you can inspect. To become interactive and to animate, the run-time will selectively rewind, and re-run only parts, in response to external events. It's a fusion of immediate and retained mode UI, offering the benefits of both and the downsides of neither, not limited to UI.

This relies heavily on FP principles such as pure functions, immutable data structures and so on. But the run-time itself is very mutable: the whole idea is to centralize all the difficult parts of tracking changes in one place, and then forget about them.

Live has no dependencies other than a JavaScript engine and these days consists of ~3600 lines.

If you're still not quite sure what the Live component tree actually is, it's 3 things at once:

  • a data dependency graph
  • an execution trace
  • a tree-shaped cache

The properties of the software emerge because these aspects are fully aligned inside a LiveComponent.

Functionally Mayonnaise

You can approach this from two sides, either from the UI side, or from the side of functional Effects.

Live Components

A LiveComponent (LC) is a React UI function component (FC) with 1 letter changed, at first:

const MyComponent: LC<MyProps> = (props: MyProps) => {
  const {wat} = props;

  // A memo hook
  // Takes dependencies as array
  const slow = useMemo(() => expensiveComputation(wat), [wat]);

  // Some local state
  const [state, setState] = useState(1);
  
  // JSX expressions with props and children
  // These are all names of LC functions to call + their props/arguments
  return (
    <OtherComponent>
      <Foo value={slow} />
      <Bar count={state} setCount={setState} />
    </OtherComponent>
  );
};

The data is immutable, and the rendering appears stateless: it returns a pure data structure for given input props and current state. The component uses hooks to access and manipulate its own state. The run-time will unwrap the outer layer of the <JSX> onion, mount and reconcile it, and then recurse.

let _ = await (
  <OtherComponent>
    <Foo foo={foo} />
    <Bar />
  </OtherComponent>
);
return null;

The code is actually misleading though. Both in Live and React, the return keyword here is technically wrong. Return implies passing a value back to a parent, but this is not happening at all. A parent component decided to render <MyComponent>, yes. But the function itself is being called by Live/React. it's yielding JSX to the Live/React run-time to make a call to OtherComponent(...). There is no actual return value.

Because a <Component> can't return a value to its parent, the received _ will always be null too. The data flow is one-way, from parent to child.

Effects

An Effect is basically just a Promise/Future as a pure value. To first approximation, it's a () => Promise: a promise that doesn't actually start unless you call it a second time. Just like a JSX tag is like a React/Live component waiting to be called. An Effect resolves asynchronously to a new Effect, just like <JSX> will render more <JSX>. Unlike a Promise, an Effect is re-usable: you can fire it as many times as you like. Just like you can keep rendering the same <JSX>.

let value = yield (
  OtherEffect([
    Foo(foo),
    Bar(),
  ])
);
// ...
return value;

So React is like an incomplete functional Effect system. Just replace the word Component with Effect. OtherEffect is then some kind of decorator which describes a parallel dispatch to Effects Foo and Bar. A real Effect system will fork, but then join back, gathering the returned values, like a real return statement.

Unlike React components, Effects are ephemeral: no state is retained after they finish. The purity is actually what makes them appealing in production, to manage complex async flows. They're also not incremental/rewindable: they always run from start to finish.

  Pure Returns State Incremental
React
Effects

You either take an effect system and make it incremental and stateful, or you take React and add the missing return data path

I chose the latter option. First, because hooks are an excellent emulsifier. Second, because the big lesson from React is that plain, old, indexed arrays are kryptonite for incremental code. Unless you've deliberately learned how to avoid them, you won't get far, so it's better to start from that side.

This breakdown is divided into three main parts:

  • the rendering of 1 component
  • the environment around a component
  • the overall tree update loop

Components

The component model revolves around a few core concepts:

  • Fibers
  • Hooks and State
  • Mounts
  • Memoization
  • Inlining

Components form the "user land" of Live. You can do everything you need there without ever calling directly into the run-time's "kernel".

Live however does not shield its internals. This is fine, because I don't employ hundreds of junior engineers who would gleefully turn that privilege into a cluster bomb of spaghetti. The run-time is not extensible anyhow: what you see is what you get. The escape hatch is there to support testing and debugging.

Shielding this would be a game of hide-the-reference, creating a shadow-API for privileged friend packages, and so on. Ain't nobody got time for that.

React has an export called DONT_USE_THIS_OR_YOU_WILL_BE_FIRED, Live has THIS_WAY___IF_YOU_DARE and it's called useFiber.

Fibers

Borrowing React terminology, a mounted Component function is called a fiber, despite this being single threaded.

Each persists for the component lifetime. To start, you call render(<App />). This creates and renders the first fiber.

type LiveFiber = {
  // Fiber ID
  id: number,

  // Component function
  f: Function,

  // Arguments (props, etc.)
  args: any[],

  // ...
}

Fibers are numbered with increasing IDs. In JS this means you can create 253 fibers before it crashes, which ought to be enough for anybody.

It holds the component function f and its latest arguments args. Unlike React, Live functions aren't limited to only a single props argument.

Each fiber is rendered from a <JSX> tag, which is a plain data structure. The Live version is very simple.

type Key = number | string;
type JSX.Element = {
  // Same as fiber
  f: Function,
  args: any[],

  // Element key={...}
  key?: string | number,

  // Rendered by fiber ID
  by: number,
}

Another name for this type is a DeferredCall. This is much leaner than React's JSX type, although Live will gracefully accept either. In Live, JSX syntax is also optional, as you can write use(Component, …) instead of <Component … />.

Calls and fibers track the ID by of the fiber that rendered them. This is always an ancestor, but not necessarily the direct parent.

fiber.bound = () => {
  enterFiber(fiber);

  const {f, args} = fiber;
  const jsx = f(...args);

  exitFiber(fiber);

  return jsx;
};

The fiber holds a function bound. This binds f to the fiber itself, always using the current fiber.args as arguments. It wraps the call in an enter and exit function for state housekeeping.

This can then be called via renderFiber(fiber) to get jsx. This is only done during an ongoing render cycle.

{
  // ...

  state: any[],
  pointer: number,
}

Hooks and State

Each fiber holds a local state array and a temporary pointer:

Calling a hook like useState taps into this state without an explicit reference to it.

In Live, this is implemented as a global currentFiber variable, combined with a local fiber.pointer starting at 0. Both are initialized by enterFiber(fiber).

The state array holds flattened triplets, one per hook. They're arranged as [hookType, A, B]. Values A and B are hook-specific, but usually hold a value and a dependencies array. In the case useState, it's just the [value, setValue] pair.

The fiber.pointer advances by 3 slots every time a hook is called. Tracking the hookType allows the run-time to warn you if you call hooks in a different order than before.

The basic React hooks don't need any more state than this and can be implemented in ~20 lines of code each. This is useMemo:

export const useMemo = <T>(
  callback: () => T,
  dependencies: any[] = NO_DEPS,
): T => {
  const fiber = useFiber();

  const i = pushState(fiber, Hook.MEMO);
  let {state} = fiber;

  let value = state![i];
  const deps = state![i + 1];

  if (!isSameDependencies(deps, dependencies)) {
    value = callback();

    state![i] = value;
    state![i + 1] = dependencies;
  }

  return value as unknown as T;
}

useFiber just returns currentFiber and doesn't count as a real hook (it has no state). It only ensures you cannot call a hook outside of a component render.

export const useNoHook = (hookType: Hook) => () => {
  const fiber = useFiber();

  const i = pushState(fiber, hookType);
  const {state} = fiber;

  state![i] = undefined;
  state![i + 1] = undefined;
};

No-hooks like useNoMemo are also implemented, which allow for conditional hooks: write a matching else branch for any if. To ensure consistent rendering, a useNoHook will dispose of any state the useHook had, rather than just being a no-op. The above is just the basic version for simple hooks without cleanup.

This also lets the run-time support early return cleanly in Components: when exitFiber(fiber) is called, all remaining unconsumed state is disposed of with the right no-hook.

If someone calls a setState, this is added to a dispatch queue, so changes can be batched together. If f calls setState during its own render, this is caught and resolved within the same render cycle, by calling f again. A setState which is a no-op is dropped (pointer equality).

You can see however that Live hooks are not pure: when a useMemo is tripped, it will immediately overwrite the previous state during the render, not after. This means renders in Live are not stateless, only idempotent.

This is very deliberate. Live doesn't have a useEffect hook, it has a useResource hook that is like a useMemo with a useEffect-like disposal callback. While it seems to throw React's orchestration properties out the window, this is not actually so. What you get in return is an enormous increase in developer ergonomics, offering features React users are still dreaming of, running off 1 state array and 1 pointer.

Live is React with the training wheels off, not with holodeck security protocols disabled, but this takes a while to grok.

Reginald Barclay

Mounts

After rendering, the returned/yielded <JSX> value is reconciled with the previous rendered result. This is done by updateFiber(fiber, value).

New children are mounted, while old children are unmounted or have their args replaced. Only children with the same f as before can be updated in place.

{
  // ...
  
  // Static mount
  mount?: LiveFiber,

  // Dynamic mounts
  mounts?: Map<Key, LiveFiber>,
  lookup?: Map<Key, number>,
  order?: Key[],

  // Continuation
  next?: LiveFiber,

  // Fiber type
  type?: LiveComponent,

  // ...
}

Mounts are tracked inside the fiber, either as a single mount, or a map mounts, pointing to other fiber objects.

The key for mounts is either an array index 0..N or a user-defined key. Keys must be unique.

The order of the keys is kept in a list. A reverse lookup map is created if they're not anonymous indices.

The mount is only used when a component renders 1 other statically. This excludes arrays of length 1. If a component switches between mount and mounts, all existing mounts are discarded.

Continuations are implemented as a special next mount. This is mounted by one of the built-in fenced operators such as <Capture> or <Gather>.

In the code, mounting is done via:

  • mountFiberCall(fiber, call) (static)
  • reconcileFiberCalls(fiber, calls) (dynamic)
  • mountFiberContinuation(fiber, call) (next).

Each will call updateMount(fiber, mount, jsx, key?, hasKeys?).

If an existing mount (with the same key) is compatible it's updated, otherwise a replacement fiber is made with makeSubFiber(…). It doesn't update the parent fiber, rather it just returns the new state of the mount (LiveFiber | null), so it can work for all 3 mounting types. Once a fiber mount has been updated, it's queued to be rendered with flushMount.

If updateMount returns false, the update was a no-op because fiber arguments were identical (pointer equality). The update will be skipped and the mount not flushed. This follows the same implicit memoization rule that React has. It tends to trigger when a stateful component re-renders an old props.children.

A subtle point here is that fibers have no links/pointers pointing back to their parent. This is part practical, part ideological. It's practical because it cuts down on cyclic references to complicate garbage collection. It's ideological because it helps ensures one-way data flow.

There is also no global collection of fibers, except in the inspector. Like in an effect system, the job of determining what happens is entirely the result of an ongoing computation on JSX, i.e. something passed around like pure, immutable data. The tree determines its own shape as it's evaluated.

Queue and Path

Live needs to process fibers in tree order, i.e. as in a typical tree list view. To do so, fibers are compared as values with compareFibers(a, b). This is based on references that are assigned only at fiber creation.

It has a path from the root of the tree to the fiber (at depth depth), containing the indices or keys.

{
  // ...

  depth: number,
  path: Key[],
  keys: (
    number |
    Map<Key, number>
  )[],
}

A continuation next is ordered after the mount or mounts. This allows data fences to work naturally: the run-time only ensures all preceding fibers have been run first. For this, I insert an extra index into the path, 0 or 1, to distinguish the two sub-trees.

If many fibers have a static mount (i.e. always 1 child), this would create paths with lots of useless zeroes. To avoid this, a single mount has the same path as its parent, only its depth is increased. Paths can still be compared element-wise, with depth as the tie breaker. This easily reduces typical path length by 70%.

This is enough for children without keys, which are spawned statically. Their order in the tree never changes after creation, they can only be updated in-place or unmounted.

But for children with a key, the expectation is that they persist even if their order changes. Their keys are just unsorted ids, and their order is stored in the fiber.order and fiber.lookup of the parent in question.

This is referenced in the fiber.keys array. It's a flattened list of pairs [i, fiber.lookup], meaning the key at index i in the path should be compared using fiber.lookup. To keep these keys references intact, fiber.lookup is mutable and always modified in-place when reconciling.

Memoization

If a Component function is wrapped in memo(...), it won't be re-rendered if its individual props haven't changed (pointer equality). This goes deeper than the run-time's own oldArgs !== newArgs check.

{
  // ...

  version?: number,
  memo?: number,

  runs?: number,
}

For this, memoized fibers keep a version around. They also store a memo which holds the last rendered version, and a run count runs for debugging:

The version is used as one of the memo dependencies, along with the names and values of the props. Hence a memo(...) cache can be busted just by incrementing fiber.version, even if the props didn't change. Versions roll over at 32-bit.

To actually do the memoization, it would be nice if you could just wrap the whole component in useMemo. It doesn't work in the React model because you can't call other hooks inside hooks. So I've brought back the mythical useYolo... An earlier incarnation of this allowed fiber.state scopes to be nested, but lacked a good purpose. The new useYolo is instead a useMemo you can nest. It effectively hot swaps the entire fiber.state array with a new one kept in one of the slots:

Indiana jones swapping golden idol

This is then the first hook inside fiber.state. If the memo succeeds, the yolo'd state is preserved without treating it as an early return. Otherwise the component runs normally. Yolo'ing as the first hook has a dedicated fast path but is otherwise a perfectly normal hook.

The purpose of fiber.memo is so the run-time can tell whether it rendered the same thing as before, and stop. It can just compare the two versions, leaving the specifics of memoization entirely up to the fiber component itself. For example, to handle a custom arePropsEqual function in memo(…).

I always use version numbers as opposed to isDirty flags, because it leaves a paper trail. This provides the same ergonomics for mutable data as for immutable data: you can store a reference to a previous value, and do an O(1) equality check to know whether it changed since you last accessed it.

Whenever you have a handle which you can't look inside, such as a pointer to GPU memory, it's especially useful to keep a version number on it, which you bump every time you write to it. It makes debugging so much easier.

Inlining

Built-in operators are resolved with a hand-coded routine post-render, rather than being "normal" components. Their component functions are just empty and there is a big dispatch with if statements. Each is tagged with a isLiveBuiltin: true.

If a built-in operator is an only child, it's usually resolved inline. No new mount is created, it's immediately applied as part of updating the parent fiber. The glue in between tends to be "kernel land"-style code anyway, it doesn't need a whole new fiber, and it's not implemented in terms of hooks. The only fiber state it has is the type (i.e. function) of the last rendered JSX.

There are several cases where it cannot inline, such as rendering one built-in inside another built-in, or rendering a built-in as part of an array. So each built-in can always be mounted independently if needed.

From an architectural point of view, inlining is just incidental complexity, but this significantly reduces fiber overhead and keeps the user-facing component tree much tidier. It introduces a few minor problems around cleanup, but those are caught and handled.

Live also has a morph operator. This lets you replace a mount with another component, without discarding any matching children or their state. The mount's own state is still discarded, but its f, args, bound and type are modified in-place. A normal render follows, which will reconcile the children.

This is implemented in morphFiberCall. It only works for plain vanilla components, not other built-ins. The reason to re-use the fiber rather than transplant the children is so that references in children remain unchanged, without having to rekey them.

In Live, I never do a full recursive traversal of any sub-tree, unless that traversal is incremental and memoized. This is a core property of the system. Deep recursion should happen in user-land.

Environment

Fibers have access to a shared environment, provided by their parent. This is created in user-land through built-in ops and accessed via hooks.

  • Context and captures
  • Gathers and yeets
  • Fences and suspend
  • Quotes and reconcilers
  • Unquote + quote

Context and captures

Live extends the classic React context:

{
  // ...

  context: {
    values: Map<LiveContext | LiveCapture, Ref<any>>,
    roots: Map<LiveContext | LiveCapture, number | LiveFiber>,
  },
}

A LiveContext provides 1 value to N fibers. A LiveCapture collects N values into 1 fiber. Each is just an object created in user-land with makeContext / makeCapture, acting as a unique key. It can also hold a default value for a context.

The values map holds the current value of each context/capture. This is boxed inside a Ref as {current: value} so that nested sub-environments share values for inherited contexts.

The roots map points to the root fibers providing or capturing. This is used to allow useContext and useCapture to set up the right data dependency just-in-time. For a context, this points upstream in the tree, so to avoid a reverse reference, it's a number. For a capture, this points to a downstream continuation, i.e. the next of an ancestor, and can be a LiveFiber.

Normally children just share their parent's context. It's only when you <Provide> or <Capture> that Live builds a new, immutable copy of values and roots with a new context/capture added. Each context and capture persists for the lifetime of its sub-tree.

Captures build up a map incrementally inside the Ref while children are rendered, keyed by fiber. This is received in tree order after sorting:

<Capture
  context={...}
  children={...}
  then={(values: T[]) => {
    ...
  }}
/>

You can also just write capture(context, children, then), FYI.

This is an await or yield in disguise, where the then closure is spiritually part of the originating component function. Therefor it doesn't need to be memoized. The state of the next fiber is preserved even if you pass a new function instance every time.

Unlike React-style render props, then props can use hooks, because they run on an independent next fiber called Resume(…). This fiber will be re-run when values changes, and can do so without re-running Capture itself.

A then prop can render new elements, building a chain of next fibers. This acts like a rewindable generator, where each Resume provides a place where the code can be re-entered, without having to explicitly rewind any state. This requires the data passed into each closure to be immutable.

The logic for providing or capturing is in provideFiber(fiber, ...) and captureFiber(fiber, ...). Unlike other built-ins, these are always mounted separately and are called at the start of a new fiber, not the end of previous one. Their children are then immediately reconciled by inlineFiberCall(fiber, calls).

Gathers and yeets

Live offers a true return, in the form of yeet(value) (aka <Yeet>{value}</Yeet>). This passes a value back to a parent.

These values are gathered in an incremental map-reduce along the tree, to a root that mounted a gathering operation. It's similar to a Capture, except it visits every parent along the way. It's the complement to tree expansion during rendering.

This works for any mapper and reducer function via <MapReduce>. There is also an optimized code path for a simple array flatMap <Gather>, as well as struct-of-arrays flatMap <MultiGather>. It works just like a capture:

<Gather
  children={...}
  then={(
    value: T[]
  ) => {
    ...
  }}
/>
<MultiGather
  children={...}
  then={(
    value: Record<string, T[]>
  ) => {
    ...
  }}
/>

Each fiber in a reduction has a fiber.yeeted structure, created at mount time. Like a context, this relation never changes for the lifetime of the component.

It acts as a persistent cache for a yeeted value of type A and its map-reduction reduced of type B:

{
  yeeted: {
    // Same as fiber (for inspecting)
    id: number,

    // Reduction cache at this fiber
    value?: A,
    reduced?: B,

    // Parent yeet cache
    parent?: FiberYeet<A, B>,

    // Reduction root
    root: LiveFiber,

    // ...
  },
}

The last value yeeted by the fiber is kept so that all yeets are auto-memoized.

Each yeeted points to a parent. This is not the parent fiber but its fiber.yeeted. This is the parent reduction, which is downstream in terms of data dependency, not upstream. This forms a mirrored copy of the fiber tree and respects one-way data flow:

yeet reduce

Again the linked root fiber (sink) is not an ancestor, but the next of an ancestor, created to receive the final reduced value.

If the reduced value is undefined, this signifies an empty cache. When a value is yeeted, parent caches are busted recursively towards the root, until an undefined is encountered. If a fiber mounts or unmounts children, it busts its reduction as well.

chain of fibers in the forwards direction turns down and back to yield values in the backwards direction

Fibers that yeet a value cannot also have children. This isn't a limitation because you can render a yeet beside other children, as just another mount, without changing the semantics. You can also render multiple yeets, but it's faster to just yeet a single list.

If you yeet undefined, this acts as a zero-cost signal: it does not affect the reduced values, but it will cause the reducing root fiber to be re-invoked. This is a tiny concession to imperative semantics, wildly useful.

This may seem very impure, but actually it's the opposite. With clean, functional data types, there is usually a "no-op" value that you could yeet: an empty array or dictionary, an empty function, and so on. You can always force-refresh a reduction without meaningfully changing the output, but it causes a lot of pointless cache invalidation in the process. Zero-cost signals are just an optimization.

When reducing a fiber that has a gathering next, it takes precedence over the fiber's own reduction: this is so that you can gather and reyeet in series, with the final reduction returned.

Fences and suspend

The specifics of a gathering operation are hidden behind a persistent emit and gather callback, derived from a classic map and reduce:

{
  yeeted: {
    // ...

    // Emit a value A yeeted from fiber
    emit: (fiber: LiveFiber, value: A) => void,

    // Gather a reduction B from fiber
    gather: (fiber: LiveFiber, self: boolean) => B,

    // Enclosing yeet scope
    scope?: FiberYeet<any, any>,
  },
}

Gathering is done by the root reduction fiber, so gather is not strictly needed here. It's only exposed so you can mount a <Fence> inside an existing reduction, without knowing its specifics. A fence will grab the intermediate reduction value at that point in the tree and pass it to user-land. It can then be reyeeted.

One such use is to mimic React Suspense using a special toxic SUSPEND symbol. It acts like a NaN, poisoning any reduction it's a part of. You can then fence off a sub-tree to contain the spill and substitute it with its previous value or a fallback.

In practice, gather will delegate to one of gatherFiberValues, multiGatherFiberValues or mapReduceFiberValues. Each will traverse the sub-tree, reuse any existing reduced values (stopping the recursion early), and fill in any undefineds via recursion. Their code is kinda gnarly, given that it's just map-reduce, but that's because they're hand-rolled to avoid useless allocations.

The self argument to gather is such an optimization, only true for the final user-visible reduction. This lets intermediate reductions be type unsafe, e.g. to avoid creating pointless 1 element arrays.

At a gathering root, the enclosing yeet scope is also kept. This is to cleanly unmount an inlined gather, by restoring the parent's yeeted.

Quotes and reconcilers

Live has a reconciler in reconcileFiberCalls, but it can also mount <Reconciler> as an effect via mountFiberReconciler.

This is best understood by pretending this is React DOM. When you render a React tree which mixes <Components> with <html>, React reconciles it, and extracts the HTML parts into a new tree:

<App>                    <div>
  <Layout>        =>       <div>
    <div>                    <span>
      <div>                    <img>
        <Header>
          <span>
            <Logo>
              <img>

Each HTML element is implicitly quoted inside React. They're only "activated" when they become real on the right. The ones on the left are only stand-ins.

That's also what a Live <Reconcile> does. It mounts a normal tree of children, but it simultaneously mounts an independent second tree, under its next mount.

If you render this:

<App>
  <Reconcile>
    <Layout>
      <Quote>
        <Div>
          <Div>
            <Unquote>
              <Header>
                <Quote>
                  <Span>
                    <Unquote>
                      <Logo>
                        <Quote>
                          <Img />
                   ...

You will get:

It adds a quote environment to the fiber:

{
  // ...
  quote: {
    // Reconciler fiber
    root: number,

    // Fiber in current sub-tree
    from: number,

    // Fiber in other sub-tree
    to: LiveFiber,

    // Enclosing reconciler scope
    scope?: FiberQuote,
  }
}

When you render a <Quote>...</Quote>, whatever's inside ends up mounted on the to fiber.

Quoted fibers will have a similar fiber.unquote environment. If they render an <Unquote>...</Unquote>, the children are mounted back on the quoting fiber.

Each time, the quoting or unquoting fiber becomes the new to fiber on the other side.

The idea is that you can use this to embed one set of components inside another as a DSL, and have the run-time sort them out.

This all happens in mountFiberQuote(…) and mountFiberUnquote(…). It uses reconcileFiberCall(…) (singular). This is an incremental version of reconcileFiberCalls(…) (plural) which only does one mount/unmount at a time. The fiber id of the quote or unquote is used as the key of the quoted or unquoted fiber.

const Queue = ({children}) => (
  reconcile(
    quote(
      gather(
        unquote(children),
        (v: any[]) =>
          <Render values={v} />
      ))));

The quote and unquote environments are separate so that reconcilers can be nested: at any given place, you can unquote 'up' or quote 'down'. Because you can put multiple <Unquote>s inside one <Quote>, it can also fork. The internal non-JSX dialect is very Lisp-esque, you can rap together some pretty neat structures with this.

Because quote are mounted and unmounted incrementally, there is a data fence Reconcile(…) after each (un)quote. This is where the final set is re-ordered if needed.

The data structure actually violates my own rule about no-reverse links. After you <Quote>, the fibers in the second tree have a link to the quoting fiber which spawned them. And the same applies in the other direction after you <Unquote>.

The excuse is ergonomics. I could break the dependency by creating a separate sub-fiber of <Quote> to serve as the unquoting point, and vice versa. But this would bloat both trees with extra fibers, just for purity's sake. It already has unavoidable extra data fences, so this matters.

At a reconciling root, the enclosing quote scope is added to fiber.quote, just like in yeeted, again for clean unmounting of inlined reconcilers.

Unquote-quote

There is an important caveat here. There are two ways you could implement this.

One way is that <Quote>...</Quote> is a Very Special built-in, which does something unusual: it would traverse the children tree it was given, and go look for <Unquote>...</Unquote>s inside. It would have to do so recursively, to partition the quoted and unquoted JSX. Then it would have to graft the quoted JSX to a previous quote, while grafting the unquoted parts to itself as mounts. This is the React DOM mechanism, obfuscated. This is also how quoting works in Lisp: it switches between evaluation mode and AST mode.

I have two objections. The first is that this goes against the whole idea of evaluating one component incrementally at a time. It wouldn't be working with one set of mounts on a local fiber: it would be building up args inside one big nested JSX expression. JSX is not supposed to be a mutable data structure, you're supposed to construct it immutably from the inside out, not from the outside in.

The second is that this would only work for 'balanced' <Quote>...<Unquote>... pairs appearing in the same JSX expression. If you render:

<Present>
  <Slide />
</Present>

...then you couldn't have <Present> render a <Quote> and <Slide> render an <Unquote> and have it work. It wouldn't be composable as two separate portals.

The only way for the quotes/unquotes to be revealed in such a scenario is to actually render the components. This means you have to actively run the second tree as it's being reconciled, same as the first. There is no separate update + commit like in React DOM.

This might seem pointless, because all this does is thread the data flow into a zigzag between the two trees, knitting the quote/unquote points together. The render order is the same as if <Quote> and <Unquote> weren't there. The path and depth of quoted fibers reveals this, which is needed to re-render them in the right order later.

The key difference is that for all other purposes, those fibers do live in that spot. Each tree has its own stack of nested contexts. Reductions operate on the two separate trees, producing two different, independent values. This is just "hygienic macros" in disguise, I think.

Use.GPU's presentation system uses a reconciler to wrap the layout system, adding slide transforms and a custom compositor. This is sandwiched in-between it and the normal renderer.

A plain declarative tree of markup can be expanded into:

<Device>
  <View>
    <Present>
      <Slide>
        <Object />
        <Object />
      </Slide>
      <Slide>
        <Object />
        <Object />
      </Slide>
    </Present>
  </View>
</Device>

I also use a reconciler to produce the WebGPU command queue. This is shared for an entire app and sits at the top. The second tree just contains quoted yeets. I use zero-cost signals here too, to let data sources signal that their contents have changed. There is a short-hand <Signal /> for <Quote><Yeet /></Quote>.

Note that you cannot connect the reduction of tree 1 to the root of tree 2: <Reconcile> does not have a then prop. It doesn't make sense because the next fiber gets its children from elsewhere, and it would create a rendering cycle if you tried anyway.

If you need to spawn a whole second tree based on a first, that's what a normal gather already does. You can use it to e.g. gather lambdas that return memoized JSX. This effectively acts as a two-phase commit.

The Use.GPU layout system does this repeatedly, with several trees + gathers in a row. It involves constraints both from the inside out and the outside in, so you need both tree directions. The output is UI shapes, which need to be batched together for efficiency and turned into a data-driven draw call.

The Run-Time

With all the pieces laid out, I can now connect it all together.

Before render(<App />) can render the first fiber, it initializes a very minimal run-time. So this section will be kinda dry.

This is accessed through fiber.host and exposes a handful of APIs:

  • a queue of pending state changes
  • a priority queue for traversal
  • a fiber-to-fiber dependency tracker
  • a resource disposal tracker
  • a stack slicer for reasons

State changes

When a setState is called, the state change is added to a simple queue as a lambda. This allows simultaneous state changes to be batched together. For this, the host exposes a schedule and a flush method.

{
  // ...

  host: {
    schedule: (fiber: LiveFiber, task?: () => boolean | void) => void,
    flush: () => void,

    // ... 
  }

This comes from makeActionScheduler(…). It wraps a native scheduling function (e.g. queueMicrotask) and an onFlush callback:

const makeActionScheduler = (
  schedule: (flush: ArrowFunction) => void,
  onFlush: (fibers: LiveFiber[]) => void,
) => {
  // ...
  return {schedule, flush};
}

The callback is set up by render(…). It will take the affected fibers and call renderFibers(…) (plural) on them.

The returned schedule(…) will trigger a flush, so flush() is only called directly for sync execution, to stay within the same render cycle.

Traversal

The host keeps a priority queue (makePriorityQueue) of pending fibers to render, in tree order:

{
  // ...

  host: {
    // ...

    visit: (fiber: LiveFiber) => void,
    unvisit: (fiber: LiveFiber) => void,
    pop: () => LiveFiber | null,
    peek: () => LiveFiber | null,
  }
}

renderFibers(…) first adds the fibers to the queue by calling host.visit(fiber).

A loop in renderFibers(…) will then call host.peek() and host.pop() until the queue is empty. It will call renderFiber(…) and updateFiber(…) on each, which will call host.unvisit(fiber) in the process. This may also cause other fibers to be added to the queue.

The priority queue is a singly linked list of fibers. It allows fast appends at the start or end. To speed up insertions in the middle, it remembers the last inserted fiber. This massively speeds up the very common case where multiple fibers are inserted into an existing queue in tree order. Otherwise it just does a linear scan.

It also has a set of all the fibers in the queue, so it can quickly do presence checks. This means visit and unvisit can safely be called blindly, which happens a lot.

// Re-insert all fibers that descend from fiber
const reorder = (fiber: LiveFiber) => {
  const {path} = fiber;
  const list: LiveFiber[] = [];
  let q = queue;
  let qp = null;

  while (q) {
    if (compareFibers(fiber, q.fiber) >= 0) {
      hint = qp = q;
      q = q.next;
      continue;
    }
    if (isSubNode(fiber, q.fiber)) {
      list.push(q.fiber);
      if (qp) {
        qp.next = q.next;
        q = q.next;
      }
      else {
        pop();
        q = q.next;
      }
    }
    break;
  }

  if (list.length) {
    list.sort(compareFibers);
    list.forEach(insert);
  }
};

There is an edge case here though. If a fiber re-orders its keyed children, the compareFibers fiber order of those children changes. But, because of long-range dependencies, it's possible for those children to already be queued. This might mean a later cousin node could render before an earlier one, though never a child before a parent or ancestor.

In principle this is not an issue because the output—the reductions being gathered—will be re-reduced in new order at a fence. From a pure data-flow perspective, this is fine: it would even be inevitable in a multi-threaded version. In practice, it feels off if code runs out of order for no reason, especially in a dev environment.

So I added optional queue re-ordering, on by default. This can be done pretty easily because the affected fibers can be found by comparing paths, and still form a single group inside the otherwise ordered queue: scan until you find a fiber underneath the parent, then pop off fibers until you exit the subtree. Then just reinsert them.

This really reminds me of shader warp reordering in raytracing GPUs btw.

Dependencies

To support contexts and captures, the host has a long-range dependency tracker (makeDependencyTracker):

{
  host: {
    // ...

    depend: (fiber: LiveFiber, root: number) => boolean,
    undepend: (fiber: LiveFiber, root: number) => void,
    traceDown: (fiber: LiveFiber) => LiveFiber[],
    traceUp: (fiber: LiveFiber) => number[],
  }
};

It holds two maps internally, each mapping fibers to fibers, for precedents and descendants respectively. These are mapped as LiveFiber -> id and id -> LiveFiber, once again following the one-way rule. i.e. It gives you real fibers if you traceDown, but only fiber IDs if you traceUp. The latter is only used for highlighting in the inspector.

The depend and undepend methods are called by useContext and useCapture to set up a dependency this way. When a fiber is rendered (and did not memoize), bustFiberDeps(…) is called. This will invoke traceDown(…) and call host.visit(…) on each dependent fiber. It will also call bustFiberMemo(…) to bump their fiber.version (if present).

Yeets could be tracked the same way, but this is unnecessary because yeeted already references the root statically. It's a different kind of cache being busted too (yeeted.reduced) and you need to bust all intermediate reductions along the way. So there is a dedicated visitYeetRoot(…) and bustFiberYeet(…) instead.

Yeet cache busting

Yeets are actually quite tricky to manage because there are two directions of traversal here. A yeet must bust all the caches towards the root. Once those caches are busted, another yeet shouldn't traverse them again until filled back in. It stops when it encounters undefined. Second, when the root gathers up the reduced values from the other end, it should be able to safely accept any defined yeeted.reduced as being correctly cached, and stop as well.

The invariant to be maintained is that a trail of yeeted.reduced === undefined should always lead all the way back to the root. New fibers have an undefined reduction, and old fibers may be unmounted, so these operations also bust caches. But if there is no change in yeets, you don't need to reduce again. So visitYeetRoot is not actually called until and unless a new yeet is rendered or an old yeet is removed.

Managing the lifecycle of this is simple, because there is only one place that triggers a re-reduction to fill it back in: the yeet root. Which is behind a data fence. It will always be called after the last cache has been busted, but before any other code that might need it. It's impossible to squeeze anything in between.

It took a while to learn to lean into this style of thinking. Cache invalidation becomes a lot easier when you can partition your program into "before cache" and "after cache". Compared to the earliest versions of Live, the how and why of busting caches is now all very sensible. You use immutable data, or you pass a mutable ref and a signal. It always works.

Resources

The useResource hook lets a user register a disposal function for later. useContext and useCapture also need to dispose of their dependency when unmounted. For this, there is a disposal tracker (makeDisposalTracker) which effectively acts as an onFiberDispose event listener:

{
  host: {
    // ...

    // Add/remove listener
    track: (fiber: LiveFiber, task: Task) => void,
    untrack: (fiber: LiveFiber, task: Task) => void,

    // Trigger listener
    dispose: (fiber: LiveFiber) => void,
  }
}

Disposal tasks are triggered by host.dispose(fiber), which is called by disposeFiber(fiber). The latter will also set fiber.bound to undefined so the fiber can no longer be called.

A useResource may change during a fiber's lifetime. Rather than repeatedly untrack/track a new disposal function each time, I store a persistent resource tag in the hook state. This holds a reference to the latest disposal function. Old resources are explicitly disposed of before new ones are created, ensuring there is no overlap.

Stack Slicing

A React-like is a recursive tree evaluator. A naive implementation would use function recursion directly, using the native CPU stack. This is what Live 0.0.1 did. But the run-time has overhead, with its own function calls sandwiched in between (e.g. updateFiber, reconcileFiberCalls, flushMount). This creates towering stacks. It also cannot be time-sliced, because all the state is on the stack.

In React this is instead implemented with a flat work queue, so it only calls into one component at a time. A profiler shows it repeatedly calling performUnitOfWork, beginWork, completeWork in a clean, shallow trace.

React stack

Live could do the same with its fiber priority queue. But the rendering order is always just tree order. It's only interrupted and truncated by memoization. So the vast majority of the time you are adding a fiber to the front of the queue only to immediately pop it off again.

The queue is a linked list so it creates allocation overhead. This massively complicates what should just be a native function call.

Live stack

Live says "¿Por qué no los dos?" and instead has a stack slicing mechanism (makeStackSlicer). It will use the stack, but stop recursion after N levels, where N is a global knob that current sits at 20. The left-overs are enqueued.

This way, mainly fibers pinged by state changes and long-range dependency end up in the queue. This includes fenced continuations, which must always be called indirectly. If a fiber is in the queue, but ends up being rendered in a parent's recursion, it's immediately removed.

{
  host: {
    // ...

    depth: (depth: number) => void,
    slice: (depth: number) => boolean,
  },
}

When renderFibers gets a fiber from the queue, it calls host.depth(fiber.depth) to calibrate the slicer. Every time a mount is flushed, it will then call host.slice(mount.depth) to check if it should be sliced off. If so, it calls host.visit(…) to add it to the queue, but otherwise it just calls renderFiber / updateFiber directly. The exception is when there is a data fence, when the queue is always used.

Here too there is a strict mode, on by default, which ensures that once the stack has been sliced, no further sync evaluation can take place higher up the stack.

One-phase commit

Time to rewind.

A Live app consists of a tree of such fiber objects, all exactly the same shape, just with different state and environments inside. It's rendered in a purely one-way data flow, with only a minor asterisk on that statement.

The host is the only thing coordinating, because it's the only thing that closes the cycle when state changes. This triggers an ongoing traversal, during which it only tells fibers which dependencies to ping when they render. Everything else emerges from the composition of components.

Hopefully you can appreciate that Live is not actually Cowboy React, but something else and deliberate. It has its own invariants it's enforcing, and its own guarantees you can rely on. Like React, it has a strict and a non-strict mode that is meaningfully different, though the strictness is not about it nagging you, but about how anally the run-time will reproduce your exact declared intent.

It does not offer any way to roll back partial state changes once made, unlike React. This idempotency model of rendering is good when you need to accommodate mutable references in a reactive system. Immediate mode APIs tend to use these, and Live is designed to be plugged in to those.

The nice thing about Live is that it's often meaningful to suspend a partially rendered sub-tree without rewinding it back to the old state, because its state doesn't represent anything directly, like HTML does. It's merely reduced into a value, and you can simply re-use the old value until it has unsuspended. There is no need to hold on to all the old state of the components that produced it. If the value being gathered is made of lambdas, you have your two phases: the commit consists of calling them once you have a full set.

In Use.GPU, you work with memory on the GPU, which you allocate once and then reuse by reference. The entire idea is that the view can re-render without re-rendering all components that produced it, the same way that a browser can re-render a page by animating only the CSS transforms. So I have to be all-in on mutability there, because updated transforms have to travel through the layout system without re-triggering it.

I also use immediate mode for the CPU-side interactions, because I've found it makes UI controllers 2-3x less complicated. One interesting aspect here is that the difference between capturing and bubbling events, i.e. outside-in or inside-out, is just before-fence and after-fence.

Live is also not a React alternative: it plays very nicely with React. You can nest one inside the other and barely notice. The Live inspector is written in React, because I needed it to work even if Live was broken. It can memoize effectively in React because Live is memoized. Therefor everything it shows you is live, including any state you open up.

The inspector is functionality-first so I throw purity out the window and just focus on UX and performance. It installs a host.__ping callback so it can receive fiber pings from the run-time whenever they re-render. The run-time calls this via pingFiber in the right spots. Individual fibers can make themselves inspectable by adding undocumented/private props to fiber.__inspect. There are some helper hooks to make this prettier but that's all. You can make any component inspector-highlightable by having it re-render itself when highlighted.

* * *

Writing this post was a fun endeavour, prompting me to reconsider some assumptions from early on. I also fixed a few things that just sounded bad when said out loud. You know how it is.

I removed some lingering unnecessary reverse fiber references. I was aware they weren't load bearing, but that's still different from not having them at all. The only one I haven't mentioned is the capture keys, which are a fiber so that they can be directly compared. In theory it only needs the id, path, depth, keys, and I could package those up separately, but it would just create extra objects, so the jury allows it.

Live can model programs shaped like a one-way data flow, and generates one-way data itself. There are some interesting correspondences here.

  • Live keep state entirely in fiber objects, while fibers run entirely on fiber.state. A fiber object is just a fixed dictionary of properties, always the same shape, just like fiber.state is for a component's lifetime.
  • Children arrays without keys must be fixed-length and fixed-order (a fragment), but may have nulls. This is very similar to how no-hooks will skip over a missing spot in the fiber.state array and zero out the hook, so as to preserve hook order.
  • Live hot-swaps a global currentFiber pointer to switch fibers, and useYolo hot-swaps a fiber's own local state to switch hook scopes.
  • Memoizing a component can be implemented as a nested useMemo. Bumping the fiber version is really a bespoke setState which is resolved during next render.

The lines between fiber, fiber.state and fiber.mounts are actually pretty damn blurry.

A lot of mechanisms appear twice, once in a non-incremental form and once in an incremental form. Iteration turns into mounting, sequences turn into fences, and objects get chopped up into fine bits of cached state, either counted or with keys. The difference between hooks and a gather of unkeyed components gets muddy. It's about eagerness and dependency.

If Live is react-react, then a self-hosted live-live is hiding in there somewhere. Create a root fiber, give it empty state, off you go. Inlining would be a lot harder though, and you wouldn't be able to hand-roll fast paths as easily, which is always the problem in FP. For a JS implementation it would be very dumb, especially when you know that the JS VM already manages object prototypes incrementally, mounting one prop at a time.

I do like the sound of an incremental Lisp where everything is made out of flat state lists instead of endless pointer chasing. If it had the same structure as Live, it might only have one genuine linked list driving it all: the priority queue, which holds elements pointing to elements. The rest would be elements pointing to linear arrays, a data structure that silicon caches love. A data-oriented Lisp maybe? You could even call it an incremental GPU. Worth pondering.

What Live could really use is a WASM pony with better stack access and threading. But in the mean time, it already works.

The source code for the embedded examples can be found on GitLab.

If your browser can do WebGPU (desktop only for now), you can load up any of the Use.GPU examples and inspect them.

De la difficulté de classifier la littérature (et de l’occasion de se rencontrer aux Imaginales)

La sérendipité de mon bibliotaphe m’a fait enchainer deux livres entre lesquels je n’ai pas pu m’empêcher de voir une grande similitude. « L’apothicaire » d’Henri Lœvenbruck et « Hoc Est Corpus » de Stéphane Paccaud.

Si l’un conte les aventures du très moderne Andreas Saint-Loup dans le Paris de Philippe le Bel, l’autre nous emmène dans la Jérusalem de Baudouin le Lépreux. Tous les deux sont des romans historiques extrêmement documentés, réalistes, immersifs et néanmoins mâtinés d’une subtile dose de fantastique. Fantastique qui ne l’est que pas le style et pourrait très bien se révéler une simple vue de l’esprit.

Dans les deux cas, l’écriture est parfaitement maitrisée, érudite tout en restant fluide et agréable. Lœvenbruck se plait à rajouter des tournures désuètes et du vocabulaire ancien, lançant des phrases et des répliques anachroniques pleines d’humour. Paccaud, de son côté, alterne rapidement les narrateurs, allant jusqu’à donner la parole aux murs chargés d’humidité ou au vent du désert.

Bref, j’ai adoré tant le style que l’histoire et je recommande chaudement ces deux lectures même si le final m’a chaque fois légèrement déçu, tuant toute ambiguïté de réalisme et rendant le fantastique inéluctablement explicite. J’aurais préféré garder le doute jusqu’au bout.

D’ailleurs, Henri Lœvenbruck, Stéphane Paccaud et moi-même serons ce week-end à Épinal pour les imaginales. N’hésitez pas à venir faire coucou et taper la causette. C’est la raison même de ce genre d’événements. (suivez-nous sur Mastodon pour nous trouver plus facilement).

De la classification de la littérature

S’il fallait les classer, ces deux livres devraient clairement se trouver côte à côte dans les rayons d’une bibliothèque. Des romans historiques avec des éléments fantastiques. D’ailleurs, Lœvenbruck m’a asséné : « Une histoire n’est pas fantastique. Elle comporte des éléments de fantastique ! » (citation approximative,).

Mais voilà. Henri Lœvenbruck est réputé comme un auteur de polars. Vous trouverez donc « L’Apothicaire » dans la section polar de votre librairie. Quand à « Hoc Est Corpus », il est paru dans la collection Ludomire chez PVH éditions, une collection (où je suis moi-même édité) spécialisée dans la « littérature de genre », à savoir la SFFF pour « Science-Fiction Fantasy Fantastique ».

Quelle importance, me demandez-vous ? On s’en fout de la classification.

Pas du tout !

Car, comme je l’ai appris à mes dépens, le lectorat grand public ne veut pas entendre parler de science-fiction ou de fantastique. Le simple fait de voir le mot sur la couverture fait fuir une immense quantité de lecteurs qui, pourtant, en lit régulièrement sous la forme de polars. La plupart des librairies générales cachent pudiquement sous une étagère quelques vieux Asimov qui prennent la poussière et ne veulent pas entendre parler de science-fiction moderne. Quelques échoppes tentent de faire exception, comme « La boîte à livre » à Tours, qui a un magnifique rayon ou le salon de thé/librairie « Nicole Maruani », près de la place d’Italie à Paris, qui m’a fait la surprise de mettre mon livre à l’honneur dans son étagère de SF (et qui fait du super bon brownie, allez-y de ma part !).

Mais Ploum, si le mot « science-fiction » est mal considéré, pourquoi ne pas mettre simplement ton roman dans la catégorie polar ? Après tout, Printeurs est clairement un thriller.

Parce que la niche des lecteurs de science-fiction est également étanche. Elle se rend dans des lieux comme « La Dimension Fantastique », près de la gare du Nord à Paris. Un endroit magique ! J’avais les yeux qui pétillaient en survolant les rayons et en écoutant l’érudition du libraire.

La SF est-elle condamnée à être cantonnée dans sa niche ? À la Dimension Fantastique, le libraire m’a confié qu’il espérait que le genre gagne ses lettres de noblesse, qu’il voyait une évolution ces dernières années.

Pour Bookynette, l’hyperactive présidente de l’April et directrice de la bibliothèque jeunesse « À livr’ouvert », le genre à la mode est le « Young Adult ». Et c’est vrai : dès que le protagoniste est un·e adolescent·e, soudainement le fantastique devient acceptable (Harry Potter) et la pure science-fiction dystopique devient branchée (Hunger Games).

Bref, la classification a son importance. Au point de décider dans quelle librairie vous allez être. Étant un geek de science-fiction, j’ai l’impression d’en écrire. Mais j’ai la prétention de penser que certains de mes textes vont au-delà de la SF, qu’ils pourraient parler à un public plus large et leur donner des clés pour comprendre un monde qui n’est pas très éloigné de la science-fiction d’il y a quelques décennies. Surtout les genres dystopiques. En pire.

La science-fiction ne parle pas et n’a jamais parlé du futur. Elle est un genre de littérature essentiel pour comprendre le présent. Peut-être doit-elle parfois se camoufler pour briser certains a priori ?

On se retrouve sur le stand PVH aux Imaginales pour discuter de tout ça ?

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

May 15, 2023

La fausse bonne idée de la livraison à domicile

J’ai reçu ce matin un email me précisant qu’une livraison allait être faite à mon domicile entre 12h51 et 13h51.

Me voilà devant le fait accompli. Je peux tout à fait rater cette livraison qui n’est pas urgente. Mais il est plus facile pour moi d’adapter mon horaire aujourd’hui que pour une relivraison hypothétique ou pour une livraison dans un point relais aléatoire. Car, pour l’anecdote, j’habite une ville entièrement piétonnière. Mais le seul point Mondial Relay de la ville se trouve dans une station d’essence située entre les deux bandes d’un boulevard fréquenté et sans aucune manière d’y accéder à pied sauf à traverser des buissons puis à marcher 200m le long de cette route pour automobiles et de la traverser.

Consultant ma montre, je m’arrange pour arriver à 12h45 chez moi. Une camionnette de livraison est garée, moteur tournant. J’interpelle le chauffeur. Il regarde sa montre et me dit qu’il ne peut pas me donner le paquet avant 12h51, qu’il doit attendre. Moteur tournant.

Comme beaucoup d’inventions humaines, la livraison à domicile semblait une bonne idée. Parce que nous n’avions pas envisagé les impacts.

Nous croyions pouvoir consommer confortablement assis dans notre canapé. Nous avons oublié que nous étions souvent hors de chez nous, pour le travail ou pour le plaisir. Nous avons oublié le service que nous rendaient les commerçants de proximité, remplacés dans tous les domaines par de la grande distribution dans des points de plus en plus éloignés.

Nous avons oublié que, parfois, nous n’avons pas envie d’être dérangés. Comme cette fameuse journée de travail à domicile durant le confinement où j’ai reçu quatre livreurs de trois entreprises différentes sur une seule après-midi. Le tout pour me livrer une seule et unique commande Amazon dans laquelle j’avais tenté de regrouper tous mes achats.

Nourriture, vêtements, livres, articles de sports. Ce qui se trouve dans ces magasins, la plupart du temps uniquement accessibles en voiture, est le strict minimum, le modèle moyen, les marques standard. Pour tout le reste ? Commandez sur Internet. Que dis-je, sur Amazon !

Amazon qui, soit dit en passant, impose à ses fournisseurs de ne pas vendre moins cher ailleurs, mais qui prend une telle marge que les producteurs, pour pouvoir être sur Amazon, sont obligés de monter leurs prix… partout ! Amazon qui n’hésite pas à copier un produit qui se vend bien et qui impose également aux producteurs de payer pour apparaître dans les résultats.

Au final, des livreurs payés au lance-pierre sont obligés de faire un nombre ahurissant de livraisons par jour en respectant des horaires à la minute près, forcés d’attendre ou d’accélérer en fonction des algorithmes. Tout cela pendant que nous sommes forcés de rester chez nous pour attendre la livraison, pour guetter par la fenêtre le livreur qui dépose un papier arguant de notre absence alors que nous étions derrière la porte.

Enfin, nous ouvrons le carton contenant des biens que nous n’avons jamais vus, que nous n’avons jamais essayés, que nous n’aurions peut-être pas achetés si nous n’avions pas été séduits par la photo subtilement éclairée, mais que nous gardons quand même devant la difficulté du renvoi ou de l’échange, lorsque celui-ci n’est tout simplement pas à un coût prohibitif.

Des biens surpayés pour permettre aux producteurs de vivre avec les marges d’Amazon et des entreprises de livraison. Des biens désormais introuvables en magasin.

La livraison à domicile paraissait une bonne idée. Elle bénéficie à certains. Mais ce ne sont ni les livreurs, ni les vendeurs de magasin, ni même les clients.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

May 14, 2023

About #Eurovision; Loved Croatia, Finland and I’m happy my fellow-Belgian #Gustaph did (very) well. But my favorite by far this year was Spains #BlancaPaloma Flamenco-ish, intense, great voice, beautiful act, uncompromising electro-track that completely avoids the standard 4 to the floor beat that is way to present in Liverpool. So much to like & discover, I’ve got it on repeat!

Source

May 12, 2023

Les réseaux sociaux sont des maladies mentales

La nuit passée, j’ai été réveillé par de la techno tonitruante. Étonné par cette cause de bruit soudaine, j’ai regardé par ma fenêtre et vu une voiture garée en face de chez moi.

Au volant, une jeune femme se filmait en train de secouer la tête comme si elle s’amusait follement, agitant le bras libre dans tous les sens. Exactement trois minutes après le début de la nuisance sonore, elle a coupé la vidéo, a coupé la musique et s’est remis à scroller sur son téléphone en silence, la nuque penchée.

Je n’ai pu ressentir qu’une bouffée de pitié pour cette jeune femme seule en pleine nuit, enfermée dans sa voiture et qui ressentait le besoin de faire savoir à d’autres qu’elle s’amusait, même s’il fallait pour cela réveiller tout le quartier. La brève durée de cet épisode m’a fait soupçonner une vidéo Tiktok.

En me recouchant, j’ai pensé à cette jeune maman que nous avons aperçue, mon épouse et moi-même, la semaine précédente.

Nous étions sur un chemin surplombant de quelques mètres une petite plage hérissée de rochers. Un parapet séparait le chemin du vide. Sur ce parapet se tenait, debout, une fillette de trois ou quatre ans qui tenait la main de sa mère. La mère a lâché sa fille, a pris son appareil photo pour prendre une photo tout en lui recommandant de ne pas bouger.

Mon cœur de père s’est arrêté. J’ai hésité à agir, mais j’ai très vite pris conscience que le moindre mouvement brusque de ma part pouvait déclencher une catastrophe. Que je n’étais émotionnellement pas capable de tenter de raisonner une mère capable de mettre la vie de son enfant en danger pour une photo Instagram.

J’ai passé mon chemin en fermant très fort les yeux.

Je pensais que les réseaux sociaux étaient des addictions, des dangers pour notre concentration. Mais pas seulement. Je pense que ce sont désormais des maladies mentales graves. Que leurs utilisateurs (dont je fais partie avec Mastodon) doivent être vus comme des personnes malades dès le moment où elles modifient leur comportement dans le seul et unique objectif de faire un post.

Les principales victimes sont les adolescents et les jeunes adultes. Et loin de les aider, le système scolaire les enfonce, de plus en plus d’enseignants et d’écoles utilisant des "apps" pour avoir l’air de suivre une pédagogie moderne et forçant leurs élèves à avoir un téléphone (et je ne parle pas des cours "d’informatique" qui forment à… Word et PowerPoint !).

Il ne fait aucun doute que, d’ici quelques années, les smartphones seront perçus pour le cerveau comme la cigarette l’est pour les poumons. Mais nous sommes dans cette période où une poignée d’experts (dont je fais partie) s’époumone face à un lobby industriel et une masse qui "suit la mode pour avoir l’air cool", qui a peur "de ne pas être dans la révolution informatique".

Quand je vois les ravages de la cigarette, encore aujourd’hui, je ne peux qu’être terrorisé pour mes enfants et les générations qui nous suivent. Car ceux qui ne sont pas atteints doivent vivre avec les autres. Ils sont les exceptions. Ils doivent justifier de ne pas sortir leur smartphone, de ne pas être connecté, de ne pas vouloir s’interrompre pour une photo.

Peut-être qu’il est temps de considérer le fait de poster sur les réseaux sociaux pour ce que c’est réellement : une action pathétique et misérable, un espoir d’exister dans un univers factice. Un appel au secours d’une personne malade.

Ne nous voilons pas la face : je suis tout aussi coupable que n’importe qui d’autre. Mais promis, je me soigne…

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

May 09, 2023

(This is a working document, regularly updated. Created on 2022-12-29.) The hardware is nice: the device feels sturdy and premium, the screen is bright and the controls work as expected. Great work, Retroid! Compared to a single system device, software configuration is certainly more complicated in emulation-land. Although Retroid provides more than the basics, if you are into game emulation, you know your setup will need some tweaking to get the best out of the hardware.

May 05, 2023

Some times it’s important to know the size of a transaction, especially when you plan to migrate to a HA solution where by default transactions have a limited size to guarantee an optimal behavior of the cluster.

Today we will see the different possibilities to have an idea of the size of transactions.

First we need to split the transaction in two types:

  • those generating data (writes, like insert, delete and update, DML)
  • those only ready data (select, DQL)

To implement High Availability, only the first category is important.

Size of DML

To know the size of a DML transaction, the only possibility we have is to parse the binary log (or query the binlog event).

We need to check the binlog event from the binlog file and then calculate its size. To illustrate this, let’s try to find the transaction identified by a specific GTID: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914

SQL > \pager grep 'Gtid\|COMMIT' ;
Pager has been set to 'grep 'Gtid\|COMMIT' ;'.
SQL > show BINLOG EVENTS in 'binlog.000064' ;
| binlog.000064 |     213 | Gtid           |         1 |         298 | SET @@SESSION.GTID_NEXT= '17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914' |
| binlog.000064 | 53904723 | Xid            |         1 |    53904754 | COMMIT /* xid=75 */                                                     |
SQL > \pager
Pager has been disabled.
SQL > select format_bytes(53904754-213);
+----------------------------+
| format_bytes(53904754-213) |
+----------------------------+
| 51.41 MiB                  |
+----------------------------+
1 row in set (0.0005 sec)

We can see that this transaction generated 51MB of binlog event.

This method can be complicated, certainly when you need to parse multiple binlog files to find the desired transaction.

Hopefully, Performance_Schema can again make our life easier. Indeed, we can parse the table binary_log_transaction_compression_stats to have information about the size of a transaction. Even if we don’t use binary log compression:

select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB  |                   1 |
+-----------+------------+---------------------+

The TRANSACTION_COUNTER column is very important as if it is bigger than 1, the values are an average.

So if you really need to know the exact size of one transaction, you need first to truncate that table before running your DML.

Let’s have a look at this example:

SQL> select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 17.13 MiB | 17.13 MiB  |                   6 |
+-----------+------------+---------------------+
1 row in set (0.0004 sec)

SQL > truncate table performance_schema.binary_log_transaction_compression_stats;
Query OK, 0 rows affected (0.0018 sec)

SQL > update sbtest1 set k=k+4;
Query OK, 132188 rows affected (1.3213 sec)

Rows matched: 132188  Changed: 132188  Warnings: 0

SQL > select format_bytes(UNCOMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) size,
       format_bytes(COMPRESSED_BYTES_COUNTER/TRANSACTION_COUNTER) compressed,
       TRANSACTION_COUNTER 
  from performance_schema.binary_log_transaction_compression_stats;
+-----------+------------+---------------------+
| size      | compressed | TRANSACTION_COUNTER |
+-----------+------------+---------------------+
| 51.38 MiB | 51.38 MiB  |                   1 |
+-----------+------------+---------------------+
1 row in set (0.0017 sec)

We still have the possibility to use a MySQL Shell Plugin that list all the transaction size from a binary log:

 JS > check.showTrxSizeSort()
Transactions in binary log binlog.000064 orderer by size (limit 10):
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541926
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541925
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541921
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541916
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541915
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541918
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541917
51 mb - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541924
257 bytes - 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541923

But how can I know the GTID of my transaction ?

MySQL has the possibility to return the GTID to the client if it supports that information returned by the server. MySQL Shell supports that feature !

To enable it, we use session_track_gtids:

SQL > set session_track_gtids='OWN_GTID';
Query OK, 0 rows affected (0.0001 sec)

SQL > update sbtest1 set k=k+1;
Query OK, 132183 rows affected (5.6854 sec)

Rows matched: 132183  Changed: 132183  Warnings: 0
GTIDs: 17f6a975-e2b4-11ec-b714-c8cb9e32df8e:7541914

As you can see, MySQL Shell returned the GTID of the transaction (update using auto_commit).

Size of DQL

But do we have the possibility to know the size of a SELECT ?

To determine the size of a SELECT, we can calculate the bytes sent by the server to the client like this:

SQL > select variable_value 
      from performance_schema.status_by_thread 
       join performance_schema.threads using(thread_id) 
      where processlist_id=CONNECTION_ID() 
        and variable_name='Bytes_sent' into @before;

SQL > select * from sbtest1;

SQL > select format_bytes(variable_value - @before) query_size 
        from performance_schema.status_by_thread 
        join performance_schema.threads using(thread_id) 
       where processlist_id=CONNECTION_ID() 
         and variable_name='Bytes_sent' ;
+------------+
| query_size |
+------------+
| 26.08 MiB  |
+------------+
1 row in set (0.0010 sec)

Summary

As you can see, MySQL Server provides a lot information via Performance_Schema and the binary logs. Parsing that information, you can retrieve the size of DML transaction or DQL.

Enjoy MySQL !

May 04, 2023

Last year, I began using Textual to develop HumBLE Explorer, a cross-platform, command-line and human-friendly Bluetooth Low Energy scanner. Textual caught my attention because it promised to be a rapid application development framework for Python terminal user interface (TUI) applications. Thanks to Textual, I was able to create an application like this, including scroll bars, switches and tables:

/images/humble-explorer-light.png

One of Textual's key features is that it's inspired by web development practices. This allows for a clear separation of design and code using CSS files (in the Textual CSS dialect), making the framework both developer-friendly and highly customizable. It also has reactive attributes, as well as a growing library of widgets. Additionally, the framework provides useful tools for debugging and live editing of CSS files during development.

If you want to have an idea about Textual's capabilities, install it from PyPI and run the demo:

pip install textual
python -m textual

In addition, one of Textual's developers maintains a list of Textual-based applications.

As a relatively new project (started in 2021), Textual still experiences occasional breaking changes in new releases. However, the developers are easily accessible for support on their Discord server and provide regular blog updates. Moreover, Textual has excellent documentation. I plan to use Textual again for new terminal user interfaces in Python.

For some more background, read my article Textual: a framework for terminal user interfaces on LWN.

When you connect to a server (or cluster) using a TCP proxy level 7, also referred to as an application-level or Layer 7 proxy (highest level of the OSI model), the application doesn’t connect directly to the back-end server(s). The proxy usually understands the protocol used and can eventually take some decisions or even change the request.

The problem when using such proxy (like HA Proxy, ProxySQL and MySQL Router) is that the server doesn’t really know from where the client is connecting. The server sees the IP address of the proxy/router as the source IP of the client.

HA Proxy initially designed the Proxy Protocol, a simple protocol that allows a TCP connection to transport proxy-related information between the client, the proxy server and the destination server. The main purpose of the Proxy Protocol is then to preserve the client’s original IP address (with some others metadata). See HA Proxy’s manual.

The back-end application must also be aware of this protocol and understand it to be able to benefit from it.

However, there are potential security issues associated with the Proxy Protocol (like spoofing, information leakage, Denial of Service using malformed or very large Proxy Protocol headers, …).

For these reasons, as you know security is very important for us at Oracle, MySQL doesn’t support the Proxy Protocol.

Does this mean that if you use MySQL Router you have no way of knowing the client’s IP address?

Of course not, if you use a secure connection (SSL client), the MySQL Router adds attributes to the handshake and these attributes are available on the server. See Connections Attributes.

With this simple query, it is now possible to list the connections and originating IP addresses of clients when they connect via MySQL Router:

select program_name, last_statement, user,attr_value 'client ip'
  from performance_schema.session_account_connect_attrs 
  join sys.processlist on conn_id=processlist_id 
 where attr_name='_client_ip' ;
+--------------+----------------------------------+-------------------------+-----------+
| program_name | last_statement                   | user                    | client ip |
+--------------+----------------------------------+-------------------------+-----------+
| mysqlsh      | NULL                             | root@mysql-router-shell | 127.0.0.1 |
| mysql        | select @@version_comment limit 1 | root@mysql-router-shell | 10.0.0.76 |
+--------------+----------------------------------+-------------------------+-----------+

MySQL Server and MySQL Router don’t support Proxy Protocol but implement the connection attributes that provides even more information.

As usual, enjoy MySQL !

May 01, 2023

A hero image with the word

Do you have a great idea to improve Drupal's software or community? The Pitch-burg innovation contest provides you with the perfect opportunity to pitch your idea for a shot at receiving funding.

In the past week, we've raised an impressive $70,000 USD for this contest thanks to the generous support of the Drupal Association and our sponsors, including 1xInternet, Digital Polygon, ImageX, Palantir, Skilld, Zoocha, and Acquia. We greatly appreciate their contributions, which will help fuel innovation for Drupal.

While this is a fantastic start, we're not done yet! We still hope to raise more funds, so please don't hesitate to reach out to the Drupal Association or to contact me if you're interested in contributing.

The judges will award funding to the most exciting and impactful proposals that could make a difference in the Drupal community. Whether you have a new feature to improve Drupal's reach with ambitious site builders, or an innovative design to improve Drupal's user interface, we want to hear from you!

We're hoping that this funding update will inspire you to participate. Remember, video submissions only need to be 2 minutes and 30 seconds long. The submission deadline is May 25th, which gives you almost 4 weeks to prepare. The winners will be announced at DrupalCon Pittsburgh, but you can enter the contest even if you're not going to DrupalCon Pittsburgh. Everyone is invited to pitch! For detailed guidelines on how to participate, please visit the original announcement.

We're excited to see your submissions!

April 26, 2023

The Drupal logo superimposed on a photo of a building with the word 'public' engraved in it.

This week, Drupal was approved as a Digital Public Good (DPG) by the Digital Public Goods Alliance.

The economic concept of a public good is decades old. Public goods are "non-excludable" and "non-rivalrous". This means that once the good is provided, it is impossible to exclude anyone from using it (non-excludable). It also means that the consumption of the good by one individual does not diminish its availability for others (non-rivalrous). A classic example of a public good is a city park: once the park is built, it is hard to exclude anyone from using it, and the use of the park by one person does not diminish its availability for others.

Similarly, the Digital Public Goods Alliance defines DPGs as a type of resource that is accessible to all. However, their definition also goes further by explicitly calling out the importance of privacy, responsible design, and alignment with the United Nations Sustainable Development Goals.

Due to their non-excludable and non-rivalrous nature, public goods can provide big and long-lasting benefits to society as a whole, often beyond their direct use. A public park benefits society both directly and indirectly. It directly improves the physical and mental health of people who use the park. However, it also indirectly increases property values in the surrounding area. Public goods are important to recognize and invest in, because they provide direct benefits for everyone and also have significant spill-over effects on the wider community.

As I wrote in "Balancing Makers and Takers to Scale and Sustain Open Source", I've long believed that Open Source software projects are public goods. Not only should Open Source projects be recognized and maintained as public goods, Open Source can also learn a lot from decades of public good management.

In the case of Drupal, we're helping to make the Open Web better, safer, and more inclusive for everyone. This benefits not only our users, but also has a far-reaching impact on society that will last for decades to come.

Drupal's designation as a DPG is not only great recognition for Drupal, but it will hopefully bring several benefits to Drupal as well.

First, it will increase Drupal's visibility and credibility, as the DPG initiative is backed by the United Nations and recognized globally. This recognition could help attract more funding and support for Drupal.

Second, being recognized as a DPG will help increase adoption. Public sector, educational, and other social-impact organizations may be more willing to use a product or service if it is recognized as a DPG, as it is demonstrated to be aligned with the United Nations 2030 Sustainable Global Development Goals.

You can read more about the Digital Public Good designation in the Drupal Association's announcement. I'm really excited about this, and I hope you are too!

A special thank you to Daniel Cothran (JSI) for starting our application process and to Tim Lehnen (Drupal Association) for driving it to completion. We also appreciate the help of Anoop John, Tim Doyle, Von Eaton, Kristin Romaine, Rachel Norfolk, and everyone else who contributed.

April 24, 2023

For Theengs Gateway we regularly got bug reports that were difficult to debug, as they depend on the operating system, Python version, application configuration, and Bluetooth adapter. After some back and forth we got an idea of the user's environment and started to identify the issue (or not). Then I discovered that Textual has a convenient solution for this: a diagnose command that prints information about the Textual library and its environment to help diagnose problems.

I borrowed this code from Textual and adapted it for Theengs Gateway's usage. So now we simply ask the user to run this command and include its output in the issue description on GitHub. Theengs Gateway's diagnose module looks like this:

theengs_gateway/diagnose.py (Source)

import asyncio
import json
import os
import platform
import re
import sys

from importlib_metadata import PackageNotFoundError, version

_conf_path = os.path.expanduser("~") + "/theengsgw.conf"
_ADDR_RE = re.compile(r"^(([0-9A-F]{2}:){3})([0-9A-F]{2}:){2}[0-9A-F]{2}$")


def _anonymize_strings(fields, config) -> None:
    for field in fields:
        if field in config:
            config[field] = "***"


def _anonymize_address(address) -> str:
    addr_parts = _ADDR_RE.match(address)
    if addr_parts:
        return f"{addr_parts.group(1)}XX:XX:XX"
    else:
        return "INVALID ADDRESS"


def _anonymize_addresses(field, config) -> None:
    try:
        config[field] = [
            _anonymize_address(address) for address in config[field]
        ]
    except KeyError:
        pass


# This function is taken from Textual
def _section(title, values) -> None:
    """Print a collection of named values within a titled section.
    Args:
        title: The title for the section.
        values: The values to print out.
    """
    max_name = max(map(len, values.keys()))
    max_value = max(map(len, [str(value) for value in values.values()]))
    print(f"## {title}")
    print()
    print(f"| {'Name':{max_name}} | {'Value':{max_value}} |")
    print(f"|-{'-' * max_name}-|-{'-'*max_value}-|")
    for name, value in values.items():
        print(f"| {name:{max_name}} | {str(value):{max_value}} |")
    print()


def _versions() -> None:
    """Print useful version numbers."""
    try:
        packages = {
            "Theengs Gateway": version("TheengsGateway"),
            "Theengs Decoder": version("TheengsDecoder"),
            "Bleak": version("bleak"),
            "Bluetooth Clocks": version("bluetooth-clocks"),
            "Bluetooth Numbers": version("bluetooth-numbers"),
            "Paho MQTT": version("paho-mqtt"),
        }
    except PackageNotFoundError as e:
        print(f"Package {e.name} not found. Please install it with:")
        print()
        print(f"    pip install {e.name}")
        print()

    if sys.version_info[:2] >= (3, 9):
        try:
            packages["Bluetooth Adapters"] = version("bluetooth-adapters")
        except PackageNotFoundError as e:
            print(f"Package {e.name} not found. Please install it with:")
            print()
            print(f"    pip install {e.name}")
            print()

    _section("Package Versions", packages)


def _python() -> None:
    """Print information about Python."""
    _section(
        "Python",
        {
            "Version": platform.python_version(),
            "Implementation": platform.python_implementation(),
            "Compiler": platform.python_compiler(),
            "Executable": sys.executable,
        },
    )


def _os() -> None:
    os_parameters = {
        "System": platform.system(),
        "Release": platform.release(),
        "Version": platform.version(),
        "Machine type": platform.machine(),
    }
    if platform.system() == "Linux" and sys.version_info[:2] >= (3, 10):
        os_parameters["Distribution"] = platform.freedesktop_os_release()[
            "PRETTY_NAME"
        ]

    _section("Operating System", os_parameters)


def _config() -> None:
    print("## Configuration")
    print()
    try:
        with open(_conf_path, encoding="utf-8") as config_file:
            config = json.load(config_file)
            _anonymize_strings(["user", "pass"], config)
            _anonymize_addresses("time_sync", config)
        print("```")
        print(json.dumps(config, sort_keys=True, indent=4))
        print("```")
        print()
    except FileNotFoundError:
        print(f"Configuration file not found: {_conf_path}")
        print()


async def _adapters() -> None:
    if sys.version_info[:2] >= (3, 9):
        from bluetooth_adapters import get_adapters

        print("## Bluetooth adapters")
        print()
        bluetooth_adapters = get_adapters()
        await bluetooth_adapters.refresh()
        print(f"Default adapter: {bluetooth_adapters.default_adapter}")
        print()

        for adapter, properties in sorted(bluetooth_adapters.adapters.items()):
            properties["address"] = _anonymize_address(properties["address"])
            print("#", end="")
            _section(adapter, properties)


async def diagnostics():
    print("# Theengs Gateway Diagnostics")
    print()
    _versions()
    _python()
    _os()
    _config()
    await _adapters()


if __name__ == "__main__":
    asyncio.run(diagnostics())

When you run this module, it prints a level one Markdown title (# Theengs Gateway Diagnostics) and then calls several functions. Each of these functions prints a level two Markdown title and some diagnostic information.

First, it displays the version numbers of the Python package for Theengs Gateway and some of its dependencies. This helps us immediately identify outdated versions, and we can suggest an update. Next, it shows information about the Python platform and the operating system. These functions are all borrowed from Textual's diagnose module, including the _section helper function to print a collection of named values within a titled section.

Since many Theengs Gateway issues depend on the exact configuration used, I also added a section that displays the contents of the configuration file (a JSON file). However, this configuration file contains some information that shouldn't be shared publicly, such as a username and password for an MQTT broker, or Bluetooth addresses. I could remove these fields in the code, but then we wouldn't know if the bug might be a result of a configuration file lacking one of these fields. So I created a simple function to anonymize specific fields:

def _anonymize_strings(fields, config) -> None:
    for field in fields:
        if field in config:
            config[field] = "***"

Then I can call this function on the configuration to anonymize the user and pass fields:

_anonymize_strings(["user", "pass"], config)

For Bluetooth addresses, I created a similar function. I want to keep the first three bytes of an address, which can point to the device manufacturer and be helpful for debugging purposes. Using a regular expression, I extract these bytes and add XX:XX:XX. This function looks like this:

_ADDR_RE = re.compile(r"^(([0-9A-F]{2}:){3})([0-9A-F]{2}:){2}[0-9A-F]{2}$")


def _anonymize_address(address) -> str:
    addr_parts = _ADDR_RE.match(address)
    if addr_parts:
        return f"{addr_parts.group(1)}XX:XX:XX"
    else:
        return "INVALID ADDRESS"

In the last part of the diagnostic information, where I display the information of the computer's Bluetooth adapters, I can call this function to anonymize the adapter's Bluetooth address:

properties["address"] = _anonymize_address(properties["address"])

Running the python -m TheengsGateway.diagnose command shows output like this:

# Theengs Gateway Diagnostics

## Package Versions

| Name               | Value  |
|--------------------|--------|
| Theengs Gateway    | 3.0    |
| Theengs Decoder    | 1.4.0  |
| Bleak              | 0.20.0 |
| Bluetooth Clocks   | 0.1.0  |
| Bluetooth Numbers  | 1.1.0  |
| Paho MQTT          | 1.6.1  |
| Bluetooth Adapters | 0.15.3 |

## Python

| Name           | Value           |
|----------------|-----------------|
| Version        | 3.10.6          |
| Implementation | CPython         |
| Compiler       | GCC 11.3.0      |
| Executable     | /usr/bin/python |

## Operating System

| Name         | Value                                               |
|--------------|-----------------------------------------------------|
| System       | Linux                                               |
| Release      | 6.2.0-10005-tuxedo                                  |
| Version      | #5 SMP PREEMPT_DYNAMIC Wed Mar 22 12:42:40 UTC 2023 |
| Machine type | x86_64                                              |
| Distribution | Ubuntu 22.04.1 LTS                                  |

## Configuration

```
{
    "adapter": "hci0",
    "ble_scan_time": 1000,
    "ble_time_between_scans": 5,
    "discovery": 1,
    "discovery_device_name": "TheengsGateway",
    "discovery_filter": [
        "IBEACON",
        "GAEN",
        "MS-CDP"
    ],
    "discovery_topic": "homeassistant/sensor",
    "hass_discovery": 1,
    "host": "rhasspy",
    "log_level": "DEBUG",
    "lwt_topic": "home/TheengsGateway/LWT",
    "pass": "***",
    "port": 1883,
    "presence": 0,
    "presence_topic": "home/TheengsGateway/presence",
    "publish_advdata": 1,
    "publish_all": 1,
    "publish_topic": "home/TheengsGateway/BTtoMQTT",
    "scanning_mode": "active",
    "subscribe_topic": "home/+/BTtoMQTT/undecoded",
    "time_format": 1,
    "time_sync": [
        "58:2D:34:XX:XX:XX",
        "E7:2E:00:XX:XX:XX",
        "BC:C7:DA:XX:XX:XX",
        "10:76:36:XX:XX:XX"
    ],
    "user": "***"
}
```

## Bluetooth adapters

Default adapter: hci0

### hci0

| Name         | Value               |
|--------------|---------------------|
| address      | 9C:FC:E8:XX:XX:XX   |
| sw_version   | tux                 |
| hw_version   | usb:v1D6Bp0246d0540 |
| passive_scan | True                |
| manufacturer | Intel Corporate     |
| product      | 0029                |
| vendor_id    | 8087                |
| product_id   | 0029                |

### hci1

| Name         | Value                   |
|--------------|-------------------------|
| address      | 00:01:95:XX:XX:XX       |
| sw_version   | tux #2                  |
| hw_version   | usb:v1D6Bp0246d0540     |
| passive_scan | True                    |
| manufacturer | Sena Technologies, Inc. |
| product      | 0001                    |
| vendor_id    | 0a12                    |
| product_id   | 0001                    |

In the repository's issue template for bug reports, we ask for the output of this command. The user simply has to copy the output, which is already formatted in Markdown syntax. This displays titles, subtitles, and even tables cleanly, providing us the necessary information:

/images/theengs-gateway-diagnose.png

April 21, 2023

The latest MySQL release has been published on April 18th, 2023 (my eldest daughter’s birthday).This new version of MySQL brings a new service that I’m excited to play with: Performance Schema Server Telemetry Traces Service. MySQL 8.0.33 contains bug fixes and contributions from our great MySQL community.

I would like to thank all contributors on behalf of the entire Oracle MySQL team !

MySQL 8.0.33 contains patches from Mikael Ronström, Evgeniy Patlan, Dmitry Lenev, HC Duan, Marcelo Altmann, Facebook, Nico Pay, Dan McCombs, Yewei Xu, Niklas Keller, Mayank Mohindra and Alex Xing.

Let’s have a look at all these contributions:

MySQL NDB Cluster

  • #103814 – ClusterJ partition key scratch buffer size too small – Mikael Ronström

MySQL Server – DML

  • #105092 – AUTO_INCREMENT can be set to less than MAX + 1 and not forced to MAX + 1 – Dmitry Lenev (Percona)

MySQL Server: Compiling

  • #110216 – failed to compile in static mode – Alex Xing

C API

  • #108364 – Fix sha256_password_auth_client_nonblocking – Facebook

Security: Privileges

  • #109576 – Fix scramble algorithm docs – Niklas Keller

Optimizer

  • #109979 (security) – Dmitry Lenev (Percona)

Replication

  • #107366 – Invalid JSON value, Error_code: 3903 – HC Duan (Tencent)
  • #109154 – Fix incorrect suggested commands to start replica in error logs – Dan McCombs
  • #109485 – Previous_gtids miss gtid when binlog_order_commits off – Yewei Xu (Tencent)

InnoDB

  • #107854 (security) – Marcelo Altmann (Percona)
  • #109873 – no need to maintain err in buf_page_init_for_read – Alex Xing

MySQL Shell

  • #108861 – Fix typo in dba.upgradeMetadata() error message – Nico Pay
  • #109909 – Add antlr4 runtime to INSTALL doc – Evgeniy Patlan

MySQL Operator for K8s

  • #109746 – Add option to disable lookups for mysql-operator – Mayank Mohindra

If you have patches and you also want to be part of the MySQL Contributors, it’s easy, you can send Pull Requests from MySQL’s GitHub repositories or send your patches on Bugs MySQL (signing the Oracle Contributor Agreement is required).

Thanks again to all our contributors !

April 20, 2023

Ik denk dat een paar van de hedendaagse problemen kunnen opgelost worden als volgt:

Een tweet of een mening die gepost wordt zal pas overmorgen gepubliceerd worden. Alle media nemen dit aan.

Wanneer men een mening of post wil plaatsen die onmiddellijk gepubliceerd moet worden, dan kost dit 500 Euro of meer (een vijfde of meer van een maandloon).

Wanneer men een mening of post plaatst die pas overmorgen hoeft gepubliceerd zal worden, dan kan men die gedurende die tijd kosteloos aanpassen en verwijderen. De prijs is een Euro of minder (een duizendste of een honderste van een maandloon).

April 19, 2023

A hero image with the word

In just a few months, the Drupal community will be gathering for DrupalCon Pittsburgh. I've started planning my opening keynote, and decided to focus it on innovation. Specifically, I want to discuss how we might increase innovation.

To keep things fresh and engaging, I've come up with an exciting idea. During the keynote, I'll be dedicating a portion towards an innovation contest called Pitch-burgh, loosely inspired by Shark Tank.

For those unfamiliar with Shark Tank, it's an American TV show where entrepreneurs pitch their business ideas to a panel of potential investors, known as "sharks". The entrepreneurs give short 2-3 minute presentations in the hopes of securing funding for their idea.

To participate in Pitch-burgh, members of the Drupal community are invited to submit a short video pitching their innovative idea, along with a request for funding. Anyone can participate; attending DrupalCon Pittsburgh is not a requirement.

At DrupalCon, I'll share the best ideas in my keynote and help determine the crowd's favorite. We'll also encourage digital agencies and end-users to contribute funds towards the most promising ideas.

After DrupalCon Pittsburgh, all submissions will be made public, and anyone is invited to contribute funding to the projects that excite them.

Just like how sharks help on TV, the Drupal Association (including myself) will help guide projects that receive investment to help make sure projects succeed and funds are used wisely.

In essence, this is an experiment to connect innovative and impactful ideas with funders. I hope that individuals and organizations in the Drupal community will get involved and fund the ideas they are most excited about.

The deadline for video submissions is May 25th. A panel of judges (see below) will review all submissions between May 25th and May 30th, and pick the best pitches to showcase in my keynote. As mentioned, the top pitches will be revealed during my keynote at DrupalCon Pittsburgh. However, we'll make sure to publish all submissions afterwards.

For more information, check out the official Pitch-Burgh innovation contest page on Drupal.org.

Pitch guidelines

To make it easy for anyone in the Drupal community to participate, submissions for Pitch-burgh should be in the form of a video, with a maximum length of 2 minutes and 30 seconds.

For inspiration on creating a compelling 2.5 minute pitch, we highly recommend reviewing some of the winning pitches on Shark Tank, which can be found on YouTube.

A compelling pitch tends to include the following elements:

  • Introduction – Start by introducing yourself or your team, and briefly explain why you're likely to be successful.
  • Executive summary – Provide a clear and concise overview of your idea, explaining what it is and why it's valuable for the Drupal community. This should be done up front in the pitch.
  • Pain point – Explain the pain point or need that your idea is addressing in more detail, and help us understand why it's important to solve this issue.
  • Solution – Describe how your idea addresses the pain point or need, and explain the innovative or impactful aspects of your approach.
  • Funding and resources – Clearly state the amount of funding or resources you're seeking and how you plan to use the funding. Make sure to specify your currency.
  • Deliverables – Identify the key deliverables that you will provide, such as a module, documentation, or other assets.

In addition to these structural guidelines, we would like to provide some additional guidelines:

  • Submissions can come from individuals, teams, or organizations, including digital agencies.
  • Attending DrupalCon Pittsburgh is not mandatory for submitting a proposal. Anyone can participate.
  • Ideas can either be completely novel, or improvements to Drupal Core or existing contributed projects.
  • We encourage the use of wireframes, mockups, prototypes, or examples to help bring your idea to life.
  • While there is no set limit on the amount of funding you can request, we recommend that the ask does not exceed $20,000. Requests that are too high may be less likely to secure the necessary funding.

Remember, the goal of your Pitch-burgh submission is to convince the judges and potential funders in the Drupal community that your idea is innovative, impactful, and worthy of support. Be creative and convincing.

Evaluation criteria

We're looking for innovative and impactful ideas that have the potential to make a difference in the Drupal community. Judges will be asked to select the top pitches based on the following two criteria:

  1. Potential impact – Judges will assess the potential impact of the idea on Drupal, and how it can contribute to the growth and improvement of Drupal or the Drupal community.
  2. Likelihood of success – Judges will assess the potential for the idea and team to succeed by considering various factors, including their qualifications and experience, the feasibility of the proposed concept, the realism of the funding requirements, etc.

Judges

I currently have secured three judges for Pitch-burgh:

  • Tim Doyle, CEO of the Drupal Association
  • Baddy Breidert, Chair of the Drupal Association Board
  • Dries Buytaert, the Project Lead and founder of Drupal

We're still looking for more judges and want to ensure that we have a diverse group of judges who can offer a variety of perspectives and feedback to participants.

Conclusion

I'm thrilled to see what innovative ideas the Drupal community will come up with and look forward to featuring them in my keynote at DrupalCon Pittsburgh. Above all, I hope to see more people get paid to develop impactful features for Drupal.

April 18, 2023

A sketch of a bird taking flight from its nest.

In 2019, Acquia acquired Mautic, the company behind the Open Source marketing automation project by the same name.

It's a little known fact that before Acquia acquired Mautic, it was incubated in Acquia's office. I was also an angel investor in Mautic, and an advisor to DB Hurley and Matt Johnston, Mautic's founding team. It's safe to say that I have a personal connection to Mautic, its incubation, growth, and future.

After Acquia acquired Mautic, we appointed Ruth Cheesely as Mautic's Project Lead, streamlined the release model for Mautic, and established a clear governance model. Thanks to these changes and Ruth's leadership, Mautic has grown a lot: Mautic now has 10 times as many contributors compared to three years ago!

Supporting Mautic's growth: the decision to make Mautic independent

And yet, as 2022 drew to a close, I began to realize that in order for Mautic to reach the next level, it needed more resources to grow. I asked myself questions like: How could we grow Mautic contributions tenfold in the next 3 years?.

As we completed our planning and resource allocation for 2023, I recognized that Acquia was unable to provide sufficient support to Mautic as it continued to grow. Therefore, I believed it would be beneficial to detach Mautic from Acquia's supervision.

A few months ago, I approached Ruth and the Drupal Association with the idea to establish the Drupal Association as an umbrella organization for various Open Source projects with a shared goal of creating an Open Source Digital Experience Platform (Open Source DXP).

Specifically, the idea was to make Mautic the second project to join alongside Drupal and for the Drupal Association to provide financial and logistical support to Mautic in the form of organizing events, managing infrastructure, and more.

While exploring the possibility of transforming the Drupal Association into an umbrella organization for multiple Open Source DXP projects, Mautic ultimately preferred to become a stand-alone project.

Some of Mautic's larger contributors believed that independence from Drupal and the Drupal Association was in Mautic's best long-term interest. Key stakeholders in the Mautic community also offered increased financial support for Mautic to become a standalone project. Their enthusiasm and their financial support made a stand-alone project a viable alternative.

Today's news is that Acquia is supporting that direction. Acquia has agreed to spin out Mautic into becoming a standalone, independent Open Source project.

A familiar scenario: drawing from my experience with Drupal

Although the decision to spin out Mautic may seem counterintuitive to those who are not well-versed in Open Source, it feels natural to me. I strongly believe that Open Source projects should be independent and community-driven.

When Drupal's expansion began to pick up momentum, I recognized the need to support its growth and helped create the Drupal Association, an independent non-profit organization dedicated to advancing Drupal. When establishing the Drupal Association, I went to great lengths to ensure that Drupal remained independent and that we maintained a level playing field for everyone in the community. This was very important to me, not only when establishing the Drupal Association but also later when launching Acquia.

Mautic is in a comparable position to when I co-founded the Drupal Association, with a similar degree of enthusiasm and dedication from its community. And just as I strived to create a level playing field within Drupal, spinning out Mautic from Acquia will help level the playing field for everyone in the Mautic community.

As I reflect on Drupal's journey, it's clear that both the Drupal Association and Acquia have played critical roles in Drupal's success, and continue to do so today. However, it's the Drupal project's independence and level playing field that has attracted tens of thousands of individual and organizational contributors. Drupal's level playing field led to a large community, and ultimately the Drupal community is the real reason behind Drupal's success.

For that reason, I am thrilled for Ruth and the Mautic community as they take the next step forward. I am confident that a decade from now, we will recognize and appreciate the value of having an independent Mautic, supported by a larger community, in which Acquia is a key contributor.

Next steps on Mautic becoming an independent Open Source project

As a next step, Acquia will transfer the Mautic trademarks, domain names, and complete governance to the Open Source Collective on behalf of the standalone Mautic project. Mautic will update its governance structure to shift away from Acquia having unique benefits, such as appointing the Project Lead and holding trademarks and domain names.

To facilitate this transition, Ruth Cheesley, the current Mautic Project Lead, will move from being an Acquia employee to becoming employed by the Mautic project directly, through the Open Source Collective's employment facility. Additionally, Acquia is making an $80,000 USD donation to help support this transition financially.

Acquia depends on Mautic's success and remains committed to Mautic

Acquia Campaign Studio, our marketing automation product built on the Mautic platform, is the biggest commercial Mautic solution available on the market and one of Acquia's fastest-growing products. It will remain an important part of our product portfolio.

As a company with deep roots in Open Source, we understand that our long-term success is directly tied to the success of the Mautic project. Even after spinning out Mautic, Acquia remains firmly committed to Mautic's success. We will continue to contribute to Mautic, and as Campaign Studio continues to grow, we plan to increase our contributions.

We also have some exciting product announcements planned for later this year or early next year, which will further enhance our commercial Mautic offering and drive additional growth. I'll discuss those in a later blog post, so stay tuned for more updates!

Conclusion

I'm proud of Acquia's many contributions to Mautic. I'm also proud that Acquia allows Mautic to operate independently. Every Open Source community has its own unique growth story and journey to sustainability.

The decision to spin out Mautic may not be entirely risk-free, but we strongly believe that the potential benefits outweigh the risks. As a company with significant investment in Mautic, we are committed to its success and will continue to support it.

Similar to how young birds mature and leave their nests, projects and communities can also evolve and outgrow their origins. As someone who has been involved with Mautic from the early days, and supported it throughout its journey, I'm excited to see where it goes next.

April 17, 2023

De buren weten het waarschijnlijk al, maar ik was pas bezig met mijn auto mat zwart te spuiten. Daarbij hoort clearcoat of blanke lak en het opschuren en polieren er van. Het is mat zwart geworden ook omdat die blanke lak (clearcoat) met spuitbusjes me net iets te moeilijk was om echt goed te krijgen. Ik zou daar m.a.w. eerst een verfspuitpistool voor moeten hebben.

Met die nieuw opgedane handigheid en de schuurmaterialen in de hand, dacht ik: hmm. Ik heb hier nog een oude bril met krassen op liggen …

Dus ik probeerde de kras(sen) eruit te halen met het schuurpapier dat ik voor de wagen had gekocht. Eerst droog P600 voor de kras zelf. Waarna de bril er totaal onbruikbaar uitziet, natuurlijk. Daarna het P600 geschuurde gedeelde met nat P1000 en dan dat met nat P3000. En dan het hele brilglas nat met zo’n P5000 van M3. En dan met polish waarmee je normaal je auto dus poliert.

En jawel hoor. De kras is eruit en het glas is helder genoeg opdat mijn ogen het niet merken. Perfect is het niet, want er is lokaal wat vervorming (zoals te verwachten). Maar de oude bril is ‘bruikbaar’. Hoewel de vraag is of die vervorming al niet even erg is als de kras was.

ps. Probeer dit enkel met een oude bril. Het glas zal nadien een beetje vervormen en je moet best wel ver gaan met je brilglas te schuren. Als je in paniek stopt met het fijner opschuren met steeds fijner schuurpapier dan is je brilglas nadien onbruikbaar volledig kapot. Het wordt eerst véél slechter (ondoorzichtbaar dof) om daarna na het polieren terug goed te worden. Net zoals bij je auto dus.

ps. Als het een echte kras is, dan gaan de online trukjes zoals tandpasta dus echt wel niet werken. Het principe is: of je schuurt de kras weg (waardoor er vervorming zal zijn) en dan schuur je met steeds fijner schuurmiddelen tot het geheel terug glanst. Of je vult de kras op met iets (wat ik denk dat de tandpasta truk doet).

April 16, 2023

Beste,


Toen ik jonger was, was ik vegetariër. Ik was dit omdat er onvoldoende wetgeving in Vlaanderen was voor het algemeen dierenwelzijn.


Naarmate ik opgroeide werd ik milder in allerlei mijn meningen. Maar het was pas op mijn 30ste dat ik terug vlees begon te eten: vooral omdat er wetgeving kwam die mij verzekerde dat vleesverwerkende bedrijven in België zich aan bepaalde voorschriften zouden gaan moeten houden.


Ik acht het volstrekt onnodig dat men in deze tijden dieren leed aandoet opdat men ons vlees kan produceren.


Indien U in uw bedrijf dat toch nodig acht, dan hoop ik dat andere bedrijven en andere bedrijfsleiders, die dit anders kunnen organiseren, uw kansen en uw middelen krijgen. Dat U van de markt verdreven wordt. Zodat ik als consument me geen zorgen meer om mensen zoals U hoef te maken.


Ik zeg m.a.w. dat indien U geen vlees kan produceren zonder dieren leed aan te doen, dat U dan niet welkom bent in onze vrije markt. Die is dan nog steeds vrij. Net zoals onze markt vrij is hoewel we geen slaven meer verhandelen. Ik zeg m.a.w. dat zelfs indien U dan geen winst kan maken, of dat indien U dan uw bedrijf niet levensvatbaar kan houden, U nog steeds niet welkom bent. Dat U dus moet weggaan.


Want als U verdreven wordt (oh ironie van de taal), dan komt er ruimte voor bedrijfsleiders die een bedrijf zoals het uwe wel zonder dierenleed kunnen organiseren.


Ik wil inderdaad ook dat er geen enkel vlees binnen de Europese Unie komt dat geproduceerd werd door dieren te martelen. Zodat er een gelijk speelveld is. Ik hoop dat U en uw bedrijf zal lobbyen voor dat gelijke speelveld. M.a.w. kan U door te investeren in goede infrastructuur waardoor dierenleed niet meer noodzakelijk is, een competitief voordeel verkrijgen.


Ik hoop ook dat uw klanten tot zolang U kan bewijzen dat er geen dierenleed in uw bedrijf is, U contractloos houden.

Dat is helaas de enige manier waarop U werk zal maken van de nodige veranderingen.


Dat is de enige manier waarop ik opnieuw klant zal worden van uw producten.

Met vriendelijke groeten,
Philip

ps. Veal Good is de slogan van VanLommel.

April 15, 2023

It’s already the third and last day… Always a strange atmosphere after the gala dinner, and people are always joining late. It’s also challenging to be the first speakers! Ronan Mouchoux and François Moerman presented «From Words to Intelligence: Leveraging the Cyber Operation Constraint Principle, Natural Language Understanding, and Association Rules for Cyber Threat Analysis». This is a very long title that explain their research. All attacks are performed by humans. They have tools, objectives, targets, but they adapted with time. From a defender’s point of view, there can be ambiguity in terms. Two incident handlers can look at the same pieces of evidence and map them to different MITRE ATT&CK techniques. The idea behind Ronan & François’s research was to parse a lot of documents, extract «words » and, with the correlation of other sources, propose an analysis of the threat actors with « association rules ». Example:

IMG_1654.jpeg

The second talk was «Boss, our data is in Russia – a case-based study of employee criminal liability for cyberattacks » by Olivier Beaudet-Labrecque & Lucas Brunoni. This talk was not technical but legal but very entertaining and interesting. They are from the Haute Ecole ARC in Switzerland, which was targeted by a cyberattack. The initial infection vector was due to a student “mistake”. He found a crack for a well-known application and… executed it. Trickbot was in place! A password to the school VPN was stolen. Question: what was the responsibility of this student? He signed an IT chart and violated it. From a legal point of view, can the student was seen as a “co-perpetrator”. What about the “intent”? They explained different situations and behaviors. For example, the student was studying computer science so he should be more aware of risks of downloading such programs. In the second phase, they explained the legal risks associated with paying ransoms. Really great stuffs!

Then we followed Matthieu Faou with «Asylum Ambuscade: Crimeware or cyberespionage? ». The talk started with a review of classic articles in the news about a ransomware attack. In the same article, it was mentioned that the ransomware was implemented to “earn money” but, a bit below, “to support Russia & Poutine”. Attacks performed by this group start with a macro document abusing the Follina vulnerability to finally drop a Sunseed malware. This malware has many modules to handle cookies, screenshots, VNC connections, … An interesting one is “deletecookies” which will remove cookies for specific websites. This is helpful to force the user to re-authenticate and collect/intercept credentials. 

After a welcomed coffee break, Erwan Chevalier & Guillaume Couchard presented “When a botnet cries: detecting botnets infection chains”. The first  botnet reviewed was Qakbot (1M+ victims from 2022/02 to 2023/02). Used by many groups and dropped by many malware (Emotet, SmokeLoader, …). The second choice was IcedID (20K+ victims). There are multiple ways to deploy payload:

IMG_1655.jpeg

Detection rules were demonstrated based on Sigma. (ex: a scheduled task with a task name as GUID for persistence)

The next talk was “Tracking residential proxies (for fun and profit)” by Michal Praszmo & Pawe? Srokosz. This talk was flagged as TLP:Amber.

After the lunch break, the next talk was again TLP:Amber: “Bohemian IcedID” by Josh Hopkins & Thibaut Seret.

With the next talk, Alexandre Côté Cyr & Mathieu Lavoie spoke about “Life on a Crooked RedLine: Analyzing the Infamous InfoStealer’s Backend”. But again… TLP:Green!

The last one was “The Plague of Advanced Bad Bots : Deconstructing the Malicious Bot Problem” by Yohann Sillam. What happened to Vinted, the well-known sales platform? Targeted by credentials stuffing attack. Bots are not always bad. Some are legit, and their goal is to automate actions on the Internet (ex: crawlers). So, what are bad bots? Ex: 

  • OpenBullet is a credential-stuffing bot. 
  • AYCD is an account creation bot. 
  • NSB is a scalping bot
  • OneClick bot

These bots are available via marketplaces (easy rentals, Bot Broker, …) and underground forums. Automation is performed via Webdriver protocol or CDP (Chrome Development Protocol). Bots won’t work without … proxies! They propose anti-captcha techniques: Human-based, AI-based, or hybrid. They released a tool called bot-monitor.

As usual, Eric closed the event with some remarks, numbers and, most important, he disclosed the location of the next event: We will meet in Nice in 2024!

The post Botconf 2023 Wrap-Up Day #3 appeared first on /dev/random.

April 13, 2023

And we are still in Strasbourg! The second day started with « From GhostNet to PseudoManuscrypt » by Jorge Rodriguez & Souhail Hammou. PseudoManuscrypt is a recent RAT spotted by Kaspersky in July 2021. It is widely distributed by fake applications, websites and malware loaders. It’s a fork of Gh0st RAT. This one is still relevant today. It became open source in 2008. Written in C++, it allows taking the full control of the infected host, persistent as a DLL. It has multiple features available via « managers » (shell, screen, video, audio, keyboard, …). Of course, when open-sourced, multiple forks will arise… They collected 22 forks and analyzed them. Example: Gh0stTImes or GamblingPuppet. PseudoManuscrypt is very active, and the botnet is growing as we speak. It is deployed via fake software. The infection path was described in detail, persistence, and configuration. The config contains the protocol (TCP/UDP), the ports, the primary C2, the fallback DGA seed, and TLD. Of course, they reversed the DGA. The communication protocol relies on the HP-Socket C++ framework. KCP Protocol for UDP with ARQ error controls (30-40% faster than TCP). Some plugins were reviewed (keylogger, proxy, or the stealer). 

The second slot was assigned to Daniel Lunghi with «Iron Tiger Enhances its TTPs and Targets Linux and MacOS Users ». They have used multiple infection vectors in the past and started using supply chain attacks like the MiMi Chat app (See last year talk). This app is restricted to some countries (based on mobile number prefix) and developed by a company in the Philippines. The desktop app uses Electron framework. This framework has been modified (electron-main.js) to download malicious code. Packed using Dean Edward’s packer (link). It also targets MacOS (download rshell). How did they infect Seektop? Not sure today. They stole credentials from a developer and then accessed the development environment. Malware toolkits used:

  • HyperBro: custom backdoor with multiple features
  • SysUpdate: same, backdoor with many features
  • Rshell: for Macos and Linux, backdoor tool

The next talk was «Ransom Cartel trying not to “REvil” its Identity» by Jéremie Destuynder and Alexandre Matousek. This talk was labeled TLP:Amber.

After the morning coffee break, we returned with «Yara Studies: A Deep Dive into Scanning Performance» by Dominika Regéciová (@). She already talked about YARA last year at Botconf. This year, she’s back with more performance-related stuff. Indeed, YARA rules are easy to write, but they can lead to slow scan performances. Dominika’s test environment was based on YARA version 4.2.3 and 22GB of data.

First optimization: Conditions are evaluated AFTER the strings definition part so all files will be scanned even if you add a condition like “filesize < 1KB”. Remove the string part and define everything in the conditions:

uint8(0) == 0x42 …

Second optimization: Try to find Powershell with carets (p^o^w^e^r^…)

$re = /p\^?o\^?w\^?o…

This will give a warning. Try this:

$re1 = /p\^?o\^?…
$re2 = /po\^?…

Instead of:

$re = { 44 (03 | 2E ) 33 }

use:

$re1 = { 44 03 33 }
$re2 = { 44 2E 33 }
$re1 or $re2

Sometimes less is more. Instead of “/.*\.exe/”, just use “.exe”. Also, try to follow the recommendations provided in warning messages. Dominika gave a lot of idea to optimize your rules. You won’t get 50% speed but, on big datasets, it could make the difference!

Then, Daniel Plohmann presented «MCRIT: The MinHash-based Code Relationship & Investigation Toolkit». Daniel is a regular contributor to Botconf. He presented his tool: MCRIT. The motivation was to detect code similarities in malware families. That’s the tool’s goal: analyze code sharing and 3rd-party library usage in malware. Looking for code similarities is not a new topic. A lot of existing papers have been released. MCRIT combines quasi-identical and fuzzy code representation. Block and function level similarities. Uses Hashmaps, LSH – Locality-Sensitive Hashing. Daniel explained how the tool works and performed some demos. Everything can be installed via Docker containers.

After lunch, we had «Operation drIBAN: insight from modern banking frauds behind Ramnit» presented by Federico Valentine & Alessandro Strino. Banking trojans is a hot topic these days. Web inject are used but less covered. The key feature of drIBAN was ATS  (« Automatic Transfer System »). Some stats: 20K€ average amount, +1400 band accounts, +1500 infected customers. They focused on corporate bank accounts. The infection path is this one:

Malspam campaign (PEC) > Stage1: sLoad > Stage2: Ramnit >Stage3: drIBAN web-inject > Money laundering

PEC means « Posta Elettronica Certificata » (certified emails used in Italy). sLoad is a PowerShell-based Trojan downloader using BITS for C2 communications. Ramnit emerged in 2010 and evolved into a modern banking trojan. Web-inject development is a 24×7 job! Some TA’s used band accounts used for debugging purposes. They explained how Ramnit exfiltrates PDF invoices and replaces them with fake ones containing new banking details. They also covered how the web-inject works. Great analysis of the malware!

Then, Nick Day, Sunny Lu and Vivkie Su presented «Catching the Big Phish: Earth Preta Targets Government, Educational, and Research Institutes Around the World». This attack started with an email. They discovered a TONESHELL malware. As usual, a complete review of the malware was performed. Note that the infection technique was based on DLL side-loading. Interesting to learn that MQTT is used as a C2 protocol (MQsTTang malware).

The next talk topic was «The Case For Real-Time Detection of Data Exchange Over the DNS Protocol» by Yarin Ozery. This is not new but still used because effective. But also easy to spot:

  • size of DNS req/responses
  • Length of hostnames
  • Entropy
  • Traffic behavior

Yarin explained the technique implemented by Akamai to detect DNS tunneling. To be honest, this was an interesting research but way too complex to implement for regular organizations. Just keep an eye on the DNS traffic (site of TXT/A records and try to detect the top speakers on the network.

After the second coffee break of the day, Suweera De Souza presented «Tracking Bumblebee’s Development». What’s Bumblebee? Related to CVE-2021-40444 in September 2021 (mshtml). Delivered as a DLL and communicates with the C2 server via 3-letter commands like « sij » (Shellcode InJection) or « dex » (Download and EXecute). She explained how hooks are implemented to defeat EDR operations. C2 communications are using WebSockets, and messages are in JSON and RC4-encrypted.

The next talk was again from Max ‘Libra’ Kersten: «A student’s guide to free and open-source enterprise-level malware analysis tooling». What are the goals and expectations:

  • It must run on a Pi3B
  • 60 days retention period
  • Run locally 24/7
  • Respect the TLP

The pipeline: 

  • Don’t focus on manual analysis yet
  • Rely on the community and pattern matching
  • Understanding data does not require expensive hardware
  • Scale by outsourcing

Max shared some resources that help to get malware samples: MalShare, Malware Bazaar, Malpedia, and Triage. He also mentioned a public platform to use YARA rules: Yaraify. For manual analysis, a lot of free tools: Ghidra, Cutter/Rizin, IDA Free, dnSpyEx, JADX, … Run tools headless and get notified via Slack/Discord/…

The day finished with a set of lightning talks. Some of them were really interesting, but I won’t cover them because some of them were TLP:Amber.

(Picture credits go to @EternalToDo)

The post Botconf 2023 Wrap-Up Day #2 appeared first on /dev/random.

April 12, 2023

It has been a while since I posted my last wrap-up. With the COVID break, many conferences have been canceled or postponed. But Botconf, one of my favorites, has been scheduled for a long time in my (busy) planning. This edition is located in Strasbourg. I arrived yesterday afternoon to attend a workshop about YARA rules. Today was the first conference day. After a quick introduction by Eric Freysinnet and some information about this edition: That’s already the 10th edition with 400 attendees from multiple continents, and I attended all of them! Patrick Penninckw from the Council of Europe (one of the sponsors), Head of Information Society Department, passed a quick message: Today, everything is digitalized and we need to protect this. There are also more and more links with human rights and the rule of law. Let’s have a quick look at the different talks scheduled today.

The first slot was “Perfect Smoke and Mirrors of Enemy: Following Lazarus group by tracking DeathNote campaign” by Seongsu Park. Who’s Lazarus? They have been a well-known threat actor since 2014, looking for financial profit, espionage, data theft. Seongsu reviewed the different techniques they used to compromise their victims and how it involved with years. If the first TTPs were pretty simple, they increased the complexity. More focus was given to DeathNote.

The next talk was “RAT as a Ransomware – An Hybrid Approach” by Nirmal Singh,  Avinash Kumar and Niraj Schivtarkar. After an introduction to RATs (“Remote Access Tools”) and some popular ones like Remcos, they explained how some RATs have the capabilities to deploy ransomware on their victims’ computers.

Then, Larry Cashdollar and Allen West presented “A Dissection of the KmsdBot“. This bot has been written in Go. A good point is the fact that C2 communications are in clear text. This is much more convenient to reverse engineer the protocol used between bots and the C2 server. Larry explained how they reversed the botnet and how they wrote a fake bot being able to interact with the C2 to learn even more details.

After a good lunch (the food is always a blast at Botconf!), Mr Paul Vixie, himself, came on stage as the keynote speaker and presented “Security Implications of QUIC”. QUIC is a protocol developed initially at Google. The goal was to solve a key problem with existing protocols like HTTPS: end-to-end encryption. QUIC relies on UDP and all data sent or received is not seen by the kernel. This means that most solutions like EDR will become “blind” and useless. There is no more connect() or accept() system calls. At the oppostive, TCP is implemented in the kernel,and QUIC is implemented in user land. Paul reviewed many facts about this need to more privacy but that will make our life, as defenders, more difficult. DoH (“DNS over HTTPS”) is another good example. Most of our classic security devices or controls will be impacted (firewalls, load-balancers). Reverse engineering of malware will also be impacted. Where to put breakpoints to learn how the malware talk to its C2 server is classic API calls are not used?

The next slot was assigned to Alec Guertin and Lukas Siewierski: “You OTA Know: Combating Malicious Android System Updaters”. OTA means “Over The Air” and is related to updates. Android devices have a feature to get updates before being sold to user or at other times. They demonstrated how this technique can be (ab)used by attackers to deploy malware on device even before they are delivered to their owners.

After the afternoon coffee break, we had two less technical talks but interesting ones: «Digital Threats Against Civil Society in the Rest of the World» by Martijn Grooten and “India’s Answer to the Botnet and Malware Ecosystems” by Pratiksha Ashok. Both focussed on the protection of regular users on the Internet and how communication can be organized to share useful information. Here is the site developed by the Indian government: https://www.csk.gov.in/alerts.html.

I expected a lot from the next talk: “Syslogk Linux Kernel Rootkit – Executing Bots via Magic Packets” by David Alvarez Pérez. This is a crazy idea: just by sending “magic packets” to the compromised host, the attacker is able to take action. David reviewed the different components of the rootkit, how it works, what are the system calls hooked by the malware and the capabilities. Because NetFilter functions are hooked, magic packets will not be intercepted by the local firewall in place. Really cool! Note that the rootkit has multiple components: one running in kernel mode, and one running (not all the time) in userland. It can also emulate another layer 7 protocols (SMTP, HTTP, …) and act as a proxy to forward magic packets to another host.

The next talk covered “RTM Locker” and was presented by Max “Libra” Kersten, a regular speaker at Botconf. Max explained in detail the ransomware features and how it performs encryption of the victim’s file. He reversed the malware and showed all features based on pieces of code. Awesome job!

Finally, the day ended with “The Fodchat Botnet We Watched” by Lingming Tu. Interesting research about this botnet, especially because the research was performed by Netlab 360. The botnet looked a regular one but the fact that it was analyzed from a Chinese point of view was interesting, despite the fact that the speaker was difficult to understand.

The day ended with some pizza and local “flammekueche” (a specialty from the region of Alsace). See you tomorrow for day 2!

The post Botconf 2023 Wrap-Up Day #1 appeared first on /dev/random.

Paulus Schoutsen, founder of the open-source home-automation project Home Assistant, declared 2023 as "the year of voice" for the popular platform. The goal of the initiative is to enable users to control their homes through offline voice commands in their own language.

Voice control is a complex and computationally intensive task, which is usually delegated to the cloud. Companies like Google, Amazon and Apple make us believe that we need their cloud-based services to be able to use voice control. Of course, this comes with downsides: users don't have any control over what happens with their voice recordings, posing a significant privacy risk. But, fundamentally, the problem lies even deeper. It just makes no sense for users to have their voices make a long detour through the internet just to turn on a light in the same room.

In the past, projects like Snips and Mycroft attempted offline voice control but faced business challenges. Rhasspy, an independent open-source voice-assistant project that has been active for a few years now, was quite successful among the niche crowd of tinkerers and those who built their own voice assistants around the flexible services the project offered. [1] However, the core of Rhasspy was mainly developed by one person, and the project wasn't backed financially.

Last month, I wrote an article for LWN.net about these three projects: Hopes and promises for open-source voice assistants. I expressed the hope that Rhasspy would finally give us the ability to control our homes with a user-friendly voice assistant that is both privacy respecting and made from open-source software. Rhasspy's developer, Michael Hansen, has been hired by Nabu Casa, the company behind Home Assistant, and they're tightly integrating Rhasspy into their home-automation software.

In the mean time, OpenVoiceOS, a community that forked Mycroft, has published a FAQ about the future of Mycroft. I already alluded to Mycroft's revival in my article, but the plans were still vague at the time. By now, it looks like Mycroft has a real chance to live on in OpenVoiceOS.

Overall, these are exciting times for open-source voice control.

[1]

The voice control chapter of my book Control Your Home with Raspberry Pi was based on Rhasspy.

April 07, 2023

Trolls & Légendes le samedi 8 avril et autres dates…

Samedi 8 avril (demain quoi), je serai à Mons au festival Troll & Légende. Si vous êtes dans le coin, passez sur le stand PVH/PVH Labs pour tailler une bavette. Je n’ai pas encore les infos exactes, je posterai en direct les infos pratiques sur Mastodon.

Une question récurrente qui m’est régulièrement posée est celle du nombre de mes followers. Souvent, ce sont des journalistes qui me la posent et ils sont déçus quand je leur dis que je n’ai pas la réponse.

Il y a 10 ans, j’ai annoncé que je supprimais tout outil statistique de mon blog. Tant pour la vie privée de mes lecteurs que pour ma santé mentale. J’ai poursuivi en encourageant mes lecteurs à me suivre par RSS, une technologie sur laquelle je n’ai aucune visibilité. Sourcehut offre également la possibilité de recevoir mes billets par mail sans que je puisse voir la liste des abonnés ou même leur nombre (mais les contenus sont en texte brut et pas mis en page, contrairement à la newsletter classique qui reste recommandée pour la plupart des lecteurs).

À part Mastodon, je n’ai donc pas de « compteur ». Je sais à quel point ce genre de métrique est à la fois addictif et complètement trompeur, voire même nocif. Avec mon logiciel de newsletter précédent, je pouvais voir le nombre de désabonnements consécutifs à chaque billet, ce qui avait pour effet de me morigéner d’avoir publié.

Pour remplacer le compteur, j’ai découvert une métrique magique, magnifique : quand je participe à une conférence, un festival ou une séance de dédicace, j’ai désormais la chance de rencontrer des lecteurs. Des gens qui me lisent depuis parfois plus d’une décennie. Des personnes qui peuvent me parler d’un vieux billet que j’avais oublié, me demander des nouvelles de ma boulangère voire me recommander une bande dessinée ou un roman qui devrait vraiment me plaire (j’adore). Des visages qui viennent parfois éclairer certains noms que j’ai lu sur le web, sur gemini ou dans ma boîte email. Des êtres humains quoi ! (enfin, pour la plupart)

Ces rencontres sont courtes, intenses et me restent dans la tête. Elles me font plaisir, me nourrissent. Donc, si vous êtes dans un coin où je traine, n’hésitez pas à venir me faire coucou, ça me fait plaisir. Et, contrairement à un Henri Loevenbruck assailli par des hordes de fans, avec Ploum vous n’aurez pas à faire la file.

Rendez-vous donc ce samedi 8 avril à Mons à Troll & Légende. Le mardi 25 avril, je donnerai une conférence pour l’Electrokot à Louvain-la-Neuve (auditoire Montesquieu 1, 20h). Et je vous préviens déjà que je serai à Épinal pour le festival Imaginales aux alentours du 25 au 28 mai.

On trouvera bien une occasion de se croiser !

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

April 04, 2023

Tux with pi's

I use the lightweight Kubernetes K3s on a 3-node Raspberry Pi 4 cluster. I wrote a few blog posts on how the Raspberry Pi’s are installed.

I run K3s on virtual machines.

Why virtual machines?

Virtual makes it easier to redeploy or to bring a system down and up if your want to test something.

Another reason is that I also run FreeBSD virtual machines on the Raspberry Pis.

I use Debian GNU/Linux as the Operating system with KVM/libvirt as the hypervisor.

I use Ansible to set up the cluster in an automated way. Got finality the time to clean up the code a bit and release it on Github: https://github.com/stafwag/ansible-k3s-on-vms

The code can also - and will by default - be used on x86 systems.

The playbook is a wrapper around the roles:

To set up the virtual machines.

To install and configure K3s on the virtual machines.

To enable libvirt on the vm_kvm_host.

The sample inventory will install the virtual machines on localhost. It’s possible to install the virtual machine on multiple lbvirt/KVM hypervisors.

This should enable you setup the virtual machine with k3s in … 5 minutes

(*) if everything goes well the first time :-)

Have fun

A hero image featuring the logos of two Drupal initiatives: Automatic Updates and Project Browser.

Drupal's modularity allows developers to combine and reuse modules, themes, and libraries to create custom solutions. This modularity is one of the key ingredients that makes Drupal a composable platform. The original motivation behind Drupal's modularity was to accelerate the pace of innovation and democratize the experience of site building.

This blog post has two main goals.

First, we'll explore how Drupal's composability is evolving to empower ambitious site builders with modern, no-code development practices. Through exciting initiatives like Automatic Updates and Project Browser, Drupal will simplify the task of installing, composing, and updating Drupal sites, all within the Drupal user interface.

Second, we'll provide a retrospective on the past 10+ years of decisions that have led to significant changes in how end-users install, extend, develop, and maintain Drupal sites. By delving into Drupal's innovation process through a timeline approach, we'll showcase key contributors, significant milestones, and pivotal shifts in thinking that have influenced Drupal's approach to composability.

Let's start!

2011

Drupal 7 was released, and introduced the "Update Manager". Derek Wright (3281d Consulting), Jacob Singh (Acquia), and Joshua Rogers (Acquia) had begun developing the Update Manager feature starting in 2009.

The Update Manager can be considered Drupal's first no-code update system. This feature introduced the ability for users to easily download and upload modules from Drupal.org to a Drupal site.

Under the hood, the Update Manager uses either the File Transfer Protocol (FTP) or Secure Shell Protocol (SSH). An end user can upload a module to their Drupal site through a form, and Drupal will FTP or SSH the module to the web server's file system.

Interestingly, fifteen years after its development started, Drupal 10 still uses the same basic Update Manager. However, this is about to change.

The Update Manager has several drawbacks: modules can conflict with each other, updates are applied directly to your live site, and if something goes wrong, there is no way to recover.

Two men on stage, standing behind a laptop on a pedestal, typing Git commands as they are displayed on a large screen behind them.Sam Boyer and me creating the Drupal 8 branch on stage at DrupalCon Chicago.

In March 2011, we started working on Drupal 8, and later that year, in August, we agreed to adopt components from the Symfony project. This decision was made to help reduce the amount of code we had to build and maintain ourselves.

2012

The Symfony project was using Composer. Composer is a PHP package management system similar to npm. With Composer, developers can define the dependencies required by their PHP application in a file called composer.json. Then, Composer will automatically download and install the required components and their dependencies.

At first, we added Symfony components directly to Drupal Core's Git repository. Core Committers would regularly run composer update and commit their updated code to Drupal Core's Git repository. This left the end user experience relatively unchanged.

Some people in the Drupal community had concerns with storing third-party dependencies in Drupal Core's Git repository. To address this, we moved the Symfony components out of Git, and required Drupal's end users to download and install third-party components themselves. To do so, end users need to run composer install on the command line.

This approach is still used today. Drupal Core Committers maintain composer.json and composer.lock in Git to specify the components that need to be installed, and end users run composer install to download and install the specified components on their system.

Looking back, it is easy to see how embracing both Symfony components and Composer was a defining moment for Drupal. It made Drupal more powerful, more flexible, and more modular. It also helped us focus. But as will become clear in the remainder of this blog post, it also changed how end users install and manage Drupal. While it brought benefits, there were also drawbacks: it increased the maintenance, integration, and testing workload for end users. For many, it made Drupal more complex and challenging to maintain.

2013

We decided that Drupal Core would adopt semantic versioning. This marked a massive shift in Drupal's innovation model, moving away from long and unpredictable release cycles that broke backward compatibility between major releases.

To understand why this decision was important for Automatic Updates and Project Browser – and Drupal's composability more broadly – it's worth discussing semantic versioning some more.

Semantic versioning is a widely-used versioning system for software that follows a standard format. The format is X.Y.Z, where X represents the major version, Y the minor version, and Z the patch version.

When a new version is released, semantic versioning requires that the version number is updated in a predictable way. Specifically, you increment Z when a release only introduces backward compatible bug fixes. If new features are added in a backward compatible manner, you increment Y. And you increment X when you introduce changes that break the existing APIs.

This versioning system makes it easy to know when an automatic update is safe. For example, if a Drupal site is running version 10.0.2 and a security update is released as version 10.0.3, it's safe to automatically update to version 10.0.3. But if a major release is made as version 11.0.0, the site owner will need to manually update, as it likely contains changes that aren't compatible with their current version. In other words, the introduction of semantic versioning laid the groundwork for safe, easy Drupal updates.

2015

Drupal 8 was released. It came with big changes on all the fronts mentioned above: a shift towards object-oriented programming, support for Composer, the introduction of Symfony components, semantic versioning, and an unwavering commitment to simplifying upgrades for users.

Unfortunately, the reaction to Composer was mixed. Many Drupal contributors greatly appreciated the introduction of Composer, as it made it easier to share and utilize code with others. On the other hand, site owners often found it difficult to use Composer. Composer necessitates using the command line, something typically used by more advanced technical users. Moreover, unexpected failures during a Composer update can be complex to resolve for both developers and non-developers alike.

2016

The Drupal Association's engineering team, together with members of the community, launched the "Composer Façade". This meant that all Drupal.org hosted projects automatically became available as packages that could be installed by Composer.

There was some behind-the-scenes magic going on to help the Drupal community transition to Composer. For example, Drupal.org extensions were available to Composer even though they were not using semantic versioning.

Over the coming months and years, additional features would be added to Composer Façade, including solutions to help manage compatibility issues, sub-modules, and namespace collisions.

2017

Because Drupal has users with different levels of technical sophistication and different technical environments, we supported multiple distribution methods: zip files, tarballs, and Composer.

In the end, we were living in an increasingly Composer-centric world and updates via zip files or tarballs became less and less viable. So we agreed that we had to take a difficult path by fully embracing Composer. We began a long-running effort to make Composer easier for Drupal end users.

For example, the Drupal Association engineering team started building zip files and tarballs with Composer support: you could start with a zip/tar file, and then continue updating your site using Composer.

Separately, we also introduced new ways to install Drupal Core via Composer, such as using a new drupal/core-recommended project template. This template specifies the exact dependencies used to test a particular version of Drupal Core. Drupal Core is only released when all tests pass, so using drupal/core-recommended helps to prevent any problems caused by using different versions of the dependencies.

A timeline that shows the progression from manual updates to automatic updates, with Drush updates, Update Manager and Composer as key milestones.A slide from my keynote at DrupalCon Vienna 2017 where I introduced the Automatic Updates initiative. My speaker notes read: "Maybe Composer can be used under the hood to develop an automatic updates feature?".

Lastly, in my DrupalCon Vienna keynote, I declared the need for automatic updates, and made it a top priority for the Drupal community based on community surveys and interviews. This led to the formation of the Automatic Updates Initiative. The basic idea was to make updating Drupal sites easier by making Composer invisible to most users, thus empowering more people, regardless of their technical expertise.

2018

From 2017 into 2018, David Strauss (Pantheon) and Peter Wolanin (SciShield) took the lead on planning out the Automatic Updates Initiative, and presented possible architectural approaches at DrupalCon Nashville.

Their approaches drew inspiration from a multitude of Open Source projects like CoreOS, Fedora Atomic/Silverblue, and systemd. Some of the ideas outlined in their presentation have since been implemented. This is the beauty of Open Source; you can stand on the shoulders of other Open Source projects.

In 2018, the Drupal Security Team and Drupal Core release managers also extended the security coverage of Drupal minor releases from six to twelve months. This enabled site owners to update Drupal on their own schedule, but also introduced "security updates only" branches which will make automatic updates safer. This work was implemented with help from Ted Bowman, Emilie Nouveau, xjm, and Neil Drumm, with sponsorship from Acquia and the Drupal Association.

Later that year, at the Midwest Developer Summit organized by Michael Hess (University of Michigan), the new initiative team (composed of members of the Drupal Security Team, Drupal Association staff, and other interested contributors) defined a full initiative roadmap and began development. Key contributors were Angela Byron, David Strauss, Michael Hess, Mike Baynton, Neil Drumm, Peter Wolanin, Ryan Aslett, Tim Lehnen and xjm (sponsored by Acquia, Pantheon, the Drupal Association, SciShield, and the Universities of Michigan and Minnesota).

This work continued at Drupal Europe in Darmstadt, when the Automatic Updates Initiative team met with contributors from the Composer Initiative to compare needs and goals.

2019

In 2019, with sponsorship from the European Commission (EC), the Drupal Association contracted additional developers to build the first iteration of the Automatic Updates concept.

On the server-side, the funding from the EC resulted in all packages hosted on Drupal.org being signed with PHP Signify. PHP Signify is a PHP implementation of OpenBSD's Signify. PHP Signify assists in verifying the authenticity of Drupal modules, safeguarding against malicious forgeries. Additionally, Drupal extended OpenBSD's Signify to support chained signatures (CSIG) for better key rotation and maintenance.

On the client-side, the funding resulted in a contributed module for Drupal 7. Due to the European Commission's exclusive use of Drupal 7 at the time, a Drupal 8 module was out of scope. The Drupal 7 module updates tar-based installations of Drupal, as Composer wasn't introduced until Drupal 8.

In my DrupalCon Amsterdam keynote in late 2019, I provided an update on the Automatic Updates initiative with the assistance of Tim Lehnen from the Drupal Association:

2020

Up until 2020, contributed modules used version numbers like 8.x-2.1. This example meant the module was compatible with Drupal 8, and that it was major version 2 with patch level 1. In other words, we supported major and patch level releases, but no minor releases.

We finally updated Drupal.org to enable semantic versioning for contributed modules, which brought them up to date with best practices, including the ability to have minor releases, which was consistent with Drupal Core.

Composer Façade continued to support modules that had not adopted semantic versioning.

The Automativ Updates initiative requires integrity checks for Drupal core, Composer 2, package signing and a custom bootloader.A slide from my keynote at DrupalCon Global 2020 where I gave an update on the Automatic Updates initiative. The slide shows the four major architectural building blocks of the Automatic Updates initiative.

Meanwhile, work on Automatic Updates continued. Because we had already embraced Composer, it seemed obvious that we would use Composer under the hood to power Automatic Updates. However, there was one feature we identified as missing from the existing Composer/Packagist ecosystem: package signing.

Composer's main security measure is at the transport layer: communication between the client (Drupal) and the package repository (drupal.org, packagist.org, github.com) is protected by https (TLS). However, we didn't believe that to be sufficient for an automatic update system.

Early in 2020, at a CMS security conference sponsored by Google, David Strauss proposed that Drupal implement The Update Framework (TUF), which would resolve several architectural issues with PHP Signify and also provide a specification to mitigate numerous kinds of supply chain attacks that we had not considered previously.

To start off this project, developers from the Drupal community met with leaders of TYPO3 (Benni Mack, Oliver Hader) and Joomla! (David Jardin, Tobias Zuluaf) to ensure this implementation of TUF would be beneficial not only to Drupal, but to the broader PHP ecosystem, especially to other Composer-driven projects.

With guidance from Trishank Karthik Kuppusamy (Datadog) and Joshua Lock (Python TUF), Ted Bowman, Adam Globus-Hoenich, xjm, David Strauss, David Stoline, and others developed PHP-TUF, with sponsorship from Acquia, Pantheon, and DDEV. PHP-TUF handles the client-side part of TUF that will run as part of every Drupal site.

At this time, the Drupal Association also began working on the server-side of the TUF implementation so that Drupal.org would be able to sign packages.

In addition to securing the update process with TUF, we also needed to figure out how to apply updates to a live site with minimal interruption. David Strauss, Mike Baynton, and Lucas Hedding (sponsored by Pantheon, Tag1, and MTech, respectively) had previously prototyped a blue-green deployment approach similar to the one used by CoreOS.

We decided that the required changes to support this would be too disruptive to Drupal, so we pivoted to a new approach proposed by David Strauss: to perform updates in a temporary copy of the site's codebase and then copy the changes to the live codebase as the final step.

While not perfectly atomic in the way that a blue-green deployment would have been, the key advantage to this approach is that it didn't require any changes to Drupal Core's file structure, which meant that it could also be easily adopted by other PHP projects. Travis Carden (Acquia) began implementing this approach as the Composer Stager library.

2021

A design proposal for Automatic Updates.A design proposal for Automatic Updates. There are updates available for different modules. You can upgrade them immediately using the user interface, or you can let the scheduler run to do it for you.

The second iteration of the Automatic Updates module was released as a beta. Unlike the first iteration sponsored by the European Commission, this version worked for Composer-based projects by leveraging the newly created Composer Stager library.

I had also gone on a virtual listening tour around the same time, and when I asked people why they fell in love with Drupal, the most common response had to do with the empowerment they felt from Drupal's no-code/low-code approach.

With that in mind, I proposed the Project Browser Initiative. The idea was that anyone should be able to install modules, including their third party dependencies, all without having to resort to using Composer on the command line.

This dovetailed nicely with the Automatic Updates initiative. The combination of Automatic Updates and Project Browser would give Drupal the equivalent of an 'app store', making it easy for anyone to discover, install, and update a module and its components.

2022

A design proposal for the Project Browser.A design proposal for the Project Browser. Users can filter modules by category, development status, security policy and more. Users can also page through results or sort the results by the number of active installs.

In 2022, we began work on making Automatic Updates' Composer functionality available for Project Browser, so that module installs and updates are handled in the same seamless, robust way. The new Package Manager (a sub-module of Automatic Updates) provides this functionality for both Automatic Updates and Project Browser, and will be the cornerstone of Drupal's install and update functionality.

Ben Mullins (Acquia), Narendra Singh Rathore (Acquia) and Fran Garcia-Linares (Drupal Association) from the Project Browser team collaborated with Ted Bowman (Acquia), Adam Globus-Hoenich (Acquia), Kunal Sachdev (Acquia), Omkar Podey (Acquia), and Yash Rode (Acquia) from the Automatic Updates team in order to enhance the Package Manager's capabilities in order to cater to both use cases.

While work was ongoing on the client side of both Automatic Updates and the Project Browser, the Drupal Association remained focused on the server side. The Drupal Association put out an RFP to implement the TUF signing specification in a way that would integrate with Drupal.org's packaging pipeline. Together with Christopher Gervais and his team at Consensus Enterprises, they developed and released the Open Source Rugged server. A server-side TUF implementation that is the companion to the PHP TUF client implementation.

Drupal Association team member Fran Garcia-Linares also started work on new Drupal.org endpoints that will feed the necessary data for the Project Browser. These endpoints were built on modern Drupal, with JSON:API, and will be deployed to production in the first half of 2023.

2023

That brings us to today. Project Browser and Automatic Updates are still two of the biggest initiatives for Drupal. Chris Wells (Redfin Solutions) and Leslie Glynn (Redfin Solutions) are leading the Project Browser initiative, and Ted Bowman (Acquia) and Tim Lehnen (Drupal Association) are leading the Automatic Updates initiatives.

Both are built on top of Drupal's new Package Manager. Package Manager provides these initiatives with the ability to programmatically utilize Composer under the hood. Acquia and the Drupal Association are funding several people to work on these initiatives full-time, while other organizations like Redfin Solutions, Agileana, PreviousNext, Third & Grove, and more have provided extensive part-time contributions.

At the time of writing this, Narayan Newton (Tag1 Consulting), as part of the Drupal.org infrastructure team, is working on deploying the Rugged TUF server on the Drupal.org infrastructure. xjm (Agileana) and catch (Third & Grove), two of Drupal Core's release managers, are also collaborating on both the client and server sides of the initiative to help smooth the path to inclusion into Drupal Core.

We have built key parts of our solution in such a way that they can easily be adopted by any PHP project: from PHP-TUF to Rugged and Composer Stager. In the spirit of Open Source, our implementation was based on other Open Source projects, and now our work can be leveraged by others in turn. We encourage any PHP project that seeks to implement automated updates and a UI-based package manager to do so.

Automatic Updates is currently available as contributed module, which facilitates updates for Drupal Core. The Automatic Updates Extensions module (a sub-module that ships with Automatic Updates) provides automatic updates for contributed modules and themes. The Project Browser is also currently available as a contributed module.

Our goal is to have both Automatic Updates and Project Browser included in Drupal Core, making them out-of-the-box features for all end users. I'm hopeful we can take the final steps to flush out the remaining bugs, finalize the Drupal.org services and APIs, and move these modules to Drupal Core in the second half of 2023.

Conclusion

Getting Automatic Updates and Project Browser into Drupal Core will be the result of 10+ years of hard work.

After all these years, we believe Drupal's Automatic Updates and Project Browser to be both the most user-friendly and most security-conscious tools of their kind among all PHP applications.

We were also able to overcome most of the drawbacks of the original Drupal 7 Update Manager: Composer helps us manage module conflicts, and updates are first applied to a staged copy of the site's codebase to ensure they do not cause any unintended side effects.

In the end, Drupal will offer an 'app-store'-like experience. Drupal contributors can register, promote, update, version, and certify modules through Drupal.org. And Drupal end users can securely install and update modules from within their Drupal site without having to use the command line.

I'm excited about achieving this milestone because it will make Drupal a lot easier to use. In fact, Drupal will be easier to install and update than Drupal 7 ever was. Think about that. Furthermore, Drupal will help showcase how one can democratize composability and advanced dependency management. I'm optimistic that in a few years, we'll realize that adopting Composer for dependency management was the correct decision, even if it was difficult initially.

Hundreds of people have been involved in climbing to reach this summit, and hundreds more outside of the Drupal project have influenced and guided our thinking. I'm grateful to everyone involved in helping to make Drupal more composable and easier to use for people worldwide. Thank you!

Special thanks to Alex Bronstein, Christopher Wells, David Strauss, Derek Wright, Gábor Hojtsy, Lee Rowlands, Nathaniel Catchpole, Neil Drumm, Ted Bowman, Tim Lehnen, Tim Plunkett, Peter Wolanin, Wim Leers, and xjm for their contributions to this blog post. Taking a stroll down memory lane with you was a blast!

With the help of the above reviewers, I made an effort to acknowledge and give credit to those who deserve it. However, there is always a possibility that we missed significant contributors. If you have any corrections or additions, feel free to email me at dries@buytaert.net, and I'll update the blog post accordingly.

Today, I would like to present this new book from Rick Silva: MySQL Crash Course – A Hands-on Introduction to Database Development, No Starch Press, 2023.

I participated in this project as technical reviewer and I really enjoyed reading the chapters Rick was writing as soon as they were ready… and thank you Rick for the kind words to me in the book 😉

About the book, if you are ready to dive into the world of database management but you don’t know where to start, this book is the perfect guide for beginners eager to learn MySQL quickly and efficiently.

MySQL Crash Course is a concise and practical guide to learn how to use the most popular Open Source Database.

The book is filled with examples, tips, expert advice and exercises.

Reading the book, you will learn the SQL basis, schema design but also new features from MySQL 8.0 like common table expressions, window functions, …

The chapters are well defined and I like their introduction, very clear and the summary at the end of each.

Of course, this book is not intended for operators who are looking for help in managing HA, backups, and performance optimization. It is meant for MySQL users (developers).

If you’re looking to learn MySQL (and specifically 8.0) quickly and efficiently, Rick Silva’s MySQL Crash Course is the perfect resource. With its practical approach, clear examples, and step-by-step guidance, as developer, this book will help you master MySQL !

In closing, I’ll use the quote from my colleague Scott:

“A fantastic resource for anyone who wants to learn about MySQL… and an excellent refresher for more seasoned developers.”

April 03, 2023

There’s a first for everything and so last week I did a presentation at a WordPress Meetup in Antwerp titled “Autoptimize: 5 secrets and an intermezzo” which at least I had fun with. You can find a PDF export of the presentation here. Questions go below, in the comments (or in the form on the contact page).

Source

Des petits gestes…

Aucun service Google ne tourne sur mon téléphone. En utilisant l’application Adguard, j’ai découvert que mon téléphone faisait de nombreuses requêtes vers facebook.com. Le coupable ? L’application de l’IRM, l’institut royal météorologique, une institution pour laquelle j’ai beaucoup de respect.

Je me suis donc rendu sur le site de l’IRM et j’ai soumis une plainte arguant qu’un service public ne devait/pouvait pas contribuer à ces pratiques, surtout sans informer ses utilisateurs. La réponse m’est parvenue après quelques jours :

Notre application contenait effectivement une librairie Facebook que nous n'utilisions pas, mais qui était malheureusement restée activée. Nous avons introduit une demande auprès du service qui s'occupe de notre application de supprimer cette librairie. Nos collègues vous remercient d'avoir remarqué cette erreur de notre part et de nous l'avoir communiquée.

Voilà, c’était tout simple. Bien sûr, dans un monde idéal, les applications financées par le service public devraient être open source mais cela fait plaisir de souligner qu’être espionné par Facebook n’est plus considéré comme normal et allant de soi.

Bon, évidemment, on ne parle pas de la rtbf, un service public dont le site web est une honte lorsqu’on voit à quel point il est littéralement rempli d’espions logiciels.

Une autre petite résolution tout simple que j’ai prise est de désormais mettre les vidéos de mes conférences sur Peertube plutôt que de vous envoyer sur Youtube.

Ma dernière conférence, « Pourquoi ? », a été visionnée dix fois plus sur Peertube que sur le Youtube officiel de la conférence. Tout simplement car j’ai mis le lien Peertube en avant par rapport à celui vers Youtube. La preuve s’il en est que ce ne sont pas ces plateformes qui nous apportent des vues mais bien le contraire. Mettre Peertube « par défaut » est une démarche assez simple qui peut avoir au final un impact important.

En explorant Peertube à travers Sepia Search, j’ai d’ailleurs découvert qu’un·e archéoternaute y avait uploadé une de mes œuvres de jeunesse.

Heureusement pour les mélomanes, c’est la seule chanson qui semble avoir survécu. Comme vous avez tous pu le constater, je suis un parolier, pas un musicien ni un chanteur… Si je retrouve d’autres chansons ou courts-métrages, faudra que j’uploade tout ça un jour.

Bref, se libérer des monopoles et du capitalisme de surveillance est une lutte comparable à l’écologie : il faut théoriser, discuter les enjeux planétaires. Mais il est également possible d’accomplir des petits gestes individuels qui peuvent inspirer d’autres et, sur le long terme, faire la différence.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

March 30, 2023

In this new article about how to find the info when using MySQL Database Service on Oracle Cloud Infrastructure, we will learn about the query accelerator: HeatWave.

With HeatWave, you can boost the performance of your MySQL queries, providing your applications with faster, more reliable, and cost-effective access to data.

HeatWave is a high-performance in-memory query accelerator for MySQL Database Service on Oracle Cloud Infrastructure. It is designed to accelerate analytics workloads (OLAP) and increase the performance of your MySQL databases by orders of magnitude. This is achieved through the use of in-memory processing, advanced algorithms, and machine learning techniques to optimize query performance. If identified by the optimizer, OLTP requests can also be accelerated using HeatWave.

Today we will try to answer the following questions:

  1. Can I use HeatWave ?
  2. Is HeatWave enabled ?
  3. Is my data ready to benefit from HeatWave ?
  4. Is my query accelerated ?
  5. Why is my query not accelerated ?
  6. Could Machine Learning improve how my data is loaded into HeatWave ?

The above questions are what a MySQL DBA using MDS in OCI must answer regularly.

Can I use HeatWave ?

To be able to use HeatWave for your MySQL Database Service in OCI, the MySQL Shape must be compatible with HeatWave.

When you create a new DB System, you have the possibility to choose for a Standalone, High Availability or HeatWave system:

If you choose HeatWave, you will have the choice of all HeatWave compatible shapes available in your tenancy:

But even if you select a HeatWave compatible shape, this doesn’t mean you already have a HeatWave Cluster enabled.

For example, if we check the DB System we used for the previous articles, we can see that even if it’s a HeatWave compatible shape, the HeatWave cluster is not yet enabled:

So yes, HeatWave can be used on this system, but only once the HeatWave Cluster will be created.

Is HeatWave enabled ?

We saw that HeatWave is not enabled by default even if we use a HeatWave compatible shape.

We can click Edit next to HeatWave cluster: Disabled on the picture above or select HeatWave on the menu on the left:

Before the creation of the HeatWave Cluster, we can also check with the SQL interface if the HeatWave service is ready:

show global status like 'rapid_service_status';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| rapid_service_status | OFFLINE |
+----------------------+---------+
1 row in set (0.0011 sec)

OFFLINE means it’s not ready. The Storage Engine’s name for HeatWave is RAPID.

Once the cluster is created we can see that HeatWave is enabled:

And in SQL:

select * from performance_schema.global_status 
         where variable_name in ('rapid_resize_status',
               'rapid_service_status','rapid_cluster_ready_number');
+----------------------------+--------------------+
| VARIABLE_NAME              | VARIABLE_VALUE     |
+----------------------------+--------------------+
| rapid_cluster_ready_number | 1                  |
| rapid_resize_status        | RESIZE_UNSUPPORTED |
| rapid_service_status       | ONLINE             |
+----------------------------+--------------------+
3 rows in set (0.0009 sec)

Is my data ready to benefit from HeatWave ?

To benefit from HeatWave, the data needs to be loaded into the HeatWave Cluster.

The best way to perform this operation is to use Estimate node operation when enabling the HeatWave Cluster:

You select the database you want to load:

So if you want to load all the tables from that schema (airportdb) you can call that procedure.

If you are using MySQL Shell for Visual Studio Code, you can also easily load a schema to HeatWave:

You can also verify which tables are loaded into HeatWave using the following query:

select name, load_progress, load_status, query_count 
       from performance_schema.rpd_tables 
       join performance_schema.rpd_table_id using(id);
+-----------------------------+---------------+---------------------+-------------+
| name                        | load_progress | load_status         | query_count |
+-----------------------------+---------------+---------------------+-------------+
| airportdb.flight_log        |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airport_geo       |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.flight            |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.passengerdetails  |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.passenger         |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airplane          |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.weatherdata       |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.flightschedule    |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.booking           |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.employee          |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airplane_type     |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.seat_sold         |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airport           |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airline           |           100 | AVAIL_RPDGSTABSTATE |           0 |
| airportdb.airport_reachable |           100 | AVAIL_RPDGSTABSTATE |           0 |
+-----------------------------+---------------+---------------------+-------------+
15 rows in set (0.0008 sec)

When a table is successfully loaded into HeatWave its status is AVAIL_RPDGSTABSTATE.

OCI MDS Web Console also has some Metrics available for HeatWave. This is an example for the load:

Is my query accelerated ?

The query explain plan (QEP) provides adequate information to determine whether a query is being off-loaded to HeatWave.

The QEP is generated using the EXPLAIN keyword:

If we see in secondary engine RAPID it means that the query is indeed using HeatWave to be accelerated.

There is also a status variable that is incremented when a query is accelerated using HeatWave:

show  status like 'rapid_query_offload_count';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| rapid_query_offload_count | 2     |
+---------------------------+-------+
1 row in set (0.0011 sec)

And we can also check again in the performance_schema tables we used earlier:

select name, load_progress, load_status, query_count
  from performance_schema.rpd_tables         
  join performance_schema.rpd_table_id using(id) where query_count > 0;
+--------------------+---------------+---------------------+-------------+
| name               | load_progress | load_status         | query_count |
+--------------------+---------------+---------------------+-------------+
| airportdb.flight   |           100 | AVAIL_RPDGSTABSTATE |           2 |
| airportdb.airplane |           100 | AVAIL_RPDGSTABSTATE |           2 |
| airportdb.booking  |           100 | AVAIL_RPDGSTABSTATE |           2 |
| airportdb.airline  |           100 | AVAIL_RPDGSTABSTATE |           2 |
+--------------------+---------------+---------------------+-------------+
4 rows in set (0.0008 sec)

There is also a Metric collecting the number of statements processed by the HeatWave cluster:

If you don’t see the sentence in secondary engine RAPID in the Query Execution Plan for a query, this means the query won’t be off-loaded to HeatWave:

There is also a nice status variables that tracks the amount of data scanned by queries using HeatWave. The value is in megabytes:

show global status like 'hw_data%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| hw_data_scanned | 444   |
+-----------------+-------+
1 row in set (0.0009 sec)

Why is my query not accelerated ?

There may be some reasons why a query does not use HeatWave, see the limitations (please check regularly, as every releases in OCI remove some limitations).

To find out why a query is not off-loaded to the HeatWave cluster, we use the optimizer_trace:

For this particular query, we can see that HeatWave is not used because the query cost is under the threshold where a query is off-loaded to HeatWave.

Could Machine Learning improve how my data is loaded into HeatWave ?

The short answer is Yes ! MySQL HeatWave offers Machine Learning Advisors that can provide recommendations based on the workload using machine learning models, data analysis and HeatWave query history.

So after having used HeatWave for a while or after having modified a lot of data, like a new import, it’s recommended to uses these ML Advisors that will create an autopilot report.

Auto Encoding

Auto Encoding recommends how the string column must be encoded in HeatWave to reduce the amount of required memory and improve performance:

CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend")));

The output is a report listing the suggestions:

+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.44                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: All           |
|                               |
+-------------------------------+
6 rows in set (0.0110 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 15 tables loaded in HeatWave for 1 schemas        |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `airportdb`                           15            107 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.0110 sec)

+------------------------------------------------------------------------------------------------------+
| ENCODING SUGGESTIONS                                                                                 |
+------------------------------------------------------------------------------------------------------+
| Total Auto Encoding suggestions produced for 22 columns                                              |
| Queries executed: 9                                                                                  |
|   Total query execution time: 621.48 ms                                                              |
|   Most recent query executed on: Wednesday 22nd March 2023 19:48:25                                  |
|   Oldest query executed on: Wednesday 22nd March 2023 19:47:07                                       |
|                                                                                                      |
|                                                              CURRENT           SUGGESTED             |
| COLUMN                                                        COLUMN              COLUMN             |
| NAME                                                        ENCODING            ENCODING             |
| ------                                                      --------           ---------             |
| `airportdb`.`airline`.`airlinename`                           VARLEN          DICTIONARY             |
| `airportdb`.`airplane_type`.`description`                     VARLEN          DICTIONARY             |
| `airportdb`.`airplane_type`.`identifier`                      VARLEN          DICTIONARY             |
| `airportdb`.`airport`.`name`                                  VARLEN          DICTIONARY             |
| `airportdb`.`airport_geo`.`city`                              VARLEN          DICTIONARY             |
| `airportdb`.`airport_geo`.`country`                           VARLEN          DICTIONARY             |
| `airportdb`.`airport_geo`.`name`                              VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`city`                                 VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`country`                              VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`emailaddress`                         VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`lastname`                             VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`password`                             VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`street`                               VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`telephoneno`                          VARLEN          DICTIONARY             |
| `airportdb`.`employee`.`username`                             VARLEN          DICTIONARY             |
| `airportdb`.`passenger`.`lastname`                            VARLEN          DICTIONARY             |
| `airportdb`.`passenger`.`passportno`                          VARLEN          DICTIONARY             |
| `airportdb`.`passengerdetails`.`city`                         VARLEN          DICTIONARY             |
| `airportdb`.`passengerdetails`.`country`                      VARLEN          DICTIONARY             |
| `airportdb`.`passengerdetails`.`emailaddress`                 VARLEN          DICTIONARY             |
| `airportdb`.`passengerdetails`.`street`                       VARLEN          DICTIONARY             |
| `airportdb`.`passengerdetails`.`telephoneno`                  VARLEN          DICTIONARY             |
|                                                                                                      |
| Applying the suggested encodings might improve cluster memory usage. Performance gains not expected. |
|   Estimated HeatWave cluster memory savings:    0 bytes                                              |
|                                                                                                      |
+------------------------------------------------------------------------------------------------------+
36 rows in set (0.0110 sec)

+----------------------------------------------------------------------------------------------------------------+
| SCRIPT GENERATION                                                                                              |
+----------------------------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 7 loaded tables                                                  |
|                                                                                                                |
| Applying changes will take approximately 5.00 s                                                                |
|                                                                                                                |
| Retrieve script containing 57 generated DDL commands using the query below:                                    |
| Deprecation Notice: "heatwave_advisor_report" will be deprecated, please switch to "heatwave_autopilot_report" |
|   SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;      |
|                                                                                                                |
| Caution: Executing the generated script will alter the column comment and secondary engine flags in the schema |
|                                                                                                                |
+----------------------------------------------------------------------------------------------------------------+

You can generate a single string to cut & paste using:

SET SESSION group_concat_max_len = 1000000;
SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ') 
       FROM sys.heatwave_autopilot_report 
       WHERE type = "sql" ORDER BY id;

After having performed all the recommended DDLs, if we run again the advisor, we can see that there are no more encoding suggestions:

CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend")));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.44                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: All           |
|                               |
+-------------------------------+
6 rows in set (0.0087 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 14 tables loaded in HeatWave for 1 schemas        |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `airportdb`                           14             92 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.0087 sec)

+------------------------------------------+
| ENCODING SUGGESTIONS                     |
+------------------------------------------+
| No encoding suggestions can be generated |
|   Current encodings found to be the best |
+------------------------------------------+
2 rows in set (0.0087 sec)

Query OK, 0 rows affected (0.0087 sec)

Auto Data Placement

This advisor generates recommendations about data placement keys that are used to partition table data among the different HeatWave nodes:

CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("airportdb")));

Of course, you need a MySQL HeatWave Cluster of at least 2 nodes to use this advisor.

Query Insights

This last advisor, doesn’t really provide recommendations, but returns runtime data for successfully executed queries, runtime estimates for EXPLAIN queries, cancelled queries (ctrl+c) and failed queries due to an out-of-memory error.

This is how to call the Query Insights Advisor:

CALL sys.heatwave_advisor(JSON_OBJECT("query_insights", TRUE));

This is an output example:

+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.44                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: All           |
|                               |
+-------------------------------+
6 rows in set (0.0086 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 14 tables loaded in HeatWave for 1 schemas        |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `airportdb`                           14             92 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.0086 sec)

+--------------------------------------------------------------------------------------------------------------------+
| QUERY INSIGHTS                                                                                                     |
+--------------------------------------------------------------------------------------------------------------------+
| Queries executed on Heatwave: 9                                                                                    |
| Session IDs (as filter): None                                                                                      |
|                                                                                                                    |
| QUERY-ID  SESSION-ID  QUERY-STRING                                                     EXEC-RUNTIME (s)  COMMENT   |
| --------  ----------  ------------                                                     ----------------  -------   |
|        1         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.447                   |
|        2         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.024 (est.)  Explain.  |
|        3         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.024 (est.)  Explain.  |
|        4         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.024 (est.)  Explain.  |
|        5         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.175                   |
|        6         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.019 (est.)  Explain.  |
|        7         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.019 (est.)  Explain.  |
|        8         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.019 (est.)  Explain.  |
|        9         950  SELECT airlinename 'Airline Name',         SUM(sold_seat)/SU...      0.019 (est.)  Explain.  |
|                                                                                                                    |
| TOTAL ESTIMATED:   7   EXEC-RUNTIME:       0.150 sec                                                               |
| TOTAL EXECUTED:    2   EXEC-RUNTIME:       0.621 sec                                                               |
|                                                                                                                    |
|                                                                                                                    |
| Retrieve detailed query statistics using the query below:                                                          |
|     SELECT log FROM sys.heatwave_autopilot_report WHERE stage = "QUERY_INSIGHTS" AND type = "info";                |
|                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------+
22 rows in set (0.0086 sec)

Conclusion

MySQL HeatWave is very powerful and now you know how to control it and verify that your queries benefit from it.

You have learn how to monitor the usage of HeatWave, and how to use to Machine Learning Autopilot Advisors to improve your HeatWave experience !

For even more information on how to monitor HeatWave, please check the manual.

Keynote Touraine Tech 2023 : Pourquoi ?

Cette conférence a été donnée le 19 janvier 2023 à Tours dans le cadre Touraine Tech.
Le texte est ma base de travail et ne reprend pas les nombreuses improvisations et digressions inhérentes à chaque One Ploum Show.

Bonjour. Cela me fait plaisir de vous rencontrer dans cette école polytechnique de Tours, car je suis moi-même issu d’une école polytechnique où j’enseigne et travaille. Le terme « Polytechnique » est magnifique : plusieurs technologies, plusieurs domaines. Chez nous, à Louvain, nous avons le département de mécanique,le département d’électricité, de chimie, de construction, quelques autres et enfin le département d’informatique.

Lorsqu’on a étudié en polytechnique, on devient un ingénieur. Il m’a fallu des années pour articuler la différence entre un scientifique et un ingénieur. Mais au fond, c’est très simple : le scientifique cherche à comprendre, à découvrir les lois de la nature. L’ingénieur cherche à contourner les lois ainsi découvertes. Le scientifique dit « cette feuille de papier tombe ! », l’ingénieur la plie en avion et réponds « pas toujours ». L’ingénieur produit donc des miracles : malgré la gravité, il fait voler des avions de plusieurs centaines de tonnes. Il arrive à construire des bâtiments, des ponts qui enjambent des gouffres. Produire des matériaux capables de résister à une rentrée dans l’atmosphère à haute vitesse. Ou d’inventer un procédé pour que la bière fasse psshhh lorsqu’on décapsule la canette. J’ai eu un professeur qui a fait fortune avec un tel procédé. Les ingénieurs (et pas seulement ceux qui ont le diplôme, je parle aussi de ceux qui le sont par expérience) prennent donc des lois immuables de la nature comme la gravité, la résistance, la mécanique vibratoire, l’électricité et ils assemblent le tout pour en faire des avions, des ponts, des sous-marins, des satellites ou des tranches de jambon qui se conservent au frigo. L’ingénieur est donc un rebelle, il cherche le progrès, à changer le monde.

À l’opposé, il y’a une catégorie de personnes qui prennent des inventions humaines et tentent d’en faire des lois naturelles, de se convaincre qu’on ne peut pas les dépasser. Cela s’appelle la théologie. C’est exactement l’inverse de l’ingénieur : faire croire que des écrits produits par des humains morts depuis longtemps ne pourront pas être dépassés ni améliorés.

Dans les facultés polytechniques, on trouve rarement un département de théologie.

Par contre, on a désormais immanquablement un département d’informatique. Et quelles sont les lois de la nature qui y sont utilisées ? Une seule : faire bouger un électron le plus vite possible. On y arrive d’ailleurs tellement bien que ce n’est plus vraiment un problème. On pourrait arguer que certains problèmes algorithmiques relèvent des lois de la nature, mais rares sont les ingénieurs en informatique qui s’y confrontent tous les jours.

La réalité est que l’informatique est désormais réduite à prendre le travail de personnes qu’on ne connait pas et de les instituer en lois incontournables puis de tenter de construire par-dessus sans jamais, au grand jamais, tenter de les contourner et les remettre en question. L’informatique n’est plus de l’ingénierie, c’est devenu de la théologie. Le travail de l’informaticien est une sorte de puzzle intellectuel comparable à ce que font les rabbis lorsqu’ils interprètent la Torah. L’informaticien n’est plus un rebelle progressiste, mais un conservateur au service de l’immobilisme.

Si vous travaillez dans l’informatique, il y’a de fortes chances que votre mission réelle puisse se résumer à « afficher sur l’écran d’un client les chiffres et les lettres qu’il souhaite y voir ». D’accord, il y’a parfois des images et du son. Mais que ce soit sur Youtube ou Soundclound, l’interface première pour accéder à une vidéo, une image ou un son reste le texte. Imaginez Spotify ou Netflix sans aucun texte ? Inutilisable. Sans image ? Peut-être un poil plus rébarbatif, mais c’est tout. Une fois maitrisés la compression et le transfert des sons et images d’un ordinateur à l’autre, le seul travail reste donc le texte. D’ailleurs, que ce soit dans un éditeur de code, un traitement de texte ou un client email, force est de constater que nous passons l’essentiel de notre temps à frapper des touches pour écrire du texte. Et que lire ou réfléchir est rarement perçu comme un véritable travail. D’ailleurs, si on s’arrêtait pour réfléchir, on serait probablement effrayé. Surpris. On ne pourrait s’empêcher d’articuler à voix haute cette phrase terrible, hantise de tout maniaque de la productivité : « Mais c’est quoi ce bordel ? » voire, bien pire, ce simple mot, honni, banni du vocabulaire de l’immense majorité des cerveaux de la startup nation : « pourquoi ? »

C’est vrai ça, pourquoi ?

Réponse typique : parce que c’est comme ça, parce que tout le monde fait comme ça, parce qu’on a toujours fait comme ça, parce qu’on te dit de faire comme ça et tu ne vas pas changer le monde.

Et bien si, justement ! On change le monde. On doit changer le monde. On ne peut que changer le monde. Alors autant réfléchir dans quel sens on veut le faire évoluer.

Depuis les années 80, on sait échanger des messages entre ordinateurs avec l’email, on sait échanger des fichiers avec FTP, on sait discuter et s’engueuler publiquement sur Usenet. Le seul truc encore difficile était de savoir où trouver l’information. Qu’à cela ne tienne, en 91, un Anglais et un Belge travaillant en Suisse dans un bureau situé du côté français de la frontière inventent… le web ! Ça commence comme une blague, non ?

Le but du web n’est, à la base, que de permettre d’accéder facilement à la documentation de la plus grosse machine jamais construite par l’homme : l’accélérateur de particules du CERN. Avec le web, on peut cliquer de page en page pour découvrir du contenu en utilisant des hyperliens. Le web n’a pas inventé la notion d’hyperliens. En fait, le concept était à l’époque sur toutes les lèvres, il y’avait même une conférence dédiée au sujet. Tim Berners Lee y a d’ailleurs présenté le web lors de l’édition de 92. Dans une petite salle au fond du couloir et dans l’indifférence générale. Personne n’a trouvé ça excitant ou intéressant.

Une fois qu’on a eu le web, on peut dire qu’on avait résolu l’essentiel des problèmes techniques permettant l’usage d’Internet. On pouvait désormais afficher n’importe quel texte sur n’importe quel ordinateur.

Le truc commence à avoir du succès et un jeune Américain très ambitieux va avoir une idée. Il travaille pour un organisme américain parastatal et programme un navigateur web : Mosaic. Il décide de quitter son job pour créer un navigateur web commercial. Afin de rendre le truc cool, il ajoute une balise image au HTML initial.

Le mec en question s’appelle Marc Andreesen et son navigateur Netscape. Tim Berners Lee est pas trop chaud pour la balise image. Il propose des alternatives. Il craint que les pages web deviennent de gros trucs flashy illisibles. Rétrospectivement, on ne peut pas vraiment lui donner tort. Mais Marc Andreesen n’en a cure. Il intègre sa propre balise image à Netscpape et distribue Netscape gratuitement. Il devient millionnaire et fait la couverture de nombreux magazine.

Attendez une seconde… Il devient millionnaire en payant des gens à programmer un truc distribué gratuitement ? Tout un concept ! Devenir millionnaire en dépensant de l’argent, c’est pas mal non ?

Le secret, c’est de dépenser l’argent des autres. On prend l’argent des investisseurs, on l’utilise pour créer un truc qui ne rapporte rien, mais qui est très cool (le terme technique est « bullshit ») et on attend qu’une grosse boîte rachète le tout parce que c’est cool. Marc Andreesen invente littéralement le concept de web startup qui perd de l’argent et vaut des milliards. Le concept reste d’ailleurs aujourd’hui très populaire. Quand on y pense, toute l’économie du web est une gigantesque pyramide de Ponzi qui attend les prochains pigeons… pardon, investisseurs. Les cryptomonnaies, à côté, c’est du pipi de chat, du travail d’amateur.

Mais revenons à nos moutons : on sait désormais tout faire sur Internet. Il faut juste se former un minimum. Mais le marketing va s’emparer de l’histoire pour le complexifier à outrance. Tout en prétendant le rendre plus simple. D’abord il va y avoir Java. Puis Javascript qui est, de l’aveu de son créateur, un truc bâclé créé sur un coin de table pour faire une démo. Le truc est tellement infâme que peu de monde le comprend. Du coup, on rajoute une surcouche qu’on appelle AJAX. Et comme Ajax est trop compliqué, on crée des frameworks au-dessus de cela. Et comme chaque framework est compliqué, on fait des frameworks de frameworks. La philosophie est simple : chaque fois qu’une andouille quelconque veut afficher du texte sur l’écran d’un client, elle se rend compte que c’est compliqué. Alors elle décide d’écrire une abstraction qui simplifie le tout. Et, évidemment, son abstraction se confronte rapidement au fait que la réalité est complexe. Soit elle abandonne son idée, soit elle la complexifie jusqu’au point où une autre andouille la trouve trop compliquée. Et le cycle recommence.

En prétendant simplifier, nous ne faisons que complexifier. Et il y’a une raison à cela : la complexité est un argument marketing. Elle donne une illusion de valeur, de la maitrise d’un savoir obscure accessible uniquement aux initiés. C’est le principe de l’occultisme et du mysticisme voire de l’astrologie : prétendre que tout est très compliqué et qu’il faut être initié. C’est une arnaque vieille comme le monde.

Le problème de la complexité, outre son coût et le fait qu’elle entraine une dépendance au fournisseur, un vendor lock-in, est qu’elle force à un simplisme paradoxal. Je m’explique : le problème semble conceptuellement simple. Simpliste même. Et pourtant incroyablement difficile à implémenter, nécessitant des experts pour les détails. La réalité c’est que tout est facile à implémenter dès lors que l’on sait précisément ce qu’on veut faire. Définir ce qu’on veut est incroyablement complexe. C’est se demander « pourquoi ? ». Intuitivement, on rêve tous d’une maison de plain-pied à deux étages. Ou ce groupe de clients qui avaient bossé à cinq pendant plusieurs semaines pour me fournir des specs très précises. Une liste de « requirements ». Qui était incohérent entre eux.

Que voulons-nous réellement ? Et surtout, pourquoi le voulons-nous ?

Masquer les choix sous la complexité permet de nier leur existence. De faire croire qu’il n’y a pas de choix. Et de permettre à d’autres de faire des choix. Pourquoi avons-nous eu Java et Javascript ? Car Netscape voulait rendre Microsoft obsolète et devenir calife à la place du calife. Pas pour être utile à l’utilisateur. Cacher les choix fondamentaux permet d’étouffer le citoyen sous un sentiment d’inexorabilité. De le transformer en utilisateur, de lui faire perdre son statut d’acteur de sa propre vie.

Que voulons-nous faire ? Afficher du texte sur un écran. Pourquoi ?

Chaque mise à jour, chaque nouveauté n’est que l’assertion d’une autorité arbitraire. On ne rend pas un système plus facile en le simplifiant. On le rend plus facile en le rendant apprenable. Qui d’entre vous sait conduire une voiture manuelle ? C’est pourtant hyper complexe quand on y pense. Et hyper dangereux. Vous risquez votre vie au moindre écart. Pourtant, vous l’avez appris en quelques semaines, quelques mois. Et vous vous améliorez d’année en année.

L’informatique est compliquée ? Non, elle est insaisissable. Elle change tout le temps. Ça va de la mise à jour prétendument de sécurité qui introduit un nouveau bug à ce fameux nouveau design avec des nouvelles icônes. Dont vous êtes si fier. Pour l’utilisateur, c’est l’obligation de réapprendre, de s’adapter sans aucune raison. J’utilise le service Protonmail pour mes mails et mon calendrier. L’icône du mail était une enveloppe avec le haut en forme de cadenas. Le calendrier était… une page de calendrier. Sur mon téléphone eink en noir et blanc, ça passait nickel. Y’avait qu’une seule couleur de toute façon. Puis est venu un redesign complet. Pour quelle raison ? Aucune idée. Le mail est désormais un rectangle dans un dégradé de mauve avec un creux figurant vaguement une enveloppe. Le calendrier est le même rectangle sans le creux. Sur mon écran eink, c’est icône sont des pâtés sans aucune signification.

Les utilisateurs ont vite compris ce que les geeks ne voulaient pas admettre : votre vie n’est qu’à un upgrade de devenir merdique. Du coup, le réflexe le plus rationnel est de ne pas faire les mises à jour. Sérieusement, vous connaissez un seul utilisateur qui se dit « Génial ! Un nouveau design pour cette application que j’utilise depuis des années ! » ?

Comment l’industrie a-t-elle réagi ? En se posant la question de savoir pourquoi l’utilisateur ne fait pas ses mises à jour ? Non, en forçant ces mises à jour. En rendant la vie de l’utilisateur encore plus misérable à travers des culpabilisations. À travers des notifications incessantes. En lui prétendant que c’est pour sa sécurité. Vous savez quoi ? L’utilisateur n’est jamais en danger si son ordinateur n’est que rarement connecté. La plupart des risques sont liés à la complexité imposée à l’utilisateur. Si son navigateur se contentait d’afficher le texte qu’il veut voir, il ne risquerait rien. Il ne serait pas forcé de racheter un nouvel engin à l’empreinte écologique crapuleuse. Sans compter que l’immense majorité des menaces, comme les arnaques, ne peuvent pas être résolues par des mises à jour.

Ma liseuse fonctionne très bien. Elle n’est jamais en ligne. J’y charge des epubs par USB. L’autre jour, j’ai activé par erreur le wifi. Elle m’a immédiatement annoncé une mise à jour importante. En consultant le changelog détaillé, j’ai découvert que cette mise à jour ajoutait une nouvelle fonctionnalité : des lectures suggérées de la boutique Vivlio sur la page d’accueil. La mise à jour m’aurait donc permis d’avoir… des publicités sur mon engin. Des publicités sur cet écran que je prends avec moi dans mon lit…

Chaque mise à jour rend la vie de l’utilisateur encore plus misérable dans le seul but de faire bander le responsable marketing qui se paluche devant le nombre de "clics" (encore du texte affiché sur un écran) ou de faire mouiller la responsable du rebranding qui trouve trop super de bosser avec une équipe de designers sous ecstasy.

Las d’être exploités, certains utilisateurs se réfugient dans la théorie du complot. Vous avez déjà vu 4chan, le site où naissent la plupart de ces théories ? Du pur HTML sans artifice. D’autres, comme moi, se réfugient dans d’obscures niches comme Gemini. L’industrie prétend alors se tourner vers le minimalisme. Comme Medium par exemple ? Vous avez déjà vu le code source d’une page Medium ? Faites-le et vous supprimerez immédiatement votre compte si vous en avez un. C’est ce que j’appelle le "paradoxe Medium" : tout projet minimaliste va soit disparaitre, soit grandir assez pour voir apparaitre une surcouche alternative permettant un accès minimaliste… au service minimaliste (scribe.rip pour Medium, Nitter pour Twitter, Teddit pour Reddit, etc.). D’ailleurs, vous connaissez beaucoup de monde qui surfe sur le web sans différents adblocks ? On est désormais habitué à une couche de complexité qui sert à contourner les couches de complexités que nous avons nous-mêmes implémentées.

L’industrie du web est une gigantesque pyramide de Ponzi qui tente d’exploiter jusqu’au trognon des utilisateurs contrôlés, humiliés et traités de crétins. Mais le web est devenu trop important. Il est devenu un pilier sociétal. Fuir le bateau n’est pas une option. Nous sommes à un moment crucial pour l’histoire de l’humanité. Et pour sauver l’humanité, il faut sauver le web. Revenir aux fondamentaux. Afficher du texte sur l’écran d’un citoyen.

Concervoir des systèmes qui s’apprennent. Et donc ne changent pas. Respecter l’humain. Et donc lui donner le texte dont il a besoin sans l’espionner. Sans l’assommer. Bordel, je veux juste commander un hamburger, pas installer votre app moisie.

L’année passée, je ne suis posé la question pour mon propre blog. Il m’a fallu beaucoup de temps pour arriver à une simple conclusion. Pour répondre à la question « pourquoi ? ». Et la réponse était : pour être lu ! J’ai réécrit tout mon blog sous forme de pages statiques que je génère avec mon propre script Python. C’est très simple en fait lorsqu’on sait ce qu’on veut. La page d’accueil de mon blog, sous Wordpress, faisait presque 1 Mo. Elle fait désormais 5 ko. J’ai retiré toutes les images qui n’aident pas à la lecture. Je pense que les réseaux sociaux sont un obstacle à la lecture. Ils nous déconcentrent, nous manipulent. Du coup, j’ai supprimé tous mes comptes exceptés Mastodon.

Est-ce que tenter d’augmenter le nombre de followers sur un réseau m’aide à être lu ? Non. Ce nombre n’aide rien. Il est de toute façon faux, fictif. Supprimés les concours de followers. En tout et pour tout, en plus du HTML, j’ai ajouté 40 lignes de CSS. Pas une de plus. Chacune n’a été ajoutée que si elle pouvait aider la lecture de mes écrits sans a priori esthétique.

On pourrait croire que ça fait un blog un peu rétro, genre brutaliste. Pourtant, dès les premiers jours, j’ai reçu plusieurs demandes pour mon « template ». Y’a 40 lignes de CSS dont la moitié servent juste au menu au-dessus de chaque page !

Je me suis aussi cassé la tête sur l’idée d’une pagination pour naviguer entre les articles, sur un moteur de recherche. Mais j’affiche désormais simplement la liste de tous mes billets sur une page. Aussi simple que cela. Ne me dites pas que ça ne « scale pas » : y’en a presque 900 ! Le moteur de recherche ? Un simple ctrl+f dans votre navigateur. Encore un truc apprenable qui est ignoré, car la complexité le rend inutilisable sur la plupart des sites « modernes ».

La conséquence la plus étonnante de tout cela, c’est le nombre de lecteurs qui me contactent à propos d’anciens billets. C’est simple, rapide et ça charge instantanément même sur les mauvaises connexions. Du coup les gens me lisent. C’est tellement inhabituel de ne pas devoir attendre, de ne pas devoir se casser la tête.

J’ai un très bon laptop et pourtant, sur le web, chaque page met quelques fractions de seconde à s’afficher. À chaque page, mes bloqueurs empêchent des centaines de requêtes, évident des mégaoctets entiers de téléchargement. Et les responsables de cet état de fait sont dans cette salle. Ils l’ont implémenté sans demander « pourquoi ? ».

Alors je vous le demande. Non plus comme un confrère ingénieur, mais comme un citoyen du web qui en a assez de devoir considérer son propre navigateur comme un territoire hostile. Apprenez à demander « pourquoi ? ». Puis à répondre « non ». Plutôt que de réfléchir sur le prochain framework JavaScript ou l’utilitaire de tracking de statistiques et le surdimensionnement du data center pour héberger un elasticsearch clustérisé à redondance asynchrone dans des containers virtualisés à travers un cloud propriétaire à charge répartie monitoré depuis une app custom nodejs qui achète automatiquement des certificats d’offset CO2 pour obtenir le label de datacenter durable, le tout à travers des transactions byzantines sur une blockchain permissioned qui trade de manière décentralisée sur le marché parallèle.

Bon, en fait, les blockchains permissioned, c’est une arnaque sémantique. Cela veut juste dire « base de données centralisée ». Les offsets carbone sont une vasque escroquerie. Ce sont les indulgences de notre siècle enrobées d’un capitalisme foncièrement malhonnête (si vous achetez des offsets carbone, vous pouvez arrêter, vous êtes en train d’enrichir des escrocs tout en encourageant un système qui a démontré faire pire que mieux). Et votre application distribuée va de toute façon se casser un jour la gueule le jour où une mise à jour sera faite dans un obscur repository github dont vous ignorez l’existence, entrainant une réaction en chaine démontrant que votre app sans single point of failure n’était pas sans single point of failure que ça finalement.

Je sais, le client est roi. Il faut payer les factures. À partir d’un certain montant, on obéit. Et à partir d’un autre, on prétend aimer ça : « Oh oui, c’est génial, nous rêvons de développer un showroom virtuel pour vos nouveaux SUVs. Un véritable challenge ! Un peu comme ce système de ciblage publicitaire pour adolescents que nous avons développé pour Philipp Morris, n’est-ce pas Brenda ? »

L’important n’est pas de devenir parfait ni puriste. Nous sommes tous pleins de contradictions. L’important est d’arrêter de se mentir, de justifier l’injustifiable. De savoir pourquoi on fait les choses. Mettre le nez de vos commanditaires dans leur propre caca en leur posant la question : « pourquoi ? ». Et, sur le web, de revenir à l’essentiel : afficher du texte.

Cette réflexion m’a amené à écrire avec… une machine mécanique. À publier en utilisant une technologie complètement libre, sans monopole, sans app store et avec une empreinte écologique non négligeable, mais bien moindre que l’informatique : le livre. Un livre qui sera toujours lisible, échangeable, copiable quand toutes les lignes de code que nous avons produit collectivement auront depuis longtemps été oubliées.

Écrire à la machine et lire des livres papier sont des actes rebelles. Mais j’aime trop l’informatique pour m’en passer. Je veux qu’elle redevienne rebelle. Qu’elle redemande « pourquoi ? ». Je vous demande de m’aider. Je vous confie cette mission : l’informatique doit cesser d’être une religion prônant l’obéissance, la soumission, l’humiliation, la consommation. Elle doit redevenir une science. Un art.

Une liberté…

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

March 29, 2023

As a MySQL DBA, you like to know who is connected on the system you manage. You also like to know who is trying to connect.

In this article, we will discover how we can retrieve the information and control who is using the MySQL DB instance we launched in OCI.

Secure Connections

The first thing we can check is that all our clients encrypt their connection to the MySQL server.

We use again Performance_Schema to retrieve the relevant information:

select connection_type, substring_index(substring_index(name,"/",2),"/",-1) name,
       sbt.variable_value AS tls_version, t2.variable_value AS cipher,
       processlist_user AS user, processlist_host AS host
from performance_schema.status_by_thread AS sbt
join performance_schema.threads AS t 
  on t.thread_id = sbt.thread_id
join performance_schema.status_by_thread AS t2 
  on t2.thread_id = t.thread_id
where sbt.variable_name = 'Ssl_version' and t2.variable_name = 'Ssl_cipher' 
order by connection_type, tls_version;
+-----------------+-------------+-------------+-----------------------------+----------+------------+
| connection_type | name        | tls_version | cipher                      | user     | host       |
+-----------------+-------------+-------------+-----------------------------+----------+------------+
| SSL/TLS         | mysqlx      |             |                             | admin    | 10.0.0.184 |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES256-GCM-SHA384 | ociadmin | localhost  |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES256-GCM-SHA384 | ociadmin | localhost  |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES128-GCM-SHA256 | admin    | 10.0.0.159 |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES128-GCM-SHA256 | admin    | 10.0.1.237 |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES128-GCM-SHA256 | admin    | 10.0.1.237 |
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES128-SHA256     | admin    | 10.0.0.159 |
| TCP/IP          | thread_pool |             |                             | fred     | 10.0.0.184 |
+-----------------+-------------+-------------+-----------------------------+----------+------------+
8 rows in set (0.0011 sec)

We can see that one connection is not using SSL/TLS (the last line). But we can also notice that the first connection is not exporting the TLS version and cipher. This is because the X protocol doesn’t export these informations as status variable.

We can also notice that all encrypted connections are using the same TLS version (1.2) but different ciphers.

In fact, MDS, for the moment, only allows TLSv1.2:

show global variables like 'tls_version';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| tls_version   | TLSv1.2 |
+---------------+---------+
1 row in set (0.0019 sec)

This variable cannot be changed by the user or from DB Instance configuration as is the case when using MySQL on site.

We also saw that there is one connection not using SSL (the one by the user fred).

If we want to force the user to use encrypted connections to our MySQL DB system, we modify the user like this:

alter user fred require ssl;
Query OK, 0 rows affected (0.0023 sec)

An on the MySQL DB Instance, we can also verify this:

+-----------------+-------------+-------------+-----------------------------+------+------------+
| connection_type | name        | tls_version | cipher                      | user | host       |
+-----------------+-------------+-------------+-----------------------------+------+------------+
| SSL/TLS         | thread_pool | TLSv1.2     | ECDHE-RSA-AES128-GCM-SHA256 | fred | 10.0.0.184 |
+-----------------+-------------+-------------+-----------------------------+------+------------+

Failed Connections

As a DBA, you also need to verify who is trying to connect unsuccessfully to your database server. It could be a user without the right credentials, using a non supported TLS or cipher version or eventually a malicious person/program.

MySQL Database Service is provided with the connection control plugin enabled.

We can verify the failed attempts from the information_schema.connection_control_failed_loin_attempts table:

select * from information_schema.connection_control_failed_login_attempts;
+---------------------+-----------------+
| USERHOST            | FAILED_ATTEMPTS |
+---------------------+-----------------+
| 'repl'@'10.0.0.159' |               3 |
| ''@'10.0.0.184'     |               1 |
| ''@'10.0.0.159'     |               2 |
| 'fred'@'%'          |               2 |
+---------------------+-----------------+
4 rows in set (0.0005 sec)

And in the error log we have more details about these failed connections:

select logged, data from performance_schema.error_log where error_code in  ('MY-010926', 'MY-010914') order by logged desc limit 10;
+----------------------------+---------------------------------------------------------------------------------------------------------------------------+
| logged                     | data                                                                                                                      |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------+
| 2023-03-21 10:37:33.384074 | Access denied for user 'fred'@'10.0.0.159' (using password: YES)                                                          |
| 2023-03-21 10:37:30.890519 | Access denied for user 'fred'@'10.0.0.159' (using password: YES)                                                          |
| 2023-03-21 09:11:33.918165 | Access denied for user 'fred'@'10.0.0.184' (using password: YES)                                                          |
| 2023-03-21 09:09:45.942667 | Access denied for user 'fred'@'10.0.0.184' (using password: YES)                                                          |
| 2023-03-21 09:09:38.643923 | Access denied for user 'fred'@'10.0.0.184' (using password: YES)                                                          |
| 2023-03-21 09:09:35.486164 | Access denied for user 'fred'@'10.0.0.184' (using password: YES)                                                          |
| 2023-03-21 09:06:27.652380 | Aborted connection 622 to db: 'unconnected' user: 'fred' host: '10.0.0.184' (Got an error reading communication packets). |
| 2023-03-21 09:06:27.652297 | Got an error reading communication packets                                                                                |
| 2023-03-21 09:04:05.260232 | Aborted connection 620 to db: 'unconnected' user: 'fred' host: '10.0.0.184' (Got an error reading communication packets). |
| 2023-03-21 09:04:05.260165 | Got an error reading communication packets                                                                                |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.0007 sec)

We can see in connection_control_failed_login_attempts only 2 failed attempts for the user fred but in the error log we see much more, what does that mean ?

In fact it seems that before the last 2 attempts, that user was able to connect. This successful connection, reset the failed attempts but the previous related entries are of course kept in the error_log.

In MDS, the connection_control variables are pre-defined and cannot be changed. These are the values:

select * from performance_schema.global_variables 
         where variable_name like 'connection_control%';
+-------------------------------------------------+----------------+
| VARIABLE_NAME                                   | VARIABLE_VALUE |
+-------------------------------------------------+----------------+
| connection_control_failed_connections_threshold | 3              |
| connection_control_max_connection_delay         | 10000          |
| connection_control_min_connection_delay         | 1000           |
+-------------------------------------------------+----------------+

In Performance_Schema, there is a table providing us some useful summary: host_cache:

select ip, count_ssl_errors, count_handshake_errors, count_authentication_errors
       from host_cache;
+------------+------------------+------------------------+-----------------------------+
| ip         | count_ssl_errors | count_handshake_errors | count_authentication_errors |
+------------+------------------+------------------------+-----------------------------+
| 10.0.0.159 |                0 |                      2 |                           3 |
| 10.0.0.184 |                4 |                      4 |                           0 |
| 10.0.1.237 |                0 |                      0 |                           0 |
+------------+------------------+------------------------+-----------------------------+
3 rows in set (0.0006 sec)

Limits

In MySQL Database Service on OCI, you have multiple ways to limit the connections:

  1. using configuration settings (from OCI console)
  2. defining a limitation for a user

Limitations by Configuration

In MDS, you have the possibility to modify the following settings using a DB Instance Configuration:

Some of these variables are usually well known like max_connections and connect_timeout but some others might look new for some users.

max_connect_errors

In MDS, the default for this variable is the maximum possible. We don’t want to block a host (that is usually using a secure network). But you have the possibility to modify this setting as shown above.

On my system, I’ve changed the value of max_connect_errors to 3:

select @@max_connect_errors;
+----------------------+
| @@max_connect_errors |
+----------------------+
|                    3 |
+----------------------+
1 row in set (0.0006 sec)

Now if I try to access the MySQL TCP/IP port (3306) with telnet for example, after 3 attempts, the host will be blocked:

$ telnet 10.0.1.33 3306
Trying 10.0.1.33...
Connected to 10.0.1.33.
Escape character is '^]'.
Host '10.0.0.184' is blocked because of many connection errors; 
unblock with 'mysqladmin flush-hosts'Connection closed by foreign host.

Please note that invalid credentials are not taken into account for max_connect_errors.

But now, even if I try to connect using MySQL Shell, it will fail as the host is blocked:

select ip, sum_connect_errors, count_host_blocked_errors 
 from host_cache where SUM_CONNECT_ERRORS >= @@max_connect_errors;
+------------+--------------------+---------------------------+
| ip         | sum_connect_errors | count_host_blocked_errors |
+------------+--------------------+---------------------------+
| 10.0.0.184 |                  3 |                         4 |
+------------+--------------------+---------------------------+

Pay attention that you need to use mysqladmin to flush the hosts as in SQL it won’t work:

flush hosts;
ERROR: 1227 (42000): Access denied; you need (at least one of) the RELOAD privilege(s) for this operation

But this will work (from a compute instance for example):

# mysqladmin flush-hosts -h 10.0.1.33 -u admin -p
Enter password:

memory tracking

With MySQL 8.0 and MySQL Database Service, it is possible to track and limit the memory consumption of the connections. The limitation doesn’t apply to users with CONNECTION_ADMIN privilege.

Once global_connection_memory_tracking is enabled, it’s possible to limit the memory consumption using global_connection_memory_limit and connection_memory_limit.

The current global memory consumption (including background threads and admin) can be returned by the following query:

SELECT format_bytes(variable_value) global_connection_memory 
       FROM performance_schema.global_status 
       WHERE variable_name='Global_connection_memory';
+--------------------------+
| global_connection_memory |
+--------------------------+
| 16.22 MiB                |
+--------------------------+

User Limitations

We also have the possibility to add some limitations for each users. These settings permit the DBA to limit individual accounts to use too many resources.

The MySQL DBA can then limit:

  • the number of queries an account can issue per hour (max_queries_per_hour)
  • the number of updates an account can issue per hour (max_updates_per_hour)
  • the number of times an account can connect to the MySQL DB instance per hour (max_connections_per_hour)
  • the number of simultaneous connections to the DB instance by an account (max_user_connections)

Let’s use again the previous account fred and let limit the amount of queries it can achieve per hour:

alter user fred with max_queries_per_hour 5;

If the user fred tries to run more than 5 queries in one hour, it will get the following error:

ERROR: MySQL Error 1226 (42000): User 'fred' has exceeded the 'max_questions'
resource (current value: 5)

The current per hour counter is not exposed. And in MDS, to reset any of these limitations, you cannot use FLUSH PRIVILEGES or mysqladmin reload, but you have to use FLUSH USER_RESOURCES.

Conclusion

In this article we learned how to control the connections to the MySQL DB instance in OCI.

We also learned how to add some limitations globally or per user to not consume all the resources on the system.

March 28, 2023

For this third article of the series dedicated on how a DBA can find the info he needs with MySQL Database Service in Oracle Cloud Infrastructure, we will see how we can find the error log.

When using MySQL DBAAS, the DBA doesn’t have direct access to the files on the filesystem. Hopefully, with MySQL 8.0, the error log is also available in Performance_Schema.

This is exactly where you will find the information present also in the error log file when using MDS in OCI:

select * from (select * from performance_schema.error_log order by logged desc limit 10) a order by logged\G
*************************** 1. row ***************************
    LOGGED: 2023-03-19 08:41:09.950266
 THREAD_ID: 0
      PRIO: System
ERROR_CODE: MY-011323
 SUBSYSTEM: Server
      DATA: X Plugin ready for connections. Bind-address: '10.0.1.33' port: 33060, socket: /var/run/mysqld/mysqlx.sock
*************************** 2. row ***************************
    LOGGED: 2023-03-19 08:41:09.950328
 THREAD_ID: 0
      PRIO: System
ERROR_CODE: MY-010931
 SUBSYSTEM: Server
      DATA: /usr/sbin/mysqld: ready for connections. Version: '8.0.32-u1-cloud'  socket: '/var/run/mysqld/mysql.sock'  port: 3306  MySQL Enterprise - Cloud.
*************************** 3. row ***************************
    LOGGED: 2023-03-19 08:41:09.950342
 THREAD_ID: 0
      PRIO: System
ERROR_CODE: MY-013292
 SUBSYSTEM: Server
      DATA: Admin interface ready for connections, address: '127.0.0.1'  port: 7306
*************************** 4. row ***************************
    LOGGED: 2023-03-19 08:51:09.000200
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013694
 SUBSYSTEM: Health
      DATA: DISK: mount point='/db', available=84.9G, total=99.9G, used=15.1%, low limit=4.0G, critical=2.0G, warnings=23.2G/13.6G/8.8G
*************************** 5. row ***************************
    LOGGED: 2023-03-19 10:49:18.394291
 THREAD_ID: 0
      PRIO: Warning
ERROR_CODE: MY-010055
 SUBSYSTEM: Server
      DATA: IP address '10.0.0.159' could not be resolved: Name or service not known
*************************** 6. row ***************************
    LOGGED: 2023-03-19 10:49:18.452995
 THREAD_ID: 0
      PRIO: Warning
ERROR_CODE: MY-010968
 SUBSYSTEM: Server
      DATA: Can't set mandatory_role: There's no such authorization ID public@%.
*************************** 7. row ***************************
    LOGGED: 2023-03-19 10:52:13.818505
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-011287
 SUBSYSTEM: Server
      DATA: Plugin mysqlx reported: '2.1: Maximum number of authentication attempts reached, login failed.'
*************************** 8. row ***************************
    LOGGED: 2023-03-19 18:52:16.600274
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013993
 SUBSYSTEM: Server
      DATA: Thread pool closed connection id 39 for `admin`@`%` after 28800.004878 seconds of inactivity. Attributes: priority:normal, type:normal, last active:2023-03-19T10:52:16.595189Z, expired:2023-03-19T18:52:16.595199Z (4868 microseconds ago)
*************************** 9. row ***************************
    LOGGED: 2023-03-19 18:52:16.600328
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013730
 SUBSYSTEM: Server
      DATA: 'wait_timeout' period of 28800 seconds was exceeded for `admin`@`%`. The idle time since last command was too long.
*************************** 10. row ***************************
    LOGGED: 2023-03-20 13:47:28.843589
 THREAD_ID: 365
      PRIO: Warning
ERROR_CODE: MY-010055
 SUBSYSTEM: Server
      DATA: IP address '10.0.1.237' could not be resolved: Name or service not known
10 rows in set (0.0015 sec)

The example above lists the last 10 entries in error log.

It’s possible to get some statistics on the entries in error log much easily then parsing the file with sed and awk:

select subsystem, count(*) 
  from performance_schema.error_log 
  group by subsystem order by subsystem;
+-----------+----------+
| subsystem | count(*) |
+-----------+----------+
| Health    |      112 |
| InnoDB    |     1106 |
| RAPID     |       51 |
| Repl      |        4 |
| Server    |      483 |
+-----------+----------+
5 rows in set (0.0018 sec)

select prio, count(*) 
  from performance_schema.error_log 
  group by prio order by prio;
+---------+----------+
| prio    | count(*) |
+---------+----------+
| System  |      105 |
| Error   |        2 |
| Warning |       50 |
| Note    |     1599 |
+---------+----------+
4 rows in set (0.0014 sec)

The error log provides a lot of information about how healthy is your system, about health monitor, InnoDB, replication, authentication failures, etc…

For example, we can see the disk usage (see the previous post) in the error_log table too:

select * from error_log where subsystem="Health" 
   and data like 'DISK:%' order by logged desc limit 4\G
*************************** 1. row ***************************
    LOGGED: 2023-03-19 08:51:09.000200
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013694
 SUBSYSTEM: Health
      DATA: DISK: mount point='/db', available=84.9G, total=99.9G, used=15.1%, 
                  low limit=4.0G, critical=2.0G, warnings=23.2G/13.6G/8.8G
*************************** 2. row ***************************
    LOGGED: 2023-03-17 15:24:57.000133
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013694
 SUBSYSTEM: Health
      DATA: DISK: mount point='/db', available=84.9G, total=99.9G, used=15.1%,
                  low limit=4.0G, critical=2.0G, warnings=23.2G/13.6G/8.8G
*************************** 3. row ***************************
    LOGGED: 2023-03-16 19:24:57.000122
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013694
 SUBSYSTEM: Health
      DATA: DISK: mount point='/db', available=74.9G, total=99.9G, used=25.1%,
                  low limit=4.0G, critical=2.0G, warnings=23.2G/13.6G/8.8G
*************************** 4. row ***************************
    LOGGED: 2023-03-16 16:34:57.000175
 THREAD_ID: 0
      PRIO: Note
ERROR_CODE: MY-013694
 SUBSYSTEM: Health
      DATA: DISK: mount point='/db', available=46.7G, total=99.9G, used=53.2%,
                  low limit=4.0G, critical=2.0G, warnings=23.2G/13.6G/8.8G

The log_error_verbosity is set to 3 in MySQL Database Service, meaning it will log the errors, the warnings and the different information messages.

These are the configuration settings related to the error log in MDS:

select * from performance_schema.global_variables
         where variable_name like 'log_error%';
+----------------------------+-------------------------------------------------------+
| VARIABLE_NAME              | VARIABLE_VALUE                                        |
+----------------------------+-------------------------------------------------------+
| log_error                  | /db/log/error.log                                     |
| log_error_services         | log_filter_internal; log_sink_internal; log_sink_json |
| log_error_suppression_list | MY-012111                                             |
| log_error_verbosity        | 3                                                     |
+----------------------------+-------------------------------------------------------+

In MySQL Database Service, we can also see that the error MY-012111 is not logged:

show global variables like 'log_error_sup%';
+----------------------------+-----------+
| Variable_name              | Value     |
+----------------------------+-----------+
| log_error_suppression_list | MY-012111 |
+----------------------------+-----------+

This error is related to MySQL trying to access a missing tablespace:

$ perror MY-012111
MySQL error code MY-012111 (ER_IB_WARN_ACCESSING_NONEXISTINC_SPACE):
      Trying to access missing tablespace %lu

However, a user doesn’t have the possibility to change any settings related to the error log, neither using SET GLOBAL, neither by creating a MDS configuration using the OCI console.

Conclusion

In MDS you don’t have access to the error log file but its content is available in Performance_Schema and easier to parse using SQL.

It’s a really good source of information that I invite every users to parse regularly.

March 27, 2023

Dédicaces à la foire du livre de Bruxelles ce samedi 1ᵉʳ avril

Ce samedi 1ᵉʳ avril, je dédicacerai mon roman et mon recueil de nouvelles à la foire du livre de Bruxelles.

Bon, dit comme ça, c’est pas très rigolo comme poisson d’avril, mais là où c’est plus marrant c’est que je serai sur le stand du Livre Suisse (stand 334). Ben oui, un Belge qui fait semblant d’être suisse pour pouvoir dédicacer à Bruxelles, c’est le genre de brol typique de mon pays. Bon, après, je vais sans doute être démasqué quand je sortirai ma tablette de « vrai » chocolat (belge !)

Y a des blagues, comme disait Coluche, où c’est plus rigolo quand c’est un Suisse…

Bref, rendez-vous de 13h30 à 15h et de 17h à 18h30 au stand 334 (Livre Suisse) dans la Gare Maritime. C’est toujours un plaisir pour moi de rencontrer des lecteurices qui me suivent parfois depuis des années. Ça va être tout bon !

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain. Abonnez-vous à mes écrits en français par mail ou par rss. Pour mes écrits en anglais, abonnez-vous à la newsletter anglophone ou au flux RSS complet. Votre adresse n’est jamais partagée et effacée au désabonnement.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

March 25, 2023

New booster in Autoptimize Pro 1.3: instant.page, a 3rd party JS component that can significantly improve performance for visitors going from one page to another on your site by preloading a page based on visitor behavior. Do take into account that it could increase the number of page requests as the preloaded page might end up not being requested after all. More info on https://instant.page.

Source

March 24, 2023

De tachtigduizend soldaten die momenteel door Oekraïne op Bakhmut ingezet worden zijn een boodschap aan Rusland alvorens China haar poging tot vredesonderhandelingen inzet.

Omgekeerd is Avdiivka Rusland’s zet om duidelijk te maken aan Oekraïne dat Rusland dit zal innemen moesten zulke onderhandelingen falen.

Met andere woorden, de standpunten zijn uitgezet. En Rusland zal Avdiivka innemen terwijl Ukraïne Bakhmut zal proberen terug te nemen.

March 21, 2023

This article is the second of the new series dedicated on how a DBA can find the info he needs with MySQL Database Service in Oracle Cloud Infrastructure.

The first article was dedicated on Backups, this one is about Disk Space Utilization.

This time we have two options to retrieve useful information related to disk space:

  1. Metrics
  2. Performance_Schema

Metrics

In the OCI Web Console, there is a dedicated metric for the disk usage:

As for the backup, we can create Alarms for this metric to get informed when we reach the end of the DB System’s capacity:

We will create 2 different alerts (see Scott’s article about alerts) the first one will be a warning when the disk space usage reaches 50% and the second one, a critical alert when the disk space utilization reaches 80%:

And if the system reaches 50% of disk capacity, we get the mail:

Performance_Schema

The MySQL DBA has also access to the disk space usage via the SQL interface using Performance_Schema. In MySQL Database Service, Performance_Schema provides some extra tables that are part of the Health Monitor:

select * from health_block_device order by timestamp desc limit 10;
+--------+---------------------+--------------+-----------------+-------------+-------------+
| DEVICE | TIMESTAMP           | TOTAL_BYTES  | AVAILABLE_BYTES | USE_PERCENT | MOUNT_POINT |
+--------+---------------------+--------------+-----------------+-------------+-------------+
| xfs    | 2023-03-16 12:27:56 | 107317563392 |     89610473472 |       16.50 | /db         |
| xfs    | 2023-03-16 12:26:56 | 107317563392 |     89610489856 |       16.50 | /db         |
| xfs    | 2023-03-16 12:25:56 | 107317563392 |     89610485760 |       16.50 | /db         |
| xfs    | 2023-03-16 12:24:56 | 107317563392 |     89610485760 |       16.50 | /db         |
| xfs    | 2023-03-16 12:23:56 | 107317563392 |     89610485760 |       16.50 | /db         |
| xfs    | 2023-03-16 12:22:56 | 107317563392 |     89610489856 |       16.50 | /db         |
| xfs    | 2023-03-16 12:21:56 | 107317563392 |     89610489856 |       16.50 | /db         |
| xfs    | 2023-03-16 12:20:57 | 107317563392 |     89610485760 |       16.50 | /db         |
| xfs    | 2023-03-16 12:19:56 | 107317563392 |     89610485760 |       16.50 | /db         |
| xfs    | 2023-03-16 12:18:56 | 107317563392 |     89610485760 |       16.50 | /db         |
+--------+---------------------+--------------+-----------------+-------------+-------------+
10 rows in set (0.0028 sec)

If you take a look at the other tables from the Health Monitor that are related to disk, you will see that those are created for the MDS operators.

Using performance_schema you can also find the size of your dataset and the space used on disk:

SELECT format_bytes(sum(data_length)) DATA_SIZE,
       format_bytes(sum(index_length)) INDEX_SIZE,
       format_bytes(sum(data_length+index_length)) TOTAL_SIZE,  
       format_bytes(sum(data_free)) DATA_FREE,
       format_bytes(sum(FILE_SIZE)) FILE_SIZE,
       format_bytes((sum(FILE_SIZE)/10 - (sum(data_length)/10 + 
                     sum(index_length)/10))*10) WASTED_SIZE
FROM information_schema.TABLES as t
JOIN information_schema.INNODB_TABLESPACES as it    
  ON it.name = concat(table_schema,"/",table_name)
  ORDER BY (data_length + index_length);
+-----------+------------+------------+-----------+-----------+-------------+
| DATA_SIZE | INDEX_SIZE | TOTAL_SIZE | DATA_FREE | FILE_SIZE | WASTED_SIZE |
+-----------+------------+------------+-----------+-----------+-------------+
| 2.37 GiB  | 4.70 GiB   | 7.07 GiB   | 43.00 MiB | 7.75 GiB  | 694.17 MiB  |
+-----------+------------+------------+-----------+-----------+-------------+

But don’t forget that on disk you also have plenty of other files like redo logs, undo logs, binary logs, …

Extra

In the result of the previous SQL statement, we can see in the last column (WASTED_SIZE) that there are almost 650MB of wasted disk space. This column represents gaps in tablespaces.

Let’s find out for which tables and how to recover it:

SELECT NAME, TABLE_ROWS, format_bytes(data_length) DATA_SIZE,
       format_bytes(index_length) INDEX_SIZE,     
       format_bytes(data_length+index_length) TOTAL_SIZE,
       format_bytes(data_free) DATA_FREE,
       format_bytes(FILE_SIZE) FILE_SIZE,
       format_bytes((FILE_SIZE/10 - (data_length/10 + 
                     index_length/10))*10) WASTED_SIZE
FROM information_schema.TABLES as t  
JOIN information_schema.INNODB_TABLESPACES as it
  ON it.name = concat(table_schema,"/",table_name) 
  ORDER BY (data_length + index_length) desc LIMIT 10;
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+
| NAME                        | TABLE_ROWS | DATA_SIZE  | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | WASTED_SIZE |
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+
| airportdb/booking           |   54082619 | 2.11 GiB   | 4.62 GiB   | 6.74 GiB   | 4.00 MiB   | 7.34 GiB   | 615.03 MiB  |
| airportdb/weatherdata       |    4617585 | 215.80 MiB |    0 bytes | 215.80 MiB | 7.00 MiB   | 228.00 MiB | 12.20 MiB   |
| airportdb/flight            |     461286 | 25.55 MiB  | 73.64 MiB  | 99.19 MiB  | 4.00 MiB   | 108.00 MiB | 8.81 MiB    |
| airportdb/seat_sold         |     462241 | 11.52 MiB  |    0 bytes | 11.52 MiB  | 4.00 MiB   | 21.00 MiB  | 9.48 MiB    |
| airportdb/passengerdetails  |      35097 | 4.52 MiB   |    0 bytes | 4.52 MiB   | 4.00 MiB   | 12.00 MiB  | 7.48 MiB    |
| airportdb/passenger         |      36191 | 2.52 MiB   | 1.52 MiB   | 4.03 MiB   | 4.00 MiB   | 12.00 MiB  | 7.97 MiB    |
| airportdb/airplane_type     |        302 | 1.52 MiB   |    0 bytes | 1.52 MiB   | 4.00 MiB   | 9.00 MiB   | 7.48 MiB    |
| airportdb/airport_geo       |       9561 | 1.52 MiB   |    0 bytes | 1.52 MiB   | 4.00 MiB   | 11.00 MiB  | 9.48 MiB    |
| airportdb/flightschedule    |       9633 | 528.00 KiB | 736.00 KiB | 1.23 MiB   | 4.00 MiB   | 9.00 MiB   | 7.77 MiB    |
| airportdb/airport           |       9698 | 448.00 KiB | 656.00 KiB | 1.08 MiB   | 4.00 MiB   | 9.00 MiB   | 7.92 MiB    |
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+

We can see that the itis in the table airportdb.booking that we have the most waste of disk space. Optimizing that table (this is not an online operation!) will recover some of the wasted disk space:

optimize table airportdb.booking ;
+-------------------+----------+----------+-------------------------------------------------------------------+
| Table             | Op       | Msg_type | Msg_text                                                          |
+-------------------+----------+----------+-------------------------------------------------------------------+
| airportdb.booking | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| airportdb.booking | optimize | status   | OK                                                                |
+-------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (14 min 45.5530 sec)

set information_schema_stats_expiry=0;

SELECT NAME, TABLE_ROWS, format_bytes(data_length) DATA_SIZE,
       format_bytes(index_length) INDEX_SIZE,     
       format_bytes(data_length+index_length) TOTAL_SIZE,
       format_bytes(data_free) DATA_FREE,
       format_bytes(FILE_SIZE) FILE_SIZE,
       format_bytes((FILE_SIZE/10 - (data_length/10 + 
                     index_length/10))*10) WASTED_SIZE
FROM information_schema.TABLES as t  
JOIN information_schema.INNODB_TABLESPACES as it
  ON it.name = concat(table_schema,"/",table_name) 
  ORDER BY (data_length + index_length) desc LIMIT 10;
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+
| NAME                        | TABLE_ROWS | DATA_SIZE  | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | WASTED_SIZE |
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+
| airportdb/booking           |   54163810 | 2.59 GiB   | 2.72 GiB   | 5.31 GiB   | 4.00 MiB   | 5.37 GiB   | 63.06 MiB   |
| airportdb/weatherdata       |    4617585 | 215.80 MiB |    0 bytes | 215.80 MiB | 7.00 MiB   | 228.00 MiB | 12.20 MiB   |
| airportdb/flight            |     461286 | 25.55 MiB  | 73.64 MiB  | 99.19 MiB  | 4.00 MiB   | 108.00 MiB | 8.81 MiB    |
| airportdb/seat_sold         |     462241 | 11.52 MiB  |    0 bytes | 11.52 MiB  | 4.00 MiB   | 21.00 MiB  | 9.48 MiB    |
| airportdb/passengerdetails  |      35097 | 4.52 MiB   |    0 bytes | 4.52 MiB   | 4.00 MiB   | 12.00 MiB  | 7.48 MiB    |
| airportdb/passenger         |      36191 | 2.52 MiB   | 1.52 MiB   | 4.03 MiB   | 4.00 MiB   | 12.00 MiB  | 7.97 MiB    |
| airportdb/airplane_type     |        302 | 1.52 MiB   |    0 bytes | 1.52 MiB   | 4.00 MiB   | 9.00 MiB   | 7.48 MiB    |
| airportdb/airport_geo       |       9561 | 1.52 MiB   |    0 bytes | 1.52 MiB   | 4.00 MiB   | 11.00 MiB  | 9.48 MiB    |
| airportdb/flightschedule    |       9633 | 528.00 KiB | 736.00 KiB | 1.23 MiB   | 4.00 MiB   | 9.00 MiB   | 7.77 MiB    |
| airportdb/airport           |       9698 | 448.00 KiB | 656.00 KiB | 1.08 MiB   | 4.00 MiB   | 9.00 MiB   | 7.92 MiB    |
| airportdb/airplane          |       5583 | 224.00 KiB | 144.00 KiB | 368.00 KiB |    0 bytes | 448.00 KiB | 80.00 KiB   |
| airportdb/employee          |       1000 | 208.00 KiB | 48.00 KiB  | 256.00 KiB |    0 bytes | 336.00 KiB | 80.00 KiB   |
| airportdb/airline           |        113 | 16.00 KiB  | 32.00 KiB  | 48.00 KiB  |    0 bytes | 144.00 KiB | 96.00 KiB   |
| airportdb/flight_log        |          0 | 16.00 KiB  | 16.00 KiB  | 32.00 KiB  |    0 bytes | 128.00 KiB | 96.00 KiB   |
| sys/sys_config              |          6 | 16.00 KiB  |    0 bytes | 16.00 KiB  |    0 bytes | 112.00 KiB | 96.00 KiB   |
| airportdb/airport_reachable |          0 | 16.00 KiB  |    0 bytes | 16.00 KiB  |    0 bytes | 112.00 KiB | 96.00 KiB   |
+-----------------------------+------------+------------+------------+------------+------------+------------+-------------+

We can see that we saved several hundreds MB.

Conclusion

Now you know how to find the information to monitor your disk space and be alerted directly via OCI’s alerting system or by using a third party tool.

By controlling the disk space usage, you know exactly when it’s time to expand the disk space of your DB system (or migrate to a bigger Shape).

March 20, 2023

One or two times a month I get the following question: Why don't you just use a Static Site Generator (SSG) for your blog?

Well, I'm not gonna lie, being the founder and project lead of Drupal definitely plays a role in why I use Drupal for my website. Me not using Drupal would be like Coca-Cola's CEO drinking Pepsi, a baker settling for supermarket bread, or a cabinet builder furnishing their home entirely with IKEA. People would be confused.

Of course, if I wanted to use a static site, I could. Drupal is frequently used as the content repository for Gatsby.js, Next.js, and many other frameworks.

The main reason I don't use a SSG is that I don't love their publishing workflow. It's slow. With Drupal, I can make edits, hit save, and immediately see the result. With a static site generator it becomes more complex. I have to commit Markdown to Git, rebuild my site, and push updates to a web server. I simply prefer the user-friendly authoring of Drupal.

A collage of screenshots displaying the websites of various static site generators, with prominent text emphasizing phrases such as 'fast page loads', 'peak performance', unparalleled speed', 'full speed', and more.A collage of screenshots featuring different static site generators' websites, emphasizing their marketing messaging on performance.

Proponents of static sites will be quick to point out that static sites are "much faster". Personally, I find that misleading. My Drupal-powered site, https://dri.es/, is faster than most static sites, including the official websites of leading static site generators.

TechnologyURL testedPage load time
Drupalhttps://dri.es/0.3 seconds
Gatsby.jshttps://www.gatsbyjs.com/2.8 seconds
Next.jshttps://nextjs.org/1.8 seconds
Jekyllhttps://jekyllrb.com/0.8 seconds
Eleventlyhttps://www.11ty.dev/0.5 seconds
Docusaurushttps://docusaurus.io/1.8 seconds
Svelte Kithttps://kit.svelte.dev/1.1 seconds

In practice, most sites serve their content from a cache. As a result, we're mainly measuring (1) the caching mechanism, (2) last mile network performance and (3) client-side rendering. Of these three, client-side rendering impacts performance the most.

My site is the fastest because its HTML/CSS/JavaScript is the simplest and fastest to render. I don't use external web fonts, track visitors, or use a lot of JavaScript. Drupal also optimizes performance with lazy loading of images, CSS/JavaScript aggregation, and more.

In other words, the performance of a website depends more on the HTML, CSS, JavaScript code and assets (images, video, fonts) than the underlying technology used.

The way an asset is cached can also affect its performance. Using a reverse proxy cache, such as Varnish, is faster than caching through the filesystem. And using a global CDN yields even faster results. A CMS that uses a CDN for caching can provide better performance than a SSG that only stores assets on a filesystem.

To be clear, I'm not against SSGs. I can understand the use cases for them, and there are plenty of situations where they are a great choice.

In general, I believe that any asset that can be a static asset, should be a static asset. But I also believe that any dynamically generated asset that is cached effectively has become a static asset. A page that is created dynamically by a CMS and is cached efficiently is a static asset. Both a CMS and a SSG can generate static assets.

In short, I simply prefer the authoring experience of a CMS, and I keep my site fast by keeping the generated HTML code lightweight and well-cached.

What really tips the scale for me is that I enjoy having a server-side requests handler. Now, I know that this might sound like the nerdiest closing statement ever, but trust me: server-side request handlers bring the fun. Over the years they have enabled me to do fun and interesting things on my websites. I'm not stopping the fun anytime soon!

March 18, 2023

(This is a working document. It will be updated when new changes are made. Created on 20230228.) This is just a small but expanding list of ROMs I enjoyed. YMMV: Super Mario Bros. (1985, NES) Super Mario Land (1989, GB) Tetris (1989, GB) Dr. Mario (1990, GB) The Legend of Zelda: Link to the Past (1991, SNES) Super Metroid (1994, SNES) Metroid Fusion (2002, GBA) The Legend of Zelda: The Minish Cap (2004, GBA) Virtua Tennis World Tour (2005, PSP) Super Mario Bros.

Het uitgangspunt is het volgende:

  • Een SPA bad dat je ~ 40 °C warm wil houden met een electrische pomp die het water opwarmt. Een SPA bad is niet heel duur.
  • Zonnepanelen die (veel) meer dan voldoende energie leveren voor je huishouden. Dit is duur, maar je hebt dit voor ook andere reden.
  • Een thuisbatterij. Dit is duur, maar je hebt dit voor ook andere reden.
  • Véél isolatie voor je bad (gelukkig niet duur)
  • Teruggeven aan het net brengt maar weinig op en je kan niet terugdraaien met een oude meter (je hebt dus al zo’n digitale meter)
    • Dus we kunnen de energie maar beter zelf gebruiken

Allereerst moet je je SPA bad zoveel mogelijk isoleren. Kies ook een SPA bad met donkere kleuren. Zodat wanneer de zon schijnt, zoveel mogelijk warmte opgenomen wordt.

De bodem moet geïsoleerd zijn door bv. puzzelmatten onder je bad te leggen en eventueel ook andere isolatiematerialen. Het dun laagje isolatiemateriaal dat bij de goedkope SPA badjes zit is niet genoeg.

Je hebt bv. isolatiematten die onder parketvloeren gebruikt worden. Je kan niet teveel isoleren. Meer is altijd beter. De matten zullen het bad ook een zachtere bodem geven. Zonder de matten zal je zo’n 10% verliezen aan het opwarmen van de grond.

Je wil zeker ook een energiebesparende cover voor je SPA bad. Zonder die cover zal je zo’n 30% verliezen aan het opwarmen van de lucht. Zet je je bad binnen, dan heb je meteen een stevige electrische verwarming voor die kamer.

Het initieel vullen van je bad doe je best met warm water uit de kraan. Tenzij je dat water toch electrisch opwarmt natuurlijk. Dan maakt het weinig uit of je de pomp van het bad het laat doen of niet. 

Anders is de totale energie die daarvoor nodig is vrijwel niet of nooit haalbaar met de gehele dagopbrengst van je zonnepanelen. Denk eraan dat het water opwarmen een constant vermogen van 2 a 3 kW vraagt en dat je op die manier ongeveer één graad opwarmt per uur wanneer het bad vol is.

Dus een 8tal uren zon op je zonnepanelen warmt je bad ongeveer 8 °C op, misschien 10 °C. Misschien een beetje meer wanneer alles heel erg goed geïsoleerd is of wanneer je bad binnen staat? M.a.w. heb je dan meerdere dagen nodig of zal je s’nachts moeten doorverwarmen en zal je thuisbatterij niet opgeladen zijn. Dus koop je dan electriciteit van het net. Dat willen we niet.

De startup kostprijs is dus een volledig bad met warm water. Dat is niet weinig, dus je wil dat vermijden. Daarom ook moet je je filters goed proper houden (minimaal iedere drie dagen). Je gebruikt ook best chloortabletten en zorg ervoor dat de pH op 7,6 blijft. Je wil niet in vuil water zitten, toch?

De bedoeling is dat je het bad als een batterij bekijkt. Fysica vertelt ons dat het opgewarmde water ook net zo traag afkoelt als dat het opwarmt. Water houdt warmte goed vast. Daarom dus dat we zoveel aandacht schenken aan het isoleren van het bad. Zo wordt het een batterij.

Je wil waarschijnlijk rond 9 a 10 uur s’avonds je bad in. Tegen dan moet het dus 40 °C zijn. Het is een SPA. Dat moet goed warm zijn he.

Je wil het bad niet helemaal terug naar de omgevingstemperatuur laten vallen (tenzij het zomer en 40 °C is, maar dan wil je waarschijnlijk net kouder water). Dus heb je s’nachts je thuisbatterij nodig. Je houdt het bad na je gebruik s’avonds op ongeveer 35 °C. Door de isolatie zal je bad nu van ongeveer 40 °C terugvallen naar 35 °C rond 6 uur s’morgens. Dit hangt natuurlijk ook van de omgevingstemperatuur s’nachts af. Zonder isolatie is dat al rond 2 a 3 uur s’nachts en zal je thuisbatterij volledig opgebruikt worden.

Rond 9 uur s’morgens heb je (soms) terug zon. Dus kan je je zonnepanelen gebruiken om die 5 °C terug te winnen. Je wil ook wat van je thuisbatterij weer kunnen opladen zodat die thuisbatterij je SPA bad op temperatuur houdt gedurende de volgende nacht en s’avonds wanneer je er gebruik van wil maken.

Zonder thuisbatterij is het volgens mij niet mogelijk een SPA bad warm te houden zonder electriciteit van het net aan te kopen.

M.a.w. Gebruik best je wasmachine en droogkast wanneer het regent en de dag ervoor je thuisbatterij volgeladen werd en je in de regen toch geen gebruik van je bad wil maken.

ps. Witte wolken wil zeggen beetje energieopbrengst (nipt genoeg zelfs, hier in maart zo’n 1,5 kW). Donkere wolken is niks. Zonnig is uiteraard veel energieopbrengst (hier in maart soms 4 kW – 6 kW en meer).

ps. Een electrische wagen opladen en zo’n SPA bad warm houden beiden met zonnepanelen? Ik denk dat je dat kan vergeten. Tenzij je een heel groot dak hebt plus nog voetbalveld vol panelen en een thuisbatterij die meer dan een dure luxewagen kost.

March 17, 2023

Critical CSS (either through Autoptimize with your own Critical CSS account or through Autoptimize Pro which includes Critical CSS) requires WordPress’ scheduling system to function to be able to communicate with criticalcss.com on a regular basis. In some cases this does not work and you might see this notification in your WordPress dashboard; If this is the case, go through these steps to...

Source

On behalf of Acquia I’m currently working on Drupal’s next big leap: Automatic Updates & Project Browser — both are “strategic initiatives”.

In November, I started helping out the team led by Ted Bowman that’s been working on it non-stop for well over 1.5 years (!): see d.o/project/automatic_updates. It’s an enormous undertaking, with many entirely new challenges — as this post will show.

For a sense of scale: more people of Acquia’s “DAT” Drupal Acceleration Team have been working on this project than the entire original DAT/OCTO team back in 2012!

The foundation for both will be the (API-only, no UI!) package_manager module, which builds on top of the php-tuf/composer-stager library. We’re currently working hard to get that module committed to Drupal core before 10.1.0-alpha1.

Over the last few weeks, we managed to solve almost all of the remaining alpha blockers (which block the core issue that will add package_manager to Drupal core, as an alpha-experimental module. One of those was a random test failure on DrupalCI, whose failure frequency was increasing over time!

A rare random failure may be acceptable, but at this point, ~90% of test runs were failing on one or more of the dozens of Kernel tests … but always a different combination. Repeated investigations over the course of a month had not led us to the root cause. But now that the failure rate had reached new heights, we had to solve this. It brought the team’s productivity to a halt — imagine what damage this would have done to Drupal core’s progress!

A combination of prior research combined with the fact that suddenly the failure rate had gone up meant that there really could only be one explanation: this had to be a bug/race condition in Composer itself, because we were now invoking many more composer commands during test execution.

Once we changed focus to composer itself, the root cause became obvious: Composer tries to ensure the temporary directory is writable and avoids conflicts by using microtime(). That function confusingly can return the time at microsecond resolution, but defaults to mere millisecondssee for yourself.

With sufficiently high concurrency (up to 32 concurrent invocations on DrupalCI!), two composer commands could be executed on the exact same millisecond:

// Check system temp folder for usability as it can cause weird runtime issues otherwise Silencer::call(static function () use ($io): void { $tempfile = sys_get_temp_dir() . '/temp-' . md5(microtime()); if (!(file_put_contents($tempfile, __FILE__) && (file_get_contents($tempfile) === __FILE__) && unlink($tempfile) && !file_exists($tempfile))) { $io->writeError(sprintf('PHP temp directory (%s) does not exist or is not writable to Composer. Set sys_temp_dir in your php.ini', sys_get_temp_dir())); } }); src/Composer/Console/Application.php in Composer 2.5.4

We could switch to microtime(TRUE) for microseconds (reduce collision probability 1000-fold) or hrtime() (reduce collision probability by a factor of a million). But more effective would be to avoid collisions altogether. And that’s possible: composer always runs in its own process.

Simply changing sys_get_temp_dir() . '/temp-' . md5(microtime()); to sys_get_temp_dir() . '/temp-' . getmypid() . '-' . md5(microtime()); is sufficient to safeguard against collisions when using Composer in high concurrency contexts.

So that single line change is what I proposed in a Composer PR a few days ago. Earlier today it was merged into the 2.5 branch — meaning it should ship in the next version!

Eventually we’ll be able to remove our work-around. But for now, this was one of the most interesting challenges along the way :)

Update 2023-03-26

Shipped in Composer 2.5.5 on March 21, 2023!

March 16, 2023

A graph showing the state of the Digital Experience Platforms in 2023. Vendors are plotted on a grid based on their ability to execute and completeness of vision. Acquia is placed in the 'Leaders' quadrant, indicating strong performance in both vision and execution.Gartner 2023 Magic Quadrant for DXP.

For the fourth consecutive year, Acquia has been named a Leader in the Gartner Magic Quadrant for Digital Experience Platforms (DXP).

Market recognition from Gartner on our product vision is exciting, because it aligns with what customers and partners are looking for in an open, composable DXP.

Acquia's strengths lie in its ties to Drupal, our open architecture, and its ability to take advantage of APIs to integrate with third-party applications.

Last year, I covered in detail what it means to be a Composable DXP.

Mandatory disclaimer from Gartner

Gartner, Magic Quadrant for Digital Experience Platforms, Irina Guseva, John Field, Jim Murphy, Mike Lowndes, March 13, 2023.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Acquia.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Gartner is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

March 14, 2023

Autoptimize Pro has “Boosters” to delay JavaScript (esp. interesting for external resources), CSS and HTML (which can be very impactful for long and/ or complex pages). Up until today’s release the delay was either until user interaction OR a set timeout of 5 seconds, but now you can choose the delay time yourself, setting it to e.g. 20s or 3s or -and that’s where things get a teeny bit shady- 0s...

Source

March 12, 2023

De oplossing voor Silicon Valley Bank is dat de FED eenvoudigweg die bank failliet laat gaan. Maar dat het alles van die bank één op één overneemt. Bij om het even wat: Decreet? Sure. Wet? Nog beter. Politiek akkoord? Ook goed. Het leger die de zaak overneemt met het geweer tegen de slaap? Desnoods wel ja.

Daarna verkoopt ze die papieren aan wie er interesse in heeft, of niet. Want de FED kan alles ook gewoon houden zoals het is. Zonder zich ook maar iets van de markt aan te trekken. Die markt denkt te vaak dat ze er echt toe doet. Ze doet dat niet zo veel. Veel minder dan ze zelf denkt.

Wat wel moet gebeuren, is dat Silicon Valley Bank failliet gaat. Dat al haar aandeelhouders alles kwijt zijn.

Dat reset de zaak. Dat is goed.

Ik denk nu niet dat het er toe doet dat ik het zeg of niet.

Maar het strategische belang is Avdiivka en niet Bakhmut.

Rusland neemt Bakhmut nog niet in voor een drietal Sun Tzu redenen:

  • Het houdt Oekraïne bezig met het sturen van reservetroepen en andere middelen die het daarom niet elders kan inzetten
  • Het houdt onze Westerse media als konijnen op een lichtdoos gefocust op dat wat er vooral niet toe doet
  • Het maakt het voor het Russische commando mogelijk om Wagner leeg te laten bloeden. Dat is nodig omdat Prigozjin bezig is met zijn reputatie.

Het boek Sun Tzu legt dit thans helder uit: What the ancients called a clever fighter is one who not only wins, but excels in winning with ease. Hence his victories bring him neither reputation for wisdom nor credit for courage.

Wat er strategisch wel toe doet is Avdiivka:

  • Het ligt vlak bij een grote hoofdstad van de Donbas, Donetsk
  • Meer in het Zuiden, waar duidelijk alle belangen en focus van Rusland liggen in dit conflict
  • Het is de start van de Noordflank voor dat Zuiden (want zulke flanken hoeven niet enkel West en Oost gemaakt te worden)

Maar wij Westerlingen zuigen onszelf leeg met onze eigen nonsense propaganda en beterweterigheid. Laten we nog wat naar Ursla haar rethoriek luisteren. Zij zal vast wel wat militaire inzichten hebben. Toch?

We zouden beter wat Sun Tzu lezen en begrijpen. Rusland kijkt meer en meer naar China, toch? Ik denk dat hun militair commando ook de Chinese oorlogsliteratuur goed gelezen heeft.

Ik denk dat ons Westers militair commando dat te weinig gelezen heeft. Of dat het vooral bezig is met zichzelf te verrijken door voor de oorlogindustrie opdrachten te verzilveren. Wat nu trouwens reeds nodig is. Uiteraard. Massaal zelfs.

Gecancelled

Bon, ik kan dus nu gecancelled worden omdat ik iets heb geschreven dat onze eigen strategie in vraag stelt en niet volledig alles wat Rusland doet probeert af te breken. Want dit moet tegenwoordig he. Introspectie gaat er volledig uit. Wij zijn heilig en goed. In alles wat we doen. Ook wanneer dat totale blunders zijn. Want de vijand is slecht. En zo. Want. Ja ja.

Die hele cancel-culture daarrond is trouwens ook een gigantisch strategische blunder van onszelf.

Meer haar op onze EU tanden

Op dit moment pleit ik er voor dat de EU lidstaten militariseren in gedachtegoed: dat we niet meer streven naar Europese vrede maar wel naar een positie waar we actief bereid zijn voor het fysiek uitvechten van een eventueel conflict. Met de daarbij horende militaire uitgaven en ontwikkelingen.

Bijvoorbeeld in Kosovo moeten we Servië duidelijk te maken dat we bereid zijn om ernstig in te grijpen, tot over de grens in hun land, en desnoods ook Belgrado zullen innemen.

Het is waarschijnlijk dat het Rusland’s strategie is om de VS haar militaire capaciteit uit te dunnen door dat conflict op de spits te drijven. Daarom moeten we als EU lidstaten Servië duidelijk maken dat wij dat zullen doen. Niet de VS. Daarom moeten wij daar onze EU soldaten stationeren. Zodat Servië klaar en duidelijk weet dat het tegen de rest van heel de Europese Unie zal vechten en dat dit tot en met de inname van hun hoofdstad zal zijn.

Uiteraard moeten we ook opnieuw al hun leiders die om het even welke oorlogsmisdaad plegen veroordelen in onze rechtbanken. Zonder meer maar ook vooral hun eerste minister en militaire leidinggevenden.

Ik zou het liever anders zien, maar het conflict in Oekraïne dwingt de EU lidstaten er toe om meer haar op hun tanden te hebben en dat haar nu ook echt te gebruiken.

March 06, 2023

playbook

The Ansible role stafwag.users is available at: https://github.com/stafwag/ansible-role-users

This release implements a shell parameters to define shell for an user. See the github issue for more details.

ChangeLog

shell parameter

  • shell parameter added

Have fun!

Ansible Role: users

An ansible role to manage user and user files - files in the home directory -.

Requirements

None

Role Variables

The following variables are set by the role.

  • root_group: The operating system root group. root by default. wheel on BSD systems.
  • sudo_group: The operating system sudo group. wheel by default. sudo on Debian systems.
  • users: Array of users to manage
    • name: name of the user.
    • group: primary group. if state is set to present the user primary group will be created. if state is set to absent the primary group will be removed.
    • uid: uid.
    • gid: gid.
    • groups: additional groups.
    • append: no (default) | yes. If yes, add the user to the groups specified in groups. If no, user will only be added to the groups specified in groups, removing them from all other groups.
    • state: absent present (default)
    • comment: user comment (GECOS)
    • home: Optionally set the user’s home directory.
    • password: Optionally set the user’s password to this crypted value.
    • password_lock: no yes lock password (ansible 2.6+)
    • ssh_authorized_keys: Array of the user ssh authorized keys
      • key: The ssh public key
      • state: absent present (default) Whether the given key (with the given key_options) should or should not be in the file.
      • exclusive: no (default) yes. Whether to remove all other non-specified keys from the authorized_keys file.
      • key_options: A string of ssh key options to be prepended to the key in the authorized_keys file.
    • user_files: array of the user files to manage.
      • path: path in the user home directoy. The home directory will be detected by getent_passwd
      • content: file content
      • state: absent present (default)
      • backup: no (default) yes. create a backup file.
      • dir_create: false (default) true.
      • dir_recurse: no (default) yes create the directory recursively.
      • mode: Default: ‘0600’. The permissions of the resulting file.
      • dir_mode: Default: ‘0700’. The permissions of the resulting directory.
      • owner: Name of the owner that should own the file/directory, as would be fed to chown.
      • owner: Name of the group that should own the file/directory, as would be fed to chown.
    • user_lineinfiles: Array of user lineinfile.
      • path: path in the user home directoy. The home directory will be detected by getent_passwd
      • regexp: The regular expression to look for in every line of the file.
      • line: The line to insert/replace into the file.
      • state: absent present (default)
      • backup: no (default) yes Create a backup
      • mode: Default: 600. The permissions of the resulting file.
      • dir_mode: Default: 700. The permissions of the resulting directory.
      • owner: Name of the owner that should own the file/directory, as would be fed to chown.
      • owner: Name of the group that should own the file/directory, as would be fed to chown.
      • create: Default: no. Create file if not exists

Dependencies

None

Example Playbooks

Create user with authorized key

- name: add user & ssh_authorized_key
  hosts: testhosts
  become: true
  vars:
    users:
      - name: test0001
        group: test0001
        password: ""
        state: "present"
        ssh_authorized_keys:
          - key: ""
            key_options: "no-agent-forwarding"
  roles:
    - stafwag.users

Add user to the sudo group

- name: add user to the sudo group
  hosts: testhosts
  become: true
  vars:
    users:
      - name: test0001
        groups: ""
        append: true
  roles:
    - stafwag.users

Create .ssh/config.d/intern_config and include it in .ssh/config

- name: setup tyr ssh_config
  become: true
  hosts: tyr
  vars:
    users:
      - name: staf
        user_files:
          - name: ssh config
            path: .ssh/config
            dir_create: true
            state: present
          - name: ssh config.d/intern_config
            path: .ssh/config.d/intern_config
            content: ""
            dir_create: true
        user_lineinfiles:
          - name: include intern_config
            path: .ssh/config
            state: present
            regexp: "^include config.d/intern_config"
            line: "include config.d/intern_config"
  roles:
    - stafwag.users

License

MIT/BSD

Author Information

Created by Staf Wagemakers, email: staf@wagemakers.be, website: http://www.wagemakers.be.

Ansible Role: users

An ansible role to manage user and user files - files in the home directory -.

Requirements

None

Role Variables

The following variables are set by the role.

  • root_group: The operating system root group. root by default. wheel on BSD systems.
  • sudo_group: The operating system sudo group. wheel by default. sudo on Debian systems.
  • users: Array of users to manage
    • name: name of the user.
    • group: primary group. if state is set to present the user primary group will be created. if state is set to absent the primary group will be removed.
    • uid: uid.
    • gid: gid.
    • groups: additional groups.
    • append: no (default) | yes. If yes, add the user to the groups specified in groups. If no, user will only be added to the groups specified in groups, removing them from all other groups.
    • state: absent present (default)
    • comment: user comment (GECOS)
    • home: Optionally set the user’s home directory.
    • password: Optionally set the user’s password to this crypted value.
    • password_lock: no yes lock password (ansible 2.6+)
    • shell: Optionally, the user shell
    • ssh_authorized_keys: Array of the user ssh authorized keys
      • key: The ssh public key
      • state: absent present (default) Whether the given key (with the given key_options) should or should not be in the file.
      • exclusive: no (default) yes. Whether to remove all other non-specified keys from the authorized_keys file.
      • key_options: A string of ssh key options to be prepended to the key in the authorized_keys file.
    • user_files: array of the user files to manage.
      • path: path in the user home directoy. The home directory will be detected by getent_passwd
      • content: file content
      • state: absent present (default)
      • backup: no (default) yes. create a backup file.
      • dir_create: false (default) true.
      • dir_recurse: no (default) yes create the directory recursively.
      • mode: Default: ‘0600’. The permissions of the resulting file.
      • dir_mode: Default: ‘0700’. The permissions of the resulting directory.
      • owner: Name of the owner that should own the file/directory, as would be fed to chown.
      • owner: Name of the group that should own the file/directory, as would be fed to chown.
    • user_lineinfiles: Array of user lineinfile.
      • path: path in the user home directoy. The home directory will be detected by getent_passwd
      • regexp: The regular expression to look for in every line of the file.
      • line: The line to insert/replace into the file.
      • state: absent present (default)
      • backup: no (default) yes Create a backup
      • mode: Default: 600. The permissions of the resulting file.
      • dir_mode: Default: 700. The permissions of the resulting directory.
      • owner: Name of the owner that should own the file/directory, as would be fed to chown.
      • owner: Name of the group that should own the file/directory, as would be fed to chown.
      • create: Default: no. Create file if not exists

Dependencies

None

Example Playbooks

Create user with authorized key

- name: add user & ssh_authorized_key
  hosts: testhosts
  become: true
  vars:
    users:
      - name: test0001
        group: test0001
        password: ""
        state: "present"
        ssh_authorized_keys:
          - key: ""
            key_options: "no-agent-forwarding"
  roles:
    - stafwag.users

Add user to the sudo group

- name: add user to the sudo group
  hosts: testhosts
  become: true
  vars:
    users:
      - name: test0001
        groups: ""
        append: true
  roles:
    - stafwag.users

Create .ssh/config.d/intern_config and include it in .ssh/config

- name: setup tyr ssh_config
  become: true
  hosts: tyr
  vars:
    users:
      - name: staf
        user_files:
          - name: ssh config
            path: .ssh/config
            dir_create: true
            state: present
          - name: ssh config.d/intern_config
            path: .ssh/config.d/intern_config
            content: ""
            dir_create: true
        user_lineinfiles:
          - name: include intern_config
            path: .ssh/config
            state: present
            regexp: "^include config.d/intern_config"
            line: "include config.d/intern_config"
  roles:
    - stafwag.users

License

MIT/BSD

Author Information

Created by Staf Wagemakers, email: staf@wagemakers.be, website: http://www.wagemakers.be.

February 27, 2023

An artistic rendering of an endless amount of servers stretching into the horizon.A Generative AI self-portrait by DALL·E. Via Wikimedia Commons.

I recently bought a Peloton bike as a Christmas gift for my wife. The Peloton was for our house in Belgium. Because Peloton does not deliver to Belgium yet, I had to find a way to transport one from Germany to Belgium. It was a bit of a challenge as the bike is quite large, and I wasn't sure if it would fit in the back of our car.

I tried measuring the trunk of my car, along with another Peloton. I wasn't positive if it would fit in the car. I tried Googling the answer but search engines aren't great at answering these types of questions today. Being both uncertain of the answer and too busy (okay, let's be real – lazy) to figure it out myself, I decided to ship the bike with a courier. When in doubt, outsource the problem.

To my surprise, when Microsoft launched their Bing and ChatGPT integration not long after my bike-delivery conundrum, one of their demos showed how ChatGPT can answer the question whether a package fits in the back of a car. I'll be damned! I could have saved money on a courier after all.

After watching the event, I asked ChatGPT, and it turns out the Peloton would have fit. That is, assuming we can trust the correctness of ChatGPT's answer.

Chatgpt peloton in volkswagen californiaA screenshot of ChatGPT answering the question: "Does a Peloton bike fit in the back of a Volkwsagen California T6.1?".

What is interesting about the Peloton example is that it combines data from multiple websites. Combining data from multiple sources is often more helpful than the traditional search method, where the user has to do the aggregating and combining of information on their own.

Examples like this affirm my belief that AI tools are one of the next big leaps in the internet's progress.

AI disintermediates traditional search engines

Since its commercial debut in the early 90s, the internet has repeatedly upset the established order by slowly, but certainly, eliminating middlemen. Book stores, photo shops, travel agents, stock brokers, bank tellers and music stores are just a few examples of the kinds of intermediaries who have already been disrupted by their online counterparts.

A search engine acts as a middleman between you and the information you're seeking. It, too, will be disintermediated, and AI seems to be the best way of disintermediating it.

Many people have talked about how AI could even destroy Google. Personally, I think that is overly dramatic. Google will have to change and transform itself, and it's been doing that for years now. In the end, I believe Google will be just fine. AI disintermediates traditional search engines, but search engines obviously won't go away.

The Big Reverse of the Web marches on

The automatic combining of data from multiple websites is consistent with what I've called the Big Reverse of the Web, a slow but steady evolution towards a push-based web; a web where information comes to us versus the current search-dominant web. As I wrote in 2015:

I believe that for the web to reach its full potential, it will go through a massive re-architecture and re-platforming in the next decade. The current web is "pull-based", meaning we visit websites. The future of the web is "push-based", meaning the web will be coming to us. In the next 10 years, we will witness a transformation from a pull-based web to a push-based web. When this "Big Reverse" is complete, the web will disappear into the background much like our electricity or water supply.

Facebook was an early example of what a push-based experience looks like. Facebook "pushes" a stream of aggregated information designed to tell you what is happening with your friends and family; you no longer have to "pull" them or ask them individually how they are doing.

A similar dynamic happens when AI search engines give us the answers to our questions rather than redirecting us to a variety of different websites. I no longer have to "pull" the answer from these websites; it is "pushed" to me instead. Trying to figure out if a package fits in the back of my car is the perfect example of this.

Unlocking the short term potential of Generative AI for CMS

While it might take a while for AI search to work out some early kinks, in the near term, Generative AI will lead to an increasing amount of content being produced. It's bad news for the web as a lot of that content will likely end up being spam. But it also is good news for CMSs, as there will be a lot more legitimate content to manage as well.

I was excited to see that Kevin Quillen from Velir created a number of Drupal integrations for ChatGPT. It allows us to experiment with how ChatGPT will influence CMSs like Drupal.

For example, the video below shows how the power of Generative AI can be used from within Drupal to help content creators generate fresh ideas and produce content that resonates with their audience.

Similarly, AI integrations can be used to translate content into different languages, suggest tags or taxonomy terms, help optimize content for search engines, summarize content, match your content's tone to an organizational standard, and much more.

The screenshot below shows how some of these use cases have been implemented in Drupal:

A screenshot of Drupal's editorial UI that shows a few integrations with ChatGPT.A screenshot of Drupal's editorial UI that shows a few integrations with ChatGPT in the sidebar. The ability to suggest similar titles, summarize content and recommend taxonomy terms.

The Drupal modules behind the video and screenshot are Open Source: see the OpenAI project on Drupal.org. Anyone can experiment with these modules and use them as a foundation for their own exploration. Sidenote: another example of how Open Source innovation wins every single time.

If you look at the source code of these modules, you can see that it is relatively easy to add AI capabilities to Drupal. ChatGPT's APIs make the integration process straightforward. Extrapolating from Drupal, I believe it is very likely that in the next year, every CMS will offer AI capabilities for creating and managing content.

In short, you can expect many text fields to become "AI-enhanced" in the next 18 months.

Boost your website's visibility by optimizing for AI crawlers

Another short-term change is that marketers will seek to better promote their content to AI bots, just like they currently do with search engines.

I don't believe AI optimization to be very different from Search Engine Optimization (SEO). Like search engines, AI bots will have to put a lot of emphasis on trust, authority, relevance, and the understandability of content. It will remain essential to have high-quality content.

Right now, in AI search engines, attribution is a problem. It's often impossible to know where content is sourced, and as a result, to trust AI bots. I hope that more AI bots will provide attribution in the future.

I also expect that more websites will explicitly license their content, and specify the ways that search engines, crawlers, and chatbots can use, remix, and adopt their content.

Schema org image license markupThe HTML code for an image on my blog. Schema.org metadata is used to programmatically specify that my photo is licensed under Creative Commons BY-NC 4.0. This license encourages others to copy, remix, and redistribute my photos, as long it is for noncommercial purposes and appropriate credit is given.

As can be seen from the screenshot above, I specify a license for all 10,000+ photos on my site. I make them available under Creative Commons. The license is specified in the HTML code, and can be programmatically extracted by a crawler. I do something very similar for my blog posts.

By licensing my content under Creative Commons, I'm giving tools like ChatGPT permission to use my content, as long as they follow the license conditions. I don't believe ChatGPT uses that information today, but they could, and probably should, in the future.

If a website has high-quality content, and AI tools give credit to their sources, this can result in organic traffic back to the website.

All things considered, my base case is that AI bots will become an increasingly important channel for digital experience delivery, and that websites will be the main source of input for chatbots. I suspect that websites will only need to make small, incremental changes to optimize their content for AI tools.

Predicting the longer term impact of AI tools on websites

Longer term, AI tools will likely bring significant changes to digital marketing and content management.

I predict that over time, AI bots will not only provide factual information, but also communicate with emotions and personality, providing more human-like interactions than websites.

Compared to traditional websites, AI bots will be better at marketing, sales and customer success.

Unlike humans, AI bots will possess perfect product knowledge, speak many languages, and – this is the kicker – have a keen ability to identify what emotional levers to pull. They will be able to appeal to customers' motivations, whether it's greed, pride, frustration, fear, altruism, or envy.

The downside is that AI bots will also become more "skilled" at spreading misinformation, or might be able to cause emotional distress in a way that traditional websites don't. There is undeniably a ​​dark side to AI bots.

My more speculative and long-term case is that AI chatbots will become the most effective channel for lead generation and conversion, surpassing websites in importance when it comes to digital marketing.

Without proper regulations and policies, that evolution will be tumultuous at best, and dangerous at worst. As I've been shouting from the rooftops since 2015 now: "When algorithms rule our lives, who should rule them?". I continue to believe that algorithms with significant effects on society require regulation and policies, just like the Food and Drug Administration (FDA) in the U.S. or the European Medicines Agency (EMA) in Europe oversee the food and drug industry.

The impact of AI on website development

Of course, the advantages of Generative AI extend beyond content creation and content delivery. The advantages also include software development, such as writing code (46% of all new code on GitHub is generated by GitHub's Copilot), identifying security vulnerabilities (ChatGPT finds two times as many security vulnerabilities as a professional software security scanner), and more. The impact of AI on software development is a complex topic that warrants a separate blog post. In the meantime, here is a video demonstrating how to use ChatGPT to build a Drupal module.

The risks and challenges of Generative AI

Even though I'm optimistic about the potential of AI, I would be neglectful if I failed to discuss some of the potential challenges associated with it.

Although Generative AI is really good at some tasks, like writing a sincere letter to my wife asking her to bake my favorite cookies, it still has serious issues. Some of these issues include, but are not limited to:

  • Legal concerns – Copyrighted works have been involuntarily included in training datasets. As a result, many consider Generative AI a high-tech form of plagiarism. Microsoft, GitHub, and OpenAI are already facing a class action lawsuit for allegedly violating copyright law. The ownership and protection of content generated by AI is unclear, including whether AI tools can be considered "creators" of original content for copyright law purposes. Technologists, lawyers, and policymakers will need to work together to develop appropriate legal frameworks for the use of AI.
  • Misinformation concerns – AI systems often "hallucinate", or make up facts, which could exuberate the web's misinformation problem. One of the most interesting analogies I've seen comes from The New Yorker, which describes ChatGPT as a blurry JPEG of all of the text on the web. Just as a JPEG file loses some of the quality and integrity of the original, ChatGPT summarizes and approximates text on the web.
  • Bias concerns – AI systems can have gender and racial biases. It is widely acknowledged that a significant proportion of the content available on the web is generated by white males residing in western countries. Consequently, ChatGPT's training data and outputs are prone to reflecting this demographic bias. Biases are troubling and can even be dangerous, especially considering the potential societal impact of these technologies.

The above issues related to legal authorship, misinformation, and bias have also given rise to a host of ethical concerns.

My personal strategy

Disruptive changes can be polarizing: they come with some real downsides, while bringing new opportunities.

I believe there is no stopping AI. In my opinion, it's better to embrace change and focus on moving forward productively, rather than resisting it. Iterative improvements to both these algorithms and to our legal frameworks will hopefully address concerns over time.

In the past, the internet was fraught with risk, and to a large extent, it still is. However, productivity and efficiency improvements almost always outweigh risk.

While some individuals and organizations advocate against the use of AI altogether, my personal strategy is to proceed with caution. My strategy is two-fold: (1) focus on experimenting with AI rather than day-to-day usage, and (2) highlight the challenges with AI so that people can make their own choices. The previous section of this blog post tried to do that.

I also expect that organizations will use their own data to train their custom AI bots. This would eliminate many concerns, and let organizations take advantage of AI for applications like marketing and customer success. Simon Willison shows that in a couple of hours of work, he was able to train his own model based on his website content. Time permitting, I'd like to experiment with that myself.

Conclusion

I'm both intrigued, wary, and inspired as to where AI will take the web in the days, months, and years to come.

In the near term, Generative AI will alter how we create content. I expect integrations into CMSs will be simple and numerous, and that websites will only have to make small changes to optimize their content for AI tools.

Longer term, AI will change the way in which we interact with the web and how the web interacts with us. AI tools will steadily alter the relative importance of websites, and potentially even surpass websites in importance when it comes to digital marketing.

Exciting times, but let's move forward with caution!

"Welcome to pre-9/11 New York City, when the world was unaware of the profound political and cultural shifts about to occur, and an entire generation was thirsty for more than the post–alternative pop rock plaguing MTV. In the cafés, clubs, and bars of the Lower East Side there convened a group of outsiders and misfits full of ambition and rock star dreams."

Music was the main reason I wanted to move to New York - I wanted to walk the same streets that the Yeah Yeah Yeahs, the National, Interpol, the Walkmen, the Antlers and Sonic Youth were walking. In my mind they'd meet up and have drinks with each other at the same bars, live close to each other, and I'd just run into them all the time myself. I'm not sure that romantic version of New York ever existed. Paul Banks used to live on a corner inbetween where I live and where my kids go to school now, but that is two decades ago (though for a while, we shared a hairdresser). On one of my first visits to New York before moving here, I had a great chat with Thurston Moore at a café right before taking the taxi back to the airport. And that's as close as I got to living my dream.

But now the documentary "Meet me in the Bathroom" (based on the book of the same name) shows that version of New York that only existed for a brief moment in time.

"Meet Me In The Bathroom — ??inspired by Lizzy Goodman’s book of the same name — chronicles the last great romantic age of rock ’n’ roll through the lens of era-defining bands."

Read the book, watch the documentary (available on Google Play among other platforms), or listen to the Spotify playlist Meet Me in the Bathroom: Every Song From The Book In Chronological Order. For bonus points, listen to Losing My Edge (every band mentioned in the LCD Soundsystem song in the order they're mentioned)

Taken from The Playlist - a curated perspective on the intersection of form and content (subscribe, discuss)

flattr this!

February 24, 2023

Someone uploaded an amateur recording of an entire (?) Jeff Buckley solo concert at the Sin-é from July 1993 (one year before Grace was released). A gem!

Source

February 20, 2023

"I have hands but I am doing all I can to have daily independence so I can’t be ‘all hands’ at work. I can share ideas, show up, and ask for help when I physically can’t use my hands. Belonging means folks with one hand, no hand and limited hands are valued in the workplace." - Dr. Akilah Cadet

If you've been wondering why over the past few months you're seeing a lot more "All Teams" meetings on your calendar, it's because language is ever evolving with the time, and people are starting to become more aware and replace ableist language.

Read more:

If your team still has "all hands" meetings, propose a more creative name, or default to "all teams" instead. Take some time to familiarize yourself with other ableist language and alternatives.

Taken from The Playlist - a curated perspective on the intersection of form and content (subscribe, discuss)

flattr this!

February 17, 2023

In Western Europe (and beyond), for centuries, at least from the Middle Ages until about 1940, there was one authority for people to believe: The catholic church.

When a peasant was in doubt, they would ask a priest and have a definitive answer. Doubt was gone. The end.

 

Somewhere in the beginning of the 20th century the power of the church started to diminish and many new 'authorities' like science or labour unions or money started taking over.

Today there is no single authority, there is no single instance anywhere that people believe. There is mainly distrust in anything that claims to be an authority. In other words, most questions remain unanswered. Today, the peasant has no priest to take away his doubt.


Enter AI. Millions of people are talking to ChatGPT and are using it to answer questions. And I wonder: What if people start believing the AI? What if this becomes the new authority?

If you think this is far fetched, then you have not played enough with ChatGPT. Then you have not tweaked your questions. It knows a whole lot of things, it's a far better writer than me, it's a better programmer, it's a better problem solver and it can learn a hundred million times faster than me.

The motto in the next couple of years will be "When in doubt, ask the AI!".

(This post is written by me by the way, not by ChatGPT.)

February 15, 2023

This is an old film roll featuring an ostrich running in every frame. The ostrich is purple in color, which represents the mascot of Nostr.

I recently discovered Nostr, a decentralized social network that I find exciting and promising.

Technically, Nostr is a protocol, not a social network. However, developers can use the Nostr protocol to create a variety of applications, including social networks.

Nostr has been around a few years, but in December 2022, Jack Dorsey, the co-founder and former CEO of Twitter, announced that he had made a donation of 14 bitcoins, valued at approximately $250,000. The donation was made to @fiatjaf, the anonymous founder of Nostr.

Nostr stands for Notes and Other Stuff Transmitted by Relays. At its core, it is a system to exchange signed messages. The basic architecture can be explained in three bullets:

  • Every Nostr user is identified by a public key.
  • Users send and retrieve messages to servers. These servers are called relays.
  • Messages are called events. Users sign events with a private key. Events can be social media posts, private messages chess moves, etc.

I reviewed the Nostr protocol and found it to be straightforward to understand. The basic Nostr protocol seems simple enough to implement in a day. This is a quality I appreciate in protocols. It is why I love RSS, for example.

While the core Nostr protocol is simple, it is very extensible. It is extended using NIPs, which stands for Nostr Implementation Possibilities. NIPs can add new fields and features to Nostr messages or events. For example, NIP-2 adds usernames and contact lists (followers), NIP-8 adds mentions, NIP-36 adds support for content warnings, etc.

Joining the Nostr social network

Despite Nostr being just a few years old, there are a number of clients. I decided on Damus, a Twitter-like Nostr client for iOS. (Nostr's Damus is a clever pun on Nostradamus, the French astrologer.)

You don't need to create a traditional account to sign up. You just use a public and private key. You can use these keys to use the platform anonymously. Unlike with proprietary social networks, you don't need an email address or phone number to register.

If you want, you can choose to verify your identity. Verifying your identity links your public key to a public profile. I verified my identity using NIP-05, though different options exist. The NIP-05 verification process involved creating a static file on my website, available at https://dri.es/.well-known/nostr.json. It verifies that I'm the owner of the name @Dries, the public key npub176xpl3dl0agjt7vjeccw6v5grlx8f9mhc75aazwvvqfjvq5al8uszj5asu and https://dri.es.

Nostr versus ActivityPub

Recently, Elon Musk became the world's richest troll and many people have left Twitter for Mastodon. Mastodon is a decentralized social media platform built on the ActivityPub protocol. I wanted to compare ActivityPub with Nostr, as Nostr offers many of the same promises.

Before I do, I want to stress that I am not an expert in either ActivityPub or Nostr. I have read both specifications, but I have not implemented a client myself. However, I do have a basic understanding of the differences between the two.

I also want to emphasize that both Nostr and ActivityPub are commendable for their efforts in addressing the problems encountered by traditional centralized social media platforms. I'm grateful for both.

ActivityPub has been around for longer, and is more mature, but by comparison, there is a lot more to like about Nostr:

  • Nostr is more decentralized — Nostr uses a public key to identify users, while ActivityPub utilizes a more conventional user account system. ActivityPub user accounts are based on domain names, which can be controlled by third-party entities. Nostr's identification system is more decentralized, as it does not rely on domain names controlled by outside parties.
  • Nostr is easier to use — Decentralized networks are notoriously tough to use. To gain mass adoption, the user experience of decentralized social networks needs to match and exceed that of proprietary social networks. Both Nostr and Mastodon have user experience problems that stem from being decentralized applications. That said, I found Nostr easier to use, and I believe it is because the Nostr architecture is simpler.
    • Migrating to a different Mastodon server can be challenging, as your username is tied to the domain name of the current Mastodon server. However, this is not a problem in Nostr, as users are identified using a unique public key rather than a domain name.
    • Nostr doesn't currently offer the ability to edit or delete messages easily. While there is an API available to delete a message from a relay, it requires contacting each relay that holds a copy of your message to request its deletion, which can be challenging in practice.
  • Nostr makes it easier to select your preferred content policies — Each Mastodon server or Nostr relay can have its own content policy. For example, you could have a Nostr relay that only lets verified users publish, does not allow content that has anything to do with violence, and conforms the local laws of Belgium. Being able to seamlessly switch servers or relays is very valuable because it allows user to choose a Mastodon server or Nostr relay that they align with. Unfortunately, migrating to a different Mastodon server, to opt into a different content policy, can be a challenging task.
  • Nostr is easier to develop for — The Nostr protocol is easier to implement than the ActivityPub protocol, and appears more extensible.
  • Nostr has Zaps, which is potentially game-changing — ActivityPub lacks an equivalent of Zaps, which could make it harder to address funding issues and combat spam. More on that in the next section.

Lastly, both protocols likely suffer from problems unique to decentralized architectures. For example, when you post a link to your site, most clients will try to render a preview card of that link. That preview card can contain an image, the title of the page, and a description. To create preview cards, the page is fetched and its HTML is parsed, looking for Open Graph tags. Because of the distributed nature of both Nostr and Mastodon this can cause a site to get hammered with requests.

Zaps

Social networks are overrun with spam and bots. Ads are everywhere. Platform owners profit from content creators, and content creators themselves don't make money. The world needs some breakthrough in this regard, and Nostr's Zap-support might offer solutions.

A Zap is essentially a micropayment made using Bitcoin's Lightning network. Although Nostr itself does not use blockchain technology, it enables each message or event to contain a "Zap request" or "Zap invoice" (receipt). In other words, Nostr has optional blockchain integration for micropayment support.

The implementation of this protocol extension can be found in NIP-57, which was finalized last week. As a brand new development, the potential of Zap-support has yet to be fully explored. But it is not hard to see how micropayments could be used to reward content creators, fund relay upkeep, or decrease spam on social media platforms. With micropayments supported at the protocol level, trying out and implementing new solutions has become simpler than ever before.

One potential solution is for receivers to require 200 satoshi (approximately $0.05) to receive a message from someone outside of their network. This would make spamming less economically attractive to marketers. Another option is for relays to charge their users a monthly fee, which could be used to maintain a block-list or content policy.

Personally, I am a big fan of rewarding content creators, financing contributions, and implementing anti-spam techniques. It aligns with my interest in public good governance and sustainability.

For the record, I have mixed feelings about blockchains. I've HODL'd Bitcoin since 2013 and Ethereum since 2017. On one hand, I appreciate the opportunities and innovation they offer, but on the other hand, I am deeply concerned about their energy consumption and contribution to climate change.

It's worth noting that the Lightning network is much more energy efficient than Bitcoin. Lightning operates on top of the Bitcoin network. The main Bitcoin blockchain, known as a layer 1 blockchain, is very energy inefficient and can only handle fewer than 10 transactions per second. In contrast, the Lightning Network, known as a layer 2 network, uses a lot less energy and has the potential to handle millions of transactions per second on top of the Bitcoin network.

So, yes, Zap support is an important development to pay attention to. Even though it's brand new, I believe that in five years, we'll look back and agree that Zap support was a game-changer.

Conclusions

"Notes and Other Stuff, Transmitted by Relays" seems like a promising idea, even at this early stage. It is definitely something to keep an eye on. While for me it was love at first sight, I'm not sure how it will evolve. I am interested in exploring it further, and if time permits, I plan to create some basic integration with my own Drupal site.

Also posted on IndieNews.

February 14, 2023

Ik was eens naar de clip van You Can Call Me Al aan het kijken. Het alom bekende nummer van Paul Simon. Daar herkende ik toch een speler in die clip die een soort van slaafje speelt voor hoofdzanger Chevy Chase.

Dat slaafje deed mij denken aan Jonathan Holslag. Ja nee. Serieus.

Ik wilde dat maar melden. Jullie mogen zelf invullen wie Al Chevy is, en wie Betty Jonathan is. Maar stel dat het de VS en de EU zouden zijn?! Zelfs moest het niet in die volgorde opgevoerd worden!?

Nan na na nah. Nan na na nah.

I can call you Betty, Jonathan. Betty when you call me you can call me Al.

I consider myself a Paul Simon generalist, trying to see the big picture without losing sight of the details.

February 13, 2023

My friend and #WordPress performance guru Bariş is from Turkey and is asking for help from the WordPress ecosystem. Share as widely as possible!

Source