Planet Grep

Planet'ing Belgian FLOSS people

Planet Grep is maintained by Wouter Verhelst. All times are in UTC.

August 12, 2022

I run Debian on my laptop (obviously); but occasionally, for $DAYJOB, I have some work to do on Windows. In order to do so, I have had a Windows 10 VM in my libvirt configuration that I can use.

A while ago, Microsoft issued Windows 11. I recently found out that all the components for running Windows 11 inside a libvirt VM are available, and so I set out to upgrade my VM from Windows 10 to Windows 11. This wasn't as easy as I thought, so here's a bit of a writeup of all the things I ran against, and how I fixed them.

Windows 11 has a number of hardware requirements that aren't necessary for Windows 10. There are a number of them, but the most important three are:

  • Secure Boot is required (Windows 10 would still boot on a machine without Secure Boot, although buying hardware without at least support for that hasn't been possible for several years now)
  • A v2.0 TPM module (Windows 10 didn't need any TPM)
  • A modern enough processor.

So let's see about all three.

A modern enough processor

If your processor isn't modern enough to run Windows 11, then you can probably forget about it (unless you want to use qemu JIT compilation -- I dunno, probably not going to work, and also not worth it if it were). If it is, all you need is the "host-passthrough" setting in libvirt, which I've been using for a long time now. Since my laptop is less than two months old, that's not a problem for me.

A TPM 2.0 module

My Windows 10 VM did not have a TPM configured, because it wasn't needed. Luckily, a quick web search told me that enabling that is not hard. All you need to do is:

  • Install the swtpm and swtpm-tools packages
  • Adding the TPM module, by adding the following XML snippet to your VM configuration:

    <devices>
      <tpm model='tpm-tis'>
        <backend type='emulator' version='2.0'/>
      </tpm>
    </devices>
    

    Alternatively, if you prefer the graphical interface, click on the "Add hardware" button in the VM properties, choose the TPM, set it to Emulated, model TIS, and set its version to 2.0.

You're done!

Well, with this part, anyway. Read on.

Secure boot

Here is where it gets interesting.

My Windows 10 VM was old enough that it was configured for the older i440fx chipset. This one is limited to PCI and IDE, unlike the more modern q35 chipset (which supports PCIe and SATA, and does not support IDE nor SATA in IDE mode).

There is a UEFI/Secure Boot-capable BIOS for qemu, but it apparently requires the q35 chipset,

Fun fact (which I found out the hard way): Windows stores where its boot partition is somewhere. If you change the hard drive controller from an IDE one to a SATA one, you will get a BSOD at startup. In order to fix that, you need a recovery drive. To create the virtual USB disk, go to the VM properties, click "Add hardware", choose "Storage", choose the USB bus, and then under "Advanced options", select the "Removable" option, so it shows up as a USB stick in the VM. Note: this takes a while to do (took about an hour on my system), and your virtual USB drive needs to be 16G or larger (I used the libvirt default of 20G).

There is no possibility, using the buttons in the virt-manager GUI, to convert the machine from i440fx to q35. However, that doesn't mean it's not possible to do so. I found that the easiest way is to use the direct XML editing capabilities in the virt-manager interface; if you edit the XML in an editor it will produce error messages if something doesn't look right and tell you to go and fix it, whereas the virt-manager GUI will actually fix things itself in some cases (and will produce helpful error messages if not).

What I did was:

  • Take backups of everything. No, really. If you fuck up, you'll have to start from scratch. I'm not responsible if you do.
  • Go to the Edit->Preferences option in the VM manager, then on the "General" tab, choose "Enable XML editing"
  • Open the Windows VM properties, and in the "Overview" section, go to the "XML" tab.
  • Change the value of the machine attribute of the domain.os.type element, so that it says pc-q35-7.0.
  • Search for the domain.devices.controller element that has pci in its type attribute and pci-root in its model one, and set the model attribute to pcie-root instead.
  • Find all domain.devices.disk.target elements, setting their dev=hdX to dev=sdX, and bus="ide" to bus="sata"
  • Find the USB controller (domain.devices.controller with type="usb", and set its model to qemu-xhci. You may also want to add ports="15" if you didn't have that yet.
  • Perhaps also add a few PCIe root ports:

    <controller type="pci" index="1" model="pcie-root-port"/>
    <controller type="pci" index="2" model="pcie-root-port"/>
    <controller type="pci" index="3" model="pcie-root-port"/>
    

I figured out most of this by starting the process for creating a new VM, on the last page of the wizard that pops up selecting the "Modify configuration before installation" option, going to the "XML" tab on the "Overview" section of the new window that shows up, and then comparing that against what my current VM had.

Also, it took me a while to get this right, so I might have forgotten something. If virt-manager gives you an error when you hit the Apply button, compare notes against the VM that you're in the process of creating, and copy/paste things from there to the old VM to make the errors go away. As long as you don't remove configuration that is critical for things to start, this shouldn't break matters permanently (but hey, use your backups if you do break -- you have backups, right?)

OK, cool, so now we have a Windows VM that is... unable to boot. Remember what I said about Windows storing where the controller is? Yeah, there you go. Boot from the virtual USB disk that you created above, and select the "Fix the boot" option in the menu. That will fix it.

Ha ha, only kidding. Of course it doesn't.

I honestly can't tell you everything that I fiddled with, but I think the bit that eventually fixed it was where I chose "safe mode", which caused the system to do a hickup, a regular reboot, and then suddenly everything was working again. Meh.

Don't throw the virtual USB disk away yet, you'll still need it.

Anyway, once you have it booting again, you will now have a machine that theoretically supports Secure Boot, but you're still running off an MBR partition. I found a procedure on how to convert things from MBR to GPT that was written almost 10 years ago, but surprisingly it still works, except for the bit where the procedure suggests you use diskmgmt.msc (for one thing, that was renamed; and for another, it can't touch the partition table of the system disk either).

The last step in that procedure says to restart your computer!, which is fine, except at this point you obviously need to switch over to the TianoCore firmware, otherwise you're trying to read a UEFI boot configuration on a system that only supports MBR booting, which obviously won't work. In order to do that, you need to add a loader element to the domain.os element of your libvirt configuration:

<loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>

When you do this, you'll note that virt-manager automatically adds an nvram element. That's fine, let it.

I figured this out by looking at the documentation for enabling Secure Boot in a VM on the Debian wiki, and using the same trick as for how to switch chipsets that I explained above.

Okay, yay, so now secure boot is enabled, and we can install Windows 11! All good? Well, almost.

I found that once I enabled secure boot, my display reverted to a 1024x768 screen. This turned out to be because I was using older unsigned drivers, and since we're using Secure Boot, that's no longer allowed, which means Windows reverts to the default VGA driver, and that only supports the 1024x768 resolution. Yeah, I know. The solution is to download the virtio-win ISO from one of the links in the virtio-win github project, connecting it to the VM, going to Device manager, selecting the display controller, clicking on the "Update driver" button, telling the system that you have the driver on your computer, browsing to the CD-ROM drive, clicking the "include subdirectories" option, and then tell Windows to do its thing. While there, it might be good to do the same thing for unrecognized devices in the device manager, if any.

So, all I have to do next is to get used to the completely different user interface of Windows 11. Sigh.

Oh, and to rename the "w10" VM to "w11", or some such. Maybe.

August 10, 2022

À voir l’affiche et la bande-annonce, l’année du requin s’annonce comme une comédie estivale des plus traditionnelles, sorte de croisement entre « Les gendarmes de Saint-Tropez à la pêche au requin » et « Les bronzés au camping 3 ».

Heureusement, la lecture des critiques m’avait mis la puce à l’oreille. L’année du requin n’est pas une énième comédie franchouillarde de type sous-splendid, au grand plaisir ou au grand dam des commentateurs. Les gags de la bande-annonce s’enchainent dans les premières minutes du film. Comme prévu, le gendarme Maja, Marina Foïs, se prend un seau d’eau et une vanne comique de la part de son collègue Blaise, Jean-Pascal Zadi. Rires bien vite étouffés par la réplique tranchante d’une Marina Foïs qui crève l’écran en gendarme fatiguée par une carrière assez terne dans une ville où la spécialité est de poser ses fesses dans le sable et de regarder la mer : « Ce n’est pas gai de se prendre un seau d’eau lorsqu’on est en service. » Sourires gênés de ses coéquipiers et du public.

Le ton est donné. Le prétexte comédie n’était qu’un attrape-nigaud. Si le film regorge de pépites humoristiques, celles-ci se font discrètes, sans insistance (comme le coup de la garde-robe de Maja, entraperçue une seconde en arrière-plan). Là n’est pas le propos.

Le propos ? Il n’est pas non plus dans l’histoire, assez simple pour ne pas dire simplette : un requin hante les côtes de la station balnéaire de La Pointe et, à la veille de la retraite, la gendarme maritime Maja décide d’en faire son affaire.

Pas de comédie désopilante ? Pas d’histoire ? Mais quel est l’intérêt alors ?

Tout simplement dans l’incroyable panoplie d’humains que la caméra des frères Boukherma va chercher. Chaque personnage est ciselé, la caméra s’attardant longuement sur les défauts physiques, les rides, les visages bouffis, fatigués, vieillis, mais également souriants et pleins de personnalité. Au contraire des frères Dardennes, l’image ne cherche pas à servir un ultra-réalisme social. Il s’agit plutôt de mettre à l’honneur, d’héroïfier ces humains normaux. En contrepoint à ces anti-superhéros, le film offre un maire jeune, lisse et sans caractère ni le moindre esprit de décision (Loïc Richard). Parachuté depuis Paris, il se réfugie, symbole de cette lutte des classes omniprésente, derrière une visière anti-covid. Des Parisiens qui sont à la fois détestés par les locaux, mais nécessaires, car faisant tourner l’économie.

** Entracte publicitaire **

Acteur bordelais, Loïc Richard est réputé pour son travail de la voix. J’ai eu l’occasion de collaborer avec lui lorsqu’il a enregistré la version audiolivre de mon roman Printeurs, disponible sur toutes les plateformes d’audiobook. Autant il joue à merveille le personnage fade et lisse dans le film, autant il peut prendre des intonations sombres et inquiétantes dans sa lecture de Printeurs. Je ne pouvais quand même pas rater de placer cette anecdote 😉

=> https://voolume.fr/catalogue/sf-et-fantasy/printeurs/

** Fin de l’entracte, merci de regagner vos sièges **

Dans la première partie du film, Maja part à la chasse aux requins et tout se passe, à la grande surprise du spectateur, un peu trop facilement. La gendarme devient, malgré elle, une héroïne des réseaux sociaux. Mais au plus rapide est la montée, au plus dure est la chute. Au premier incident, qui n’est clairement pas le fait de Maja, elle devient la bête noire. Harcelée, elle en vient à paniquer dans une courte, mais puissante scène de rêve. Le propos est clair : le véritable requin est l’humain, alimenté par les réseaux sociaux et par les médias, symbolisé par une omniprésente radio réactionnaire qui attise les haines sous un vernis pseudohumoristique. Sous des dehors de petits paradis balnéaires, la haine et la rancœur sont tenaces. Sous la plage, les pavés. L’éden est amer.

À partir de la séquence onirique, le film perd progressivement tout semblant de réalisme et l’humour se fait de plus en plus rare. Les codes sont inversés : si l’humour était filmé de manière réaliste, les images d’action et d’angoisse sont offertes à travers la caméra d’une comédie absurde, l’apogée paradoxal étant atteint avec le rodéo impromptu de Blaise et le réveil surréaliste d’une Maja qui s’était pourtant noyée quelques minutes auparavant. Tout donne l’impression que Maja a continué son rêve, que la lutte contre le requin se poursuit dans son inconscient.

Étrange et déstabilisant, le film fonctionne entre autres grâce à un travail très particulier du cadre et de la couleur. Chaque plan résulte d’une recherche qui porte le propos, l’émotion. Lorsqu’elle est sur son ordinateur, Maja est baignée d’une lumière froide alors que son mari, à l’arrière-plan, représente la douceur chaleureuse du foyer. « Tu devrais arrêter Twitter », lance-t-il machinalement en partant dans la nature alors qu’elle reste enfermée devant son smartphone. Lors des confrontations entre les époux, la caméra se décentre souvent, donnant une perspective, un retrait, mais une intensité aux échanges.

Le titre lui-même porte une critique sociale très actuelle : « L’année passée c’était le covid, cette année le requin. Ce sera quoi l’année prochaine ? ». Le requin est le pur produit d’un réchauffement climatique entrainant des catastrophes face auxquelles tant les politiciens, les écologistes et les réactionnaires sont impuissants. Chacun ne cherchant finalement qu’à se dédouaner de toute responsabilité. Comme le dit le maire : « Ça va encore être la faute de la mairie ! ».

Sans y toucher, le film démontre le succès et la nécessité de décennies de lutte féministe. Le personnage principal est une femme qui s’est consacrée à sa carrière avec le soutien d’un mari effacé et très gentil (Kad Merad, incroyablement humain en mari bedonnant). Son assistante Eugénie est une femme (Christine Gautier). Pourtant, encore une fois, aucune insistance n’est placée sur le sujet. Le sexe des personnages importe peu, les relations étant, à tous les niveaux, purement basées sur leur caractère. Aucune séduction, aucune histoire d’amour autre qu’un mariage de longue date entre Maja et son mari, aucune mièvrerie. Le tout avec des interactions humaines profondément réalistes (dans des situations qui le sont évidemment beaucoup moins).

L’année du requin n’est certes pas le film de la décennie, la faute probablement à un scénario un peu simplet, il offre néanmoins une expérience cinématographique originale, nouvelle. Les frères Boukherma nous gratifiant d’un nouveau genre : celui de la parodie sérieuse qui ne se prend pas la tête. Fourmillant de trouvailles (la radio, la voix off particulièrement originale), le film mêle plaisir, clins d’œil aux cinéphiles, critique sociale et cadre original, le tout servi par des acteurs dont les talents sont particulièrement bien exploités.

Que demander de plus ?

Une morale ? Le film se termine justement sur une morale gentille, mais pas trop bateau et parfaitement appropriée : « Il y a deux types de héros. Ceux qui veulent sauver le monde et ceux qui veulent sauver ceux qu’ils aiment ».

Si l’année du requin ne sauve pas ni ne révolutionne le monde, il saura offrir quelques heures de plaisir à ceux qui cherchent des saveurs nouvelles sans se prendre la tête et qui aiment ce cynisme un peu grinçant qui ne s’inscrit dans aucune case précise. Il m’a clairement donné envie de découvrir Teddy, le premier film de ce jeune tandem de réalisateurs jumeaux. Et si après le loup-garou et le requin, ils décident de s’attaquer à la science-fiction, je suis volontaire pour leur pondre un scénario.

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

August 07, 2022

Si ma déconnexion totale a été un échec, si j’ai repris des connexions intempestives, mon usage de l’ordinateur a cependant été profondément modifié. Il est, par défaut, non connecté. Je suis conscient de chaque connexion. Et je ne regarde mes emails qu’une fois, parfois deux par jour. Ce dernier changement ayant grandement facilité grâce à une action que j’ai commencé il y a près de trois ans : supprimer mes comptes en ligne.

Au cours de ces trois dernières années, j’ai activement supprimé plus de 600 comptes sur différentes plateformes. Chaque fois que je reçois un mail d’une plateforme sur laquelle j’ai un compte inutilisé, je procède aux démarches, parfois longues et fastidieuses, pour le supprimer. Au cours de ces trois années, de nombreuses plateformes sont réapparues dont j’avais oublié jusqu’à l’existence.

Le travail a été de très longue haleine, mais commence à porter ses fruits et m’enseigne énormément sur cette marque de viande en boîte transformée en nom commun par des humoristes anglais déguisés en Vikings : le spam.

Les différents types de spams

J’ai identifié trois sortes de spams : le random-spam, l’expected-spam et le white-collar-spam.

Le random-spam est le pur spam dans la plus ancienne tradition du terme. Des emails envoyés à des millions de personnes sans aucune logique, pour vous vendre du viagra, pour vous convaincre d’installer un spyware, d’aider un prince nigérien à récupérer des millions ou de verser une rançon en bitcoins, car vous avez été soi-disant filmé en train de vous palucher devant un site porno. Une fois que votre adresse est publique, il n’y a rien à faire contre ce type de spam si ce n’est tenter de les filtrer. Il est complètement illégal. C’est d’ailleurs sa caractéristique première : il n’est lié à aucune entité juridique évidente. Vous ne pouvez pas vous plaindre ou vous désinscrire. Si le random-spam était une vraie plaie historiquement, je suis surpris de constater que sur mon adresse la plus spammée, une adresse publiée partout depuis quinze années, présente dans une kyrielle de bases de données publiques, je reçois en moyenne un random-spam tous les deux ou trois jours (il est automatiquement détecté comme tel et placé dans mon dossier spam, les faux négatifs étant très rares). La moitié de ces spams concernent les cryptomonnaies. J’en déduis que sur une adresse relativement récente et peu publique, vous recevrez très peu de ces spams.

L’expected-spam est exactement le contraire : c’est du spam envoyé par des plateformes ou des services sur lesquels vous êtes inscrits de votre plein gré. Notifications, enquête de satisfaction, newsletters ou autres annonces de nouveautés. La particularité est que vous pouvez vous désinscrire, même si ce n’est souvent que très temporairement (comme pour Facebook ou Linkedin, qui s’évertuent à créer des nouvelles newsletters ou catégories d’emails pour se rappeler à vous). Au final, il est très simple de se débarrasser de ce spam : supprimer définitivement votre compte de ce service. En théorie. Parce que certains continuent à vous envoyer des messages dont vous ne pouvez plus vous désabonner vu que vous n’avez plus de compte. Une menace de plainte RGPD suffit généralement à résoudre le « bug » informatique. Il est donc possible de réduire l’expected-spam à zéro (sauf s’il provient de votre employeur. Les entreprises se plaignent du manque de productivité des employés, mais paient des gens pour les assommer sous les newsletters internes complètement inutiles, allez comprendre).

Vient ensuite la troisième catégorie : le white-collar-spam. Le white-collar-spam est en fait du spam qui se donne des fausses impressions de légalité. Ce sont des entreprises qui ont acheté vos données et qui vous contactent comme si vous étiez inscrits chez eux. Un lien de désinscription est généralement toujours disponible. Mais plutôt que de me désinscrire simplement, je contacte chacune des entreprises et demande d’où elles tiennent mes données, les menaçant de poursuite RGPD. J’ai ainsi découvert que l’immense majorité des white-collar-spam proviennent, en francophonie, d’un ou deux fournisseurs. Ces fournisseurs sont absolument peu scrupuleux sur la manière dont ils collectent les données. Ce n’est pas étonnant : leur métier est littéralement d’emmerder les utilisateurs d’emails. Leurs clients sont les entreprises, les organisations non gouvernementales et les services publics. Ils classent les emails en catégories et vendent ces bases de données pour une durée limitée. Ce dernier point est important, car un an après avoir été en contact avec l’un de ces spammeurs-légaux-professionnels et avoir clairement fait comprendre que mes données ne pouvaient plus être vendues, j’ai reçu du spam d’un de leur client. Il s’est avéré que le client, un service public français à vocation culturelle, avait réutilisé une base de données achetée deux ans auparavant, ce qui était interdit par son contrat.

J’ai donné le nom « white-collar-spam », car ce spam n’est guère différent du random-spam illégal si ce n’est qu’il est accompli par des sociétés ayant pignon sur rue très fières de leur métier de spammeur. Au lieu de lutter contre le spam, nous en avons fait une activité honorable et rémunératrice !

Outre ces quelques acteurs professionnels du spam, une grande quantité de white-collar-spam provient indirectement de Linkedin. En effet, certains outils permettent aux professionnels du marketing (le nouveau nom pour spammeur) de récolter les adresses mails, même cachées, de leurs contacts Linkedin. Si vous avez un de ces très nombreux spammeurs dans vos contacts sur ce réseau, vous êtes foutu. La solution la plus simple : supprimer votre compte Linkedin et laisser les spammeurs entre eux (la fonction première de ce réseau). Le simple fait d’effacer mon compte Linkedin a divisé par deux, en quelques semaines, le nombre de spams que je recevais.

La majorité du spam que je reçois aujourd’hui est donc ce white-collar-spam qui est plus ou moins légal et complètement immoral.

Une idée m’est venue pour le combattre très simplement : interdire la revente d’une donnée identifiante sans l’accord de la personne concernée. Simple comme tout : si une société souhaite vendre des données, elle doit en demander l’autorisation à chaque transaction. Cette règle s’appliquerait également en cas de rachat d’une société par une autre ou en cas de transfert d’une entité juridique à une autre. Il semble en effet évident que l’on peut partager ses données avec une entité, mais ne pas vouloir le faire avec une autre. La société machin sur laquelle vous avez un compte se fait racheter par truc ? Vous devez marquer votre accord sans quoi vos données seront effacées après un délai de quelques mois. Simple à implémenter, simple à surveiller, simple à légiférer.

Ce qui signifie que si nous avons du spam, c’est parce que nous le voulons. Comme la cigarette ou la pollution industrielle, le spam fait partie des nuisances dont nous nous plaignons sans réellement oser les combattre parce que nous sommes persuadés qu’il y’a une raison valable pour laquelle ça existe, parce que nous nous y sommes habitués et parce que certains se sont tellement enrichis avec qu’ils peuvent influencer le pouvoir politique et médiatique. Pire : nous admirons même un peu ceux qui gagnent leur vie de cette manière et sommes prêts à travailler pour eux si une offre juteuse se présente.

Les bénéfices insoupçonnés de la suppression de compte

La solution la plus radicale et qui fonctionne à merveille reste de supprimer tous ses comptes. C’est un processus de longue haleine : je me suis découvert plus de 600 comptes au fur et à mesure que je fouillais mon gestionnaire de mot de passe, les comptes liés à mes comptes Google, Facebook et LinkedIn. Chaque fois que je crois avoir fait le tour, des comptes complètement oubliés réapparaissent dans ma boîte mail lorsqu’ils modifient leurs conditions d’utilisation.

Supprimer un compte qu’on n’utilise plus est un processus pénible : réussir à se reconnecter, à trouver la procédure pour supprimer qui est souvent planquée et artificiellement complexe (pas toujours). Mais c’est encore plus difficile lorsqu’il s’agit d’un compte qu’on utilise ou qu’on pense pouvoir réutiliser. Le plus difficile étant lorsqu’un historique existe, historique souvent agrémenté d’un score : nombre d’amis, points, karma, récompenses, badges… Après Facebook et Twitter, Reddit et Quora furent probablement les comptes les plus difficiles à supprimer. Je me suis rendu compte que je tirais une fierté absurde de mon karma et de mes scores alors que je n’ai jamais été un utilisateur assidu de ces plateformes.

Mention spéciale tout de même à ces sites qui ont prétendu avoir effacé mes données sans réellement le faire. Dans le cas d’une chaine de restaurants de sushi, le gestionnaire s’est contenté de rajouter « deleted_ » devant mon adresse email. Ce fut encore pire pour un grand site immobilier belge. Plus d’un an après la suppression totale de mes données, le site s’est soudain mis à m’envoyer journalièrement le résultat d’une recherche que j’avais enregistrée une décennie auparavant. Sans possibilité de désactiver, mon compte étant officiellement supprimé. Il a fallu plusieurs semaines d’échanges par email pour résoudre le problème et obtenir un semblant d’explication : un très vieux backup aurait été utilisé pour restaurer certaines bases de données. Je vous laisse juge de la crédibilité d’une telle raison.

De toutes mes histoires, j’ai appris une généralité : l’immense majorité des services est en réalité incapable de supprimer vos données, que ce soit par malveillance ou par incompétence. Toute donnée entrée sur un site doit être considérée comme définitivement compromise et potentiellement publique. Si j’ai très souvent accordé le bénéfice du doute, attribuant les erreurs ou difficultés à l’incompétence, j’ai plusieurs fois été confronté à ce qui ne pouvait être que des mensonges manifestes et éhontés. Une grande majorité des services web réclamant vos données sont donc soit incompétents, soit profondément malhonnêtes. Soit les deux. L’exception venant des petits services artisanaux, généralement développés par une toute petite équipe. Dans tous les cas de ce genre, l’effacement s’est fait rapidement, proprement et parfois avec un mot gentil personnalisé. Preuve que la suppression n’est pas un acte techniquement insurmontable.

Contrairement à l’abstinence ou au blocage d’accès à ces sites, la suppression du compte a eu chez moi un impact absolument incroyable. Du jour au lendemain, j’ai arrêté de penser à ce qui se passait sur ces plateformes. Du jour au lendemain, j’ai arrêté de penser à ce qui pourrait avoir du succès sur ces plateformes. J’ai arrêté de penser pour ces plateformes. J’ai arrêté de me plier à leurs règles, de travailler inconsciemment pour elles. J’ai arrêté d’avoir envie de les consulter. Et lorsque me vient l’envie d’y poster ou d’y répondre, le fait de devoir recréer un compte pour l’occasion est assez pour m’arrêter dans mon élan et me faire remarquer que j’ai mieux à faire. Lorsqu’une plateforme est soudain vraiment nécessaire, je recrée un compte, si possible avec une adresse jetable et le supprime après emploi. Une fois le réflexe pris, ce n’est plus tellement contraignant.

Plateformes et militantisme

N’ayant pas supprimé mon compte Mastodon, par simple soutien idéologique au projet, je me retrouve mécaniquement à explorer cette plateforme. Plateforme elle-même complètement biaisée (si je devais la considérer comme représentative de la France, Mélenchon aurait dû devenir président avec près de 95% des voix, le reste étant essentiellement des abstentions).

Dans le militantisme, il existe deux écoles. La première prétend qu’il faut aller chercher les gens où ils sont. Militer pour le logiciel libre sur Facebook par exemple. La seconde soutient qu’il faut d’abord être fidèle à ses propres valeurs, ses convictions.

Je suis désormais convaincu de la seconde approche. Je pense avoir soutenu la première approche pendant des années entre autres pour justifier ma quête égotique sur les réseaux propriétaires, pour résoudre mon conflit interne. Car, quelle que soit l’intention derrière un message, son impact sera toujours contrôlé par la plateforme sur laquelle il est posté. Le simple fait d’utiliser une plateforme nous déforme et nous conforme à ladite plateforme.

Je pense également qu’il ne faut pas aller « chercher les gens là où ils sont ». Ne pas crier pour tenter de couvrir le bruit ambiant. Il faut au contraire construire des espaces de calme, des espaces personnels et faire confiance aux humains pour les trouver lorsqu’ils en ont besoin. Le simple fait d’avoir un compte sur une plateforme justifie pour tous vos contacts le fait de rester sur cette plateforme. Le premier qui quitte la plateforme s’exclut du groupe. Le second force le groupe à se poser des questions. Le troisième implique que « le groupe » n’est tout simplement plus sur cette plateforme, que celle-ci est devenue inutile dans le cadre du groupe.

Aucun discours ne convainc autant que montrer l’exemple. Faire plutôt que dire. Être plutôt que convaincre. Vivre ses propres choix, sa propre personnalité et respecter ceux qui en font d’autres en acceptant que cela puisse nous éloigner.

Oui, en supprimant mes comptes j’ai raté des opportunités sociales. Mais soit je ne m’en suis pas rendu compte, ce qui a épargné mon énergie mentale, soit cela a eu pour impact de faire prendre conscience à mon entourage qu’ils ne pouvaient plus faire entièrement confiance à Facebook ou Whatsapp. Dans tous les cas, le rapport coût/bénéfice s’est révélé disproportionnellement en faveur de la suppression.

À chaque compte effacé, j’ai eu le sentiment qu’on m’enlevait un poids des épaules. Je me sentais revivre. Certes, je perdais une « audience potentielle », mais j’y gagnais en liberté, en plaisir d’écrire sur mon blog, sur mon gemlog voire sur ma machine à écrire plutôt que de sans cesse réagir, répondre, être en réaction (au sens le plus Nitzchéen du terme).

Si j’ai replongé dans la connexion intermittente, un progrès énorme s’est fait : la connexion m’ennuie de plus en plus. Le nombre de plateformes sur lesquelles lire du contenu s’est à ce point restreint que j’en fais très vite le tour. J’ai également compris que mon addiction n’est pas uniquement due à la connexion, elle est également technophile. J’aime être sur mon ordinateur, devant mon écran. Je tente de trouver des excuses pour garder les mains sur le clavier, pour mettre à jour un logiciel, configurer mon environnement, améliorer mes processus, découvrir, coder. Bref, « chipoter ».

La découverte de cette composante de mon addiction m’a convaincu de faire entrer ma déconnexion dans une nouvelle phase. Celle de la matérialité.

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

August 02, 2022

Zoals altijd zijn we als EU of Europa achtergesteld omdat we geen enkel militair antwoord hebben op de zaken die geostrategisch gaande zijn.

Voor Ukraine kunnen we weinig tot niets doen omdat we geen enkel antwoord hebben op de vraag ‘vanwaar komt het gas dan wel?’

Deze vraag is te belachelijk simpel en toch kan geen enkel EU politicus ze beantwoorden.

Nochtans was het antwoord hierop haalbaar: men had werk kunnen maken van alternatieve energiebronnen. Maar dat vonden de heren en dames EU-politici niet nodig. Overbodig. En zo verder.

M.a.w. zijn ze volstrekt incompetent. Ik schrijf ze effectief volledig af. Want ze hadden daar wel een antwoord op moeten kunnen formuleren. De idioten die er nu nog zitten kunnen dat niet. Daarom zijn het losers en daarom horen we ze te ontslaan uit hun functie. Helaas zijn het ook populisten en daarom zal hun ontslag vele jaren duren (zie Brexit).

Richting Taiwan doen de EU politici alweer hun belachelijke best om wat dan ook te betekenen. Maar iets betekenen doen ze helemaal niet. Ze doen niets dat er toe doet.

Omdat ze collectief besloten hebben geen EU-leger te hebben.

Daarom zijn ze onbelangrijk. Insignificant. Incompetent. Onbelangrijk.

August 01, 2022

I wanted to outline the development and deployment workflow I use on dri.es, my personal website.

My site uses Drupal (obviously) and runs on Acquia Cloud (of course), but a lot of this is a best practice for any web application.

I manage my website's code and configuration in Git. Each time I commit a change to my Git repository, I go through the following steps:

  1. I create a staging environment to test my code before deploying it to production. It's a complete staging environment: not just PHP, MySQL and Nginx, but also Varnish, Memcache, etc.
  2. I check out my Git repository. My Git repository hosts my custom files only. It's a best practice not to commit Drupal core or third-party Drupal modules to your Git repository.
  3. I run PHP Code Sniffer to make sure my code conforms to my coding style rules. I specify my coding style rules in phpcs.xml and use phpcs to make sure my code adheres to them. If not, phpcbf tries to fix my code automatically. I like my code tidy.
  4. I run PHPStan, a static code analysis tool for PHP, that scans my code base for bugs. It will find dead code, type casting problems, incorrect function arguments, missing type hints, unknown function calls, and much more. PHPStan is a fantastic tool.
  5. I run PHP Unit, a PHP testing framework, to make sure my unit tests pass.
  6. I run phpcs-security-audit, a static code analysis tool for PHP. It scans my PHP code for security vulnerabilities and security weaknesses.
  7. I run ESLint, a static code analysis tool for JavaScript. It scans my JavaScript code for security vulnerabilities and weaknesses.
  8. I run nodejs-scan to find insecure code patterns in my Node.js applications. I don't use Node.js at the moment though.
  9. I also run Semgrep, a static code analysis tool for a variety of programming languages.
  10. I run Rector to make sure I don't use deprecated Drupal code. When I do, Rector will try to programmatically update any deprecated code that it finds.
  11. As my Git repository only has custom files, I use Composer to download and install the latest version of Drupal and all third-party modules and components.
  12. I run drush pm:security. Drush is a Drupal-specific tool, and the pm:security option verifies that I have no insecure dependencies installed.

This all might sound like a lot of work to set up, and it can be. For Acquia customers and partners, Acquia Code Studio automates all the steps above. Acquia Code Studio is a fully managed CI/CD based on Gitlab, with specific steps optimized for Drupal. In 20+ years of working on Drupal, it's my best webops workflow yet. It couldn't be easier.

A screenshot of the Acquia Code Studio UI showing some of the automated tests.A screenshot of Acquia Code Studio showing the automated tests feature.

Acquia Code Studio also takes care of automating dependency updates. Code Studio regularly checks if Drupal or any of its dependencies have a new release available. If there is a new release, it will run all the steps above. When all of the above tools pass, Acquia Code Studio can deploy new code to production with one click of a button.

A screenshot of the Acquia Code Studio UI showing that some Composer packages have been updated.A screenshot of Acquia Code Studio showing the automated update feature.

I love it!

July 30, 2022

Many Bluetooth Low Energy (BLE) devices broadcast data using manufacturer-specific data or service data in their advertisements. The data format is often defined in a specification or should be reverse-engineered.

If you want to decode the binary data format into usable chunks of data from various data types in your own Python program, I find the Construct library quite an accessible solution. And Bleak is my favorite BLE library in Python, so first install Bleak and Construct:

pip3 install bleak construct

As an example, let's see how you could decode iBeacon advertisements with Bleak and Construct in Python.

The iBeacon specification

The iBeacon specification, published by Apple, is officially called Proximity Beacon. The idea is to have Bluetooth beacons advertise their presence in order to calculate their approximate distance. You can find the iBeacon specification online.

The specification lists the format of the iBeacon advertising packet. This always consists of two advertising data structures: flags (of length 2) and manufacturer-specific data (of length 26). That’s why an iBeacon advertising packet is always 30 bytes long (1 + 2 + 1 + 26). Here's the structure of the complete packet:

/images/proximity-beacon-advertising-packet.png

We're specifically interested in the second data structure with type 0xff, which signifies that it's manufacturer-specific data. The first two bytes of these manufacturer-specific data are always the company ID. To know which company ID is linked to which company, consult the list of all registered company identifiers. Normally the company ID is the ID of the manufacturer of the device. However, Apple allows other manufacturers to use its company ID for iBeacon devices if they agree to the license.

Note

The company ID is a field of two bytes that are sent as a little-endian value. If you look at an iBeacon packet capture in Wireshark, the bytes on the air are 4c 00. However, Apple's real company ID is 00 4c, or 76 in decimal.

If you want to know more about the meaning of the iBeacon packet's fields, consult the document Getting Started with iBeacon published by Apple.

Decoding iBeacon advertisements

Now that you know the format, let's see how to scan for iBeacon advertisements and decode them:

ibeacon_scanner/ibeacon_scanner.py (Source)

"""Scan for iBeacons.

Copyright (c) 2022 Koen Vervloesem

SPDX-License-Identifier: MIT
"""
import asyncio
from uuid import UUID

from construct import Array, Byte, Const, Int8sl, Int16ub, Struct
from construct.core import ConstError

from bleak import BleakScanner
from bleak.backends.device import BLEDevice
from bleak.backends.scanner import AdvertisementData

ibeacon_format = Struct(
    "type_length" / Const(b"\x02\x15"),
    "uuid" / Array(16, Byte),
    "major" / Int16ub,
    "minor" / Int16ub,
    "power" / Int8sl,
)


def device_found(
    device: BLEDevice, advertisement_data: AdvertisementData
):
    """Decode iBeacon."""
    try:
        apple_data = advertisement_data.manufacturer_data[0x004C]
        ibeacon = ibeacon_format.parse(apple_data)
        uuid = UUID(bytes=bytes(ibeacon.uuid))
        print(f"UUID     : {uuid}")
        print(f"Major    : {ibeacon.major}")
        print(f"Minor    : {ibeacon.minor}")
        print(f"TX power : {ibeacon.power} dBm")
        print(f"RSSI     : {device.rssi} dBm")
        print(47 * "-")
    except KeyError:
        # Apple company ID (0x004c) not found
        pass
    except ConstError:
        # No iBeacon (type 0x02 and length 0x15)
        pass


async def main():
    """Scan for devices."""
    scanner = BleakScanner()
    scanner.register_detection_callback(device_found)

    while True:
        await scanner.start()
        await asyncio.sleep(1.0)
        await scanner.stop()


asyncio.run(main())

First it defines a Struct object from the Construct library, and calls it ibeacon_format. A Struct is a collection of ordered and usually named fields. 1 Each field in itself is an instance of a Construct class. This is how you define the data type of bytes in an iBeacon data structure. In this case the fields are:

  • Const(b"\x02\x15"): a constant value of two bytes, because these are always fixed for an iBeacon data structure.

  • Array(16, Byte): an array of 16 bytes that define the UUID.

  • Int16ub for both the major and minor numbers, which are both unsigned big-endian 16-bit integers.

  • Int8sl for the measured power, which is a signed 8-bit integer.

Now when the device_found function receives manufacturer-specific data from Apple, it can easily parse it. It just calls the parse function on the ibeacon_format object, with the bytes of the manufacturer-specific data as its argument. The result is an object of the class construct.lib.containers.Container, with the fields that are defined in the ibeacon_format struct. That's why you can just refer to the fields like ibeacon.major, ibeacon.minor and ibeacon.power.

However, ibeacon.uuid returns a construct.lib.containers.ListContainer object, which is printed as a list of separate numbers. To print it like a UUID, first convert it to bytes and then create a UUID object from these bytes.

Note

This Python program doesn't explicitly check for the company ID and the first two bytes in the manufacturer-specific data. The code just assumes it receives iBeacon data and catches exceptions if this assumption proves false: the KeyError exception happens if there's no 0x004c key in the manufacturer_data dictionary and the ConstError exceptions happens if the first two bytes of the data don't equal the constant b"\x02\x15". This common Python coding style is called EAFP (easier to ask for forgiveness than permission), and in many cases it makes the code easier to follow. The other style, testing for all conditions before, is called LBYL (look before you leap).

If you run this program, it will scan continuously for iBeacons and shows their information:

$ python3 ibeacon_scanner.py
UUID     : fda50693-a4e2-4fb1-afcf-c6eb07647825
Major    : 1
Minor    : 2
TX power : -40 dBm
RSSI     : -80 dBm
-----------------------------------------------
UUID     : d1338ace-002d-44af-88d1-e57c12484966
Major    : 1
Minor    : 39904
TX power : -59 dBm
RSSI     : -98 dBm
-----------------------------------------------

This will keep scanning indefinitely. Just press Ctrl+c to stop the program.

1

A Struct object in Construct is comparable to a struct in the C programming language.

Long time no Radiohead here, so let’s fix that shall we? Here’s Thom Yorke solo in Zermatt (Switzerland) playing songs from Radiohead, his solo-records and his new band (The Smile). If anything this is a testament to the great songwriter the man is! Also remarkable; he seems so much more at ease on stage now, maybe having accepted the spotlights which sometimes seemed too much for him to cope with.

Source

July 29, 2022

Setup of a HifiBerry AMP2...on a Rapsberry Pi 2.

First attempt was with Volumio, as advised by a friend. Well that works, but I personally find the interface a horror, and I seem to lose control of the Pi since Volumio is a full OS that seems only accessible by web interface. No thanks.

Using Raspberry Pi OS:

- download Raspberry Pi OS lite (command line is fine)

- extract the image

- dd the image to the sd-card

dd if=/home/paul/2022-04-04-raspios-bullseye-armhf-lite.img of=/dev/sdb bs=1M

- mount it to enable ssh

touch /boot/ssh

- I also had to set a password for the pi user, since 'raspberry' was not accepted?

- Boot the Pi (the HifiBerry is still attached)

- ssh into the Pi 

apt update
apt upgrade
apt install vim
vi /boot/config.txt
#dtparam=audio=on
#dtoverlay=vc4-kms-v3d

# added by paul 2022-07-29 for HifiBerry AMP2
dtoverlay=hifiberry-dacplus
force_eeprom_read=0


Comment out the first two lines, add the last two. Check here for other HifiBerries.

Now, before using mplayer or something, LOWER THE VOLUME! Use a really low value, and gradually go up while playing music since the default is extremely loud.

amixer -c 0 sset Digital "20%"

Thanks for listening :)

July 25, 2022

Een eigen Europees leger starten. Waarbij ieder Europees land haar eigen expertise in de groep werpt.

Afspraken maken met Rusland over de energievoorziening van Europa.

Een nieuw veiligheidspakt met Rusland maken opdat er zo weinig mogelijk conflicten in Europa zullen zijn.

Machtsprojectie doen vanuit Europa, met het Europees leger. We moeten opnieuw leren wat het is om aan geostrategie te doen. We moeten dat Europees leger durven inzetten om onze strategische doelen te behalen. We moeten niet verlegen zijn om de wereld duidelijk te maken dat wij zulke strategische doelen hebben.

Het conflict in Oekraïne beïndigen. Want het dient ons (Europeanen) en Russen niet. We zijn beiden benadeeld door dit conflict. We hebben er beiden baad bij om dit te beïndigen.

Durven praten over Europa en niet enkel over de Europese Unie.

July 23, 2022

Almost 2 decades ago, Planet Debian was created using the "planetplanet" RSS aggregator. A short while later, I created Planet Grep using the same software.

Over the years, the blog aggregator landscape has changed a bit. First of all, planetplanet was abandoned, forked into Planet Venus, and then abandoned again. Second, the world of blogging (aka the "blogosphere") has disappeared much, and the more modern world uses things like "Social Networks", etc, making blogs less relevant these days.

A blog aggregator community site is still useful, however, and so I've never taken Planet Grep down, even though over the years the number of blogs that was carried on Planet Grep has been reducing. In the past almost 20 years, I've just run Planet Grep on my personal server, upgrading its Debian release from whichever was the most recent stable release in 2005 to buster, never encountering any problems.

That all changed when I did the upgrade to Debian bullseye, however. Planet Venus is a Python 2 application, which was never updated to Python 3. Since Debian bullseye drops support for much of Python 2, focusing only on Python 3 (in accordance with python upstream's policy on the matter), that means I have had to run Planet Venus from inside a VM for a while now, which works as a short-term solution but not as a long-term one.

Although there are other implementations of blog aggregation software out there, I wanted to stick with something (mostly) similar. Additionally, I have been wanting to add functionality to it to also pull stuff from Social Networks, where possible (and legal, since some of these have... scary Terms Of Use documents).

So, as of today, Planet Grep is no longer powered by Planet Venus, but instead by PtLink. Rather than Python, it was written in Perl (a language with which I am more familiar), and there are plans for me to extend things in ways that have little to do with blog aggregation anymore...

There are a few other Planets out there that also use Planet Venus at this point -- Planet Debian and Planet FSFE are two that I'm currently already aware of, but I'm sure there might be more, too.

At this point, PtLink is not yet on feature parity with Planet Venus -- as shown by the fact that it can't yet build either Planet Debian or Planet FSFE successfully. But I'm not stopping my development here, and hopefully I'll have something that successfully builds both of those soon, too.

As a side note, PtLink is not intended to be bug compatible with Planet Venus. For one example, the configuration for Planet Grep contains an entry for Frederic Descamps, but somehow Planet Venus failed to fetch his feed. With the switch to PtLink, that seems fixed, and now some entries from Frederic seem to appear. I'm not going to be "fixing" that feature... but of course there might be other issues that will appear. If that's the case, let me know.

If you're reading this post through Planet Grep, consider this a public service announcement for the possibility (hopefully a remote one) of minor issues.

What does one do on a free Saturday afternoon? Upgrading Linux machines of course! I thought upgrading my Ubuntu 20.04 LTS laptop to Ubuntu 22.04 LTS would be a routine task I could keep running in the background, but... computer said no.

The first hurdle was starting the upgrade:

$ sudo do-release-upgrade
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release
set Prompt=normal in /etc/update-manager/release-upgrades.

I was puzzled: although Ubuntu 22.04 has been released three months ago, Ubuntu 20.04 doesn't detect this as a new release. It took some digging around to discover that apparently upgrades from one LTS release to the next one are only available after the first point release. So in this case, Ubuntu 20.04 will not detect a newer version until Ubuntu 22.04.1 is released, which is scheduled for August 4.

Luckily you're still able to upgrade, you just have to ask for the development version:

$ sudo do-release-upgrade -d

Ok, so after this first hurdle I thought this was the most exciting part of the upgrade, but I was wrong. I'm not sure what the problem was, but after all packages had been downloaded and when the upgrade process was in the middle of applying the package upgrades, the screen became grey and showed the message "Something has gone wrong. Please logout and try again." I had to restart Ubuntu and it even didn't reach the login screen, just showing me a grey screen.

This all looked familiar. 1 I encountered exactly the same problem two years ago while upgrading Ubuntu 19.10 to Ubuntu 20.04 LTS. So luckily I could take the blog article I wrote then as a guideline for the recovery process. However, the fix was slightly more complex this time. These are my notes of fixing this.

First reboot your computer and start Ubuntu in recovery mode:

  • Hold the Shift key while booting the PC.

  • In the GRUB boot menu that appears, choose the advanced options and then recovery mode.

  • In the recovery menu that appears, enable networking first and then choose the option to open a root shell.

Because the installation has been aborted, I tried fixing a broken install:

# apt --fix-broken install

While last time this fixed the issue, I now encountered an error about the Firefox package. Ubuntu decided to switch Firefox to a snap package, and apparently apt wasn't able to install the new deb package that installs the snap. As a temporary workaround, I removed Firefox and ran the apt command again:

# apt remove firefox
# apt --fix-broken install

This still resulted in the same error, so my next idea was to prevent the upgrade process from installing Firefox. So I created the apt preference file /etc/apt/preferences.d/firefox-no-snap.pref with the following configuration:

Package: firefox*
Pin: release o=Ubuntu*
Pin-Priority: -1

Then I tried to fix the install again:

# apt --fix-broken install

This worked! No complaints about the Firefox package this time.

To be sure I didn't miss any package configuration, I configured all unpacked but not yet configured packages:

# dpkg --configure -a

This returned silently, so no issues there.

Then I continued the upgrade:

# apt upgrade

And this now went smoothly. So after the upgrade succeeded, I had Ubuntu 22.04 on my system:

# lsb_release -a
No LSB modules are available.
Distributor ID:   Ubuntu
Description:  Ubuntu 22.04 LTS
Release:  22.04
Codename: jammy

I rebooted:

# reboot

And then I could login again and was welcomed by the new jellyfish desktop background:

/images/ubuntu-22.04-desktop.png

Then I reinstalled Firefox as a snap: 2

$ snap install firefox

And finally I had a working Ubuntu laptop again.

Note

If you don't remember exactly what you did in recovery mode to fix your upgrade and you have already rebooted, just run sudo su and then history. It will show you the commands you ran as root.

1

This article about fixing an upgrade to Ubuntu 20.04 is by far the most popular one on my blog, and I still get some emails sometimes from people thanking me that I saved them the trouble of finding out how to fix their broken upgrade.

2

At this moment I was just too lazy to find out how to install Firefox from the Mozilla team's PPA. I know that there are some issues with it, but I'll just have to see whether the snap works for me.

July 22, 2022

There are not enough movies that really floor me. Everything, Everywhere, All at Once did. Go see it!

Source

July 20, 2022

Acquia recently turned 15. Over the past 15 years, I've heard the name "Acquia" pronounced in all sorts of different ways.

It was Jay Batson, my co-founder, who came up with the name "Acquia". Please blame Jay for the difficult name. ;)

When it came time to pick a name for our company, Jay insisted that the name started with the letter A. I remember questioning the value of that. In a world where people search for things, who looks up things in alphabetical order?, I asked.

In the end, Jay was right. I learned a great many things the past 15 years, including how common alphabetical listings still are. You'd be amazed how often Acquia ranks number one in listings. It gave us a small edge.

For more background on how the name Acquia came to be, and some other Acquia trivia, check out Jay's blog post "ah-kwe-eh".

I published the following diary on isc.sans.edu: “Malicious Python Script Behaving Like a Rubber Ducky“:

Last week, it was SANSFIRE in Washington where I presented a SANS@Night talk about malicious Python scripts in Windows environment. I’m still looking for more fresh meat and, yesterday, I found another interesting one.

Do you remember the Rubber Ducky? Pentesters like this kind of gadgets. I still have one as well as others with WiFi capabilities The idea behind these USB keys is to deliver a payload by simulating a keyboard. When you connect then to a computer, they are detected as a HID (“Human Interface Device”). The payload will be “injected” like if the user pressed all the keys one by one… [Read more]

The post [SANS ISC] Malicious Python Script Behaving Like a Rubber Ducky appeared first on /dev/random.

Usually, I receive a lot of emails, and sometimes I read them on my phone and then… I forgot about them.. (shame on me).

On my Linux desktop, I used to use Get Things Gnome for a long time, due to the declining appeal of the project and the end of the extension for Thunderbird, I found it less and less useful.

I was then looking for a solution to have my todolist accessible from everywhere and that I could manage it myself, not hosted somewhere.

I found a very nice, fast and practical project that was easy to deploy and was using MySQL as backend: myTinyTodo.

However, I was missing the possibility to easily create a new task from an email (and especially on my phone).

This is why I decided to write a script that would perform exactly what I was looking for and integrated it with myTinyTodo.

mail2todo was born !

This script reads the emails from a dedicated imap account and populate the MySQL database used by myTinyTodo.

I really like how tags where handled in Get Things Gnome, so I used the same behavior:

This script is doing exactly what I needed !

The requirements are simple:

The code can be improved but currently it does what I need and this is why I decided to share it. You can also see how easy it’s to work with the mysqlx module even to handle only SQL queries.

Enjoy MySQL, Python, the X Protocol and myTinytodo !

July 19, 2022

Autoptimize 3.1 was just released with some new features and some fixes/ improvements: new: HTML sub-option: “minify inline CSS/ JS” (off by default). new: Misc option: permanently allow the “do not run compatibility logic” flag to be removed (which was set for users upgrading from AO 2.9.* to AO 3.0.* as the assumption was things were working anyway). security: improvements to the critical CSS...

Source

July 17, 2022

XPO Space

Zeker doen als je interesse hebt in ruimtevaart, er staan enkele boeiende objecten op ware grootte en dat geeft toch een andere indruk. De geschiedenis van de ruimtevaart wordt er goed in beeld gebracht, helaas gaat de rest van de XPO enkel over de Verenigde Staten, Rusland en een beetje ESA. Geen woord over de Indische, Japanse of Chinese ruimtevaart van de laatste tien-twintig jaren.

Er staat ook een foute schaal bij een Saturn V (1:144 ipv 1:72) en ze geven een nieuwe betekenis voor een 'dag' (op Aarde is een dag 24u, volgens XPO is een dag op de maan 14 Aardse dagen, maar dat moet 28 zijn gezien de nacht ook een integraal deel is van de dag).


Maagdenhuis Antwerpen

Het Maagdenhuis is enerzijds een standaard museum met schilderijen (ze hebben Pieter Paul Rubens, Jacob Jordaens en Antoon van Dyck!) en oude voorwerpen, anderzijds geeft dit museum duidelijk aan hoe de tijdsgeest kan veranderen.

Hier staat ook de houten Clara (helaas bestaat het restaurant met dezelfde naam niet meer).


Mayer van den Bergh Antwerpen

Mayer van den Bergh is wereldberoemd voor al wie 'de Dulle Griet' van Suske en Wiske heeft gelezen. Het schilderij met dezelfde naam van Pieter Breugel is gerestaureerd sinds ik het de laatste keer zag, en ja, het ziet er geweldig uit vandaag.

Behalve de schilderijen en de (soms heel oude) voorwerpen, vind ik hier ook de kamers zelf al de moeite om te bekijken.


Rockoxhuis Antwerpen

Voluit het Snijders&Rockoxhuis is boeiend als je graag oude schilderijen ziet. Je krijgt hier een ipad om informatie te lezen over alles wat er staat (of hangt), en dat vind ik veel beter dan een audioplayer omdat ik veel liever lees dan luister (of kijk).

Hier hangt het beroemde spreekwoorden schilderij van Pieter Breugel. Er staat nu een touchscreen bij dit schilderij dat alle spreuken verraadt... maar misschien is het leuker om er zelf een paar te ontdekken die vandaag nog gangbaar zijn.


July 15, 2022

A while ago I bought an external webcam with better image quality than the one built into my laptop. However, when I wanted to use it on my Linux system, I faced an unexpected problem. Not all programs or web sites allowed me to choose which webcam to use. And even worse: the ones that didn't give me the choice automatically chose the first available webcam device, /dev/video0, which is of course the internal webcam.

Luckily there's a solution for everything in Linux. I just had to find a way to disable the internal webcam. My idea was that the external webcam would then become the first available webcam device and this would then be chosen automatically.

So I first looked at the product information of all connected USB devices:

$ for device in $(ls /sys/bus/usb/devices/*/product); do echo $device;cat $device;done
/sys/bus/usb/devices/1-1/product
HD Pro Webcam C920
/sys/bus/usb/devices/1-7/product
Chicony USB2.0 Camera
/sys/bus/usb/devices/usb1/product
xHCI Host Controller
/sys/bus/usb/devices/usb2/product
xHCI Host Controller

As you see, the first two devices are webcams. The HD Pro Webcam C920 is the external one, while the Chicony USB2.0 Camera is the internal one. I wanted to disable the latter. The file with the product information for this webcam is /sys/bus/usb/devices/1-7/product, and I needed the code 1-7 in its path. This means that the device is connected on USB bus 1 port 7.

With this information I could send a command to the USB driver to unbind the port:

$ echo '1-7' | sudo tee /sys/bus/usb/drivers/usb/unbind

After this, the internal webcam isn't found anymore by software or web sites. If I connect the external webcam after this command, it gets assigned /dev/video0 as the device file.

Re-enabling the internal webcam is easy too:

$ echo '1-7' | sudo tee /sys/bus/usb/drivers/usb/bind

This is the same command as the previous one, but with bind instead of unbind in the path.

To make this easier to remember, I created a small shell script, webcam.sh:

#!/bin/sh

device="1-7"
status=$1

case $status in
    enable) driver_command="bind";;
    disable) driver_command="unbind";;
    *) exit 1;;
esac

echo $device | sudo tee /sys/bus/usb/drivers/usb/$driver_command

After making it executable with chmod +x webcam.sh, I could just run webcam.sh disable before connecting the external webcam every time I wanted to use it. And after disconnecting the external webcam, I could always re-enable the internal webcam with webcam.sh enable, but I never bothered with it.

I used the script for a while like this, until I realized I could even run this script automatically every time I connected or disconnected the external webcam, thanks to a udev rule.

So I added the following udev rule to /etc/udev/rules.d/99-disable-internal-webcam.rules:

SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="46d/8e5/c", RUN+="/home/koan/webcam.sh disable"
SUBSYSTEM=="usb", ACTION=="remove", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="46d/8e5/c", RUN+="/home/koan/webcam.sh enable"

I found the correct value for ENV{PRODUCT} in the output of udevadm monitor --kernel --property --subsystem-match=usb while connecting or disconnecting the external webcam.

So now I never have to bother with disabling, enabling or choosing a webcam device. If my external webcam isn't connected, all software chooses the internal webcam. As soon as I connect the external webcam, the software chooses this one. And as soon as I disconnect the external webcam, the software chooses the internal webcam again.

July 12, 2022

If you write C applications that need to connect to MySQL, you can use the MySQL C API aka libmysqlclient. The MySQL C API replaces the outdated MySQL-Connector-C.

If you want to use MySQL 8.0 as a Document Store with the X Protocol, you need then to use MySQL Connector/C++ 8.0.

Some have asked how to compile only the MySQL C API.

Compiling only libmysqlclient

As the FAQ stipulates it, it’s not possible to only build the library. However, as mentioned in the documentation, it’s possible to reduce the amount of compiled products with some cmake options.

You still need to get the full source tree (from GitHub for example) and bypass the compilation of the server.

~ $ mkdir workspace
~ $ cd workspace
~workspace $ git clone https://github.com/mysql/mysql-server.git
~workspace $ mkdir mysql-server-bin-debug
~workspace $ cd mysql-server-bin-debug
~workspace/mysql-server-bin-debug $ cmake -DWITHOUT_SERVER=ON \
                                    -DDOWNLOAD_BOOST=1 \
                                    -DWITH_BOOST=../mysql-server/downloads/ \
                                    -DWITH_UNIT_TESTS=OFF ../mysql-server
~workspace/mysql-server-bin-debug $ make -j 8                        

On my laptop, this command took 2m24sec

Now you have built the client line tools and also libmysqlclient:

Conclusion

Following the process outlined in this blog, it takes me 2min24sec to build the client line tools and also libmysqlclient. You are required to download the complete source, and build some extra command line tools, but the process is fast.

Update !

Joro added extra information to this bug including how to bypass boost. I recommend reading those comments especially if you are building this on Windows.

July 11, 2022

Wow! But I can’t seem to find who’s playing that bass, is it all keyboard, even in the beginning where you here the snares of an upright?

Source

July 07, 2022

Last month I moved from Merelbeke to Ghent. I registered my new address on the government website, and last week I was invited to update my eID with my new address.

I made an appointment with one of the administrative centers of the city. The entire process took less than 5 minutes, and at the end I got a welcome gift: a box with a lot of information about the city services.

It’s been a while since I last did an unboxing video. The audio is in Dutch, maybe if I’m not too lazy (and only if people ask for it in the comments) I’ll provide subtitles.

Unboxing the Ghent box 🎁

July 06, 2022

Conferences are back! After Botconf in April, that’s Pass-The-Salt that is organized this week in Lille, France. After the two years break, the formula did not change: same location, free, presentations around security, and free software! And, most important, the same atmosphere.

The first day started in the afternoon and talks are grouped by topic. The first one was cryptography, hard way to start the conference if, like me, you don’t like this. But, the talks were interesting anyway! The first one was “Mattermost End-to-End Encryption plugin” by Adrien Guinet & Angèle Bossuat. Mattermost is a very popular free chat service (like Slack) but a free version is available and the community around it creates a lot of plugins. Encryption is in place between the client and the server but there was a lack of E2EE or “End-2-End Encryption” (so that even administrators of the server can’t see messages and files exchanged). The plugin was not easy to implement due to the Mattermost limitations: notifications, attachments, modification of sent messages, etc. Adrien & Angèle explained how they implemented this, how they solved some challenges, and ended with a quick demo. The plugin is available here if interested.

Then, Ludovic Dubost came on stage to present “CryptPad : a zero knowledge collaboration platform”. This project is already six years old and, even if I’ve heard about it, I never used it. Cryptpad tries to solve all privacy issues with data (example: the cloud by definition). Today, in most cases, it’s all about trust. The situation between the two parties has been widely accepted and let’s sign contracts instead. Cryptpad helps you to work on many documents and share them with your peers. Its key points are encrypted docs that can be edited in real time, E2EE, and key management with secure and private key sharing. Server owners have access to nothing (even recover data). Some features:

  • Create application files (docs, presentations, pads, kanban, surveys, code, forms, …
  • Crypt drive
  • Sharing functions
  • Work in teams

A good resume could be “Get people out of Google Docs”. Another good fact: it requires a low amount of resources. More information on cryptpad.org.

Then, another tool was presented by Yves Rutschle: “Dataflow tabular charts — a presentation tool for security architects”. The tool is called dtc.pl, yes, written in Perl! Its purpose is to create SVG files that will represent drawings of complex infrastructure or systems but the file is generated based on descriptions in a text file. Example:

Human:
-> Human (6)
void / void / void / void / void / Human Content
Reader laptop:
...

The tool is available here.

The second round of sessions focused on operating systems security. Mickaël Salaün presented “Sandboxing your application with Landlock, illustration with the p7zip case”. Landlock is not new but not known by many people. It’s enable by default on many systems like Ubuntu 22.04-LTS. It’s a software sandboxing systems that allows to restrict access to applications. It must be implemented by developers and is based on only three syscalls! Mickaël explained the system (how to enable it, create rules and populate them) and them applied it to a popular utility, 7zip to add sandboxing capabilities.

The last presentation of the day was “Building operating systems optimized for containers, from IoT to desktops and servers” by Timothée Ravier. The idea behind the talk was to explain how operating systems can be improved by using other techniques like containers to reduce the attack surface. Indeed, many OS are deployed today with plenty of tools that, because software have bugs, increase the attack surface for attackers. The idea is to reduce this amount of software to the bare minimum and deploy them with other technique to make them easier to patch.

The second day started with a series of presentations around networks. Yves Rutschle came back for a a second time with “sslh — an applicative-level protocol multiplexer“. Yves started this project years ago because he’s a big fan of console mail clients like Mutt and would like to have access to a SSH server from anywhere. In 2000, when firewalls started to filter the outgoing traffic, he developed the first version of sslh. The idea is to let sslh to listen to a port (TCP/443) and, depending on the first bytes of the session, redirect the incoming connection: Apache, the SSH server, etc. With time, it expanded, was rewritten C, as a daemon etc… Today, his tool is integrated into many Linux distributions. It also supports more protocols like OpenVPN, XMPP. The tools is available on Yves’s github repo but also available as a package in your favorite Linux distribution.

The next slot was assigned to Eric Leblond who, as usual, came to speak about the Suricata eco-system: “Write faster Suricata signatures easier with Suricata Language Server“. This time, nothing related to the tool in itself but he focused on a recurrent and annoying task: to write Suricata rules. The classic process looks like: write a rule, test it, fine-tune it, test it, … Very complex signatures can be complex to write. Because, he received multiple time the question: “How to write Suricata signature?”, he developed a solution based on the Language Server Protocol: “SLS” or “Suricata Language Server”. From your preferred text editor (lot of them are supported), you can get help, auto-completion, verification directly when you write Suricata rules. Really interesting if you write a lot of them!

The next one was “Building on top of Scapy: what could possibly go wrong?” by Claire Vacherot who’s a pen tester active in the ICS field. She explained their needs for specific tools. They create a framework (BOF – “Boiboite Opener Framework”) but they faced a lot of problems and tried to find an alternative with Scapy. Scapy was not immediately effective so they decided to keep both and use BOF as a frontend for Scapy. Best conclusion: Don’t just use tools, learn the power of them! make the most of them!

After the welcomed coffee break, we switched to other talks. The next one was “Use of Machine and Deep Learning on RF Signals” by Sébastien Dudek. RF signals are used everywhere and a lot of researchers try to look at them to check many risks are related to them (jamming, eavesdropping, replay, inject, …) but the very first challenge is to get a good idea about the signals. What are we capturing? Sébastien explained briefly how to get some signals from very expensive devices to “gadgets” connected to free software. His research was to use machine learning and deep learning to help identifying the discovered signals. As I don’t have experiences in these domains (nor RF nor ML), it was a bit complex to follow but the idea seems interesting. I just bookmarked this site which helps to identify RF signals patterns: sigidiki.com.

The keynote was given by Ivan Kwiatkowski: “Ethics in cyberwar times”. It was a great talk and an eye-opener for many of us (well, I hope). The common thread was related to thread intelligence. How it is generated, by who, what companies do with it (and by they buy it!). For Ivan, “TI resellers are intelligence brokers”. Also, can there be neutal, apolitical intelligence? This is a huge debate! Everybody uses TI today: APT groups conduct attacks, TI companies write reports and APT groups buy these reports to improve their attacks. What could go wrong?

After the lunch break, we started a série about reverse and binary files. 

Abusing archive-based file formats” by Ange Albertini. Once again, Ange covered his techniques and tools to abuse file formats and create “malicious” documents. I recommend you to have a look at his Github repo.

Then, Damien Cauquil presented “Binbloom reloaded”. The first version of this tool was published in 2020 by Guillaume Heilles. Damien stumbled upon an unknown firmware… First reflex, load it into Ghidra but it was not easy to guess the base address to start the analysis. He started other tools like basefind.py without luck. He reviewed these tools and explained how they search for the base addresses. Finally, he decided to dive into binbloom and improve it. The tool is available here.

Then another tool was presented by Jose E. Marchesi: “GNU poke, the extensible editor for structured binary data“. Honestly, I never of this hex editor. They are plenty of them in the open source ecosystem but Jose explained they weaknesses. Poke looks extremely powerful to extract specific information and convert them to update the file but I was a bit lost in the details. The tool looks complex at a first sight!

To wrap-up the schedule, Romain Thomas presented something pretty cool: “The Poor Man’s Obfuscator“. The idea of his research was to transform a ELF file (Linux executable) or Mach-O (Macos executable) to make them obfuscated and not easy to debug using classic debuggers/disassemblers. Of course, the modified binaries have to remain executable from an OS point of view. Several techniques were covered like creating a lot of exports with random names or confusing names. Another technique was to alter the sections of the file. Really interesting techniques to seem very powerful. With one sample, he was able to crash Ghidra!

The day finished with a series of rump sessions and the social event in the center of Lille. The day three started with talks related to blueteams. The first one was “Sudo logs for Blue Teamers” by Peter Czanik. Peter is an advocate for Sudo & Syslog-NG, two useful tools for your Linux distributions. Both are integrated smoothly because Sudo logs are parsed automatically by Syslog-NG. IO Logs API is a nice feature that many people don’t know: you can interact with the user session. Ex: Use a few lines of Python code to terminate the session if some text if detected. Peter covered the new features like the possibility to execute a chroot with Sudo. Recap of the new versions: more logging capabilities, track and intercept sub-commands.

The next presentation focused on another tool: “DFIR-IRIS – collaborative incident response platform” presented by Théo Letailleur (&) and Paul Amicelli. As incident handlers, they came with problems to solve: Track elements during investigations, share pieces of information between analysts and handle repetitive tasks. Many solutions existed: TheHive, FIR, Catalyst, DRIFTrack, Aurora. They decided to start their own tool: DFIR-IRIS. It provides the following components:

  • python web application (API, modules)
  • automation (enrichment, reports)
  • collaboration (notes, share, tasks)
  • tracking (IOCs, assets, timeline, evidences)

I was surprised by the number of features and the power of the tool. I’m not using at this time (I use TheHive) but it deserves to be tested! More information here.

The next speaker was Solal Jabob, another regular PTS speaker! This time, he presented: “TAPIR : Trustable Artifact Parser for Incident Response“. Based on what he demonstrated, the amount of work is simply crazy. It’s a complete framework that you can use to collect artefacts and perform triage, export, … It is based on a library (TAP) used to parse data from files, disk images, … then plugins are use to extract metadata. The first tool is presented is Bin2JSON, then TAP-Query, TAPIR is a client/server solution with a REST-API, multi-user capabilities. It is command line or web based. A Python library is also available (TAPyR).

Just before the lunch break, I presented “Improve your Malware Recipes with Cyberchef“. I explained how this tool can be powerful to perform malware analysis.

After the break, the second half of the day was dedicated to pentesting / readteaming presentations. Quickly, we had Antoine Cervoise who presented “MobSF for penetration testers“. MobSF is a free open-source security scanner for mobile applications. He demonstrated how he can find interesting information to be used in his pentest projects. Hugo Vincent presented “Finding Java deserialization gadgets with CodeQL”. Java deserialization remains a common issue in many applications in 2022. It’s still present in the OWASP Top-10. Hugo explained the principle behind this attack, then demonstrated, with the help of CodeQL (a static code analyzer) how he can find vulnerabilities. Pierre Milioni presented “Dissecting NTLM EPA & building a MitM proxy“. NTML is an old protocol but still used to authenticate users on web applications. Microsoft expanded it to “EPA” for “Extended Protection for Authentication”. This make some tools useless because they don’t support this protocol extension. Pierre developed a MitM proxy called Proxy-Ez that helps to use these tools. Finally, Mahé Tardy presented “kdigger: A Context Discovery Tool for Kubernetes Penetration Testing“. kdigger is a tool that helps to perform penetration tests in the context of a Kubernetes environment. The demo was pretty nice!

After two years “online”, it was nice to be back at Lille to meet people in real life. We were approximatively 100 attendees, great number to have time to exchange with many people! The talks have been recorded and are already online, slides as well.

The post Pass-The-Salt 2022 Wrap-Up appeared first on /dev/random.

July 02, 2022

A wise person recently told me:

"Release early, release often!"

So here goes... I know a teeny-weeny bit of Python and have recently learned to enjoy FreeCAD. This morning I discovered that FreeCAD can be scripted using Python, so here's my first attempt. Any hints are welcome.

 

 

The script creates a (very flat) cube, attaches four smaller flat cubes that serve as tenons, and then cuts out four small cubes that act as mortises. It ends by creating four simple copies of this piece, forming a four-piece puzzle.

The next step is to automate the inclusion of an SVG file on the surface of these pieces.

June 29, 2022

La digue s’est rompue. Sous la pression des flots furieux, je me suis reconnecté, j’ai été inondé.

La cause initiale a été l’organisation de plusieurs voyages. De nos jours, organiser un voyage consiste à passer des heures en ligne à faire des recherches, trouver des opportunités, comparer les offres, les disponibilités puis à réserver, attendre les emails, confirmer les réservations. Au moment de la confirmation finale d’un vol, j’ai par exemple eu la désagréable surprise de découvrir que les bagages n’étaient pas autorisés. Mais bien sur le même vol le lendemain. Il m’a fallu décaler tout le planning, revenir aux réservations des hébergements, etc.

Lorsqu’on passe sa journée en ligne, papillonnant entre les sites web, répondant à un mail de temps en temps, ce genre d’exercice s’inscrit naturellement dans la journée. Mais quand, comme moi, on chronomètre le temps passé en ligne, l’énergie consacrée à organiser un voyage est effrayante. Outre le temps passé à explorer les possibilités, à chercher activement et remplir les formulaires, il y a également le temps d’attente pour les confirmations, les dizaines de mails à déchiffrer dont la plupart ne sont que des arguties administratives ou, déjà, des publicités desquelles il faut se désabonner.

Le tout évidemment devant être synchronisé avec les autres participants desdits voyages.

Entre deux créations de comptes et deux mails de confirmations, attendant la réponse urgente d’un des participants, mon cerveau n’a pas la capacité de se concentrer. Il attend. Et tant qu’à attendre, il y’a justement des dizaines, des centaines, des milliers de petites tranches informationnelles divertissantes. Les résultats d’une course cycliste. Les élections en France. Des sujets passionnants. Voire inquiétant pour le dernier. Mais un sujet inquiétant n’en est que plus passionnant. J’observe avec un intérêt morbide la montée de l’extrême droite comme on regarde un film d’horreur : impuissant et sans pouvoir me détacher de l’écran.

Dans mon cas, le fait de voyager a été la cause de ma reconnexion. Mais cela aurait pu être autre chose. Comme les problèmes que j’ai eus avec mon ex-banque, qui force désormais l’utilisation d’une application Android revendant mes données privées afin de fermer les agences et virer le personnel.

Le point commun entre les banques et les voyagistes ? La disparition du service client. La disparition d’un métier essentiel qui consistait à écouter le client pour ensuite tenter de transformer ses desiderata en actes administratifs. Désormais, le client est seul face à la machine administrative. Il doit remplir les formulaires, tout connaitre, tout comprendre tout seul. Se morigéner pour la moindre erreur, car personne ne vérifiera à sa place.

Mais, si le service n’existe plus, la fiction du service existe toujours. Les départements marketing bombardent d’emails, de courriers papier et d’appels téléphoniques intempestifs. Pour vour faire signer ou acheter un énième produit dont vous ne pourrez plus vous défaire. L’agression est permanente. Le pouvoir politique est incapable d’agir pour plusieurs raisons.

La première est qu’il ne veut pas agir, les politiciens étant les premiers à vouloir envahir les gens sous leurs publicités. Les administrations publiques, peuplées de spécialistes du privé dont on a vanté les mérites organisationnels, se retrouvent… à faire de la publicité. C’est absurde et inexorable. Pourquoi les chemins de fer mettent-ils tant d’effort à promouvoir, à travers des publicités risibles, des systèmes compliqués d’abonnements incompréhensibles ? Ce budget ne pourrait-il pas être utilisé à mieux payer les cheminots ?

Le second point est lui plus profond. Les pouvoirs publics se targuent de vouloir faire la différence entre le « bon » marketing et les arnaques malhonnêtes. Le problème est que la différence est purement arbitraire. Les deux cherchent à exploiter une faiblesse quelconque pour soutirer de l’argent.

Pourquoi, par exemple, faut-il explicitement mettre un autocollant sur sa boîte aux lettres pour éviter de la voir se remplir de publicités sous blister ? L’inverse serait plus logique : n’autoriser la publicité que lorsqu’elle est explicitement demandée.

Pourquoi le RGPD est-il tellement décrié alors qu’il tente de mettre de l’ordre dans la manière dont sont utilisées les données privées ? Parce qu’il a été, à dessein, rendu incroyablement complexe. Il suffirait de mettre dans la loi que toute donnée personnelle ne peut-être utilisée qu’avec un accord explicite valable un an. Que cet accord n’est pas transférable. Cela impliquerait que toute revente de données forcerait l’acheteur à demander l’accord aux personnes concernées. Et à renouveler cet accord tous les ans. Simple, efficace.

À la base, le rôle du pouvoir public est de protéger les citoyens, de faire respecter cette frontière en perpétuel mouvement entre la liberté de l’individu et le respect de l’autre. Mais lorsque le pouvoir public prétend devenir rentable et agir comme un acteur économique plutôt que politique, son action devient ubuesque.

Comme lorsque l’état engage les grands moyens pour empêcher la contrefaçon de cigarettes. En tentant d’arguer que les cigarettes contrefaites sont… dangereuses pour la santé. Oubliant que les cigarettes « légales » sont responsables de plus de morts que le COVID (dont le tiers ne fume pas), d’une destruction grave de l’environnement et de l’émission de plus de 1% du CO2 annuellement produit.

Plusieurs fois par semaine, mon téléphone sonne pour tenter de m’extorquer de l’argent selon une technique quelconque. Je suis pourtant dans une liste rouge. À chaque appel imprévu, je porte plainte sur le site du gouvernement ainsi que, lorsque c’est possible, auprès de la société appelant. Cela m’a valu un échange avec un enquêteur travaillant chez le plus gros opérateur téléphonique belge. Grâce à lui, j’ai compris comment la loi rendait difficile de lutter contre ce type d’arnaque sous prétexte de défendre le télémarketing « légal ».

On en revient toujours au même problème : l’optimisation de l’économie implique de maximiser les échanges économiques, quels qu’ils soient. De maximiser le marketing, aussi intrusif, aussi absurde, aussi dommageable soit-il. D’exploiter les faiblesses humaines pour soutirer un maximum d’argent, pour générer un maximum de consommation et donc de pollution.

La pollution de l’environnement, la pollution de l’air, la pollution mentale permanente ne sont que les facettes d’une seule et même cause : la maximisation politique des échanges économiques. Jusqu’à en crever.

Nous achetons des bouteilles en plastique remplies de sucres morbides à consommer en attendant le énième message qui fera sonner notre smartphone. Un message qui, la plupart du temps, nous poussera à consommer ou justifiera l’argent que nous recevons mensuellement pour continuer à consommer. Sans message, nous serons réduits à rafraichir compulsivement l’écran, espérant une nouvelle info, quelque chose de croustillant. N’importe quoi. La mort d’un animateur télévision de notre enfance, par exemple, histoire de se taper plusieurs heures de vidéos postées sur YouTube.

Le fait que j’aie en partie replongé me démontre à quel point la connexion est une drogue. Une addiction savamment entretenue, un danger permanent pour les addicts comme je le suis.

Chaque connexion est jouissive. C’est une bouffée de plaisir bien méritée, un repos intellectuel. Je peux compulsivement consommer, cliquer sans penser. Le simple fait d’utiliser la souris, de multiples onglets ouverts sur des images ou des vidéos permet de ralentir l’esprit tout en donnant une fausse sensation de contrôle, de puissance.

La problématique touche d’ailleurs depuis longtemps le monde professionnel. Comme le raconte Cal Newport dans son livre « A world without email », la plupart des métiers se résument désormais à répondre à ses emails, ses coups de téléphone, le tout en participant à des réunions. L’échange est permanent et a été largement aggravé par l’apparition des messageries professionnelles comme Slack.

Le monde professionnel n’a plus le loisir de penser. Les décisions sont prises sans recul et acceptées sur base du simple charisme d’un manager. Ce n’est pas un hasard. Penser est dangereux. Penser remets en question. Penser fait de vous un paria.

Les élections en France m’ont donné envie de politique, de débat. Alors j’ai lu « Son Excellence Eugène Rougon », de Zola. En version papier. Je me suis remis à penser. J’ai retrouvé la motivation de reprendre le combat. Un combat contre mon addiction. Un combat contre toute la société qui m’entoure. Un combat contre moi-même.

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

When performing physical backup on system that are heavily used, it can happen that the backup speed cannot keep up with the redo log generation. This can happen when the backup storage is slower than the redo log storage media and this can lead in inconsistency in the generated backup.

MySQL Enterprise Backup (aka MEB) and probably Percona Xtrabackup, benefit from the possibility to sequentially write redo log records to an archive file in addition to the redo log files.

This feature was introduced in MySQL 8.0.17.

How to enable it ?

To enable this feature, two settings are necessary:

  • set globally a directory where those archiving logs can be stored
  • start the archiving process in a session by calling a dedicated function

The global variable is innodb_redo_log_archive_dirs.

This variable musts contain labelled directories where the archiving redo logs can be stored. The format is a semi-colon separated string like this:

innodb_redo_log_archive_dirs='label1:/backups1;label2:/backups2'

The system user running mysqld must have access to those directories and should not be accessible to all users.

The redo log archiving is started using the function innodb_redo_log_archive_start() and stopped using innodb_redo_log_archive_stop(). Only users with the INNODB_REDO_LOG_ARCHIVE privilege can call those functions.

It’s important to notice that the MySQL session that activates redo log archiving must remain open for the duration of the archiving. You must deactivate redo log archiving in the same session. If the session is terminated before the redo log archiving is explicitly deactivated, the server deactivates redo log archiving implicitly and removes the redo log archive file.

Let’s see how to enable it:

$ sudo mkdir -p /var/lib/mysql-redo-archive/backup1
$ sudo chown mysql. -R /var/lib/mysql-redo-archive
$ sudo chmod -R 700 /var/lib/mysql-redo-archive/

In fact , it’s ready to work but it’s not enabled, only when a session, usually the one initializing the backup, will invoke the innodb_redo_log_archive_start() it will really be enabled:

Is it enabled ?

How can we see that the redo log archiving is active ?

We can check if MySQL is using a redo log archive file using the following query:

select * from performance_schema.file_instances
   where event_name like '%::redo_log_archive_file'\G

If there is an entry, this means that the redo log archive process is enabled or has been enabled and stopped successfully using the dedicated function:

So this is not enough to be sure that the redo log archiving is active. But we have the possibility to also check if the thread is active using this query:

select thread_id, name, type from threads 
   where name like '%redo_log_archive%';

If a row is returned, it means that the redo log archiving is enabled and active:

Error Messages

Here are some common error messages related to Redo Log Archiving:

ERROR: 3850 (HY000): Redo log archiving failed: Session terminated with active redo log archiving - stopped redo log archiving and deleted the file. This error happens when you try to stop the redo log archiving from another session and the session that started it was terminated.

ERROR: 3851 (HY000): Redo log archiving has not been started by this session. This is when the session that started the process is still open and you try to stop the redo log archiving from another session.

ERROR: 3848 (HY000): Redo log archiving has been started on '/var/lib/mysql-redo-archive/backup2/archive.17f6a975-e2b4-11ec-b714-c8cb9e32df8e.000001.log' - Call innodb_redo_log_archive_stop() first: this happens when you try to start the archiving process and there is already one active.

ERROR: 3842 (HY000): Label 'backup2' not found in server variable 'innodb_redo_log_archive_dirs': this is when you try to start the redo log archiving and you are using a label which is not defined in innodb_redo_log_archive_dirs.

ERROR: 3846 (HY000): Redo log archive directory '/var/lib/mysql-redo-archive/backup2' is accessible to all OS users: this is when the directory is accessible by others users too. Only the user running mysqld should have access to it.

ERROR: 3844 (HY000): Redo log archive directory '/var/lib/mysql-redo-archive/backup3' does not exist or is not a directory: this is a very common error, it happens when the subdir is not existing in the directory defined by the corresponding label in innodb_redo_log_archive_dirs. In this example, backup3 is not created in /var/lib/mysql-redo-archive.

ERROR: 3847 (HY000): Cannot create redo log archive file '/var/lib/mysql-redo-archive/backup3/archive.17f6a975-e2b4-11ec-b714-c8cb9e32df8e.000001.log' (OS errno: 13 - Permission denied): this is simple to understand, the directory and sub-directory exist but doesn’t belong to the user running mysqld (usually mysql).

Callable Functions

There are several functions that are related to Redo Log Archiving, we already used 2 of them to start and stop the process. Here is the list as MySQL 8.0.29:

The last two functions are used by MEB and are not documented in MySQL Server’s manual and there is no reason to use them as normal user.

innodb_redo_log_archive_flush is used to flush the redo log archive queue.

innodb_redo_log_sharp_checkpoint makes a checkpoint calling log_make_latest_checkpoint(*log_sys)

Conclusion

Even if not popular yet, this feature is mandatory for heavy workload when the backup storage doesn’t have the same capabilities of the production storage and is not able to follow up the speed of the writes.

When enabled by the DBA, MySQL Enterprise Backup will use it automatically. To know if a the redo log archiving process was started and is still active, the DBA can check the performance_schema.threads table.

June 22, 2022

I published the following diary on isc.sans.edu: “Malicious PowerShell Targeting Cryptocurrency Browser Extensions“:

While hunting, I found an interesting PowerShell script. After a quick check, my first conclusion was that it is again a simple info stealer. After reading the code more carefully, the conclusion was different: It targets crypto-currency browser apps or extensions. The script has a very low score on VT: 1/53… [Read more]

The post [SANS ISC] Malicious PowerShell Targeting Cryptocurrency Browser Extensions appeared first on /dev/random.

June 20, 2022

Full table scans can be problematic for performance. Certainly if the scanned tables are large. The worst case is when full table scans are involved in joins and particularly when the scanned table is not the first one (this was dramatic before MySQL 8.0 as Block Nested Loop was used) !

A full table scans means that MySQL was not able to use an index (no index or no filters using it).

Effects

When Full Table Scans happen (depending of the size of course), a lot of data gets pulled into the Buffer Pool and maybe other important data from the working set is pulled out. Most of the time that new data in the Buffer Pool might even not be required by the application, what a waste of resources !

You then understand that another side effect of Full Table Scans is the increase of I/O operations.

The most noticeable symptoms of Full Table Scans are:

  • increase of CPU usage
  • increase of disk I/O (depending on the size of the tables and the size of the Buffer Pool)
  • increase of accessed rows

Trending

What is the best way to see if we have an increase of Full Table Scans ?

MySQL doesn’t provide a metric with the exact amount of table scans and additionally, if the full table scan is performed against a table with only 1 record, is it problematic ?

To determine if we have an increase in Full Table Scans, we will use the handler API metrics and precisely handler_read_rnd_next:

handler_read_rnd_next represents the number of requests to read the next row in the data file. In other words, it represents the number of non-indexed reads.

Handler API

Each storage engine is a class with each instance of the class communicating with the MySQL server through a special handler interface.

The handler API is then the interface between MySQL and the storage engine. The MySQL server communicates with the storage engines through that API and it’s the storage engine’s responsibility to manage data storage and index management.

The Handler_% variables count handler operations, such as the number of times MySQL asks a
storage engine to read the next row from an index.

This is exactly the values of those handler_% variables that are plotted in the graph above.

Let’s have a quick look at some interesting ones:

  • handler_read_first: The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scans
  • handler_read_next: The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan.
  • handler_read_rnd_next: The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. Handler_read_rnd_next is incremented when handler::rnd_next() is called. This operation advances the cursor position to the next row.

For more information about other handler status variable, check the manual.

I rewrote a query from High Performance MySQL, 3rd edition, O’Reilly to be compatible with MySQL 8.0 to illustrate the instrumentation of the handler counters:

SELECT * FROM (
  SELECT STRAIGHT_JOIN
    LOWER(gs0.VARIABLE_NAME) AS variable_name,
    gs0.VARIABLE_VALUE AS value_0, gs1.VARIABLE_VALUE AS value_1,
    ROUND((gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE) / 60, 2) AS per_sec,
    (gs1.VARIABLE_VALUE - gs0.VARIABLE_VALUE)  AS per_min
  FROM 
  (
    SELECT VARIABLE_NAME, VARIABLE_VALUE
      FROM performance_schema.global_status
    UNION ALL
    SELECT '', SLEEP(60) FROM DUAL
  ) AS gs0
  JOIN performance_schema.global_status gs1 
  USING (VARIABLE_NAME)
  WHERE gs1.VARIABLE_VALUE <> gs0.VARIABLE_VALUE
) a 
WHERE variable_name LIKE 'handler%';

This is an example of this query’s output:

Please node that the SHOW VARIABLES command is a full table scan and the handler_read_rnd_next counter will be incremented when that command is executed.

JOINS

As written above, when Full Tables Scans are involved in Joins, it usually makes things even worse. That information can also be plotted:

MySQL provides status variables allowing to control how the joins are made during a SELECT statement:
  • select_range: The number of joins that used ranges on the first table. This is normally not a critical issue even if the value is quite large.
  • select_scan: The number of joins that did a full scan of the first table.
  • select_full_range_join: The number of joins that used a range search on a reference table. In other words, this is the number of joins that used a value from table n to retrieve rows from a range of the reference index in table n + 1. Depending on the query, this can be more or less costly than Select_scan.
  • select_range_check: The number of joins without keys that check for key usage after each row. If this is not 0, you should carefully check the indexes of your tables as this query plan has very high overhead.
  • select_full_join: This is the counter you don’t wan t to see with high value. It represents the number of joins that perform table scans because they do not use indexes, the number of cross joins, or joins without any criteria to match rows in the tables. When checking this value for a specific query, the value of rows examined is the product of the number of rows in each table. Should be absolutely avoided !

How to find those Queries ?

Performance_Schema and Sys Schema have all the necessary resources to retrieve the query performing Full Table Scans. Let’s have a look at the output of the following query:

SELECT format_pico_time(total_latency) total_time, 
       db, 
       exec_count, 
       no_index_used_count, 
       no_good_index_used_count, 
       no_index_used_pct, 
       rows_examined, 
       rows_sent_avg, 
       rows_examined_avg, 
       t1.first_seen, t1.last_seen, 
       query_sample_text 
FROM sys.x$statements_with_full_table_scans AS t1 
JOIN performance_schema.events_statements_summary_by_digest AS t2 
  ON t2.digest=t1.digest ORDER BY total_latency DESC\G

This is an example of one returned row:

              total_time: 2.05 s
                      db: sbtest
              exec_count: 3
     no_index_used_count: 3
no_good_index_used_count: 0
       no_index_used_pct: 100
           rows_examined: 859740
           rows_sent_avg: 1
       rows_examined_avg: 286580
              first_seen: 2022-06-16 12:21:14.450874
               last_seen: 2022-06-16 12:23:48.577439
       query_sample_text: select count(*) from sbtest2 
                          join sbtest1 using(k) where sbtest1.c 
                          like '%1414%' or sbtest2.c like '%1424%'

We can see that on average this query is scanning more than 280k rows each time it’s executed.

Let’s have a look a the Query Execution Plan for that specific query and confirm it does Full Table Scans:

And this is even more obvious when we use the Tree format:

To illustrate another bad behavior, I will remove the index on the k column of those two tables and use the same query.

We will also check the handler_% and Select_% status variables:

We can see that this is exactly the situation we should avoid, especially if you are not using MySQL 8.0. Without Hash Joins this will be worse !

Conclusion

In general, Full Table Scans should be avoided but of course this also depends on the size of the tables involved, the performance of the storage, the storage engine used (Performance_Schema is not a problem) and how the buffer pool is used.

Query Optimization is the solution, this consists in eventually adding indexes, having to rewrite the query, … however this is not always easy.

If you need to deal with such operations, I recommend you to read Chapter 24 of MySQL 8 Query Performance Tuning, Jesper Wisborg Krog, Apress, 2020.

Enjoy MySQL !

June 16, 2022

I published the following diary on isc.sans.edu: “Houdini is Back Delivered Through a JavaScript Dropper“:

Houdini is a very old RAT that was discovered years ago. The first mention I found back is from 2013! Houdini is a simple remote access tool written in Visual Basic Script. The script is not very interesting because it is non-obfuscated and has just been adapted to use a new C2 server… [Read more]

The post [SANS ISC] Houdini is Back Delivered Through a JavaScript Dropper appeared first on /dev/random.

June 14, 2022

If you want to receive Bluetooth Low Energy sensor measurements, there's a new project you can use: Theengs Gateway. It uses Theengs Decoder, an efficient, portable and lightweight C++ library for BLE payload decoding, and it publishes the decoded data as MQTT messages. It already supports 40 BLE devices, including RuuviTags, iBeacons, and various Xiaomi devices.

Recently Mihai Ambrosie created a Theengs Gateway add-on for Home Assistant, so you can install it easily. The installation process goes like this:

  • Click on Settings / Add-ons in Home Assistant and then Add-on Store at the bottom right. Click on the three dots at the top right and then Repositories.

  • Enter the url https://github.com/mihsu81/addon-theengsgw and click on Add. Click on Close after the repository has been added.

  • Click on TheengsGateway in the list of add-ons and then Install.

After the installation is complete, open the Configuration tab of the add-on and enter the host and port of your MQTT broker and optionally a username and password. 1 You can also change some parameters such as the base of the MQTT topics, the scan duration and the time between scans, and a filter for devices that you don't want to be discovered by Home Assistant because they're too numerous. 2

/images/theengs-gateway-addon-configuration.png

Click on Save to save the configuration and then click on Start in the Info tab to start the add-on. After this, all BLE devices that Theengs Gateway detects are automatically discovered by Home Assistant, and you can find them in Settings / Devices & Services. Look at the Devices and Entities tabs:

/images/theengs-gateway-addon-ruuvitag.png
1

If you don't have an MQTT broker yet, install the Mosquitto add-on in Home Assistant.

2

By default, iBeacons, Google/Apple Exposure Notifications (GAEN), and Microsoft Advertising Beacons (advertised by Windows devices) are filtered.

June 13, 2022

Hello
Leica M10-R

I have over 10,000 photos on my website. All these photos are managed by a custom Drupal module. I wrote the first version of that module over 15 years ago, and continue to work on it from time to time. Like this weekend, when I added a new feature.

Digital photos have EXIF data embedded in them. EXIF data includes information such as camera model, lens, aperture, shutter speed, focal length, ISO, and much more.

My module now extracts the EXIF data from my photos and stores it in a database. Having all my EXIF metadata in a database allows me to analyze my photography history.

For example, over the years, I've owned 11 different cameras and 10 different lenses:

SELECT COUNT(DISTINCT(camera)) AS count FROM images; 
+-------+
| count |
+-------+
|    11 |
+-------+

SELECT COUNT(DISTINCT(lens)) AS count FROM images;  
+-------+
| count |
+-------+
|    10 |
+-------+

Here is a SQL query that shows all cameras I have owned in the last 22 years, and the timeframe I used them for.

SELECT camera, MIN(DATE(date)) AS first, MAX(DATE(date)) AS last, TIMESTAMPDIFF(YEAR, MIN(date), MAX(date)) AS years FROM images GROUP BY camera ORDER BY first; 
+---------------------+------------+------------+-------+
| camera              | first      | last       | years |
+---------------------+------------+------------+-------+
| Sony Cybershot      | 2000-01-01 | 2003-08-01 |     3 |
| Nikon Coolpix 885   | 2001-11-13 | 2004-04-11 |     2 |
| Nikon D70           | 2004-04-03 | 2006-11-19 |     2 |
| Nikon D200          | 2006-12-31 | 2012-06-17 |     5 |
| Panasonic Lumix GF1 | 2011-10-11 | 2014-10-26 |     3 |
| Nikon D4            | 2012-07-01 | 2018-08-26 |     6 |
| Sony Alpha 7 II     | 2015-02-25 | 2019-01-09 |     3 |
| DJI Mavic Pro       | 2017-07-23 | 2019-01-18 |     1 |
| Nikon D850          | 2019-03-16 | 2021-04-24 |     2 |
| Nikon Z 7           | 2019-04-07 | 2021-08-31 |     2 |
| Leica M10-R         | 2021-11-18 | 2022-06-09 |     0 |
+---------------------+------------+------------+-------+

Finally, here is a chart that visualizes my camera history:

Chart that shows my cameras and when I used themThe timeframe I used each camera for. The white numbers on the blue bars represent the number of photos I published on my website.

A few takeaways:

  • I used my Nikon D4 for 6 years and my Nikon D200 for 5 years. On average, I use a camera for 3.3 years.
  • I should dust of my drone (DJI Mavic Pro) as I haven't used it since early 2019.
  • In 2019, I bought a Nikon D850 and a Nikon Z 7. I liked the Nikon Z 7 better, and didn't use my Nikon D850 much.
  • Since the end of 2021, I've been exclusively using my Leica.

I shared this with my family but they weren't impressed. Blank stares ensued, and the conversation took a quick turn. While I could go on and share more statistics, I'll take a hint from my family, and stop here.

June 12, 2022

This week my new book has been published, Develop your own Bluetooth Low Energy Applications for Raspberry Pi, ESP32 and nRF52 with Python, Arduino and Zephyr.

Bluetooth Low Energy (BLE) is one of the most accessible wireless communication standards. You don't need any expensive equipment to develop BLE devices such as wireless sensor boards, proximity beacons, or heart rate monitors. All you need is a computer or a Raspberry Pi, an ESP32 microcontroller board, or a development board with a Nordic Semiconductor nRF5 (or an equivalent BLE SoC from another manufacturer).

On the software side, BLE is similarly accessible. Many development platforms, most of them open source, offer an API (application programming interface) to assist you in developing your own BLE applications. This book shows you the ropes of Bluetooth Low Energy programming with Python and the Bleak library on a Raspberry Pi or PC, with C++ and NimBLE-Arduino on Espressif's ESP32 development boards, and with C on one of the development boards supported by the Zephyr real-time operating system, such as Nordic Semiconductor's nRF52 boards.

While Bluetooth Low Energy is a complex technology with a comprehensive specification, getting started with the basics is relatively easy. This book takes a practical approach to BLE programming to make the technology even more approachable. With a minimal amount of theory, you'll develop code right from the start. After you've completed this book, you'll know enough to create your own BLE applications.

What is Bluetooth Low Energy?

Bluetooth is a wireless communication standard in the 2.4 GHz Industrial, Scientific, and Medical (ISM) frequency band. These days, if you hear about Bluetooth support in a product, this almost always is Bluetooth Low Energy (BLE). It's a radical departure from the original Bluetooth standard, which is now called Classic Bluetooth.

Bluetooth Low Energy and Classic Bluetooth are actually different protocols. Classic Bluetooth is essentially a wireless version of the traditional serial connection. If you want to print a document, transfer a file or stream audio, you want this to happen as fast as possible. Therefore, the focus of development in Classic Bluetooth was on attaining faster and faster speeds with every new version.

However, Classic Bluetooth wasn't a good fit for devices with low power consumption, for instance those powered by batteries. That's why Nokia adapted the Bluetooth standard to enable it to work in low-power scenarios. In 2006, they released their resulting technology onto the market, dubbed Wibree.

The Bluetooth Special Interest Group (SIG), the organization that maintains the Bluetooth specifications, showed interest in this new development. After consulting with Nokia, they decided to adopt Wibree as part of Bluetooth 4.0, with a new name, Bluetooth Low Energy. Classic Bluetooth remained available for high-throughput applications.

Note

In practice, many chipsets support both Classic Bluetooth and Low Energy, especially in laptops and smartphones.

Layered architecture

The Bluetooth Core Specification is more than 3200 pages long. And this is only the core specification; there are many supplemental documents for BLE. However, BLE has a layered architecture. Many end-user applications only use the upper layers, so you don't need to know the details of the architecture's lower layers.

/images/ble-stack.png

The BLE architecture consists of three main blocks: controller, host, and application.

Controller

This has the lower-level layers: the Physical Layer (PHY), Link Layer (LL) and Direct Test Mode (DTM). These are the layers where the Bluetooth radio does its work. The controller communicates with the outside world using the antenna, in a frequency band around 2.4 GHz. It communicates with the host using a standardized interface between the two blocks: the Host Controller Interface (HCI). 1

Host

This is the block with which the end user or application developer comes in contact. The Logical Link Control and Adaptation Protocol (L2CAP) defines channels and signaling commands. On top of it, the Security Manager Protocol (SMP) handles secure connections (with authentication and encryption), and the Attribute Protocol (ATT) defines how to expose and access data as attributes. The Generic Attribute Profile (GATT) 2 builds on the Attribute Protocol to define how to discover services and their characteristics and how to read and write their values. The upper layer of the Host block is the Generic Access Profile (GAP), which defines how devices can discover other devices and connect, pair, and bond to them. The host communicates with the controller using its part of the host controller interface, and applications communicate with the host depending on the APIs exposed by the operating system.

Application

This layer builds on top of the Generic Attribute Profile to implement application-specific characteristics, services, and profiles. A characteristic defines a specific type of data, such as an Alert Level. A service defines a set of characteristics and their behaviors, such as the Link Loss Service. A profile is a specification that describes how two or more devices with one or more services communicate with each other. An example is the Proximity profile, which has two roles: Proximity Monitor and Proximity Reporter.

The three blocks don't have to run on the same processor. In fact, there are three common configurations --- one single-chip and two dual-chip:

Single-chip (SoC)

Controller, host and application code run on the same chip. The host and controller communicate through function calls and queues in the chip's RAM. Most simple devices such as BLE sensors use this configuration; it keeps the cost down. Some smartphones also use this configuration if they have a SoC with Bluetooth built in.

Dual-chip over HCI

A dual-chip solution with application and host on one chip, and the controller on another chip, communicates over HCI. Because HCI is a standardized interface, it lets you combine different platforms. For instance, on a Raspberry Pi, the Wi-Fi and BLE chip implements a BLE controller. If you connect a BLE dongle to an older Raspberry Pi, this dongle also implements a BLE controller. 3 BlueZ, the Raspberry Pi Linux kernel's Bluetooth stack, implements a BLE host. So BlueZ communicates with the BLE controller in the built-in BLE chip or the BLE dongle. In the former case, the HCI uses SDIO, and in the latter, UART over USB. 4 Many smartphones and tablets also use the dual-chip over HCI configuration, with a powerful processor running the host and a Bluetooth chip running the controller.

Dual-chip with connectivity device

Another dual-chip solution is one with the application running on one chip and the host and controller on another chip. The latter is then called the connectivity device because it adds BLE connectivity to the other device. This approach is useful if you have an existing hardware device that you want to extend with BLE connectivity. Because there's no standardized interface in this case, the communication between the application processor and the connectivity device needs to make use of a proprietary protocol implemented by the connectivity device.

A three-chip solution with controller, host, and application each running on its own chip is also possible. However, because of the associated cost, this is typically only done for development systems.

How to communicate with BLE devices?

Bluetooth Low Energy has two ways to communicate between devices: with and without a connection.

Without a connection

Without a connection means that the device just broadcasts information in an advertisement. Every BLE device in the neighborhood is able to receive this information.

/images/ble-broadcaster-observers.png

Some examples of BLE devices broadcasting data are:

Proximity beacons

These devices, often following Apple's iBeacon standard, broadcast their ID. Receivers calculate their approximate distance to the beacons based on the advertisement's Received Signal Strength Indicator (RSSI).

Sensors

Many temperature and humidity sensors broadcast their sensor values. Most devices do this in an unencrypted fashion, but some of them encrypt the data to prevent it being read by every device in the neighborhood.

Mobile phones

After the COVID-19 pandemic started in 2020, Google and Apple collaborated on the Exposure Notifications standard for contact tracing. As part of this technology, Android phones and iPhones broadcast unique (but anonymous) numbers. Other phones can pick up these numbers and use them later to warn users that they have been in contact with someone who is known to have had COVID-19.

With a connection

The other way to communicate between BLE devices is with a connection. One device (the client) scans for BLE advertisements to find the device it wants to connect to. Then, optionally, it may do an active scan to ask the device (the server) which services are offered.

After the client connects to the server, the client can use the server's services. Each BLE service is a container of specific data from the server. You can read this data, or (with some services) write a value to the server.

/images/ble-peripheral-central.png

Some examples of BLE devices using a connection are:

Fitness trackers

Your smartphone can connect to a fitness tracker and read your heart rate, the tracker's battery level, and other measurements.

Sensors

Some environmental sensors let you read their sensor values over a BLE connection.

Proximity reporters

These devices sound an alert when their connection to another device is lost.

Advantages of BLE

Low power consumption

As its name implies, Bluetooth Low Energy is optimized for low-power applications. Its whole architecture is designed to reduce power consumption. For instance, setting up a connection, reading or writing data, and disconnecting happens in a couple of milliseconds. The radio is often the most energy-consuming part of a device. Therefore, the idea is to turn on the Bluetooth radio, create a connection, read or write data, disconnect, and turn off the radio again until the next time the device has to communicate.

This way, a well-designed BLE temperature sensor is able to work on a coin cell for ten years or more. You can use the same approach with other wireless technologies, such as Wi-Fi, but they require more power and more time to set up a connection.

Ubiquitous

BLE radio chips are ubiquitous. You can find them in smartphones, tablets, and laptops. This means that all those devices can talk to your BLE sensors or lightbulbs. Most manufacturers create mobile apps to control their BLE devices.

You can also find BLE radios in many single-board computers, such as the Raspberry Pi, and in popular microcontroller platforms such as the ESP32. 5 This makes it quite easy for you to create your own gateways for BLE devices. And, platforms such as the Nordic Semiconductor nRF5 series of microcontrollers with BLE radio even make it possible to create your own battery-powered BLE devices.

Low cost

There's no cost to access the official BLE specifications. Moreover, BLE chips are cheap, and the available development boards (based on an nRF5 or ESP32) and Raspberry Pis are quite affordable. This means you can just start with BLE programming at minimal cost.

Disadvantages of BLE

Short range

BLE has a short range (for most devices, less than 10 meters) compared to other wireless networks, such as Zigbee, Z-Wave, and Thread. It's not a coincidence that these competitors all have a mesh architecture, in which devices can forward their neighbors' messages in order to improve range. Low-power wide area networks (LPWANs), such as LoRaWAN, Sigfox, and NB-IoT, have even longer ranges.

In 2017, the Bluetooth SIG added Bluetooth Mesh, a mesh protocol. This builds upon BLE's physical and link layers with a whole new stack above them. However, Bluetooth Mesh isn't as well-established as the core BLE protocol, at least not for home use.

Limited speed

The BLE radio has a limited transmission speed. For Bluetooth 4.2 and earlier, this is 1 Mbps, while for Bluetooth 5 and later, this can be up to 2 Mbps. This makes BLE unsuitable for high-bandwidth applications.

You need a gateway

Wi-Fi devices have their own IP addresses, so you can communicate with them directly from other IP-based devices, and they're integrated in your LAN (local area network). Bluetooth doesn't have this: to integrate your BLE devices with other network devices, you need a gateway. This device has to translate Bluetooth packets to IP-based protocols such as MQTT (Message Queuing Telemetry Transport). That's why many BLE device manufacturers have smartphone apps that function as device gateways. 6

Platforms used in this book

This book focuses on Bluetooth Low Energy programming on three platforms:

BLE platforms used in this book

Programming language

Library

Software platform

Hardware platform

Python

Bleak

Windows, Linux, macOS

Raspberry Pi or PC

C++

NimBLE-Arduino

Arduino framework

ESP32

C

/ 7

Zephyr

nRF52

These choices were made in order to demonstrate a wide range of applications compatible with many software and hardware platforms.

Python/Bleak (Raspberry Pi, PC)

Python is an easy-to-use programming language that works on all major operating systems. There are a lot of Python Bluetooth Low Energy libraries, but many of them support only a single operating system. Bleak, which stands for Bluetooth Low Energy platform Agnostic Klient, is a welcome exception. It supports:

  • Windows 10, version 16299 (Fall Creators Update) or higher

  • Linux distributions with BlueZ 5.43 or higher (also on a Raspberry Pi)

  • OS X 10.11 (El Capitan) or macOS 10.12+

/images/rpi4.jpg

Bleak is a GATT client: it's able to connect to BLE devices that act as GATT servers. It supports reading, writing, and getting notifications from GATT servers, and it's also able to discover BLE devices and read advertising data broadcast by them.

Bleak doesn't implement a GATT server. In practice this isn't a big limitation. GATT servers are typically implemented on constrained devices, so, for this purpose, the ESP32 and nRF52 hardware platforms are a better match. 8

C++/NimBLE-Arduino (ESP32)

If you're looking at microcontrollers, the Arduino framework has become quite popular, not only for the original Arduino boards, which didn't have BLE functionality, but also on ESP32 development boards, which do.

/images/esp32-pico-kit-v4.1.jpg

Programming for the Arduino framework is done in a variant of C++, but the framework and many Arduino libraries hide much of C++'s complexity. Even if you only know some C (which is much less complex than C++), you'll be able to use the Arduino framework.

One of the more popular BLE libraries for Arduino on the ESP32 is NimBLE-Arduino. It's a fork of NimBLE, which is part of the Apache Mynewt real-time operating system. With NimBLE-Arduino, you can easily create your own GATT server or client.

C/Zephyr (nRF52)

For even more constrained devices, typically battery-powered, you need a specialized real-time operating system (RTOS). This book uses the Zephyr Project on nRF52840-based devices from Nordic Semiconductor. Zephyr has a completely open-source Bluetooth Low Energy stack.

/images/nrf52840-dongle.png

Zephyr's BLE stack is highly configurable. You can build Zephyr firmware for three configuration types:

Combined build

Builds the BLE controller, BLE host, and your application for a one-chip configuration.

Host build

Builds the BLE host and your application, along with an HCI driver to let your device communicate with an external BLE controller on another chip. 9

Controller build

Builds the BLE controller with an HCI driver to let your device communicate with an external BLE host on another chip.

With some basic knowledge of C, you can create your own BLE devices with Zephyr, such as BLE beacons, sensor boards, and proximity reporters. Zephyr has extensive documentation of its Bluetooth API, as well as a lot of ready-to-use examples that you can build upon.

June 11, 2022

  • Receive an ODT file (OpenDocument Text Document).
  • Everyone: opens the file with either LibreOffice or even Microsoft Office nowadays, apparently.
  • Me: uses Pandoc and LaTeX to convert the file to PDF and read it in Evince because I don’t have LibreOffice installed and I’m too lazy to upload the document to Google Docs.

I needed to review an addendum to a rental contract. (I moved! I’ll write about that later.) The addendum was sent to me in ODT format. At the time, my desktop pc was still packed in a box. On my laptop (a 2011 MacBook Air with Ubuntu 20.04) I only have the most essential software installed, which for me doesn’t include an office suite. I could install LibreOffice, but why make it easy if I can also do it the hard way? 😀

I do have Evince installed, which is a lightweight PDF viewer. To convert ODT to PDF I’m using Pandoc, which is a Swiss army knife for converting document formats. For PDF it needs the help of LaTeX, a document preparation system for typesetting.

First I installed the required software:

$ sudo apt install pandoc texlive texlive-latex-extra
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libpdfbox-java preview-latex-style texlive-base texlive-binaries
  texlive-fonts-recommended texlive-latex-base texlive-latex-recommended texlive-pictures texlive-plain-generic tipa
Suggested packages:
  libavalon-framework-java libcommons-logging-java-doc libexcalibur-logkit-java liblog4j1.2-java texlive-xetex texlive-luatex pandoc-citeproc
  context wkhtmltopdf librsvg2-bin groff ghc php python r-base-core libjs-mathjax node-katex perl-tk xzdec texlive-fonts-recommended-doc
  texlive-latex-base-doc python3-pygments icc-profiles libfile-which-perl libspreadsheet-parseexcel-perl texlive-latex-extra-doc
  texlive-latex-recommended-doc texlive-pstricks dot2tex prerex ruby-tcltk | libtcltk-ruby texlive-pictures-doc vprerex
The following NEW packages will be installed:
  libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libpdfbox-java pandoc preview-latex-style texlive texlive-base
  texlive-binaries texlive-fonts-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures texlive-plain-generic
  tipa
0 upgraded, 17 newly installed, 0 to remove and 1 not upgraded.
Need to get 116 MB of archives.
After this operation, 448 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Just to compare, installing LibreOffice Writer would actually use less disk space. Pandoc is a lot faster though.

$ sudo apt install libreoffice-writer
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libabw-0.1-1 libboost-date-time1.71.0 libboost-filesystem1.71.0 libboost-iostreams1.71.0 libboost-locale1.71.0 libclucene-contribs1v5
  libclucene-core1v5 libcmis-0.5-5v5 libe-book-0.1-1 libeot0 libepubgen-0.1-1 libetonyek-0.1-1 libexttextcat-2.0-0 libexttextcat-data libgpgmepp6
  libjuh-java libjurt-java liblangtag-common liblangtag1 libmhash2 libmwaw-0.3-3 libmythes-1.2-0 libneon27-gnutls libodfgen-0.1-1 liborcus-0.15-0
  libraptor2-0 librasqal3 librdf0 libreoffice-base-core libreoffice-common libreoffice-core libreoffice-math libreoffice-style-colibre
  libreoffice-style-tango librevenge-0.0-0 libridl-java libuno-cppu3 libuno-cppuhelpergcc3-3 libuno-purpenvhelpergcc3-3 libuno-sal3
  libuno-salhelpergcc3-3 libunoloader-java libwpd-0.10-10 libwpg-0.3-3 libwps-0.4-4 libxmlsec1 libxmlsec1-nss libyajl2 python3-uno uno-libs-private
  ure
Suggested packages:
  raptor2-utils rasqal-utils librdf-storage-postgresql librdf-storage-mysql librdf-storage-sqlite librdf-storage-virtuoso redland-utils
  libreoffice-base gstreamer1.0-plugins-bad tango-icon-theme fonts-crosextra-caladea fonts-crosextra-carlito libreoffice-java-common
The following NEW packages will be installed:
  libabw-0.1-1 libboost-date-time1.71.0 libboost-filesystem1.71.0 libboost-iostreams1.71.0 libboost-locale1.71.0 libclucene-contribs1v5
  libclucene-core1v5 libcmis-0.5-5v5 libe-book-0.1-1 libeot0 libepubgen-0.1-1 libetonyek-0.1-1 libexttextcat-2.0-0 libexttextcat-data libgpgmepp6
  libjuh-java libjurt-java liblangtag-common liblangtag1 libmhash2 libmwaw-0.3-3 libmythes-1.2-0 libneon27-gnutls libodfgen-0.1-1 liborcus-0.15-0
  libraptor2-0 librasqal3 librdf0 libreoffice-base-core libreoffice-common libreoffice-core libreoffice-math libreoffice-style-colibre
  libreoffice-style-tango libreoffice-writer librevenge-0.0-0 libridl-java libuno-cppu3 libuno-cppuhelpergcc3-3 libuno-purpenvhelpergcc3-3
  libuno-sal3 libuno-salhelpergcc3-3 libunoloader-java libwpd-0.10-10 libwpg-0.3-3 libwps-0.4-4 libxmlsec1 libxmlsec1-nss libyajl2 python3-uno
  uno-libs-private ure
0 upgraded, 52 newly installed, 0 to remove and 1 not upgraded.
Need to get 78,5 MB of archives.
After this operation, 283 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort.

Next, converting the file. It’s possible to tell Pandoc which file formats to use with the -f (from) and -t (to) switches, but it can usually guess correctly based on the file extensions.

$ time pandoc 2022-06-house-contract-adendum.odt -o 2022-06-house-contract-adendum.pdf

real	0m0,519s
user	0m0,475s
sys	0m0,059s

It took only half a second to convert the file. Opening LibreOffice takes a bit more time on this old laptop.

You can see the PDF document properties with pdfinfo:

$ pdfinfo 2022-06-house-contract-adendum.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.20
CreationDate:   Sat Jun 11 23:32:30 2022 CEST
ModDate:        Sat Jun 11 23:32:30 2022 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      612 x 792 pts (letter)
Page rot:       0
File size:      64904 bytes
Optimized:      no
PDF version:    1.5

I don’t want it in letter format, I want A4:

$ time pandoc -V papersize:a4 -o 2022-06-house-contract-adendum.pdf 2022-06-house-contract-adendum.odt

real	0m0,520s
user	0m0,469s
sys	0m0,060s
$ pdfinfo 2022-06-house-contract-adendum.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.20
CreationDate:   Sat Jun 11 23:40:16 2022 CEST
ModDate:        Sat Jun 11 23:40:16 2022 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      64935 bytes
Optimized:      no
PDF version:    1.5

Then I could open the file with evince 2022-06-house-contract-adendum.pdf.

And yes, I know that addendum is with double d. 🙂

June 07, 2022

Lyte was just listed as a top video player on WP Glob...

Source

June 03, 2022

I published the following diary on isc.sans.edu: “Sandbox Evasion… With Just a Filename!“:

Today, many sandbox solutions are available and deployed by most organizations to detonate malicious files and analyze their behavior. The main problem with some sandboxes is the filename used to submit the sample. The file can be named like “sample.exe”, “suspicious.exe”, “<SHA256>.tmp” or “malware.tmp”… [Read more]

The post [SANS ISC] Sandbox Evasion… With Just a Filename! appeared first on /dev/random.

June 02, 2022

Today, the book I would like to recommend is Efficient MySQL Performance – Best Practices and Techniques, Daniel Nichter, O’Reilly, 2021.

I participated (just a bit) in the writing of this book as technical reviewer with Vadim and Fipar. I really enjoyed that role of carefully reading the early drafts of the chapters Daniel was writing.

Although Daniel says the book is not for the experts, I think even experts will enjoy it because several key InnoDB concepts are also covered. You can see that I refer to the book often in my A graph a day, keeps the doctor away ! series on monitoring and trending.

If you’re looking for information on transaction isolation and undo logs, fuzzy checkpointing, etc… you’ll find valuable information in the book.

The book is also enhanced with detailed illustrations that help in understanding complicated concepts (InnoDB page flushing, page 216, Source to Replica , page 235, MVCC and undo logs, page 277 are some examples).

Personally, I use this book as I used the 2nd and 3rd editions of High Performance MySQL.

From the beginning to the end of the book, Daniel focuses on the most important and measurable metric for all database consumers: query response time.

I had the chance to meet again Daniel at Percona Live and offered me a signed copy 😉

the wonderful signature 😀

I also had the privilege of having my review published in the back of the book:

If you are looking for a book to improve your knowledge of MySQL or if you are a software engineer who needs to deal with MySQL, this is a good choice.

I also recommend reading Daniel’s blog which is a complement to the book.

Have a good read and enjoy MySQL!

June 01, 2022

This fall, from October 16 to 20, the MySQL Summit will be held in La Vegas.

This conference is totally dedicated to your favorite dolphin database and is part of Oracle CloudWorld.

MySQL Summit will bring together a large community of new and expert MySQL users.

Attendees will be able to meet the engineers, product managers and developers who make MySQL the number one open source database in the world!

MySQL Summit will be different from MySQL’s previous appearances at Oracle OpenWorld, for the summit, we, as MySQL Team, users of MySQL… MySQL Community, will have more sessions, dedicated tracks and more !

The call for papers is now open until June 24, 2022.

CfP is extented !

The Call for Papers ends on June 30th !

If you are interested in participating in this new all-MySQL event as a speaker, now is the perfect time to submit your proposal: https://bit.ly/mysqlsummit.

Oracle Single Sign On is required to submit your proposal.

I hope to see you there !

When watching Eurosong a couple of weeks ago -while having a tapas-like dinner for hours and having great fun- one song stood out for me but I did not really enjoy the arrangements. But then a couple of days ago I heard “Saudade, saudade” (by Maro) in a complete live version and I was almost crying. So Eurosong not only is good for an evening of kitsch, fun and food, now and again it also lets the...

Source

May 31, 2022

This is the second article of the series dedicated to MySQL trending.

As I wrote before, understanding your workload and seeing the evolution of it over time can help anticipating problems and work on solutions before the breakdown.

This article covers MySQL History List Length also known as HLL.

MySQL History List is related to InnoDB Undo Logs. InnoDB is a multi-version storage engine (MVCC). It keeps information about old versions of changed rows to support transactional features such as concurrency and rollback. This information is stored in undo tablespaces in a data structure called a rollback segment.

This means that you can start a transaction and continue to see a consistent snapshot even if the data changed by other transaction. This behavior is related to the isolation level. By default in MySQL, the transaction isolation is REPEATABLE-READ:

SQL> show global variables like '%isola%';
+-----------------------+-----------------+
| Variable_name         | Value           |
+-----------------------+-----------------+
| transaction_isolation | REPEATABLE-READ |
+-----------------------+-----------------+

To provide such isolation, InnoDB needs to keep old versions of rows that are modified until there is still a transaction open.

All those changes are kept in a linked list pointing to the previous version of the same row which itself points to a previous version of the same row, etc… This means that each time a row is updated, within a new transaction, the old version is copied over to the respective rollback-segment with pointer to it.

Each rows have then a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.

Transaction 99 was started with START TRANSACTION; and not yet committed or rolled-back.

On the illustration above, a second transaction (tx 100) insert a record. By default (REPEATABLE_READ), the second row is not visible in trx 99 (TRX_ID for that row is greater than 99).

Now when data is updated, the changes are also kept in the undo update:

And this keep increasing until the undo segments are not purged:

This is a high level illustration on how it works in InnoDB.

The History List Length quantifies the amount of changes (the amount of records containing preview changes).

If a record contains a large amount of versions, retrieving the value might take longer in the oldest transactions.

In the MySQL Manual, we can read: Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could require the information in the update undo log to build an earlier version of a database row.

Reading those lines, we could understand that if we have a long transaction (even inactive), that has accessed some rows that are not used by any other transaction, that won’t impact the history list… this is not the case!

The metrics are available in the INFORMATION_SCHEMA.INNODB_METRICS table when enabled or in the output of SHOW ENGINE INNODB STATUS\G:

MySQL> select * from INFORMATION_SCHEMA.INNODB_METRICS 
       where name='trx_rseg_history_len'\G
*************************** 1. row ***************************
           NAME: trx_rseg_history_len
      SUBSYSTEM: transaction
          COUNT: 8319
      MAX_COUNT: 92153
      MIN_COUNT: 7
      AVG_COUNT: NULL
    COUNT_RESET: 8319
MAX_COUNT_RESET: 92153
MIN_COUNT_RESET: 7
AVG_COUNT_RESET: NULL
   TIME_ENABLED: 2022-05-25 10:23:17
  TIME_DISABLED: NULL
   TIME_ELAPSED: 135495
     TIME_RESET: NULL
         STATUS: enabled
           TYPE: value
        COMMENT: Length of the TRX_RSEG_HISTORY list
MySQL> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2022-05-27 00:01:46 139760858244672 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 43 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 4146 srv_active, 0 srv_shutdown, 76427 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 5954
OS WAIT ARRAY INFO: signal count 60629
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 903438
Purge done for trx's n:o < 883049 undo n:o < 0 state: running but idle
History list length 9746

Trending Graph

Let’s have a look at this graph:

We can see that the History List Lengt (trx_rseg_history_len) is increasing linearly… but the workload is not:

When HLL increases significantly over a period of time, it means that InnoDB is keeping a large amount of old row versions instead of purging them because one or more long-running transaction has not committed or was abandoned without being rolled back.

In MySQL starting a transaction and then performing a simple SELECT starts all this MVCC mechanism.

Daniel Nichter in his book, Efficient MySQL Performance, explains that a normal value for innodb.trx_rseg_history_len is less than 1,000. If it goes over 100,000, this can become problematic and an alert should be sent.

I recommend you to read the chapter MVCC and the Undo Logs, page 276 of Daniel’s book.

Size matters !

Something that is important to know and that is not exposed in MySQL, is that HLL represents an amount of changes, not the size of those changes. So, even less that 1,000 can be problematic if those are full of huge blobs for example.

Let’s have a look again to the History List Length for the next 10 minutes:

We can see that as soon as we stopped a transaction that was kept open (sleeping) all get solved almost immediately !

The workload is sysbench OLTP insert (not using the employees database) and we created a long transaction using the employees database. This long transaction statement was:

MySQL> start transaction;
MySQL> select * from employees.titles limit 10;
+--------+-----------------+------------+------------+
| emp_no | title           | from_date  | to_date    |
+--------+-----------------+------------+------------+
|  10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |
|  10002 | Staff           | 1996-08-03 | 9999-01-01 |
|  10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |
|  10004 | Engineer        | 1986-12-01 | 1995-12-01 |
|  10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |
|  10005 | Senior Staff    | 1996-09-12 | 9999-01-01 |
|  10005 | Staff           | 1989-09-12 | 1996-09-12 |
|  10006 | Senior Engineer | 1990-08-05 | 9999-01-01 |
|  10007 | Senior Staff    | 1996-02-11 | 9999-01-01 |
|  10007 | Staff           | 1989-02-10 | 1996-02-11 |
+--------+-----------------+------------+------------+
10 rows in set (0.0002 sec)
MySQL>  We did nothing for 10 minutes
MySQL> rollback;

The graph bellow represent the same transaction idle for 4mins in the middle of 10 minutes sysbench OLT Read/Write:

What does a large HLL really mean ?

The reason why History List Length increases is that the InnoDB Purge activity is lagging !

The purge thread is responsible for emptying and truncating undo tablespaces (see the manual).

What could be responsible of such lag into the purge process ?

  • the write activity is too high and the purge is unable to process as fast
  • a long running transaction is blocking the purge and the purge won’t progress until the transaction is finished

We will see later, how we can deal with this, but first, let’s have a look at the performance.

Performance

Even if HLL doesn’t impact directly performance, it might become problematic when a lot of versions of the rows need to be traversed.

Let’s see this behavior with the example above. If we perform the following SELECT when we start the long transaction that we will let open (abandoned), pay attention to the size of HLL and the execution time:

MySQL> SELECT id, k, (
         SELECT count FROM information_schema.innodb_metrics 
          WHERE name='trx_rseg_history_len') HLL 
       FROM sbtest.sbtest1 WHERE c LIKE '36%' LIMIT 10;
+-----+-------+-----+
| id  | k     | HLL |
+-----+-------+-----+
|  10 | 34610 |  98 |
| 288 |   561 |  98 |
| 333 | 54800 |  98 |
| 357 | 96142 |  98 |
| 396 | 82983 |  98 |
| 496 | 65614 |  98 |
| 653 | 38837 |  98 |
| 684 | 61922 |  98 |
| 759 |  8758 |  98 |
| 869 | 50641 |  98 |
+-----+-------+-----+
10 rows in set (0.0006 sec) 

If we try again later in the same transaction (we didn’t rolled-back or committed it), the same query we can notice something different:

MySQL> SELECT id, k, (
         SELECT count FROM information_schema.innodb_metrics 
          WHERE name='trx_rseg_history_len') HLL 
       FROM sbtest.sbtest1 WHERE c LIKE '36%' LIMIT 10;
+-----+-------+--------+
| id  | k     | HLL    |
+-----+-------+--------+
|  10 | 34610 | 391836 |
| 288 |   561 | 391836 |
| 333 | 54800 | 391836 |
| 357 | 96142 | 391836 |
| 396 | 82983 | 391836 |
| 496 | 65614 | 391836 |
| 653 | 38837 | 391836 |
| 684 | 61922 | 391836 |
| 759 |  8758 | 391836 |
| 869 | 50641 | 391836 |
+-----+-------+--------+
10 rows in set (1.9848 sec)

The query is now much slower when the History List Length is large.

As explained in this excellent post of Jeremy Cole, in write-heavy databases, having a large History List Length may require reverting the version of a large amount of rows to very old versions. This will slow down the transaction itself and in the worst case may mean that very long-running queries in a write-heavy database can never actually complete; the longer they run the more expensive their reads get.

Having large HLL means that the undo logs also increase. With MySQL 8.0, you have more control on Undo Log tablespaces (see the manual), but you still need to monitor your diskspace !

Solutions

If the HLL is growing, the first step is to identify which reasons of the two listed above the system is experiencing.

Purge is not able to follow heavy writes

If the purge threads are not able to keep up with the write workload, it is necessary to throttle the write activity.

In MySQL 8.0, a maximum purge lag can be configured for InnoDB: innodb_max_purge_lag.

When the purge lag exceeds the innodb_max_purge_lag threshold, a delay is imposed on INSERT, UPDATE and DELETE operations to allow time for purge operations to catch up.

In some exceptionally rare situations that delay became way too high, this is why you have also the possibility to cap it using innodb_max_purge_lag_delay.

Another tunable setting related to InnoDB’s Purge, is innodb_purge_threads that represents the number of background threads dedicated to the Purge operation.

There is no ideal number to recommend, as usual, it depends 😉

The manual explains this point very well:

If the innodb_max_purge_lag setting is exceeded, purge work is automatically redistributed among available purge threads. Too many active purge threads in this scenario can cause contention with user threads, so manage the innodb_purge_threads setting accordingly.

If DML action is concentrated on few tables, keep the innodb_purge_threads setting low so that the threads do not contend with each other for access to the busy tables. If DML operations are spread across many tables, consider a higher innodb_purge_threads setting. The maximum number of purge threads is 32.

The innodb_purge_threads setting is the maximum number of purge threads permitted. The purge system automatically adjusts the number of purge threads that are used.

Long Running Transactions

As pointed earlier, a long running transaction, even a sleeping/stalled one, will block the purge and regardless of the write workload, even if it’s very low, the HLL will continue to grow for the entire life of that transaction.

The only way to fix this is by stopping those long transactions (commit, rollback, kill).

To find such long running transactions, this Performance_Schema query can be used:

MySQL> SELECT ROUND(trx.timer_wait/1000000000000,3) AS trx_runtime_sec,
              format_pico_time(trx.timer_wait) AS trx_runtime,
              processlist_id, trx.thread_id AS thread_id,
              trx.event_id AS trx_event_id, trx.isolation_level,
              trx.autocommit, stm.current_schema AS db, 
              stm.sql_text AS query, 
              stm.rows_examined AS rows_examined, 
              stm.rows_affected AS rows_affected, 
              stm.rows_sent AS rows_sent, 
              IF(stm.end_event_id IS NULL, 'running', 'done') AS exec_state, 
              ROUND(stm.timer_wait/1000000000000,3) AS exec_time 
   FROM performance_schema.events_transactions_current trx 
   JOIN performance_schema.events_statements_current stm USING (thread_id)       
   JOIN threads USING (thread_id) 
  WHERE trx.state = 'ACTIVE' AND trx.timer_wait > 1000000000000 * 1\G
*************************** 1. row ***************************
trx_runtime_sec: 1040.443
    trx_runtime: 17.34 min
 processlist_id: 107
      thread_id: 147
   trx_event_id: 73
isolation_level: REPEATABLE READ
     autocommit: NO
             db: sbtest
          query: select * from employees.titles limit 10
  rows_examined: 10
  rows_affected: 0
      rows_sent: 10
     exec_state: done
      exec_time: 0.000
1 row in set (0.0004 sec) 

If the state and the query doesn’t change between multiple runs, the query can be considered as stalled or abandoned. A DBA should take action and kill it.

The Isolation level also impact this, I recommend to use READ-COMMITTED instead of the default REPEATABLE-READ as it will help reducing the HLL.

Indeed, with READ-COMMITTED, a new read-view is spawned for each SQL statement, and only kept active for it’s duration, as opposed to REPEATABLE-READ in which the read-view’s lifetime is tied to the whole transaction. This means that in REPEATABLE-READ as show earlier with the example, if you start a transaction and perform one SELECT and go for a coffee, you are still blocking the undo log purge, but with READ-COMMITTED as soon as the query finishes, the undo log purge is no longer blocked.

Is READ-COMMITTED always better ?

DimitriK points that there are also some caveats with READ-COMMITTED as he explains in this post. This is something you need to explore and maybe change the isolation level only for the session of those long transactions an eventually use READ-UNCOMMITTED if you can afford dirty reads.

Reclaiming Undo Log’s disk space

With MySQL 8.0 we have two methods to truncate undo tablespaces to reclaim the diskspace, which can be used individually or in combination to manage undo tablespace size.

The first method is automated by enabling innodb_undo_log_truncate which is now enabled by default.

The second is manual, with a SQL statement the DBA has the possibility to mark a undo log tablespace as innactive. All transactions using rollback segments in that specific tablespace, are permitted to finish. After transactions are completed, the purge system frees the rollback segments in the undo tablespace, then it is truncated to its initial size, and the undo tablespace state changes from inactive to empty.

Two undo logs are always required, so when you set a undo log tablespace to inactive, prior, you must have a least 3 active ones (including the one to set inactive).

The manual SQL syntax is:

MySQL> ALTER UNDO TABLESPACE tablespace_name SET INACTIVE;

It’s possible to list the undo log tablespaces and their state by running the following query:

MySQL> SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES 
       WHERE row_format='undo' ;
+-----------------+--------+
| NAME            | STATE  |
+-----------------+--------+
| innodb_undo_001 | active |
| innodb_undo_002 | active |
+-----------------+--------+

There are also some status variables related to the Undo Tablespaces:

MySQL> SELECT * FROM global_status WHERE variable_name 
       LIKE 'Innodb_undo_tablespaces%';
+----------------------------------+----------------+
| VARIABLE_NAME                    | VARIABLE_VALUE |
+----------------------------------+----------------+
| Innodb_undo_tablespaces_total    | 2              |
| Innodb_undo_tablespaces_implicit | 2              |
| Innodb_undo_tablespaces_explicit | 0              |
| Innodb_undo_tablespaces_active   | 2              |
+----------------------------------+----------------+

The above output is the default of MySQL 8.0. In case we would like to set inactive innodb_undo_001, this is the error we will get:

ERROR 3655 (HY000): Cannot set innodb_undo_001 inactive since there 
would be less than 2 undo tablespaces left active

So we need to create another one first using the following syntax:

MySQL> CREATE UNDO TABLESPACE my_undo_003 ADD DATAFILE 'my_undo_003.ibu';
Query OK, 0 rows affected (0.47 sec)

On the filesystem, we can see the new added tablespace:

[root@imac ~]# ls /var/lib/mysql/*undo* -lh
-rw-r----- 1 mysql mysql 16M May 31 20:13 /var/lib/mysql/my_undo_003.ibu
-rw-r----- 1 mysql mysql 32M May 31 20:12 /var/lib/mysql/undo_001
-rw-r----- 1 mysql mysql 32M May 31 20:13 /var/lib/mysql/undo_002

Now we can set it to inactive:

mysql> ALTER UNDO TABLESPACE innodb_undo_001 SET INACTIVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES 
       WHERE row_format='undo' ;
+-----------------+--------+
| NAME            | STATE  |
+-----------------+--------+
| innodb_undo_001 | empty  |
| innodb_undo_002 | active |
| my_undo_003     | active |
+-----------------+--------+
3 rows in set (0.00 sec)

When empty, we can set it back to active, and if we want, we can also delete the extra one like this:

MySQL> ALTER UNDO TABLESPACE my_undo_003 SET INACTIVE;
Query OK, 0 rows affected (0.00 sec)

MySQL> DROP UNDO TABLESPACE my_undo_003;
Query OK, 0 rows affected (0.01 sec)

MySQL> SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES
       WHERE row_format='undo' ;
+-----------------+--------+
| NAME            | STATE  |
+-----------------+--------+
| innodb_undo_001 | active |
| innodb_undo_002 | active |
+-----------------+--------+
2 rows in set (0.00 sec)

Conclusion

You understand now why it’s important to monitor InnoDB History List Length and in case it increases too much, identify if the purge is just not able to floow the write workload or if some long transactions are just totally blocking the InnoDB Purge.

The other method is manual, performed using SQL statements.

Thank you to Kuba and DimK for reviewing the article.

We spent a week on a power catamaran exploring the British Virgin Islands (BVI). The purpose of the trip was twofold: to experience the BVI, and to come home with a boating license.

The BVI are somewhat hard to get to. It's part of what makes them so stunningly unspoiled. Getting to the BVI required flying from Boston to San Juan, and then from San Juan to Tortola. The flight to Tortola involved a very small plane, which was an experience in itself.

An airline worker stuffing luggage in the wing of a small planeThe plane to Tortola was so small that the luggage had to go in the wings. The pilot also sized each passenger up and decided where we would sit to evenly distribute the weight.

On the first day we met our captain (and instructor) and charted to our first destination; Oil Nut Bay at the Island of Virgin Gorda. It's where we got introduced to the BVI's signature drink; the "painkiller".

Four different rum cocktails lined up next to each othersPainkillers and rum punches, the most popular drinks in the BVI. Every bar and restaurant serves them.Vanessa watching the sunset from an infinity pool while having a cocktailVanessa enjoying a cocktail from the pool at Nova on Oil Nut Bay. Our boat is in the background, tied to a mooring buoy.

We spent the next six days island hopping around the BVI. Each day, we'd arrive at a different near-deserted Caribbean island. We'd tie our boat to a mooring ball, and jump off the boat to swim or paddle board.

A woman standing on a paddle board, surrounded by boats on mooring ballsTaking a stand-up paddle board out in a bay.

After our swim, we'd take our dinghy to shore to explore the island of the day. On shore, there is little to nothing except maybe a few beach bars and restaurants scattered around. In the evening, we'd either cook dinner on the boat, or enjoy the rare restaurant or bar on the island.

Dinghies tied up to a dockTaking our dinghy to a restaurant on shore.

Each island has perfect soft sand and pristine turquoise water. But each island also has something unique to offer; Cooper Island had a rum bar with a selection of 300 rums; Jost Van Dyke had an amazing beach bar (the famous Soggy Dollar who invented the Painkiller); Trellis Bay has a great sushi restaurant; and The Baths are a unique national park, and a must-see attraction.

A hand on top of a nautical chart and next to a course plotterNavigation planning using a nautical chart, course plotter, and brass dividers.

Every day, we had classroom-style learning and practiced driving the boat. This included navigation planning, docking exercises, man-overboard maneuvers, anchoring, and more. At the end of the week, I took the theoretical and practical exams, and passed! Next time, I'll be able to charter my own boat.

May 27, 2022

This is the first article of a new blog series related to MySQL books.

Let’s start with MySQL Workshop, a practical guide to working with data and managing databases with MySQL written by Thomas Pettit and Scott Cosentino.

The editor sent me a copy and asked me what I was thinking about the book.

This is the review I sent:

This book is really appropriate for those who want to start with MySQL. It covers all aspects necessary for developers to start their journey with MySQL.

The book also covers topics that are not covered in other books such as integration with MS Access and VBA, which in my opinion makes this book unique!

I also really enjoyed the well explained chapter about how to use MySQL and Node.JS.

And finally, there is a chapter about the X Dev API that allows you to use MySQL without a single line of SQL, which makes developers happy… All this information in the same book!

So if you are looking for a book to start with MySQL with plenty of examples, this book is made for you !

Good reading and… enjoy MySQL !

May 25, 2022

We are getting many requests for migration from MariaDB to MySQL. Here is a quick guide and steps to follow:

On the MariaDB server:

  • Create a logical dump of MariaDB (using MySQL Shell)
  • Create a dedicated user for replication [optional]

On the MySQL Server:

  • Import/Load the logical dump into MySQL 8.0 (using MySQL Shell)
    • You can load the schema and tables definition without the data to modify the character set
    • Load the data
  • Setup replication between MariaDB and MySQL [optional]

MySQL Shell dump & load utility is the best method to perform logical dumps. Those dumps can be used for logical backups but also to migrate from versions where physical in-place upgrades are not possible.

This is the case with migration to the cloud for example.

In-place upgrades are also not possible when we want to migrate from an older version (or perform a downgrade) of MySQL. This is exactly what I’m covering in this article: migrating from MariaDB 10.6 to MySQL 8.0.

MySQL Shell Upgrade Checker

If we want to upgrade from MySQL 5.7 to 8.0, it is recommended to use MySQL Shell Upgrade Checker.

However, MySQL Shell Upgrade Checker utility doesn’t support any version below 5.7, neither MariaDB.

I would highlight that MySQL Shell Upgrade Checker:

  • is mandatory for In-place Upgrade
  • is much less necessary during a migration using a logical dump

The database I used for this migration is the employees database and sbtest created by sysbench.

During the full migration process, sysbench generates transactions on the system (connected to the MariaDB instance).

Logical Dump of MariaDB

Now we will dump the data (logical dump) using MySQL Shell Dump & Load. The dump is done in parallel and stored locally on disk:

dumpInstance() sees MariaDB as MySQL 5.5 – pay attention that GTIDs are not compatible between both platforms

The dump is fast and we can already load it to MySQL 8.0.

We will also take the opportunity to migrate to the new character set: UTF8mb4.

Load the dump into MySQL 8.0

It’s recommended to load first the users (if we plan to keep them). Users are required if definers are specified with views. In case we don’t want to have the definers specified, you need to take the logical dump using the strip_definers compatibility option like this:

MariaDB JS> util.dumpInstance('/home/fred/dump/maria-10.6.7',
            {'compatibility': ['strip_definers']})

We first load all the schemas and tables definitions using the option loadData set false. We also need to ignore the dump’s version:

Once all the schemas and tables are loaded, we modify the default character set like this:

When all modifications are done, we can load the data in parallel:

more threads can be specified

Compatibility Issue

It’s also possible that MySQL Shell Dump doesn’t work with MariaDB as it’s not supported. For example, depending of your data definition, it might be necessary to disable the dump using chunks. This will reduce the speed of the dump & load operations.

Live Migration

Usually when we want to perform migration with minimal downtime, we must use replication. The traffic to the application doesn’t need to be stopped during the dump process, neither while restoring the data to the new server.

So when the new server is ready, it needs to catch up all the transactions that were made during that period.

Asynchronous Replication is then used to catch up and only once the old and the new server are in sync, we can decide to move the traffic (switch) to the new server.

Does replication work between MariaDB and MySQL 8.0 ?

MySQL allows replication from old version to new version (not the reverse). So if your application is not using features that are specific to MariaDB, this should not be a problem at all.

I discussed with a friend during Percona Live and he explained to me that he used this technique to migrate all his databases (production and development) from MariaDB to MySQL 8.0. That represented terabytes of data !

Preparing for Replication

Disclaimer

Some offensive words may appear in some commands. In MySQL we pay a lot of attention to that and we are removing those offensive words everywhere, commands, comments, results, code…

To replicate from MariaDB to MySQL 8.0, we need first a dedicated user:

MariaDB> CREATE USER repl@'%' IDENTIFIED by 'repl_passwd';
MariaDB> GRANT REPLICATION SLAVE ON *.* TO repl@'%';

In the folder where we stored the dump, there is a file called @.json containing all the metadata of the dump.

We will use that file to get the binary log file’s name and position from which we need to start the replication process:

@.json

Start Replica

On MySQL 8.0, if we want to use GTIDs (always recommended) we need to change the GTID mode to allow replication from the MariaDB server:

MySQL> SET PERSIST gtid_mode=on_permissive;

Then we can setup replication:

MySQL> CHANGE REPLICATION SOURCE TO source_host='localhost',
              source_port=10607, source_user='repl',
              source_password='repl_passwd', 
              source_log_file='mariab-bin.000004', source_log_pos=20972403;

And we can start the replica and wait for the lag to decrease:

MySQL> START REPLICA;
MySQL> SHOW REPLICA STATUS\G

Conclusion

We are now ready to move all our traffic (sysbench) to the new MySQL 8.0 server and we can retire the old MariaDB server.

We are constantly improving MySQL Shell Dump & Load utility. If you encounter an issue, please submit a bug or add a comment on this article.

Enjoy migrating to MySQL 8.0 !

FAQ

How can I migrate from older versions of MariaDB (10.0, 10.1, 10.2, 10.3, ..) to MySQL 8.0 ?

Using the exact same technique of creating logical dumps. And import them.

I already covered such topic in these previous articles: [1], [2].

Can we perform an in-place upgrade from MariaDB to MySQL 8.0 ?

No, MariaDB and MySQL 8.0 are too different now. Those products are using the same protocol but are different. Even InnoDB is now different between those 2 databases.

Is replication from MariaDB to MySQL 8.0 supported ?

If you set the GTID mode to permissive on MySQL side, it should mostly work without any problem. Avoid using MariaDB specific features as it will break replication.

Can I load the users using MySQL Shell Dump & Load ?

No, as the syntax to create the users is not compatible anymore between both databases. However, you can use a MySQL Shell plugin to copy the users, see this article.

Example:

loading MariaDB users with MySQL Shell MySQL Shell copy user plugin example

Please note that some MariaDB specific keywords like VIA are not supported.

What if I use specific features in MariaDB like System-Versioned Tables ?

Those tables will be ignored from the dump as their table type is SYSTEM VERSIONED, MySQL Shell Dump & Load is only using BASE TABLE and VIEW. Don’t forget that if such features are activated after the dump & load, while replication is running, this will break replication.

Extra

Just because I am curious, I patched MySQL Shell to check what would be the result of MySQL Shell Upgrade Checker on MariaDB 10.6.7.

Let’s have a look at the output generated with this patched version of MySQL Shell:

We can see that some of the errors are related to impossible checks because some performance_schema tables are missing.

The main errors are also related to 2 tables in mysql schema, nothing to worry about as those tables are not specific to my application.

May 21, 2022

At work, as with many other companies, we're actively investing in new platforms, including container platforms and public cloud. We use Kubernetes based container platforms both on-premise and in the cloud, but are also very adamant that the container platforms should only be used for application workload that is correctly designed for cloud-native deployments: we do not want to see vendors packaging full operating systems in a container and then shouting they are now container-ready.

Sadly, we notice more and more vendors abusing containerization to wrap their products in and selling it as 'cloud-ready' or 'container-ready'. For many vendors, containers allow them to bundle everything as if it were an appliance, but without calling it an appliance - in our organization, we have specific requirements on appliances to make sure they aren't just pre-build systems that lack the integration, security, maintainability and supportability capabilities that we would expect from an appliance.

Even developers are occasionally tempted to enlarge container images with a whole slew of middleware and other services, making it more monolithic solutions than micro-services, just running inside a container because they can. I don't feel that this evolution is beneficial (or at least not yet), because the maintainability and supportability of these images can be very troublesome.

This evolution is similar to the initial infrastructure-as-a-service offerings, where the focus was on virtual machines: you get a platform on top of which your virtual machines run, but you remain responsible for the virtual machine and its content. But unlike virtual machines, where many organizations have standardized management and support services deployed for, containers are often shielded away or ignored. But the same requirements should be applied to containers just as to virtual machines.

Let me highlight a few of these, based on my Process view of infrastructure.

Cost and licensing

Be it on a virtual machine or in a container, the costs and licensing of the products involved must be accounted for. For virtual machines, this is often done through license management tooling that facilitates tracking of software deployments and consumption. These tools often use agents running on the virtual machines (and a few run at the hypervisor level so no in-VM agents are needed).

Most software products also use licensing metrics that are tailored to (virtual) hardware (like processors) or deployments (aka nodes, i.e. a per-operating-system count). Software vendors often have the right to audit software usage, to make sure companies do not abuse their terms and conditions.

Now let's tailor that to a container environment, where platforms like Kubernetes can dynamically scale up the number of deployments based on the needs. Unlike more static virtual machine-based deployments, we now have a more dynamic environment. How do you measure software usage here? Running software license agents inside containers isn't a good practice. Instead, we should do license scanning in the images up-front, and tag resources accordingly. But not many license management tooling is already container-aware, let alone aligned with a different way of working.

But "our software license management tooling is not container-ready yet" is not an adequate answer to software license audits, nor will the people in the organization that are responsible for license management be happy with such situations.

Product lifecycle

Next to the licensing part, companies also want to track which software versions are being used: not just for vulnerability management purposes, but also to make sure the software remains supported and fit for purpose.

On virtual machines, regular software scanning and inventory setup can be done to report on the software usage. And while on container environments this can be easily done at the image level (which software and versions are available in which containers) this often provides a pre-deployment view, and doesn't tell us if a certain container is being used or not, nor if additional deployments have been triggered since the container is launched.

Again, deploying in-container scanning capabilities seems to be contra-productive here. Having an end-to-end solution that detects and registers software titles and products based on the container images, and then provides insights into runtime deployments (and history) seems to be a better match.

Authorization management (and access control)

When support teams need to gain access to the runtime environment (be it for incident handling, problem management, or other operational tasks) most companies will already have a somewhat finer-grained authorization system in place: you don't want to grant full system administrator rights if they aren't needed.

For containers, this is often not that easy to accomplish: the design of container platforms is tailored to situations where you don't want to standardize on in-container access: runtimes are ephemeral, and support is handled through logging and metric, with adaptation to the container images and rolling out new versions. If containers are starting to get used for more classical workloads, authorization management will become a more active field to work out.

Consider a database management system within the container alongside the vendor software. Managing this database might become a nightmare, especially if it is only locally accessible (within the container or pod). And before you yell how horrible such a setup would be for a container platform... yes, but it is still a reality for some.

Auditing

Auditing is a core part of any security strategy, logging who did what, when, from where, on what, etc. For classical environments, audit logging, reporting and analysis are based upon static environment details: IP addresses, usernames, process names, etc.

In a container environment, especially when using container orchestration, these classical details are not as useful. Sure, they will point to the container platform, but IP addresses are often shared or dynamically assigned. Usernames are dynamically generated or are pooled resources. Process identifiers are not unique either.

Auditing for container platforms needs to consider the container-specific details, like namespaces. But that means that all the components involved in the auditing processes (including the analysis frameworks, AI models, etc.) need to be aware of these new information types.

In the case of monolithic container usage, this can become troublesome as the in-container logging often has no knowledge of the container-specific nature, which can cause problems when trying to correlate information.

Conclusion

I only touched upon a few processes here. Areas such as quality assurance and vulnerability management are also challenges for instance, as is data governance. None of the mentioned processes are impossible to solve, but require new approaches and supporting services, which make the total cost of ownership of these environments higher than your business or management might expect.

The rise of monolithic container usage is something to carefully consider. In the company I work for, we are strongly against this evolution as the enablers we would need to put in place are not there yet, and would require significant investments. It is much more beneficial to stick to container platforms for the more cloud-native setups, and even in those situations dealing with ISV products can be more challenging than when it is only for internally developed products.

Feedback? Comments? Don't hesitate to drop me an email, or join the discussion on Twitter.

May 20, 2022

I have a new laptop. The new one is a Dell Latitude 5521, whereas the old one was a Dell Latitude 5590.

As both the old and the new laptops are owned by the people who pay my paycheck, I'm supposed to copy all my data off the old laptop and then return it to the IT department.

A simple way of doing this (and what I'd usually use) is to just rsync the home directory (and other relevant locations) to the new machine. However, for various reasons I didn't want to do that this time around; for one, my home directory on the old laptop is a bit of a mess, and a new laptop is an ideal moment in time to clean that up. If I were to just rsync over the new home directory, then, well.

So instead, I'm creating a tar ball. The first attempt was quite slow:

tar cvpzf wouter@new-laptop:old-laptop.tar.gz /home /var /etc

The problem here is that the default compression algorithm, gzip, is quite slow, especially if you use the default non-parallel implementation.

So we tried something else:

tar cvpf wouter@new-laptop:old-laptop.tar.gz -Ipigz /home /var /etc

Better, but not quite great yet. The old laptop now has bursts of maxing out CPU, but it doesn't even come close to maxing out the gigabit network cable between the two.

Tar can compress to the LZ4 algorithm. That algorithm doesn't compress very well, but it's the best algorithm if "speed" is the most important consideration. So I could do that:

tar cvpf wouter@new-laptop:old-laptop.tar.gz -Ilz4 /home /var /etc

The trouble with that, however, is that the tarball will then be quite big.

So why not use the CPU power of the new laptop?

tar cvpf - /home /var /etc | ssh new-laptop "pigz > old-laptop.tar.gz"

Yeah, that's much faster. Except, now the network speed becomes the limiting factor. We can do better.

tar cvpf - -Ilz4 /home /var /etc | ssh new-laptop "lz4 -d | pigz > old-laptop.tar.gz"

This uses about 70% of the link speed, just over one core on the old laptop, and 60% of CPU time on the new laptop.

After also adding a bit of --exclude="*cache*", to avoid files we don't care about, things go quite quickly now: somewhere between 200 and 250G (uncompressed) was transferred into a 74G file, in 20 minutes. My first attempt hadn't even done 10G after an hour!

I published the following diary on isc.sans.edu: “A ‘Zip Bomb’ to Bypass Security Controls & Sandboxes“:

Yesterday, I analyzed a malicious archive for a customer. It was delivered to the mailbox of a user who, hopefully, was security-aware and reported it. The payload passed through the different security layers based on big players on the market!

The file is a zip archive (SHA256:97f205b8b000922006c32c9f805206c752b0a7d6280b6bcfe8b60d52f3a1bb5f) and has a score of 6/58 on VT. The archive contains an ISO file that, once mounted, discloses a classic PE file. But let’s have a look at the file… [Read more]

The post [SANS ISC] A ‘Zip Bomb’ to Bypass Security Controls & Sandboxes appeared first on /dev/random.

May 19, 2022

Lorsque j’ai signé pour la publication de Printeurs, mon éditeur m’a fait savoir qu’il attendait de la part des auteurs un certain investissement dans la promotion de leurs livres. J’ai demandé ce qu’il entendait par là et il m’a parlé de participer à des salons, des séances de dédicaces, ce genre de choses. Salons qui furent parmi les premières victimes du COVID en 2020 et 2021.

En 2022, il est temps de rattraper le temps perdu et d’honorer ma promesse. Je serai donc présent du vendredi 20 mai après-midi jusque dimanche 22 mai après-midi aux Imaginales à Épinal, la plus célèbre foire aux boudins de l’imaginaire.

Je n’ai aucune idée de ce que je suis censé faire.

Si j’ai très souvent entendu parler des Imaginales, je n’ai aucune idée de à quoi ressemble ce genre d’événements ni ce qu’on peut y attendre d’un auteur. C’est une nouvelle expérience pour moi et j’en suis assez curieux.

Si vous êtes dans le coin, n’hésitez pas à venir faire un petit coucou et me poser des questions sur le livre qui va sortir cette année. Cherchez une machine à écrire coincée entre une pile de « Printeurs », mon roman cyberpunk, et une pile de « Aristide, le lapin cosmonaute », mon livre pour enfants dont il ne reste qu’une poignée d’exemplaires. Le type derrière le machine qui fait « clac clac clac ding », c’est moi ! Ne vous sentez pas obligé d’acheter un bouquin. Quoi ? Mon éditeur (le type derrière moi, avec le fouet dans la main et un symbole € à la place des pupilles) me dit que si, c’est obligatoire.

Pratiquement, je serai au stand PVH édition, dans une petite tente blanche face à la fontaine au milieu du parc le samedi entre 11h et 19h et, à confirmer, le dimanche matin au même endroit, à côté d’une démonstration de jeu vidéo et d’une imprimante 3D.

J’avoue avoir longuement hésité à poster ce message, mais je me suis dit que si certains d’entre vous sont dans le coin, ce serait dommage de se rater. C’est toujours un plaisir pour moi de rencontrer des lecteurs de mes livres ou de mon blog. Certains d’entre vous me suivent et me soutiennent depuis près de 15 ans et il y a dans la rencontre en chair et en os, même très brève, quelque chose que des dizaines d’emails ne pourront jamais apporter.

Pour ceux qui ne seront pas à Épinal, ce n’est, je l’espère, que partie remise (entre nous, j’ai le secret espoir de pouvoir proposer une conférence aux Utopiales à Nantes, ce qui me donnerait une bonne excuse pour m’y rendre).

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

May 17, 2022

I published the following diary on isc.sans.edu: “Use Your Browser Internal Password Vault… or Not?“:

Passwords… a so hot topic! Recently big players (Microsoft, Apple & Google) announced that they would like to suppress (or, at least, reduce) the use of classic passwords. In the meantime, they remain the most common way to authenticate users against many online services. Modern Browsers offer lightweight password management tools (“vaults”) that help users to save their passwords in a central repository. So they don’t have to remember them, and they follow the golden rule that we, infosec people, are recommending for a long time: to not share passwords across services… [Read more]

The post [SANS ISC] Use Your Browser Internal Password Vault… or Not? appeared first on /dev/random.

May 15, 2022

De bedoeling van Finland en Zweden die lid worden van het huidige NATO mag niet zijn dat we meer NATO krijgen. Maar wel dat we een groter EU leger verkrijgen.

Alleen een goed EU leger zal in de toekomst een buurland zoals Rusland koest en kalm houden. De NATO zal dat niet doen. In tegendeel, het zal Rusland onrustig maken. Want dat speelt in NATO’s kaarten: een grootschalig conflict op het grondgebied van de EU met Rusland is in het grootste belang van de NATO om te blijven bestaan als organisatie.

M.a.w. moeten wij Europeanen heel erg veel investeren in onze eigen defensie. We moeten dit niet meer overlaten aan de VS. We moeten dit financieel én operationeel in eigen handel nemen.

We moeten bereid zijn en klaar staat om agressie van bijvoorbeeld Rusland te kunnen afslaan. Zelfs te bestraffen. Dat wil dus zeggen dat ook wij nucleaire wapens nodig hebben. In EU handen. Inzetbaar. Met een strategisch EU commando dat het middel ook effectief kan en vooral zal inzetten in geval van bepaalde scenarios.

Maar vooral moeten we investeren in diplomatie en realisme, realpolitik, met Rusland. Aan de hand van hun bezorgdheden, en die van ons.

Oekraïne was een bufferstaat. Dat werkte goed. Ik zou graag willen dat het dat zou blijven en zou zijn. Terwijl het graan produceert. Terwijl het groeit. Terwijl het eventueel over een paar decennium onderdeel van de EU wordt.

Maar zouden we niet eerst de Balkan eens eindelijk na zo’n godverdomme 20 jaar lid maken van de EU?! Kosovaren bijvoorbeeld kunnen nog steeds niet zonder reis visum naar de EU komen. Hoewel we daar de grootste NAVO-basis van West-Europa hebben (KFOR) staan. We praten hier nu over Oekraïne dat morgen lid van de EU zou kunnen worden. Terwijl de gehele Balkan-regio na haar oorlogen van de jaren negentig nog steeds, in 2022, niet geïntegreerd is.

 Two books about The Witcher and eentje over onze maatschappij.



 

I never played The Witcher game, but I did see the TV-series' first season in the months before Covid-19. The books are about the same characters, Geralt, Ciri, Yennefer and the others, but the story details differ from the TV-series. I saw the second season before I read any of the books.

Some dialogues are identical in both books and TV. And yes, the books also mix time periods! Anyway, so far these books are entertaining and I will definitely read more (I got eight, which is all of them I think).


While looking for wine in Antwerp I came across a 'Books & Wine' store. I did not find the wine I was looking for so I bought this book.

Bart Verhoeven is een millennial (een dertiger vandaag) en geeft zijn kijk op onze maatschappij. Hij slaat met momenten de nagel op de kop en het was leuk om typische Gen X observaties te lezen, maar dan geschreven door een snotneus. Het boek mocht ook 250blz langer zijn.

Hier en daar had ik wel mijn bedenkingen, zo kon ik me niet identificeren met de 'typische mens' in het begin van het boek. Ik laat immers mijn gsm achter als ik de stad in ga (het is rustgevend om offline te zijn in First life) en heb bijna nooit een smartphone bij.


May 14, 2022

An emotional book about shame, a historical book about Constantinople and an interesting take on prehistory (and society).

 

Three book covers
 

Brene Brown has a world famous Ted talk, though I must admit I didn't really understand it back in 2010. But when a beautiful South American lady gives you a book, then you read it. I read this in the summer of 2019 I think, before Covid-19.

This book about vulnerability, about shame, gave me a lot of insight in human behaviour, including my own. I did not realise how important shame was in live. Thank you Brown for writing this.

 

Lost to the West was upvoted on Hackernews as an interesting read. I was not disappointed. The book is about the East Roman Empire, on which we did not spend much time in school. We studied Egypt, Greece and Rome in detail, but only the Western part of Rome until the split in East and West Roman Empires. School only mentioned that the Eastern Empire lasted for 11 centuries, but that was it.

Some people may find this book with 1100 year history of emperors boring, there is a lot of repetition, but the message is intriguing. Constantinople, now Istanbul, and sometimes called Byzantium, really did shape Western Europe. This empire is at least as important as the Greeks and the (West)Romans to the current society in Europe.


The Sapiens book has been recommended to me by several people. I like prehistory a lot, it's my favorite time period. The book is interesting, and really easy to read, but is it science? I don't know. It's a good book though!

 

May 12, 2022

In a previous post, I explained how you can collect and plot metrics using MySQL Shell.

This is a new series of article where I will explain how to read and understand some of the generated graphs.

Understanding your workload and seeing the evolution of it over time can help anticipating problems and work on solutions before the breakdown.

Let’s start the series with a concept that is not always well understood or at least not always considered at its true value: MySQL Checkpoint Age.

example of checkpoint age graph

InnoDB Checkpointing

Before analyzing the graph, we need to understand what is MySQL InnoDB Checkpointing.

Each change to a data page in the InnoDB Buffer Pool are also written into the Write Ahead Logs.

In the literature they are sometimes called Transaction Logs but this term is not totally correct within InnoDB.

Those logs are used only in case of a crash and InnoDB needs to perform a recovery of all transactions that have been committed. This process guarantees the durability, the D in ACID. The transaction logs are also called redo logs.

However at some point, InnoDB will also write the changed page to disk in the tablespaces (data files). The process of writing the dirty pages (pages that have been modified) to the tablespaces is known as flushing.

The checkpoint represent the LSN value of the latest changes written to the data files.

InnoDB flushes small batches of those dirty pages from the buffer pool, this is why it’s called fuzzy checkpointing. MySQL does not flush them all at once to avoid heavy process that could disrupt the normal usage of MySQL.

By default, the redo log is composed of 2 files ib_logfile0 and ib_logfile1. Those file contain the changes that are made to the InnoDB pages but are not InnoDB pages.

The transaction log can be represented like a circular log like this:

MySQL InnoDB Write Ahead Log

We know that when a transaction that modifies data is committed, the change is written in the Write Ahead Log (flushed and synced to disk for durability). That write is done to the head of the redo log. This makes the head advance.

The flushing process that writes the dirty pages from the Buffer Pool to the tablespaces moves the tail forward the corresponding data changes in the write ahead log.. Then that space can be reused. The tail and the head can only move forward (clockwise on the illustration above).

The Checkpoint Age is the length in bytes between the tail and the head.

As it is a fixed size circular log, the head could reach the tail if the write rate exceeds to flush rate… and that would be a horrible problem !

InnoDB will never let that happen ! If that scenario would be possible, all writes would be blocked!

Async and Sync Flush Points

To avoid the previous chaotic scenario, InnoDB will take actions in case of reaching some threshold points:

  • async flush point: write are allowed but page flushing will be increased to reach its maximum capacity. This will lead to a drop of performance.
  • sync flush point: at this point all writes are stopped and InnoDB will only perform page flushing as much as it can. Terrible performance.

Async flush point is reached at 75% of the total log file size. Sync fluch point is at 87.5%.

MySQL InnoDB Write Ahead Log: Async & Sync flush points

Conclusion

Now you know the reason why it’s important to monitor the Checkpoint Age to avoid poor performance.

We can see here that after 5 minutes, the checkpoint age was very close of the async flush point and can be problematic. Again, 2 minutes later the that checkpoint age increased again.

In case you see that the async point is (almost) reached, you need to identify (and we will see that in future articles in this series) if you reached the I/O capacity of the system, or if you have the possibility to increase the flushing rate.

Another option of course is to increase the size of redo logs (innodb_log_file_size).

If you are interested in InnoDB flushing and checkpointing, I recommend you to read Efficient MySQL Performance from Daniel Nichter, O’Reilly 2022 (from page 205).

Enjoy MySQL and keep plotting your MySQL performance metrics !

May 11, 2022

Le concept même de logiciel n’est pas évident. Comme le rappelait Marion Créhange, la première titulaire d’un doctorat en informatique en France, la manière d’influencer le comportement des premiers ordinateurs était de changer le branchement des câbles. Un programme était littéralement un plan de câblage qui nécessitait de s’arracher les mains sur des fils.

Petit à petit, les premiers informaticiens ont amélioré la technologie. Ils ont créé des ordinateurs « programmables » qui pouvaient être modifiés si on leur fournissait des programmes au format binaire, généralement des trous sur des cartes en carton qui étaient insérées dans un ordre précis. Il fallait bien évidemment comprendre exactement comment fonctionnait le processeur pour programmer la machine.

Les processeurs ont ensuite été améliorés pour comprendre des instructions simples, appelées « assembleur ». Les informaticiens pouvaient réfléchir avec ce langage qui donnait des instructions directes au processeur. Cependant, ils utilisaient d’autres formalismes qu’ils appelèrent « langage de programmation ». Les programmes étaient écrits sur papier dans ce langage puis, grâce à des tableaux, traduits en assembleur.

Vint alors l’idée de concevoir un programme qui effectuerait cette traduction. Le premier de ces programmes fut appelé « Traducteur de formules », « Formula Translator », abrégé en « FORTRAN ». Les traducteurs de formules furent rapidement renommés « compilateurs ». Avec un compilateur, il n’était plus nécessaire de comprendre ce que faisait le processeur. Il devenait possible de créer un programme dans un langage dit « de haut niveau » et de le donner à la machine.

Il faut cependant insister sur le fait qu’à l’époque, chaque ordinateur vient avec son propre système logiciel qui est extrêmement complexe et particulier. Faire tourner un programme requiert d’apprendre l’ordinateur en question. Faire tourner le même programme sur plusieurs ordinateurs différents n’est même pas envisagé.

Vu le prix élevé d’un ordinateur, comparables à celui d’une luxueuse villa, ceux-ci sont rares et partagés entre des dizaines d’utilisateurs, le plus souvent au sein des universités. L’un des gros problèmes informatiques de l’époque est de rentabiliser l’ordinateur. Il est possible d’interagir directement avec un ordinateur à travers un télétype : un gros clavier relié à un système électrique avec une bande de papier fournissant les résultats. Le problème du télétype c’est que lorsque l’utilisateur lit les résultats et réfléchit, l’ordinateur attend sans rien faire. Un peu comme si vous achetiez une voiture à 50.000€ pour la laisser 95% du temps dans son garage (ce qui est le cas, soi dit en passant).

Pour cette raison, l’interaction directe est découragée au profit des « batch » de cartes perforées. Le programmeur créé des cartes perforées contenant toutes les instructions qu’il veut effectuer (son « programme »), les donne dans une boîte au centre de calcul qui les insère dans l’ordinateur lorsque celui-ci est inoccupé. Le résultat est imprimé est envoyé au programmeur, parfois le lendemain ou le surlendemain.

Apparait l’idée de permettre de connecter plusieurs télétypes à un ordinateur pour donner une apparence d’interactivité tout en faisant en sorte que l’ordinateur traite séquentiellement les requêtes. Un projet est mis sur pieds pour faire de ce rêve une réalité : Multics. Multics est un projet à gros budget qui fait se collaborer une grande entreprise, General Electrics, une université, le MIT, et un laboratoire de recherche dépendant d’AT&T, le Bell Labs.

Multics possède les bons programmeurs et les bonnes idées. Mais une collaboration à une telle échelle nécessite beaucoup de managers, de politiciens, d’administratif, de paperasse. Bref, au bout de 5 ans, le projet est finalement considéré comme un échec. Les managers et les administrateurs vont voir ailleurs.

La première grande collaboration

Sans s’en rendre compte tout de suite, le défunt Multics laisse désœuvrés quelques programmeurs. Parmi eux, Ken Thompson vient d’avoir un enfant. L’anecdote est importante, car son épouse lui annonce partir un mois avec le bébé pour le présenter à sa famille, qui vit de l’autre côté du pays. Thompson se retrouve quatre semaines tout seul et décide de profiter de ce temps pour mettre en place une version simplifiée de Multics qu’il a en tête. Par dérision, il l’appelle Unics. Qui deviendra Unix. Au départ écrit en assembleur, il inventera le langage B qui sera utilisé par son pote Ritchie pour inventer le langage C avec lequel sera réécrit Unix en 1973.

Thompson et Ritchie sont également les premiers prototypes des geeks de l’informatique : barbe mal entretenu, cheveux en bataille, t-shirt trop large (à une époque où même la jeunesse rebelle n’imagine pas s’afficher autrement qu’en vestes de cuir cintrées), séances de programmation jusqu’à l’aube, arrivées au laboratoire vers 13h du matin, etc.

Dans ses réflexions, Thompson n’a qu’un mot à la bouche : « simplicité ». C’est d’ailleurs lui qui va pousser Ritchie à ce que le langage « C » s’appelle ainsi. Une lettre, c’est simple, efficace. Cette idée de simplicité sous-tend toute l’architecture d’Unix : plutôt que de faire un énorme système hyper complexe qui fait tout, on va faire des petits programmes et leur permettre de s’utiliser l’un l’autre. L’idée est révolutionnaire : un programme peut utiliser un autre pour obtenir une fonctionnalité. Arrive même l’idée du « pipe » qui permet à l’utilisateur de construire ses propres chaines de programmes. Cela semble évident aujourd’hui, mais c’est un progrès énorme à une époque où les concepteurs de systèmes d’exploitation tentent de faire un énorme programme capable de tout faire en même temps. Cette simplicité est d’ailleurs encore aujourd’hui ce qui fait la puissance des systèmes inspirés d’Unix. C’est également ce qui va permettre une vague de collaboration sans précédent : chaque individu peut désormais créer un petit outil qui va accroitre exponentiellement les capacités du système global.

Ritchie et Thompson présentent le résultat de leur travail lors d’une conférence en 1973. À la même époque vient justement d’apparaitre le concept non pas de connecter des ordinateurs directement entre eux, ce qui existait déjà, mais de leur permettre de servir de relais. Deux ordinateurs peuvent désormais se parler sans être connectés directement. Cela dessine un réseau, appelé ARPANET et qui deviendra INTERNET, l’interconnexion des réseaux.

Certains historiens estiment qu’environ la moitié des personnes connectées à ARPANET à l’époque était dans la salle lors de la présentation d’Unix.

Le concept est tout de suite un véritable succès.

Une chose importante est de savoir que le Bell Labs, où a été inventé Unix, faisait partie d’AT&T. Et qu’AT&T était, à ce moment-là, sous le coup d’un procès antitrust et n’avait pas le droit de faire autre chose que de vendre de l’équipement téléphonique. L’état américain voyait d’un très mauvais œil qu’une entreprise privée puisse devenir trop puissante. Dès que c’était le cas, dès qu’une position de monopole se dessinait, l’état intervenait rapidement.

AT&T commercialisait des lignes téléphoniques. L’objectif du Bell Labs était de développer des technologies qui nécessitaient l’utilisation de lignes téléphoniques afin d’en encourager l’usage. Mais AT&T ne cherchait pas à commercialiser directement ces technologies. Le procès antitrust rendait cela beaucoup trop risqué.

Pas du tout préoccupée par l’aspect mercantile, l’équipe UNIX se met à collaborer avec les universités, les chercheurs de tout bord. À l’université de Berkeley, tout particulièrement, on se met à améliorer les fonctionnalités. On s’échange du code source, des astuces. On se connecte directement aux ordinateurs de l’un et l’autre pour comprendre les problèmes. Bref, on collabore sans réellement réfléchir à l’aspect financier.

En Australie, le professeur John Lions enseigne Unix à ses étudiants en leur donnant… l’entièreté du code source à lire. Code source qu’il commente abondamment. Son syllabus fait rapidement le tour du monde comme un outil indispensable, le tout avec la bénédiction de Ken Thompson.

Petit à petit, Unix prend de l’importance et, chez AT&T, les juristes commencent un peu à se dire qu’il y’aurait des sous à se faire. La marque UNIX est déposée. Qu’à cela ne tienne, le « UNIX USER GROUP » est renommé « Usenix ». On demande à Ritchie et Thompson de ne plus partager leur travail. Qu’à cela ne tienne, celui-ci « oublie » malencontreusement des sacs remplis de bandes magnétiques dans un parc où, justement, se promenait un ami de Berkeley.

Ça vous donne un peu l’idée du niveau de collaboration et d’esprit frondeur qui anime l’ensemble. Ils ont peut-être des barbes et des t-shirts larges. Ils n’en restent pas moins intelligents et frondeurs. Ce sont des « hackers », le terme qu’ils se donnent et qui est encore utilisé aujourd’hui.

La première confiscation

En 1980, un changement politique se fait en occident avec l’élection de Margaret Thatcher au Royaume-Uni et de Ronald Reagan aux États-Unis. Leur doctrine pourrait se résumer en « faire passer les intérêts des entreprises avant ceux des individus ». La recherche, la collaboration, le partage et l’empathie sont devenus des obstacles sur la route de la recherche de profit. Plusieurs grandes mesures vont voir le jour et avoir une importance extrêmement importante, tant sur l’industrie du logiciel que sur notre avenir.

Le terme « propriété intellectuelle », un concept inventé en 1967 et jusque là à peu près ignoré, va devenir une stratégie essentielle des entreprises pour faire du profit. Historiquement, le concept de propriété permet d’asseoir le droit de jouir d’un bien sans se le faire enlever. Mais comment peut-on « voler » une propriété intellectuelle ? Cela semble absurde, mais c’est pourtant très simple : il suffit de convaincre la population que tout concept, toute œuvre possède un propriétaire et que le simple fait d’en jouir ou de l’utiliser est une atteinte aux intérêts économiques de l’auteur. Historiquement, les brevets offraient aux inventeurs un monopole limité dans le temps sur une invention en échange du fait que cette invention deviendrait publique une fois le brevet expiré. Bien que luttant contre les monopoles, l’état savait en offrir un temporaire en échange d’un bénéfice futur pour la société. L’idéologie du profit avant tout permet d’écarter complètement cette composante de l’équation. Le terme « propriété intellectuelle » permet d’étendre les monopoles temporaires, tant en termes de temporalité que de droits. Le propriétaire de la propriété intellectuelle ne doit plus rien à la société qui n’a plus qu’une fonction : préserver ses profits bien mérités.

Grâce à près de trente années de campagnes intensives, le concept de propriété intellectuelle est tellement ancré dans les mœurs que les écoles n’osent plus faire chanter leurs enfants sans s’assurer de payer des droits auprès d’un organisme quelconque qui, bien évidemment, ne demande que ça. La propriété intellectuelle est, intellectuellement, une escroquerie. Elle sonne également le glas de la collaboration et des grands progrès académiques. Même les universités, aujourd’hui, cherchent à préserver leurs propriétés intellectuelles. Par réflexe et en dépit de leur mission première.

Les concepts de collaboration et de bien-commun sont eux immolés sur l’autel de l’anticommunisme. Faire du profit à tout prix devient un devoir patriotique pour lutter contre le communisme. Cela a tellement bien fonctionné que malgré l’écroulement total du communisme, le concept du bien commun est rayé du vocabulaire. Plus de 30 ans après la chute du mur de Berlin, les gouvernements et les institutions publiques comme les universités doivent encore justifier leurs choix et leurs investissements en termes de rentabilités et de profits futurs.

Cette évolution des mentalités se fait en parallèle à la complexification du concept de logiciel. Graduellement, le logiciel est passé d’une série d’instructions à un véritable travail, une véritable œuvre. Il peut logiquement être commercialisé. La commercialisation d’un logiciel se fait via une innovation légale : la licence. La licence est un contrat entre le fournisseur du logiciel et l’utilisateur. Le fournisseur du logiciel impose ses conditions et vous devez les accepter pour utiliser le logiciel. Grâce à la propriété intellectuelle, le logiciel reste la propriété du fournisseur. On n’achète plus un bien, on achète le droit de l’utiliser dans un cadre strictement défini.

Notons que cette innovation commerciale découle en droite ligne de l’importance morale accordée aux profits. Si l’utilisateur ne peut plus acheter, stocker, réparer et réutiliser un bien, il doit payer à chaque utilisation. L’utilisateur est clairement perdant par rapport au cas d’usage où il achèterait le logiciel comme un bien dont il peut disposer à sa guise.

Aujourd’hui, chaque logiciel que nous utilisons est muni d’une licence et dépend le plus souvent d’autres logiciels également munis de leurs propres licences. Il a été démontré qu’il est matériellement impossible de lire tous les contrats que nous acceptons quotidiennement. Pourtant, la fiction légale prétend que nous les avons acceptés consciemment. Que nous devons nous plier aux conditions qui sont décrites. Chaque fois que nous cliquons sur « J’accepte », nous signons littéralement un chèque en blanc sur notre âme, en espérant que jamais le Malin ne vienne réclamer son dû.

Pour perpétuer l’esprit UNIX initial, cet esprit frondeur et hacker, l’université de Berkeley met au point la licence BSD. Cette licence dit, en substance, que vous pouvez faire ce que vous voulez avec le logiciel, y compris le modifier et le revendre, à condition de citer les auteurs.

Le livre de John Lions est lui interdit, car contenant la propriété intellectuelle d’AT&T. Il circule désormais sous le manteau, photocopié par des générations d’étudiants qui ne cherchent pas à le moins du monde à faire de l’ombre à AT&T, mais à simplement mieux comprendre comment fonctionne ou pourrait fonctionner un ordinateur.

L’apparition des ordinateurs personnels

Les ordinateurs sont de moins en moins chers. Et de plus en plus variés. L’entreprise IBM est le plus gros fournisseur d’ordinateurs, certains sont des mastodontes, d’autres ont à peine la taille d’une grosse valise (les « mini-ordinateurs »).

IBM a l’idée de créer des ordinateurs assez petits pour tenir sur un bureau : un micro-ordinateur. Il serait bon marché, peu puissant. Son nom ? Le « personal computer », le PC.

Seulement, comme AT&T auparavant, IBM est sous le coup d’un procès antitrust. Diplomatiquement, il serait dangereux de se lancer dans un nouveau marché. Pour éviter cela, IBM va tout d’abord concevoir le PC avec une architecture ouverte, c’est-à-dire des composants que n’importe qui peut acheter ou copier (contrairement aux concurrents comme Amiga, Commodore, Atari, etc.). Le système d’exploitation, le logiciel qui fera tourner la machine, est considéré comme l’un de ces composants. Problème : les systèmes Unix sont très chers et conçus avec des fonctionnalités très complexes comme le fait d’héberger plusieurs utilisateurs. Ils ne sont donc pas adaptés à un « personal computer ». IBM travaille bien sur son propre operating system (OS2), mais, en attendant, pourquoi ne pas sous-traiter la tâche ?

Un riche et important avocat d’affaires, William Gates, qui travaille, entre autres, pour IBM, entend parler de cette histoire et annonce que son fils vient justement de lancer une entreprise d’informatique. Le fils en question, appelé également William Gates, mais surnommé Bill, n’a jamais programmé d’operating system mais il propose à IBM d’en faire un à condition d’être payé pour chaque personal computer vendu. Tant qu’il y est, il ajoute dans le contrat qu’IBM est forcé d’installer son OS sur chaque PC vendu. Empêtré dans son procès antitrust et ne voyant pas un grand avenir pour une solution bon marché comme le PC, IBM accepte ces conditions qui sont pourtant étranges.

Bill Gates n’a jamais programmé d’OS, mais il connait un jeune informaticien de 22 ans, Tim Paterson, qui a fait ce qu’il appelle un « Quick and Dirty OS », QDOS. Un truc amateur et minimaliste qui n’a même pas les fonctionnalités qu’avait Unix 15 ans auparavant. Mais ce n’est pas grave. Bill Gates propose 50.000$ pour QDOS, qu’il rebaptise PC-DOS puis MS-DOS.

À cette même époque, un conseiller du président Reagan, un certain Robert Bork, se rend compte que s’il y’a bien un truc qui empêche de maximiser les profits des entreprises, ce sont les lois antimonopoles. Pauvre AT&T, pauvre IBM. Ces entreprises américaines auraient pu coloniser le monde et faire le bonheur de leurs actionnaires. Ces entreprises pourraient affaiblir économiquement le bloc soviétique. Au lieu de cela, les Américains eux-mêmes leur ont mis des bâtons dans les roues.

Il propose donc de tout simplement autoriser les monopoles. Bien sûr, ça se verrait un peu trop si on le dit comme ça du coup il propose de n’empêcher que les monopoles dont on peut démontrer qu’ils entrainent une hausse significative du prix pour le consommateur final. Cela permet de prétendre que l’on protège le consommateur final tout en faisant oublier que, premièrement, il est impossible de démontrer qu’un prix est trop haut et que, deuxièmement, les monopoles infligent bien d’autres problèmes à la société qu’une seule augmentation des prix.

L’impact de Robert Bork est primordial. Grâce à lui, Microsoft, la société de Bill Gates, s’arroge le monopole sur les systèmes d’exploitation tournant sur les ordinateurs personnels, qui deviennent, à la grande surprise d’IBM, un énorme succès au point d’éclipser les supercalculateurs. Il faut dire qu’IBM, par peur de devenir un monopole, a permis à d’autres constructeurs de se mettre à fabriquer leur propre PC : Dell, Compaq, etc. Et rien n’empêche Microsoft de fournir DOS à ces concurrents tout en les obligeant à ne fournir leurs ordinateurs qu’avec DOS, rien d’autre. En quelques années, le terme « PC » devient synonyme de micro-ordinateur et de… « Microsoft ». Les Atari, Armstrad, Commodore et autres Amiga disparaissent complètement malgré leur avance technologique indéniable.

Il faut dire que ce DOS merdique arrange pas mal de monde : il permet aux véritables informaticiens de rester en dehors de la plèbe avec leurs supercalculateurs et aux fournisseurs de véritables ordinateurs de continuer à pratiquer des prix délirants. Le PC avec DOS n’est qu’un jouet, pas un truc professionnel.

La résistance s’organise

Tous ces événements sonnent le glas de du premier âge d’or de l’informatique. L’informatique était jusque là un domaine de passionnés qui collaboraient, elle devient un domaine extrêmement lucratif qui cherche à contrôler autant que possible ses clients pour en tirer le maximum d’argent.

Richard Stallman, un chercheur en Intelligence Artificielle du MIT se rend compte que, graduellement, tous ses collègues se font embaucher par des entreprises privées. Une fois embauchés, ceux-ci ne peuvent plus collaborer avec lui. Ils emportent avec eux leurs codes, leurs recherches. Ils continuent à utiliser les codes disponibles sous licence BSD, mais ne partagent plus rien.Pire : ils ne peuvent même plus discuter du contenu de leur travail.

Richard, qui ne vit que pour l’informatique, en est très triste, mais la goutte d’eau viendra d’une imprimante. En effet, Richard avait modifié le logiciel tournant sur l’imprimante de tout le département pour éviter les bourrages et pour envoyer un email à la personne ayant initié l’impression une fois celle-ci terminée. Cependant, l’imprimante est ancienne et est remplacée. Richard Stallman ne se pose pas trop de questions et décide d’implémenter les mêmes fonctionnalités pour la nouvelle imprimante. À sa grande surprise, il ne trouve pas le code source des programmes de celle-ci. Il fait la demande au fabricant et se voit rétorquer que le code source appartient au fabricant, que personne ne peut le consulter ou le modifier.

Richard Stallman voit rouge. L’université a payé pour l’imprimante. De quel droit le fabricant peut-il empêcher un client de faire ce qu’il veut avec son achat ? De quel droit peut-il empêcher un « hacker » de comprendre le code qui tourne dans son propre matériel. Avec une grande prescience, il comprend que si la pratique se généralise, c’est tout simplement la fin du mouvement hacker.

Ni une ni deux, RMS démissionne du MIT, où il gardera néanmoins un bureau, pour lancer la Free Software Foundation, la fondation pour le logiciel libre. Son idée est simple : l’industrie vient de confisquer aux informaticiens la liberté de faire tourner les logiciels de leur choix. Il faut récupérer cette liberté.

Il théorise alors les quatre grandes libertés du logiciel libre :

  1. Le droit d’utiliser un logiciel pour n’importe quel usage
  2. Le droit d’étudier un logiciel pour le comprendre
  3. Le droit de modifier un logiciel
  4. Le droit de partager un logiciel et/ou ses modifications

Aujourd’hui, on considère « libre » un logiciel qui permet ces quatre libertés.

Il semble évident qu’un logiciel sous licence BSD est libre. Mais Stallman se rend compte d’un problème : les logiciels sont bien libres, mais les personnes les modifiant dans des entreprises privées ne partagent jamais leurs améliorations.

Il met alors au point la licence GPL. Celle-ci, comme toute licence, est un contrat et stipule que l’utilisateur d’un logiciel a le droit de réclamer les sources, de les étudier, les modifier et les redistribuer. Par contre, s’il redistribue le logiciel, il est obligé de conserver la licence GPL.

Si les licences propriétaires restreignent au maximum les libertés des utilisateurs et les licences apparentées à BSD donnent le maximum de libertés aux utilisateurs, la GPL restreint une seule liberté : celle de restreindre la liberté des autres.

La GPL définit donc une forme de bien commun qu’il n’est pas possible de privatiser.

Notons que, comme toute licence, la GPL est un contrat entre le fournisseur et l’utilisateur. Seul l’utilisateur peut demander le code source d’un logiciel sous licence GPL. Un logiciel GPL n’est donc pas nécessairement public. De même, rien n’interdit de faire payer pour un logiciel GPL. Richard Stallman lui-même gagnera un temps sa vie en vendant le logiciel libre qu’il développe, l’éditeur Emacs.

Stallman, RMS pour les intimes, ne s’arrête pas là. Puisque les systèmes d’exploitation UNIX sont des myriades de petits logiciels coopérants entre eux, il va les réécrire un par un pour créer un système entièrement sous licence GPL : GNU. GNU est l’acronyme de « GNU is Not Unix ». C’est récursif, c’est de l’humour de geek.

RMS travaille d’arrache-pied et le projet GNU est bien lancé. Il ne manque qu’un composant pourtant essentiel : le noyau ou « kernel » en anglais. Le noyau est la partie de code centrale qui fait l’interface entre la machine et tous les outils UNIX ou GNU.

La deuxième grande collaboration

En parallèle du travail de Richard Stallman, un tremblement de terre a lieu dans le monde Unix : l’entreprise AT&T s’est soudainement réveillée et s’est rendu compte qu’elle est le berceau d’un système d’exploitation devenu très populaire et surtout très rémunérateur.

Elle attaque alors en justice l’université de Berkeley pour diffusion illégale du code source UNIX sous le nom « Berkeley Software Distribution », BSD.

La distribution de BSD est un temps interrompue. Le futur de BSD semble incertain. Il apparaitra que le code de BSD avait bien été écrit par les chercheurs de Berkeley, ceux-ci ayant graduellement amélioré le code source UNIX original au point de le remplacer complètement, dans une version moderne du mythe du bateau de Thésée. Seuls 6 fichiers seront identifiés par le juge comme appartenant à AT&T, 6 fichiers qui seront donc remplacés par des nouvelles versions réécrites.

Mais, en 1991, cette conclusion n’est pas encore connue. Le système GNU n’a pas de noyau, le système BSD est dans la tourmente. Un jeune étudiant finlandais vient, de son côté, d’acquérir un nouvel ordinateur et rêve de faire tourner UNIX dessus. Comme les UNIX propriétaires sont vraiment trop chers, il télécharge les outils GNU, qui sont gratuits et, s’appuyant dessus, écrit un mini-noyau qu’il partage sur Usenet.

Si Internet existe, le web, lui, n’existe pas encore. Les échanges se font sur le réseau Usenet, l’ancêtre des forums web et le repaire des passionnés de n’importe quel domaine.

Le noyau de Linus Torvalds, notre jeune finlandais, attire rapidement l’attention de passionnés de Unix qui le nomment du nom de son créateur : Linux. Très vite, ils envoient à Linus Torvalds des améliorations, des nouvelles fonctionnalités que celui-ci intègre pour produire de nouvelles versions.

Influencé par les outils GNU, Linus Torvalds a mis sa création sous licence GPL. Il est donc désormais possible de mettre en place un système Unix sous licence GPL : c’est GNU/Linux. Enfin, non, ce n’est pas Unix. Car Gnu is Not Unix. Mais ça y ressemble vachement…

Bon, évidemment, installer le noyau Linux et tous les outils GNU n’est pas à la portée de tout le monde. Des projets voient le jour pour faciliter l’installation et permettent d’empaqueter tous ces composants hétéroclites. Ce sont les « Distributions Linux ». Certaines offrent des versions commerciales, comme Red Hat alors que d’autres sont communautaires, comme Debian.

Outre le noyau Linux, Linus Torvalds invente une nouvelle manière de développer des logiciels : tout le monde peut étudier, critiquer et contribuer en soumettant des modifications (des « patchs »). Linus Torvalds choisit les modifications qu’il pense utiles et correctes, refuse les autres avec force discussions. Bref, c’est un joyeux bazar. Sur Usenet et à travers les mailing-lists, les hackers retrouvent l’esprit de collaboration qui animait les laboratoires Unix des débuts. Des années plus tard, Linus Torvalds inventera même un logiciel permettant de gérer ce bazar : git. Git est aujourd’hui l’un des logiciels les plus répandus et les plus utilisés par les programmeurs. Git est sous licence GPL.

La « non-méthode » Torvalds est tellement efficace, tellement rapide qu’il semble évident que c’est la meilleure manière de développer des bons logiciels. Des centaines de personnes lisant un code source sont plus susceptibles de trouver des bugs, d’utiliser le logiciel de manière imprévue, d’apporter des touches de créativité.

Eric Raymond théorisera cela dans une conférence devenue un livre : « La cathédrale et le bazar ». Avec Bruce Perens, ils ont cependant l’impression que le mot « Free Software » ne fait pas professionnel. En anglais, « Free » veut dire aussi bien libre que gratuit. Les « Freewares » sont des logiciels gratuits, mais propriétaires, souvent de piètre qualité. Eric Raymond et Bruce Perens inventent alors le terme « open source ».

Techniquement, un logiciel libre est open source et un logiciel open source est libre. Les deux sont synonymes. Mais, philosophiquement, le logiciel libre a pour objectif de libérer les utilisateurs là où l’open source a pour objectif de produire des meilleurs logiciels. Le mouvement open source, par exemple, accepte volontiers de mélanger logiciels libres et propriétaires là où Richard Stallman affirme que chaque logiciel propriétaire est une privation de liberté. Développer un logiciel propriétaire est, pour RMS, hautement immoral.

Les deux noms désignent cependant une unique mouvance, une communauté grandissante qui se bat pour regagner la liberté de faire tourner ses propres logiciels. Le grand ennemi est alors Microsoft, qui impose le monopole de son système Windows. La GPL est décrite comme un « cancer », le logiciel libre comme un mouvement « communiste ». Apple, une entreprise informatique qui n’est à cette époque plus que l’ombre de son glorieux passé va piocher dans les logiciels libres pour se donner un nouveau souffle. En se basant sur le code de FreeBSD, qui n’est pas sous licence GPL, ils vont mettre au point un nouveau système d’exploitation propriétaire : OSX, renommé ensuite MacOS. C’est également de cette base qu’ils vont partir pour créer un système embarqué : iOS, initialement prévu pour équiper l’iPod, le baladeur musical lancé par la firme.

La seconde confiscation

Pour fonctionner et donner des résultats, un logiciel a besoin de données. Séparer le logiciel des données sur lesquelles il tourne permet de réutiliser le même logiciel avec différente données. Puis de vendre des logiciels dans lesquels les utilisateurs vont entrer leurs propres données pour obtenir des résultats.

L’un des arguments pour souligner l’importance du logiciel libre est justement l’utilisation de ces données. Si vous utilisez un logiciel propriétaire, vous ne savez pas ce qu’il fait de vos données, que ce soit de données scientifiques, des données personnelles, des documents dans le cadre du travail, des courriers. Un logiciel propriétaire pourrait même envoyer vos données privées aux concepteurs sans votre consentement. L’idée paraissait, à l’époque, issue du cerveau d’un paranoïaque.

Outre les données, un autre des nombreux combats libristes de l’époque est de permettre d’acheter un ordinateur sans Microsoft Windows préinstallé et sans payer la licence à Microsoft qui s’apparente alors à une taxe. Le monopole de Microsoft fait en effet qu’il est obligatoire d’acheter Windows avec un ordinateur, même si vous ne comptez pas l’utiliser. Cela permet à Bill Gates de devenir en quelques années l’homme le plus riche du monde.

Pour lutter contre ce monopole, les concepteurs de distributions Linux tentent de les faire les plus simples à utiliser et à installer. Le français Gaël Duval est un pionnier du domaine avec la distribution Mandrake, renommée plus tard Mandriva.

En 2004 apparait la distribution Ubuntu. Ubuntu n’est, au départ, rien d’autre qu’une version de la distribution Debian simplifiée et optimisée pour une utilisation de bureau. Avec, par défaut, un joli fond d’écran comportant des personnes nues. Ubuntu étant financé par un milliardaire, Mark Shuttleworth, il est possible de commander gratuitement des Cd-roms pour l’installer. La distribution se popularise et commence à fissurer, très légèrement, le monopole de Microsoft Windows.

L’avenir semble aux mini-ordinateurs de poche. Là encore le libre semble prendre sa place, notamment grâce à Nokia qui développe ce qui ressemble aux premiers smartphones et tourne sous une version modifiée de Debian : Maemo.

Mais, en 2007, Apple lance ce qui devait être au départ un iPod avec une fonctionnalité téléphone : l’iPhone. Suivit de près par Google qui annonce le système Android, un système d’exploitation prévu au départ pour les appareils photo numériques. L’invasion des smartphones a commencé. Plusieurs entreprises se mettent alors à collaborer avec Nokia pour produire des smartphones tournant avec des logiciels libres. Mais en 2010, un cadre de chez Microsoft, Stephen Elop, est parachuté chez Nokia dont il est nommé CEO. Sa première décision est d’arrêter toute collaboration avec les entreprises « libristes ». Durant trois années, il va prendre une série de décisions qui vont faire chuter la valeur boursière de Nokia jusqu’au moment où l’entreprise finlandaise pourra être rachetée… par Microsoft.

L’iPhone d’Apple est tellement fermé qu’il n’est, au départ, même pas possible d’y installer une application. Steve Jobs s’y oppose formellement. Les clients auront le téléphone comme Jobs l’a pensé et rien d’autre. Cependant, les ingénieurs d’Apple remarquent que des applications sont développées pour l’Android de Google. Apple fait vite machine arrière. En quelques années, les « app » deviennent tendance et les « App Store » font leur apparition, reprenant un concept initié par… les distributions Gnu/Linux ! Chaque banque, chaque épicerie se doit d’avoir son app. Sur les deux systèmes, qui sont très différents. Il faut développer pour Apple et pour Google.

Microsoft tente alors de s’imposer comme troisième acteur avant de jeter l’éponge : personne ne veut développer son app en 3 exemplaires. Dans les milieux branchés de la Silicon Valley, on se contente même souvent de ne développer des applications que pour Apple. C’est plus cher donc plus élitiste. De nombreux hackers cèdent aux sirènes de cet élitisme et se vendent corps et âmes à l’univers Apple. Qui interdit à ses clients d’installer des applications autrement que via son App Store et qui prend 30% de commission sur chacune des transactions réalisées sur celui-ci. Développer pour Apple revient donc à donner 30% de son travail à l’entreprise de Steve Jobs. C’est le prix à payer pour faire partie de l’élite. Les hackers passés à Apple vont jusqu’à tenter de se convaincre que comme Apple est basée sur FreeBSD, c’est un Unix donc c’est bien. Nonobstant le fait que l’idée même derrière Unix est avant tout de pouvoir comprendre comment le système fonctionne et donc d’avoir accès aux sources.

Ne sachant sur quel pied danser face à l’invasion des smartphones et à la fin du support de Nokia pour Maemo, le monde de l’open source et du logiciel libre est induit en erreur par le fait qu’Android est basé sur un noyau Linux. Android serait Linux et serait donc libre. Génial ! Il faut développer pour Android ! Google entretient cette confusion en finançant beaucoup de projets libres. Beaucoup de développeurs de logiciels libres sont embauchés par Google qui se forge une image de champion du logiciel libre.

Mais il y’a une arnaque. Si Android est basé sur le noyau Linux, qui est sous GPL, il n’utilise pas les outils GNU. Une grande partie du système Android est bien open source, mais les composants sont choisis soigneusement pour éviter tout code sous licence GPL. Et pour cause… Graduellement, la part d’open source dans Android est réduite pour laisser la place aux composants Google propriétaires. Si vous installez aujourd’hui uniquement les composants libres, votre téléphone sera essentiellement inutilisable et la majorité des applications refuseront de se lancer. Il existe heureusement des projets libres qui tentent de remplacer les composants propriétaires de Google, mais ce n’est pas à la portée de tout le monde et ce ne sont que des pis-aller.

En se battant pour faire tourner du code libre sur leurs ordinateurs, les libristes se sont vu confisquer, malgré leur avance dans le domaine, le monde de la téléphonie et la mobilité.

Mais il y’a pire : si les libristes se battaient pour avoir le contrôle sur les logiciels traitant leurs données, les nouveaux acteurs se contentent de traiter les données des clients sur leurs propres ordinateurs. Les utilisateurs ont perdu le contrôle à la fois du code et du matériel.

C’est ce qu’on appelle le « cloud », le fait que vos données ne sont plus chez vous, mais « quelque part », sur les ordinateurs de grandes entreprises dont le business model est d’exploiter au mieux ces données pour vous afficher le plus de publicités. Un business model qui fonctionne tellement bien que toutes ces entreprises sont, nous l’avons vu, les plus grosses entreprises mondiales.

État de la situation

Les progrès de l’informatique se sont construits en deux grandes vagues successives de collaboration. La première était informelle, académique et localisée dans les centres de recherches. La seconde, initiée par Richard Stallman et portée à son apogée par Linus Torvalds, était distribuée sur Internet. Dans les deux cas, les « hackers » évoluaient dans un univers à part, loin du grand public, du marketing et des considérations bassement matérielles.

À chaque fois, l’industrie et le business confisquèrent les libertés pour privatiser la technologie en maximisant le profit. Dans la première confiscation, les industriels s’arrogèrent le droit de contrôler les logiciels tournant sur les ordinateurs de leurs clients puis, la seconde, ils s’arrogèrent également le contrôle des données desdits clients.

Nous nous trouvons dans une situation paradoxale. Chaque humain est équipé d’un ou plusieurs ordinateurs, tous étant connectés en permanence à Internet. La puissance de calcul de l’humanité a atteint des proportions démesurées. Lorsque vous regardez l’heure sur votre montre Apple, il se fait plus de calculs en une seconde dans votre poignet que dans tous les ordinateurs du programme Appolo réunis. Chaque jour, votre téléphone transmet à votre insu presque autant de données qu’il n’y en avait dans tous les ordinateurs du monde lorsqu’Unix a été inventé.

Pourtant, cette puissance de calcul n’a jamais été aussi concentrée. Le pouvoir n’a jamais été partagé en aussi peu de personnes. Il n’a jamais été aussi difficile de comprendre comment fonctionne un ordinateur. Il n’a jamais été aussi difficile de protéger sa vie privée, de ne pas se laisser influencer dans nos choix de vie.

Les pouvoirs publics et les réseaux éducatifs se sont, le plus souvent, laissé prendre au mensonge qu’utiliser les nouvelles technologies était une bonne chose. Que les enfants étaient, de cette manière, éduqués à l’informatique.

Utiliser un smartphone ou une tablette éduque autant à l’informatique que le fait de prendre un taxi éduque à la mécanique et la thermodynamique. Une personne peut faire des milliers de kilomètres en taxi sans jamais avoir la moindre notion de ce qu’est un moteur. Voyager avec Ryanair ne fera jamais de vous un pilote ni un expert en aérodynamique.

Pour comprendre ce qu’est un moteur, il faut pouvoir l’ouvrir. L’observer. Le démonter. Il faut apprendre à conduire sur des engins simplifiés. Il faut en étudier les réactions. Il faut pouvoir discuter avec d’autres, comparer un moteur avec un autre. Pour découvrir l’aérodynamique, il faut avoir le droit de faire des avions de papier. Pas d’en observer sur une vidéo YouTube.

Ce qui nous semble évident avec la mécanique a été complètement passé sous silence avec l’informatique. À dessein, les monopoles, désormais permis, tentent de créer une expérience « magique », incompréhensible. La plupart des développeurs sont désormais formés dans des « frameworks » de manière à ne jamais avoir à comprendre comment les choses fonctionnent sous le capot. Si les utilisateurs sont les clients des taxis, les développeurs devraient, dans la vision des Google, Apple et autre Facebook, être des chauffeurs Uber qui conduisent aveuglément, sans se poser de questions, en suivant les instructions.

L’informatique est devenue une infrastructure humaine trop importante pour être laissée aux mains de quelques monopoles commerciaux. Et la seule manière de leur résister est de tenter de minimiser leur impact sur nos vies. En refusant au maximum d’utiliser leurs solutions. En cherchant des alternatives. En contribuant à leur création. En tentant de comprendre ce que font réellement ces solutions « magiques », avec nos ordinateurs, nos données et nos esprits.

L’informatique n’est pas plus compliquée ni plus ésotérique qu’un moteur diesel. Elle a été rendue complexe à dessein. Pour augmenter son pouvoir de séduction. Jusqu’aux termes « high-tech » ou « nouvelles technologies » qui renforcent, volontairement, l’idée que la plupart d’entre nous sont trop bêtes pour comprendre. Une idée qui flatte d’ailleurs bon nombre d’informaticiens ou de bidouilleurs, qui se sentent grâce à cela supérieurs alors même qu’ils aident des monopoles à étendre leur pouvoir sur eux-mêmes et leurs proches.

Le chemin est long. Il y a 20 ans, le concept d’open source m’apparaissait surtout comme philosophique. Aujourd’hui, j’ai découvert avec étonnement que je me plonge de plus en plus souvent dans le code source des logiciels que j’utilise quotidiennement pour comprendre, pour mieux les utiliser. Contrairement à la documentation ou aux réponses sur StackOverFlow, le code source ne se trompe jamais. Il décrit fidèlement ce que fait mon ordinateur, y compris les bugs. Avec l’entrainement, je me rends compte qu’il est même souvent plus facile de lire le code source des outils Unix que de tenter d’interpréter des dizaines de discussions sur StackOverFlow.

Le logiciel libre et l’open source sont la seule solution que j’envisage pour que les ordinateurs soient des outils au service de l’humain. Il y a 20 ans, les idées de Richard Stallman me semblaient extrémistes. Force est de constater qu’il avait raison. Les logiciels propriétaires ont été essentiellement utilisés pour transformer les utilisateurs en esclaves des ordinateurs. L’ordinateur n’est alors plus un outil, mais un moyen de contrôle.

La responsabilité n’en incombe pas à l’individu. Après tout, comme je l’ai dit, il est presque impossible d’acheter un téléphone sans Google. L’individu ne peut pas lutter.

La responsabilité en incombe aux intellectuels et aux professionnels, qui doivent élever la voix plutôt que d’accepter aveuglément des décisions institutionnelles. La responsabilité en incombe à tous les établissements du secteur de l’éducation qui doivent poser des questions, enseigner le pliage d’avions de papier au lieu d’imposer à leurs étudiants de se créer des comptes sur les plateformes monopolistiques pour accéder aux données affichées sur les magnifiques écrans interactifs financés, le plus souvent, par Microsoft et consorts. La responsabilité en incombe à tous les militants, qu’ils soient écologistes, gauchistes, anticapitalistes, socialistes, voire même tout simplement locaux. On ne peut pas militer pour l’écologie et la justice sociale tout en favorisant les intérêts des plus grandes entreprises du monde. On ne peut pas militer pour le local en délocalisant sa propre voix à l’autre bout du monde. La responsabilité en incombe à tous les politiques qui ont cédé le contrôle de pays, de continents entiers à quelques entreprises, sous prétexte de gagner quelques voix lors de la prochaine élection.

Je ne sais pas encore quelle forme prendre la troisième grande collaboration de l’informatique. Je sais juste qu’elle est urgente et nécessaire.

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

May 09, 2022

Two weeks later, I'm still feeling the energy from our first in-person DrupalCon in two years!

This blog post is Part 3 of my DrupalCon keynote recap. In case you missed it, you can read up on Part 1 and Part 2. Part 1 focused on Drupal 10 updates. Part 2 talked about our new vision statement.

In my keynote, I also mapped out a potential strategy for Drupal 11. In this blog post, I explain Drupal 11's strategy, and how it aligns with our updated vision statement.

Drupal 11 to focus on a Composable Core, helped by Project Browser, Starter Templates and Automated Updates

Drupal 11's strategy is focused on (1) empowering ambitious site builders and (2) accelerating innovation in our contributed modules repository.

To accomplish these two goals, Drupal will have to double down on "composability", which is reflected by the six proposed initiatives below. I'm code-naming the two-year strategy for Drupal 11 "Composable Core".

Six focus areas for Drupal 11

Project Browser

In Drupal 9, we have over 8,000 modules and themes. In those 8,000 projects are some amazing innovations. But right now, it's hard for site builders to find them.

Many first-time adopters don't even realize Drupal can be extended with all these contributed modules and themes. Some site builders even give up on Drupal because once Drupal is installed, they don't know what to do next.

So, we have an opportunity to make all these great innovations easier to find.​​ The Project Browser Initiative would recommend or highlight great Drupal modules to use. It would also enable a one-click install process.

Under the hood, modules are still installed with Composer, making them easy to maintain and update.

Check out this video to learn more about Project Browser:

Starter Templates

While Project Browser would help site builders discover and install individual modules, it is not unusual to use 30+ contributed modules to build a site. It can take a lot of work to find and configure these modules, and to make sure they all work well together.

This is where the new Starter Templates concept comes in. Starter Templates are about installing and configuring groups of modules. For example, Starter Templates for an event website or a product website would include all of the necessary modules and configuration required for that type of website.

This concept bears a lot of resemblance with Drupal's 15-year-old concept of Drupal distributions. The goal of Starter Templates, however, is to be a lot easier to build and maintain than distributions.

Drupal site starter templates compared to distributions; starter templates are easier to implement and maintain

We don't know yet how exactly we'd implement Starter Templates, but we have some initial ideas. Join the #distributions-starter-templates Slack channel if you're interested in contributing.

Automated Updates

For Drupal 11, we will continue our work on the Automated Updates Initiative, which is meant to make updates to Drupal much easier.

Check out the video below for an update on Drupal's Automated Updates initiative:

GitLab Initiative

Accelerating innovation by empowering contributors continues to be a key priority for Drupal. We will keep working on the GitLab Initiative to bring better collaboration and development tools to Drupal contributors.

Check out this video for the latest on the GitLab Initiative:

The Great Module Migration

This proposed initiative focuses on making Drupal Core smaller. How? By responsibly moving more modules from core to contrib. This mean less code to maintain in core, so that Drupal can focus on innovation. Certain modules can also innovate faster as a contributed module.

To evaluate which core modules we might migrate, we developed a ranking system. The system evaluates each module's capabilities, adoption, strategic value, and maintenance effort.

Based on this early ranking system, we found that we could remove approximately 16 modules from core and keep approximately 64. Some of these modules could already be removed in Drupal 10. Drupal 11 could be 20% smaller compared to Drupal 9.

We believe it's safe to make core smaller, especially with a strong focus on Project Browser, Starter Templates and Automatic Updates.

Drupal 11 readiness

As a part of Drupal 11 readiness, we will continue to manage our third-party dependencies to keep Drupal secure.

Should we do more?

I'd love to see us do more in Drupal 11. I wish we could invest in embracing web components, building a decoupled administration backend, modernizing Views, and much more.

Investment like this often require small teams of dedicated developers; i.e. 3-5 developers for 6 to 12 months.

To get even more done for Drupal 11, it's important that organizations consider paying Drupal developers to work on larger, long-term contributions.

I covered this concept in past blog posts, such as the privilege of free time in Open Source and balancing Makers and Takers to sustain Open Source.

In the next year, the Drupal Association will start taking additional steps towards incentivizing larger contributions. Needless to say, I'm very excited about that. Stay tuned for more on that topic.

Let's get building

The early planning phases of a new release are an exciting time to discuss Drupal's potential for the future, and focus on ideas that will make the most impact for our users. I'm excited for us all to collaborate on Drupal 11 in the coming years.

May 06, 2022

Complètement inexistante il y a à peine soixante ans, l’industrie informatique est aujourd’hui devenue la plus importante du monde. Le monde est contrôlé par l’informatique. Comprendre l’informatique est devenu l’une des seules manières de préserver notre individualité et de lutter contre les intérêts d’une minorité.

Vous n’êtes pas convaincu de l’importance de l’informatique ?

En termes de capitalisation boursière, l’entreprise la plus importante du monde à l’heure où j’écris ces lignes est Apple. La seconde est Microsoft. Si l’on trouve un groupe pétrolier en troisième place, Alphabet (ex-Google) vient en quatrième et Amazon en cinquième place. En sixième place on trouve Tesla, qui produit essentiellement des ordinateurs avec des roues et, en septième place, Meta (ex-Facebook). La place de Facebook est particulièrement emblématique, car la société ne fournit rien d’autre que des sites internet sur lesquels le temps de cerveau des utilisateurs est revendu à des agences publicitaires. Exploiter cette disponibilité de cerveau également le principal revenu d’Alphabet.

Je pense que l’on n’insiste pas assez sur ce que ce classement boursier nous apprend : aujourd’hui, les plus grands acteurs de l’économie mondiale ont pour objectif premier de vendre le libre arbitre et la disponibilité des cerveaux de l’humanité. Le pétrole du vingt-et-unième siècle n’est pas « le big-data », mais le contrôle de l’esprit humain. Alphabet et Facebook ne vendent ni matériel, ni logiciels, mais bien un accès direct à cet esprit humain dans sa vie privée. Microsoft, de son côté, tente de vendre l’esprit humain dans un contexte professionnel. Bien qu’ayant également des intérêts publicitaires, une grande partie de son business est de prendre le contrôle sur les travailleurs pour ensuite le « sous-louer » aux employeurs, trop heureux de contrôler leurs employés comme jamais.

Tesla veut contrôler les déplacements des humains. C’est d’ailleurs une constante chez Elon Musk avec ses projets d’Hyperloop, de creusage de tunnels voire de conquête spatiale. Amazon cherche à contrôler toutes vos interactions marchandes en contrôlant les commerçants, que ce soit dans les magasins physiques ou en ligne (grâce à une hégémonie sur l’hébergement des sites web à travers Amazon S3).

Apple cherche, de son côté, à obtenir un contrôle total sur chaque humain qui entre dans son giron. Regardez autour de vous le nombre de personnes affublées d’Airpods, ces petits écouteurs blancs. Lorsque vous vous adressez à l’une de ces personnes, vous ne parlez pas à la personne. Vous parlez à Apple qui décide ensuite ce qu’elle va transmettre de votre voix à son client. Le client a donc payé pour perdre le contrôle de ce qu’il entend. Le client a payé pour laisser à Apple le choix de ce qu’elle va pouvoir faire avec son téléphone et son ordinateur. Si les autres sociétés tentent chacune de contrôler le libre arbitre dans un contexte particulier, Apple, de son côté, mise sur le contrôle total d’une partie de l’humanité. Il est très simple de s’équiper en matériel Apple. Il est extrêmement difficile de s’en défaire. Si vous êtes un client Apple, faites l’expérience mentale d’imaginer vivre sans aucun produit Apple du jour au lendemain, sans aucune application réservée à l’univers Apple. Dans cette liste, Apple et Tesla sont les seules à fournir un bien tangible. Ce bien tangible n’étant lui-même qu’un support à leurs logiciels. Si ce n’était pas le cas, Apple serait comparable à Dell ou Samsung et Tesla n’aurait jamais dépassé General Motors en ayant vendu qu’une fraction des véhicules de ce dernier.

Nous pouvons donc observer qu’une partie importante de l’humanité est sous le contrôle de logiciels appartenant à une poignée de sociétés américaines dont les dirigeants se connaissent d’ailleurs intimement. Les amitiés, les contacts entre proches, les agendas partagés, les heures de nos rendez-vous ? Contrôlés par leurs logiciels. Les informations que vous recevez, privées ou publiques ?Contrôlées par leurs logiciels. Votre position ? Les photos que vous êtes encouragés à prendre à tout bout de champ ? Les produits que vous achetez ? Contrôlés par leurs logiciels ! Le moindre paiement effectué ? Hors paiements en espèce, tout est contrôlé par leurs logiciels. C’est encore pire si vous utilisez votre téléphone pour payer sans contact. Ou Paypal, la plateforme créée par… Elon Musk. Les données des transactions Mastercard sont entièrement revendues à Google. Visa, de son côté, est justement huitième dans notre classement des sociétés les plus importantes.

Les déplacements ? Soit dans des véhicules contrôlés par les mêmes logiciels ou via des transports requérants des apps contrôlées… par les mêmes logiciels. Même les transports publics vous poussent à installer des apps Apple ou Google, renforçant leur contrôle sur tous les aspects de nos vies. Heureusement qu’il nous reste le vélo.

Votre vision du monde et la plupart de vos actions sont aujourd’hui contrôlées par quelques logiciels. Au point que le simple fait de ne pas avoir un smartphone Google ou Apple est inconcevable, y compris par les pouvoirs publics, votre banquier ou, parfois, votre employeur ! Il est d’ailleurs presque impossible d’acheter un smartphone sur lequel Google n’est pas préinstallé. Faites l’expérience : entrez dans un magasin vendant des smartphones et dites que vous ne voulez pas payer pour Apple ou Google. Voyez la tête du vendeur… Les magasins de type Fnac arborent fièrement des kilomètres de rectangles noirs absolument indistinguables les uns des autres. Google paye d’ailleurs à Apple plus d’un milliard de dollars par an pour être le moteur de recherche par défaut des iPhones. Même les affiches artisanales des militants locaux ne proposent plus que des QR codes, envoyant le plus souvent vers des pages Facebook ou des pétitions en ligne hébergées par un des monopoles suscités.

Vous pouvez trouver cet état de fait confortable, voire même pratique. Pour moi, cet état de fait est à la fois triste et dangereux. C’est cet engourdissement monopolistique qui nous paralyse et nous empêche de résoudre des problématiques urgentes comme le réchauffement climatique et la destruction des écosystèmes. Tout simplement, car ceux qui contrôlent nos esprits n’ont pas d’intérêts directs à résoudre le problème. Ils gagnent leur pouvoir en nous faisant consommer, nous faisant acheter et polluer. Leur business est la publicité, autrement dit nous convaincre de consommer plus que ce que nous aurions fait naturellement. Observez que, lorsque la crise est pour eux une opportunité, les solutions sont immédiates. Contre le COVID, nous avons accepté sans broncher de nous enfermer pendant des semaines (ce qui a permis la généralisation de la téléconférence et des plateformes de télétravail fournies, en majorité, par Google et Microsoft) et de voir notre simple liberté de déplacement complètement bafouée avec les pass sanitaires entérinant l’ubiquité… des smartphones et des myriades d’applications dédiées. Nous nous sommes moins déplacés (au grand plaisir de Facebook qui a servi d’intermédiaire dans nos relations sociales) et les smartphones sont devenus quasi-obligatoires pour présenter son QR code à tout bout de champ.

Nous avons accepté, sans la moindre tentative de rébellion, de nous faire enfermer et de nous faire ficher par QR code. En termes de comparaison, rappelons que la simple évocation de l’augmentation du prix de l’essence a entrainé l’apparition du mouvement des gilets jaunes. Mouvement qui s’est construit sur Facebook et a donc augmenté l’utilisation de cette plateforme.

Pour regagner son libre arbitre, je ne vois qu’une façon de faire : comprendre comment ces plateformes agissent. Comprendre ce qu’est un logiciel, comment il est apparu et comment les logiciels se sont historiquement divisés en deux catégories : les logiciels propriétaires, qui tentent de contrôler leurs utilisateurs, et les logiciels libres, qui tentent d’offrir de la liberté à leurs utilisateurs.

Le pouvoir et la puissance des logiciels nous imposent de les comprendre, de les penser. Sans cela, ils penseront à notre place. C’est d’ailleurs déjà ce qui est en train de se produire. Les yeux rivés sur notre écran, les oreilles bouchées par des écouteurs, nous réagissons instinctivement à ce qui s’affiche sans avoir la moindre idée de ce qui s’est réellement passé.

Les logiciels ne sont pas « magiques », ils ne sont pas « sexys » ni « hypercomplexes ». Ce ne sont pas des « nouvelles technologies ». Tous ces mots ne sont que du marketing et l’équivalent sémantique du « lave plus blanc que blanc de nos lessives ». Ce ne sont que des mots infligés par des entreprises au pouvoir démesuré dont le cœur est le marketing, le mensonge.

En réalité, les logiciels sont une technologie humaine, compréhensible. Une série de choix arbitraires qui ont été faits pour des raisons historiques, des séries de progrès comme des retours en arrière. Les logiciels ne sont que des outils manipulés par des humains. Des humains qui, parfois, tentent de nous faire oublier leur responsabilité, de la camoufler derrière du marketing et des icônes aux couleurs scintillantes.

Les logiciels ont une histoire. J’ai envie de vous la raconter…

(à suivre)

Recevez les billets par mail ou par RSS. Max 2 billets par semaine, rien d’autre. Adresse email jamais partagée et définitivement effacée lors du désabonnement. Dernier livre paru : Printeurs, thriller cyberpunk. Pour soutenir l’auteur, lisez, offrez et partagez des livres.

Ce texte est publié sous la licence CC-By BE.

May 05, 2022

Autoptimize can help improve your site’s performance, but in some cases after installing and activating you might encounter issues with your site. Although frustrating and maybe even scary, these problems can generally be easily resolved by re-configuring Autoptimize and I’ll be happy to share how you can do that. First off: if the problem is limited to just one (or two) page(s), go to that page’s...

Source

May 04, 2022

With Drupal 10 around the corner, it's time to start laying out Drupal 11's development roadmap.

It's important we begin that work by reflecting on Drupal's purpose. Drupal's purpose has evolved over the years. In the past, the goal might have been to build the world's most powerful CMS. Today, I believe Drupal has become much bigger than a CMS alone.

Drupal enables everyone to participate in an Open Web. The web is one of the most important public resources. As a result, the Drupal community's shared purpose is to make that resource open, safe, and accessible to all. With 1 in 30 websites running on Drupal, we have a lot of influence on building the future of the web we want to see. In fact, we have an opportunity to help build a digital future that is better than the one we have today.

Drupal enables everyone to participate in an Open Web, a decentralized, public resource that is open, safe and accessible to all

To align with that purpose, and to drive the most impact, our vision also has to evolve. Five years ago, I declared that Drupal is for ambitious digital experiences. I'd argue that we have achieved that vision by investing in headless Drupal, Media, Layout Builder, and additional features that help enable the creation of ambitious digital experiences.

That is why I propose evolving our vision statement to "Drupal is for ambitious site builders".

Drupal is for ambitious site builders

Attracting more Drupal site builders will increase Drupal's potential user base, and in turn create a more open, accessible and inclusive web for all.

This shift also brings us back to our roots, which I've talked about in several of my previous DrupalCon keynotes.

What is an ambitious site builder?

An ambitious site builder sits in between the developer hand-coding everything using a framework, and the content author using a SaaS solution. There is a gap between developers and content authors that Drupal fills really well.

Drupal's unique strength is the Ambitious Site Builder

An ambitious site builder can get a lot of things done by installing and configuring modules, and using Drupal through the UI. But when needed, they can use custom code to make their site exactly how they want it to be. Ambitious site builders are the reason why Drupal became so successful in the first place.

I'm excited to see this vision come to life through the key initiatives for Drupal 11, which I'll talk about in my next blog post.

May 03, 2022

Last week, 1,300 Drupalists gathered in Portland, Oregon for DrupalCon North America. It was the first in-person DrupalCon in more than two years. I can't tell you how amazing it was to see everyone face-to-face.

In good tradition, I delivered my State of Drupal keynote. You can watch the video of my keynote or download my slides (262 MB).

I covered a lot of ground in this presentation, so I broke down my written summary into a three-part blog series. Part 1 below focuses on Drupal 10 updates. I'll be publishing Part 2 and Part 3 later this week, which will focus on Drupal's evolved vision and Drupal 11 proposed initiatives.

Drupal stands with Ukraine

I couldn't begin my presentation without acknowledging the loss of life and destruction in Ukraine. It's impacting many in the Drupal community, which is heartbreaking.

You may not be aware, but Ukraine is the sixth most active country in the world in terms of Drupal contributions. If you were to look at these contributions per capita, Ukraine's contributions are even more significant.

Both myself and the Drupal Association strongly condemn the Russian attacks on Ukraine. Many of us might want to know how to help. The Drupal Association has compiled a list of organizations that are accepting charitable donations.

Updates on Drupal 10

From there, I gave an update on Drupal 10. We had targeted a Drupal 10 release date of June 2022, but we made the decision to postpone until December 2022.

We had to move the date back because we have more work to do on the CKEditor 5 migration path. We're upgrading from CKEditor 4 to CKEditor 5. CKEditor 5 is a complete rewrite, with no upgrade path or backwards compatibility.

The Drupal community (and Acquia in particular) has spent thousands of hours working on an upgrade path for CKEditor to make the upgrade easy for all Drupal users. While that has gone well, we need some additional time to work through the remaining upgrade challenges. Fortunately, we are getting great support from CKSource, the company behind CKEditor.

Next, I walked through three important facts about Drupal 10.

  1. Symfony 6.2 — Drupal 10 will upgrade Symfony – a PHP framework that Drupal relies on heavily – from Symfony 4 to Symfony 6.2. At the time of the Drupal 10 release, Symfony 6.2 will be the latest and greatest release. For planning purposes, if you use Symfony components in your custom Drupal modules, you will have to upgrade those to Symfony 6.x.
  2. PHP 8.1 — We have changed the minimum PHP requirement from PHP 7.4 for Drupal 9 to PHP 8.1 for Drupal 10. This is in large part because Symfony 6.2 will require PHP 8.1. Drupal users will benefit from various improvements in the new version of PHP. It also means you might have to upgrade any custom code. Because Drupal 9.3 works with PHP 8.1, you could start that work now with Drupal 9.3. It's a good way to prepare for Drupal 10.
  3. Drupal 9 end-of-life — Drupal 9 end-of-life will happen in November 2023. Once Drupal 10 is released, you will have 11 months to upgrade your Drupal 9 sites to Drupal 10. The good news is, this should be the easiest upgrade in the history of Drupal. On Drupal 9's release date, 71% of deprecated API uses in contributed projects had automated conversions. Today, 93% of deprecated API uses for Drupal 10 across all contributed projects have automated conversions. And we're working on getting that even higher by the time that Drupal 10 is released.

With that, I provided some exciting updates on the five major Drupal 10 initiatives.

Olivero

Drupal's new frontend theme, named Olivero, is now stable. It's the most accessible theme we've ever shipped. During DrupalCon, Olivero also became the default theme for Drupal 10. Everyone who installs Drupal 10 will be greeted by a new frontend theme. That is significant because we used the current default theme, Bartik, for 11 years.

Claro

Drupal's new backend theme, called Claro, also became the new default administration theme at DrupalCon. Another major milestone and reason to celebrate!

Starterkit

Starterkit, a new way of creating themes in Drupal, is on track to be stable by Drupal 10's new release date. Releasing Starterkit means that we can move faster with theming improvements in Drupal Core. It also means that end users won't need to worry about whether upgrading Drupal breaks any of their sites' themes.

Instead of sub-theming a core base theme, Starterkit generates a starter theme for you from its latest defaults. This new theme will be more of a clone or fork, and will not have a runtime dependency on a core base theme.

CKEditor 5

We have made great progress on our content authoring experience. Check out this video for the latest update:

Automated updates

Automated updates, the Drupal community's number one feature request, is progressing well.

The plan is to have Automatic Updates in one of the first minor versions of Drupal 10, or even in 10.0 in December if the community can help us test and finalize it in time. Check out this video to learn more:

In Parts 2 and 3 of this blog series later this week, I'll focus on our strategy and proposed initiatives for Drupal 11.

I'd like to thank everyone who made our first in-person DrupalCon in two years a reality. It was amazing to see everyone's faces again and collaborate in person. Your contributions and hard work, as always, are inspiring to me!

I would also like to thank all the people that helped with my keynote. In no particular order, they are: Ash Sullivan, Alex Bronstein, Matthew Grasmick, Gábor Hojtsy, Jess (xjm), Ted Bowman, Baddý Sonja Breidert, Leslie Glynn, Tim Lehnen, Adam Bergstein, Adam Goodman, Théodore Biadala, and Alex Pott.

April 29, 2022

Here we go with day 3! In the morning, there are always fewer people due to the short night. The gala dinner is always a key activity during Botconf!

The last day started with “Jumping the air-gap: 15 years of nation-state efforts” presented by Alexis Dorais-Joncas and Facundo Munoz. Does “air-gap” means a big castle in the middle of the Internet? That’s the image that usually pops un in our minds. They covered malware targeting “protected environments”. Such malware implement an offline, covert communication mechanism between the compromised systems and the attacker. Think about “Stuxnet”. They found 17 families of such malware and, as usual, attribution was not easy, sometimes controversial. They are two categories:

  • Connected frameworks (with a connected side and an air-gapped side). A classic approach is initial compromise (spear phishing) and a weaponised USB stick is used to reach the air-gapped side. Results are stored back to the USB drive hopping that the user will connect the USB back to the connected side. Another technique is to write commands from the C2 on the USB drive.
  • Offline frameworks (no Internet connectivity). A user must “import” the malware

Automated execution is the most effective technique to launch the malware, for example via LNK remote code execution. In the case of non-automated execution, the main techniques are:

  • Abuse of Windows auto run
  • Planting decoy files to lure victims
  • Rig existing files with malicious code

They reviewed some well-known malware sample and explained the techniques used to interact with the air-gapped systems. To exfiltrate data, they used modified directory entries to hide the content on the USB drive. By example, if you create an invalid directory name, Windows will ignore it and it will be hidden. Another technique is to hook some file-related API and when a specific file is used (ex: Word document), data is appended to documents. This is how the Ramsay malware works.
How to defend against those types of attacks? USB drives is the key element. So, disable/prevent usage at much as possible. When you can’t, disable automatic execution, sanitize USB drives before inserting them in the air-gapped system.

The next talk was presented by Eric Freyssinet: “A vision on Cyberthreats”. There were already talks in the past about the French law enforcement services. Eric today presented a picture of what he’s doing with his team and the challenges they are facing. Years ago, fighting cybercrime was a niche but today, it’s a key element. A keyword mentioned multiple times was “together”: “we have to share together, we have to talk and act together”. A cyberspace command has been created to not only fight cybercrime to also to be present on the Internet and offer services to citizens, such as a 24×7 chat. About the cyber threat landscape, in 2020, there are 100K cases (+21% over 2019). In Q1-Q2 2021, +38% over 2020!. Mail challenges today are fewer data available in cleartext. more and more data to process, cloud environments, and more to come! Criminals also evolve very quickly: “Crime as a Service”, more and more facing open cyber-criminal ecosystems.

After a welcome coffee break, we attended “Detecting and Disrupting Compromised Devices based on Their Communication Patterns to Legitimate Web Services”, presented by Hen Tzaban. This was a remote presentation because the speaker was not able to join us in Nantes. Hen is specialized in data analysis. Today, enterprise protection switched from a blacklist system (too complex to manage and prone to issues) to behavioral monitoring. In the past, look for IOC and block bad IP/domains in proxies/firewalls. Today, it shifted to user behavior because criminals use sites like Twitter, Google, GitHub, Pastebin, etc. Legitimate services! For example: the malware HAMMERTOSS uses Twitter. Akamai developed a framework based on PSD (Power Spectral Density and Neural Networks model. The goal is to detect compromised hosts inside the enterprise. For example for DNS traffic: beaconing, multi-stage channels, and DGA are common techniques. Hen explained how they implemented this technique at Akamai.

The next one was “ProxyChaos: a year-in-review of Microsoft Exchange exploitation” by Mathieu Tartare. Except if you live on a small island, you should be aware of this massive issue. So big that FBI decided itself to clean up infected systems. Quick overview: It’s a suite of vulnerabilities:

  • ProxyLogin
  • When chained, pre-auth RCE
  • On-premise Exchange 2013, 2016 & 2019 are affected (not the cloud version)

It started in January 2021, before being reported to Microsoft, which decided to release an out-of-band patch. In March 2021, mass exploitation started. How does it work? They use CVE-2021-26855 then install a WebShell (ChinaChopper). The name of the webshell is controlled by the attacker, so attribution is easier. What appended in the wild? More than 10 APT groups used this vulnerability. Multiple actors could be on the same server at the same time. Pre-auth means a huge amount of scans. Hafnium was the first group to use the vulnerability. Mathieu reviewed multiple groups’ activities. Matthieu also reviewed “ProxyShell”. The exploitation is a bit different but very similar to ProxyLogon. Vulnerabilities are exploited in chain and a malicious PST file will be used to drop the webshell and more fun happen.

Then, Eric Leblond presented “Suricata and IOCs” (in preview for a workshop in 2023?). After a quick introduction to his company (Stamus Networks) that develops Suricata. Some advanced features were covered by Eric: IPrep, Lua scripting support and dataset. The concept is list matching on sticky buffer keywords. By example:

alert http … (http_user_agent;dataset:isnotset,official_agents;)

You can also use IOC matching with datasets. Here is another example:

alert dns … (dns.rename; dataset:isset,ioc_d,string,load ioc_domains.lst;)

Dataset can be modified on packet path, for example, build a list of HTTP user agents seen on the network! You can also combine datasets. Ex: build a set of UA agents but not for the site from Alexa1M. To improve the integration of MISP and Suricata, a tool has been developed by Sebdraven: IOCmite Concept: improve the integration of MISP & Suricata. MISP > IOCmite > (stocket) > Suricata

After the lunch break, Souhail Hammou presented “Privateloader – The malware behind a havoc-wreaking Pay-Per-Install service”. PPI services monetise wide distribution of malware. There exist public and private PPI services. Once the malware is installed, the loader sent back info to the C2 server for “accounting” purposes. Then the “customer” is billed based on usage. The “PrivateLoader” was detected by Intel471 in 2021. It is developed in C++, uses HTTP for C2 communications and is heavily maintained. Main distribution channel is malicious websites delivering cracked software. The loader has a core module which disables some security features and grabs its configuration. One of the information retrieved is the geographical location. Based on it, different payloads will be sent. It can also target specific hosts depending on a “profile”. Two types of payloads can be downloaded: PE files and browser extensions (for Chromium). Payloads are delivered through Discord URLs. Installed payloads list is sent back to the C2 server. What about tracking PrivateLoader? They created fake bots but “passive” (to not ring a bell) and analyzed the malware dropped on victims.

The very last talk was “Qakbot malware family evolution” presented by Markel Picado Ortiz and Carlos Rubio Ricote. This was the second talk about Qbot. They spent a lot of time analyzing plenty of samples. Qbot has many versions! They showed the multiple changes found in the code. After reviewing the version changes, they gave some statistics about affiliate IDs, versions, and timestamps to visualize who used which version in time.

What about this 9th edition of Botconf? It was so great to meet people in real life! There was 300 attendees coming from almost all continents, avibrant and diverse community. As usual, they already announced the 10th edition that will be held in April 2023 in Strasbourg! Kudos to the crew for this awesome event!

The post Botconf Day 3 Wrap-Up appeared first on /dev/random.

April 28, 2022

The second day is already over. Here is my recap of the talks. The first one was “Identifying malware campaigns on a budget” by Max “Libra” Kersten and Rens Van Der Linden. The idea was to search for malicious activity without spending too much money. Read: “using as few resources as possible”. The solution proposed must be scalable, reusable, and repurposable. From a budget point of view, it was based on Raspberry Pi 3B (~65€) + the cost of electricity and free software. In their research, they used a dataset of 500K malicious emails (that can’t be disclosed but the same applies to any dataset). They normalized the data then applied different modules to them:

  • The first module was used to check the email subjects and, using the Levenshtein distance (number of character changes), they found clusters of emails.
  • The second module was used on lure images (the logos used in phishing campaings to make the victim more confident).
  • The third one was focusing on URLs with passive reconnaissance techniques as well as active techniques.

I was a bit surprised to see the last slides… without really demonstrating the scripts they used (also, nothing was shared). Too bad, I was expecting something that could be reused (like a GitHub repo to clone). An interesting question from the audience: why not use Karton to automate these processes?

Max stayed on stage for a second (long) presentation: “See ya Sharp: A Loader’s Tale”. He explained his research about a specific loader: Cyax-Sharp. What is a loader? It is used to “load” a remote/local payload. It is usually encrypted, obfuscated, and has anti-debugging features. If classic malware samples are analyzed, loaders attract less attention and we lack reports but they are an essential stage in the infection process. The malware ecosystem lacks a naming convention and this payload can be referenced as “ReZer0” or later “Cyax-Shart. This is confusing. He reviewed some capabilities of the loader like disabling WinDefender, and anti-sandboxing techniques (searching for some DLLs). Then, how the configuration is managed. What are the changes over time?

  • The configuration changed: new features added
  • Sleep is configurable and message box can be displayed

Max gave some statistics about his research. He collected 513 samples. Process hollowing is present in 72% of cases. The remaining is spread across MSBuild hollowing, vbc, regsvc or direct launch
Persistence is used in 54%, based on scheduled tasks. Regarding protections, 79% had both enabled (anti-vm, anti-analysis). Payload families? 54% was AgentTesla, others

After a short coffee break, we were back with “Into The Silent Night” presented by Yuta Sawabe and Ryuichi Tanabe. If restrictions are lighter regarding the COVID, they are still some travel issues between countries, and the speakers were not able to join Nantes. They recorded the presentation. Silent Night is the new name of ZLoader. The goal of this research was to track C2 servers. In this case, a DGA is used and make the analysts’ life more difficult. They build a tracking system to collect data, extract threat intel and analyze the results. Silent Night is a modular trojan from the Zeus family, delivered via exploit kits, fake software, phishing, etc. Its C2 communications can be identified via /cp.php, /gate.php or /logs.
The process they used:

  • collect sampled (VT, any.run, Triage)
  • extract config via triage
  • calculate DGA domains
  • collect log files from C2 servers

Some results they shared:

  • 453 samples
  • 22 RC4 keys
  • Peak of Number of infections was in Sep 2021
  • Main country was US

When you know the DGA, domains are registered in advance and you can trace the future attack activities. 32 domains generated per day (not all of them used)

We continued with “A fresh look into the underground card shop ecosystem” with Beatriz Pimenta Klein and Lidia Lopez Sanz. This was interesting because I’ve no experience with these platforms. Where are cards sold?

  • Automated vending cards (AVCS)
  • Marketplaces
  • Forums & chats

They focused on the first type. But, how are they stolen? Via PoS malware that dumps from memory but skimmers remain a classic attack.CVVs are also known as “cards”: phishing pages, digital skimmers, leaks, info stealers. In the next part of the talk, they presented some card shops and compared them:

  • Brian’s Club
  • Legendary Rescator

By the way, the price of a cart can go from $3 to $269 depending on the country, the bank, the expiration date, etc. A lot of card shops are inactive because they organized their own closure, they have been seized by law enforcement agencies, or … exit scan (the most common one). It was interesting to learn that typosquatting is very common in card shops. They share the same cards and… deliver malware samples!

The next talk was presented by Dominika Regéciová: “Yara: Down the Rabbit Hole Without Slowing Down“. This talk was a review of YARA, performances, and changes. Dominika started with a quick review of YARA (who does not this tool developed by VT?). Writing YARA rules is pretty easy but when you need to scan a huge set of data (size or number of files), you’ll quickly face issues… like warnings from YARA or the required scan time. You’ll have to optimize the rules! The first optimization was to take into account “atoms” selection from strings. How YARA handles strings. Regular expressions are a common issue with YARA and may drastically affect the speed. The next issue was related to the “too many matches” problem. Dominika explained how to write better rules. She demonstrated the optimization with an example that was executed in 3 seconds instead of 45 mins (without optimization). If you’re interesting int this topic, have a look here.

Then, we had a talk provided by Alibaba Cloud: “Detecting emerging malware on cloud before VirusTotal can see it” by Anastasia Poliakova and Yuriy Yuzifovich. Why this research? Third-party validation has a cost (latency, costs!). Sometimes tools like VT give ambiguous results. Is a low VT score a sign of maliciousness? Also, many malware samples targeting Chinese customers are underreported to VT. To avoid these problems, Alibaba Cloud decided to start its own detection tool based on ssdeep… They explained in a lot of detail how to build a tool more efficiently than… VT! (that’s what they said). I had a discussion with other people about this and we all had the same feeling: Are they prepared a competitor to VT?

After the lunch break, we restarted with “Warning! Botnet is in your house…” by Vitaly Simonovich and Sarit Yerushalmi. For sure, I was curious about this title but it was not what I expected, however, the content was good. The research goals were: how botnets operate, for which purpose, and how it started.
The initial discovery was CVE-2017-9841 (target: PHPUnit RCE). It starts with a POST request and the return data is a PHP script with a system() and a curl + payload execution. The payload is “traber.pl” which install a Python script in a crontab. Python payload grabs a zip file, unzip it and executes the new Python payload “update.py”. This payload makes an HTTP request with a user-agent. In return, you get some JSON data. They found multiple bundles of files that targeted multiple vulnerabilities (RCE, RFI, …)
But what was the botnet purpose? To understand this, they created a honeypot and simulated a fake compromised host. After 1h, the attacker contacted the host. He added a second web shell, escalated his privileges, and got full control of the server. Third-party services were also used: GitHub & Pastebin to store more payloads. The bot was called “KashmirBlack”, why? It was the name of a git repo.
Later, they found the use of a dropbox account… and the associated token to access it.
Some numbers:

  • 285 bots
  • 480 attacks / day / bot
  • 140K attacks / day
  • 0.5% success
  • 1000 new bots / day

The next one was “How Formbook became XLoader and migrated to macOS” by Alexey Bukhteyev and Raman Ladutska. Here again, they were not able to travel and the talk was presented via an interactive Zoom session. Overview of the malware: banker & stealer malware. 100+ apps targeted. 6y old, 3000+ campaigns, worldwide targets. Now, called XLoader and targets macOS & Windows. First spotted in 01/2016, may 2018 last post from authors. In Oct 2020, XLoader was born. Formbook sales stopped and the XLoader model changed. They explained carefully how obfuscation is used to hide the functions and strings. More information is available here.

After the coffee break, Vasiliy Berdnikov and Aseel Kayal presented “SandyBlacktail: Following the footsteps of a commercial offensive malware in the Middle East”. This one was presented under TLP:Amber. Great reviews of the malware with plenty of details.

The next one was “Smoke and Fire – Smokeloader Historical Changes and Trends” presented by Marcos Alvarez. He started with a question to the audience. Many people know SmokeLoader but not many of us put our hands in the code. He did… intensively! His talk covered the findings and all changes across the years. But first, he explained with this loader is popular: Based on its simple business model, the cost, and complexity. Indeed it has many core features like anti-analysis, persistence, network & crypto features, payload injection, and extensions (modules). In the next part of the presentation, Marco presented all changes since 2011(!).

Why is it a successful loader?

  • Business model (direct with the author)
  • Cost
  • Complexity

Its core features are: anti-analysis, persistence, network, crypto, payload injection and… extensions (plugins) From an operational aspect: Initial recon + modules -> Data harvesting + infostealer -> Final payload (banker, ransomware, RAT).

The last talk was “PARETO: Streaming Mimicry” presented by Inna Vasilyeva. This was a sponsored talk but technical enough! PARETO is a botnet that infected Android devices (1M) to spook CTV apps. Inna explained that the malware mimics a streaming TV product to generate a higher price for ad impressions. This is achieved via the TopTopSDK. After technical details, she explained that a takedown operation has been started. Google and Roku were contacted as well as LE agencies.

This day ended with a round of funny/interesting lightning talks about the following topics:

  • ONYPHE (with a demo of a specific URL shown earlier today : /upl.php
  • Advanced Persistent Speaker @ Botconf and DESKTOP-Group status
  • Crowdsourcing the demise of Twitter’s business model
    • JusticeRage/twitter-blocklist
  • WTF VT?!
  • Should we care about formula injection?
  • Binary Analysis course
    • maxkersten.nl/binary-analysis-course/
  • Hunting with Nuclei
  • Iranian & Russian eCrime: It’s complicated
  • Yet another dev fail?
  • Botnet Tracker
  • DDX – Detection, Detonation & Config eXtraction
  • Where’s my botconf recording?

Day 2 is over, the gala dinner is over, it’s now time to get some sleep hours for the last day!

The post Botconf Day 2 Wrap-Up appeared first on /dev/random.

April 27, 2022

Incredible! Here is my first wrap-up for two years! Now that the COVID seems under control, it’s so good to be back at conferences and meet a lot of good friends. Like most of the events, Botconf was canceled, postponed, uncertain until the COVID situation was better and, finally, it occurs live! For this edition, we are in Nantes, France. I arrived yesterday to attend a workshop performed by CERT.pl about MWDB. I learned some interesting concepts and will probably write one or two blog articles about this tool.

The first day started with some words by Eric Freyssinet, the boss of the conference. This edition was back to normal with around 300 people registered, and coming from multiple countries. After some words about the sponsors (without them, no conference), he remembered that Botconf respects the TLP and privacy of attendees (about pictures, by example). With this edition, we are also back to the roots, the very first edition was already organized in Nantes in… 2013!

The first time slot was assigned to Berk Albayrak and Ege Balci who presented “Behind the Scenes of QBot”. The talk was given under TLP:Amber so I respect what Eric said and I won”t disclose any information they provided. They make an in-depth analysis of the QBot infrastructure. QBot (also known as Qakbot) is a well-known bot that is active for a long time. They provided a lot of useful information about the botnet internals, how it works, and some statistics about victims.

The second talk was “RTM: sink-holing the botnet” presented by Rustam Mlirkasymov and Semyon Rogachev. They started to work on this case in 2019 with the help of law enforcement agencies across multiple countries that helped to collect enough pieces of evidence to catch the attackers behind the botnet. It’s not a secret that Russia remains the top-1 provider of malware. Most banking groups in 2008-2017 spoke Russian. The RTM is spread through spear-phishing emails and has multiple stages:

  • First stage: reconnaissance module (verifiy the presence of financial activity)
  • Second stage (RTM Core): download of the malware from a C2 server like Pony Stealer. It has many commands like reboot, shutdown, video-process, …
  • Third module (RTM Modules): many modules can be installed (like lateral movement, money stealing)

Of course, ransomware can also be deployed! The reconnaissance phase is based on the browser history (looking for the presence of interesting URLs). The most interesting part was the different ways used by the malware to connect to C2 servers: via livejournal.com blogs, via .bit domains or… based on Bitcoin transactions! The technique was explained, the IP address bytes are extracted from two transactions (2 x 2 bytes). Note that all C2 were proxy servers. When they understood how the Bitcoin transactions were used to generate the C2 IP. They create specifically crafted ones to “insert” a rogue IP address (that they managed) as a fake C2 server. This server was used to collect information (playing Man in the Middle) and redirect the traffic to official C2 servers. Brilliant technique!

The next presentation was “Private Clubs For Hackers: How Private Forums Shape The Malware Market” by David Décary-Hétu and Luca Brunoni. David is a legal guy and explained that there is a huge gap between legal people and “techies” but they have to work together to fight against criminals. The idea of this talk was to reduce this gap. They followed the criminal behavior of bad guys across different forums. These forums are used as sharing platforms. Sometimes, they are also used to perform transactions. Some of them are “public” while others are restricted (“private”) to keep out law enforcement agencies, researchers, etc. They try to create a safer and trusted environment… in theory! The initial question was: “Is there a myth of the elite or private forums?”. They compared two forums: one public with ~60K users and a private one with ~185K users and focused on malware trends. First finding, there is not a big difference in malware types (the goal remains the same: access the compromised computer and exfiltrate data). The type of access is RAT or bot but, on the private forum, more RAR were offered. About the exfiltration, the public forum offered more “stealers”. They also looked at prices, the reputation of vendors. What is the key difference? Not much:

  • On the private forum, reputation and creation of good profiles is key
  • On the private forum, they asked to not attack Russia
  • Both provide a high sense of inpunity.

Interesting talk, not technical but it provided a good overview of the “underground” market!

After the lunch, Leon Böck presented “Insights and Experiences from Monitoring Multiple P2P Botnets“. Based on the blind men and the elephant; Leon talked about similarities with botnets: what are the targeted devices? the behavior? the affected regions? First, efforts are necessary to perform the reserve engineering of the malware and be able to understand the P2P protocol used to communicate with the C2 infrastructure. Leon explained the approach that was used in this research, based on crawlers (getting data from an external point of view) and sensors (which are part of the P2P network). The goal was to create a “BMS” or Botnet Monitoring System. After a detailed review of the infrastructure in place, Leon completed the talk with some statistics he gathered, like the presence of a diurnal pattern in many botnets.

The next talk was tagged as TLP:Green: “TA410: APT10’s distant cousin” presented by Alexandre Côté Cyr and Matthieu Faou. They mentioned that a blog post will be released by Eset soon…

Then, “Operation GamblingPuppet: Analysis of a multivector and multiplatform campaign targeting online gambling customers” was presented by Jaromir HOREJSI and Daniel Lunghi. This research started with a sample of Xnote connected to the Operation DRBControl’s domain name. The infection vector was a fake chat application called “Mimi” (“secret” in Chinese). It targeted Windows and macOS computers. Built on ElectronJS, the malicious payload was delivered via extra code added at the end of the file electron-main.js. Another technique was a persistent XSS combined with a fake Flash installed. Yes, even if Flash is dead for a while, this technique still works. A third infection vector discovered was a fake .dmg to install BitGet (a Singapore cryptocurrency company application). Then, some payloads were reviewed (example: oRAT on macOS or AsyncRAT on Windows). Finally, the infrastructure was reviewed: 50 C2 servers, and 12 different RAT families(!). Based on “rude” strings, comments, it was clear that the actors were Chinese people:

After the afternoon coffee break, the next presentation was “Fingerprinting Bot Shops: Venues, Stealers, Sellers” by Bryan Oliver and Austin Turecek. The idea behind this research was to understand how bot shops and their ecosystems are used today. Some facts:

  • Infostealers are rent, data exfiltrated and used to bot shops
  • Bots are used to obtain initial access and/or monetised
  • Combo “email:pass” is no longer enough for initial access brokers! Bot shops are the way to go with infostealer and their data (ex: cookies, MFA details, …)
  • They covered 38 account shops, 5 bot shops over the last 3y
  • Main components are bot shops, malware developers, cloud providers
  • Some well-known bot shops: Genesis, Russian Market, Amigos, 2Easy
  • Some cloud providers: Ross Cloud, Cash Cloud, Dark Logs Cloud, …Cloud Altery

What about monetization?

  • Logs may be sold by category (and resold!)
  • Logs can be used by multiple actors
  • Persistent infections opens up new opportunities with status updates
  • Cashing out? Cryptocurrency of course!
  • Slack, Citrix, SSO, Okta access are sold and searchable (used for lateral movement)
  • Some bots on Russian market are identical: rent it on one, it disappear on the second

Techniques are also constantly updated: Browsers, proxies, new emulations, new data points.

The next one was “How to Eavesdrop on Winnti in a Live Environment Using Virtual Machine Introspection (VMI)” presented by Philipp Barthel and Sebastian Eydam. The goal was to explain a typical usage of a tool called Tycho to perform malware analysis but from the hypervisor point of view. The selection malware was Winnit, a classic APT RAT that was used to attack many DAX corporations. First, they explained what is “VMI” or Virtual Machine Introspection. It’s a kind of IDS to collect and detect suspicious behavior. From the IDS world, we have HIDS (host) and NIDS (network). VMI is the best of both. It connects to the hypervisor (to the malware does not detected the presence of an agent in the sandbox). Tycho can be used to interact with the OS and collected data. By example:

calc = tycho.open_process(“calc.exe”)
calc.pause()
calc.read_linear(0,1024)
add_syscall_whitelist(syscalls.NtCreateFile)

It seems to be a very powerful tool but… for me, the presentation was too much focused on a commercial tool. Indeed, Tycho is not free but it deserves to be tested. Based on Tycho, they reviewed how Winnit encrypts data, how it communicates with its C2 server, etc.

The last presentation was “Evolution of the Sysrv mining botnet” with György Luptak and Dorka Palotay. The idea was to explain how to reverse Golang binaries using Ghidra. Why? More and more malware samples are written in Go (IoT, Linux) and Sysrv was a good example. What are the (some) issues with Go (from a reverse engineering point of view):

  • Huge binaries (statically linked)
  • Unusual strings handling
  • No symbol names (stripped binaries)

Compared to other tools, Ghidra does not have all the capabilities by default. After a quick presentation of Sysrv (found in 2020, wormable and cryptominer), they explained how to use Ghidra to analyze Go code. Go programs are organized in packages. A package is a collection of source files in the same directory that are compiled together. They already exist tools to deobfuscate Go programs like obfuscate. But what about Ghidra? When you load Sysrc, Ghidra will find 3800+ functions! Where to start? Which one(s) is/are interesting? In the next part of the presentation, Dorka explained in detail how Go uses strings. In Go, they don’t end with a NULL character. After some examples, some Ghidra scripts were introduced to automatically extract strings (and function names) from Go programs. It was for me, the best talk of this first day! All scripts are available here.

Ready for day 2!

The post Botconf Day 1 Wrap-Up appeared first on /dev/random.