Planet Grep

Planet'ing Belgian FLOSS people

Planet Grep is maintained by Wouter Verhelst. All times are in UTC.

February 26, 2024

A graph showing the state of the Digital Experience Platforms in 2024. Vendors are plotted on a grid based on their ability to execute and completeness of vision. Acquia is placed in the 'Leaders' quadrant, indicating strong performance in both vision and execution.

For the fifth year in a row, Acquia has been named a Leader in the Gartner Magic Quadrant for Digital Experience Platforms (DXP).

Acquia received this recognition from Gartner based on both the completeness of product vision and ability to execute.

Central to our vision and execution is a deep commitment to openness. Leveraging Drupal, Mautic and open APIs, we've built the most open DXP, empowering customers and partners to tailor our platform to their needs.

Our emphasis on openness extends to ensuring our solutions are accessible and inclusive, making them available to everyone. We also prioritize building trust through data security and compliance, integral to our philosophy of openness.

We're proud to be included in this report and thank our customers and partners for their support and collaboration.

Mandatory disclaimer from Gartner

Gartner, Magic Quadrant for Digital Experience Platforms, Irina Guseva, Jim Murphy, Mike Lowndes, John Field - February 21, 2024.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Acquia.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Gartner is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

February 25, 2024

Wij houden met Euroclear zo’n 260 miljard euro aan Russische geblokkeerde tegoeden vast.

Moesten wij die gebruiken dan houdt dat in dat ons land de komende honderd en meer jaar internationaal gezien zal worden als een dief.

Tegelijkertijd spenderen wij slechts zo’n procent van onze GDP aan defensie. Dat is niet goed. Maar ja. Misschien indien de VS zo erg graag wil dat we die Russische tegoeden voor hun politieke pleziertjes gebruiken, dat ze deze ‘gunsten’ als Belgische defensie-uitgaven kunnen zien?

Bovendien horen er een aantal garanties te zijn t.o.v. de represailles die Rusland zal uitwerken tegen België wanneer Euroclear de tegoeden herinvesteert (dit wil voor Rusland zeggen: steelt).

Waar zijn die garanties? Welke zijn ze? Over hoeveel geld gaat het? Is het extreem veel? Want dat moet. Is het walgelijk extreem veel? Want dat moet.

Die garanties, gaan die over meer dan honderd jaar? Want dat moet.

Het is allemaal wel gemakkelijk en zo om een en ander te willen. Maar wat staat daar tegenover?

Martti Malmi, an early contributor to the Bitcoin project, recently shared a fascinating piece of internet history: an archive of private emails between himself and Satoshi Nakamoto, Bitcoin's mysterious founder.

The identity of Satoshi Nakamoto remains one of the biggest mysteries in the technology world. Despite extensive investigations, speculative reports, and numerous claims over the years, the true identity of Bitcoin's creator(s) is still unknown.

Martti Malmi released these private conversations in reaction to a court case focused on the true identity of Satoshi Nakamoto and the legal entitlements to the Bitcoin brand and technology.

The emails provide some interesting details into Bitcoin's early days, and might also provide some new clues about Satoshi's identity.

Satoshi and Martti worked together on a variety of different things, including the relaunch of the Bitcoin website. Their goal was to broaden public understanding and awareness of Bitcoin.

And to my surprise, the emails reveal they chose Drupal as their preferred CMS! (Thanks to Jeremy Andrews for making me aware.)

Email from Satoshi Nakamoto to Martti Malmi, dated December 2009, about Bitcoin's new Drupal site going online and being an improvement over the old bitcoin.org page.

The emails detail Satoshi's hands-on involvement, from installing Drupal themes, to configuring Drupal's .htaccess file, to exploring Drupal's multilingual capabilities.

Email from Satoshi Nakamoto to Martti Malmi, dated November 2009, discussing Drupal themes installation and comparing Drupal and Joomla!.

At some point in the conversation, Satoshi expressed reservations about Drupal's forum module.

Email from Satoshi Nakamoto to Martti Malmi, dated November 2009, about preferring Drupal as a CMS over a Wiki, and expressing that Drupal's forum capabilities are not ideal but better than Wikis.

For what it is worth, this proves that I'm not Satoshi Nakamoto. Had I been, I'd have picked Drupal right away, and I would never have questioned Drupal's forum module.

Jokes aside, as Drupal's Founder and Project Lead, learning about Satoshi's use of Drupal is a nice addition to Drupal's rich history. Almost every day, I'm inspired by the unexpected impact Drupal has.

I started to migrate all the services that I use on my internal network to my Raspberry Pi 4 cluster. I migrated my FreeBSD jails to BastileBSD on a virtual machine running on a Raspberry Pi. See my blog post on how to migrate from ezjail to BastilleBSD. https://stafwag.github.io/blog/blog/2023/09/10/migrate-from-ezjail-to-bastille-part1-introduction-to-bastillebsd/

tianocore

Running FreeBSD as a virtual machine with UEFI on ARM64 came to the point that it just works. I have to use QEMU with u-boot to get FreeBSD up and running on the Raspberry Pi as a virtual machine with older FreeBSD versions: https://stafwag.github.io/blog/blog/2021/03/14/howto_run_freebsd_as_vm_on_pi/.

But with the latest versions of FreeBSD ( not sure when it started to work, but it works on FreeBSD 14) you can run FreeBSD as a virtual machine on ARM64 with UEFI just like on x86 on GNU/Linux with KVM.

UEFI on KVM is in general provided by the open-source tianocore project.

I didn’t find much information on how to run OpenBSD with UEFI on x86 or ARM64.

OpenBSD 7.4

So I decided to write a blog post about it, in the hope that this information might be useful to somebody else. First I tried to download the OpenBSD 7.4 ISO image and boot it as a virtual machine on KVM (x86). But the iso image failed to boot on a virtual with UEFI enabled. It looks like the ISO image only supports a legacy BIOS.

ARM64 doesn’t support a “legacy BIOS”. The ARM64 download page for OpenBSD 7.4 doesn’t even have an ISO image, but there is an install-<version>.img image available. So I tried to boot this image on one of my Raspberry Pi systems and this worked. I had more trouble getting NetBSD working as a virtual machine on the Raspberry Pi but this might be a topic for another blog post :-)

You’ll find my journey with my installation instructions below.

Download

Download the installation image

Download the latest OpenBSD installation ARM64 image from: https://www.openbsd.org/faq/faq4.html#Download

The complete list of the mirrors is available at https://www.openbsd.org/ftp.html

Download the image.

[staf@staf-pi002 openbsd]$ wget https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/install74.img
--2024-02-13 19:04:52--  https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/install74.img
Connecting to xxx.xxx.xxx.xxx:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 528482304 (504M) [application/octet-stream]
Saving to: 'install74.img'

install74.img       100%[===================>] 504.00M  3.70MB/s    in 79s     

2024-02-13 19:06:12 (6.34 MB/s) - 'install74.img' saved [528482304/528482304]

[staf@staf-pi002 openbsd]$ 

Download the checksum and the signed checksum.

2024-02-13 19:06:12 (6.34 MB/s) - 'install74.img' saved [528482304/528482304]

[staf@staf-pi002 openbsd]$ wget https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/SHA256
--2024-02-13 19:07:00--  https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/SHA256
Connecting to xxx.xxx.xxx.xxx:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1392 (1.4K) [text/plain]
Saving to: 'SHA256'

SHA256                  100%[=============================>]   1.36K  --.-KB/s    in 0s      

2024-02-13 19:07:01 (8.09 MB/s) - 'SHA256' saved [1392/1392]

[staf@staf-pi002 openbsd]$ 
[staf@staf-pi002 openbsd]$ wget https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/SHA256.sig
--2024-02-13 19:08:01--  https://cdn.openbsd.org/pub/OpenBSD/7.4/arm64/SHA256.sig
Connecting to xxx.xxx.xxx.xxx:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1544 (1.5K) [text/plain]
Saving to: 'SHA256.sig'

SHA256.sig              100%[=============================>]   1.51K  --.-KB/s    in 0s      

2024-02-13 19:08:02 (3.91 MB/s) - 'SHA256.sig' saved [1544/1544]

[staf@staf-pi002 openbsd]$ 

Verify

OpenBSD uses signify to validate the cryptographic signatures. signify is also available for GNU/Linux (at least on Debian GNU/Linux and Arch Linux).

More details on how to verify the signature with signify is available at: https://www.openbsd.org/74.html

This blog post was also useful: https://www.msiism.org/blog/2019/10/20/authentic_pufferfish_for_penguins.html

Install OpenBSD signify

Download the signify public key from: https://www.openbsd.org/74.html

[staf@staf-pi002 openbsd]$ wget https://ftp.openbsd.org/pub/OpenBSD/7.4/openbsd-74-base.pub
--2024-02-13 19:14:25--  https://ftp.openbsd.org/pub/OpenBSD/7.4/openbsd-74-base.pub
Connecting to xxx.xxx.xxx.xxx:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 99 [text/plain]
Saving to: 'openbsd-74-base.pub'

openbsd-74-base.pub     100%[=============================>]      99   397 B/s    in 0.2s    

2024-02-13 19:14:26 (397 B/s) - 'openbsd-74-base.pub' saved [99/99]

[staf@staf-pi002 openbsd]$

I run Debian GNU/Linux on my Raspberry Pi’s, let see which signify packages are available.

[staf@staf-pi002 openbsd]$ sudo apt search signify
sudo: unable to resolve host staf-pi002: Name or service not known
[sudo] password for staf: 
Sorting... Done
Full Text Search... Done
chkrootkit/stable 0.57-2+b1 arm64
  rootkit detector

elpa-diminish/stable 0.45-4 all
  hiding or abbreviation of the mode line displays of minor-modes

fcitx-sayura/stable 0.1.2-2 arm64
  Fcitx wrapper for Sayura IM engine

fcitx5-sayura/stable 5.0.8-1 arm64
  Fcitx5 wrapper for Sayura IM engine

signify/stable 1.14-7 all
  Automatic, semi-random ".signature" rotator/generator

signify-openbsd/stable 31-3 arm64
  Lightweight cryptographic signing and verifying tool

signify-openbsd-keys/stable 2022.2 all
  Public keys for use with signify-openbsd

[staf@staf-pi002 openbsd]$

There’re two OpenBSD signify packages available on Debian 12 (bookworm);

  • signify-openbsd/: The OpenBSD signify tool.
  • signify-openbsd-keys: This package contains the OpenBSD release public keys, installed in /usr/share/signify-openbsd-keys/. Unfortunately, the OpenBSD 7.4 release isn’t (yet) included in Debian 12 (bookworm).
[staf@staf-pi002 openbsd]$ sudo apt install signify-openbsd signify-openbsd-keys
sudo: unable to resolve host staf-pi002: Name or service not known
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  signify-openbsd signify-openbsd-keys
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 70.4 kB of archives.
After this operation, 307 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main arm64 signify-openbsd arm64 31-3 [62.3 kB]
Get:2 http://deb.debian.org/debian bookworm/main arm64 signify-openbsd-keys all 2022.2 [8020 B]
Fetched 70.4 kB in 0s (404 kB/s)          
Selecting previously unselected package signify-openbsd.
(Reading database ... 94575 files and directories currently installed.)
Preparing to unpack .../signify-openbsd_31-3_arm64.deb ...
Unpacking signify-openbsd (31-3) ...
Selecting previously unselected package signify-openbsd-keys.
Preparing to unpack .../signify-openbsd-keys_2022.2_all.deb ...
Unpacking signify-openbsd-keys (2022.2) ...
Setting up signify-openbsd-keys (2022.2) ...
Setting up signify-openbsd (31-3) ...
[staf@staf-pi002 openbsd]$ 

Verify the checksum

Verify the checksum.

[staf@staf-pi002 openbsd]$ sha256sum install74.img 
09e4d0fe6d3f49f2c4c99b6493142bb808253fa8a8615ae1ca8e5f0759cfebd8  install74.img
[staf@staf-pi002 openbsd]$ 
[staf@staf-pi002 openbsd]$ grep 09e4d0fe6d3f49f2c4c99b6493142bb808253fa8a8615ae1ca8e5f0759cfebd8 SHA256
SHA256 (install74.img) = 09e4d0fe6d3f49f2c4c99b6493142bb808253fa8a8615ae1ca8e5f0759cfebd8
[staf@staf-pi002 openbsd]$ 

Verify with signify

Execute the signify command to verify the checksum. See the OpenBSD signify manpage for more information.

You’ll find a brief list of the arguments that are used to verify the authenticity of the image.

  • -C: Will verify the Checksum.
  • -p <path>: The path to the Public key.
  • -x <path>: The path to the signature file.

Verify the image with signify.

[staf@staf-pi002 openbsd]$ signify-openbsd -C -p openbsd-74-base.pub -x SHA256.sig install74.img
Signature Verified
install74.img: OK
[staf@staf-pi002 openbsd]$

Secure boot

The Debian UEFI package for libvirt ovmf is based on https://github.com/tianocore/tianocore.github.io/wiki/OVMF.

Debian Bookworm comes with the following UEFI BIOS settings:

  • /usr/share/AAVMF/AAVMF_CODE.ms.fd This is with secure boot enabled.
  • /usr/share/AAVMF/AAVMF_CODE.fd This is without secure boot enabled.

The full description is available at /usr/share/doc/ovmf/README.Debian on a Debian system when the ovmf package is installed.

To install OpenBSD we need to disable secure boot.

Test boot

I first started a test boot.

Logon to the Raspberry Pi.

[staf@vicky ~]$ ssh -X -CCC staf-pi002 
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Linux staf-pi002 6.1.0-17-arm64 #1 SMP Debian 6.1.69-1 (2023-12-30) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Feb 14 06:08:45 2024 from xxx.xxx.xxx.xxx
[staf@staf-pi002 ~]$ 


virt-manager



Start virt-manager and click on the [ Create on new VM ] icon.



new vm




This will bring up the new vm window. Select ( ) Import existing disk image, you review the architecture option by selecting the \/ Architecture options. The defaults are fine. Click on [ Forward ].






import vm



This will open the “import vm” window. Click on [ Browse ] to select the OpenBSD installation image or just copy/paste the path.

At the bottom of the screen, you’ll see Choose the operating system you are installing. Starting type openbsd and select [ X ] include end-of-life operating systems Debian 12 (bookworm) doesn’t include support for OpenBSD 7.4 (yet) so we need to set it to “OpenBSD 7.0”. Click on [ Forward ].





select custom


In the next windows keep the default Memory and CPU settings as we’re just verifying that we can boot from the installation image.

Debian uses “secure boot” by default. We need to disable secure boot. Select [ X ] Customize configuration before install, this allows us to set the UEFI boot image.






begin install


Set the Firmware to: /usr/share/AAVMF/AAVMF_CODE.fd to disable secure boot and click on [ Begin Installation ].








begin install




Let’s check if OpenBSD can boot. Great, it works!




Installation with virt-install

I prefer to use the command line to install as this allows me to make the installation reproducible and automated.

Create a ZFS dataset

I used ZFS on my Raspberry Pi’s, this makes it easier to create snapshots etc when you’re testing software etc.

root@staf-pi002:/var/lib/libvirt/images# zfs create staf-pi002_pool/root/var/lib/libvirt/images/openbsd-gitlabrunner001
root@staf-pi002:/var/lib/libvirt/images# 
root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# pwd
/var/lib/libvirt/images/openbsd-gitlabrunner001
root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# 

Get the correct os-variant

To get the operating system settings you can execute the command virt-install --osinfo list

root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# virt-install --osinfo list | grep -i openbsd7
openbsd7.0
root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# 

We’ll use openbsd7.0 as the operating system variant.

Create QEMU image

Create a destination disk image.

root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# qemu-img create -f qcow2 openbsd-gitlabrunner001.qcow2 50G
Formatting 'openbsd-gitlabrunner001.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=53687091200 lazy_refcounts=off refcount_bits=16
root@staf-pi002:/var/lib/libvirt/images/openbsd-gitlabrunner001# 

Run virt-install

Run virt-install to import the virtual machine.

#!/bin/bash

virt-install --name openbsd-gitlabrunner001  \
 --noacpi \
 --boot loader=/usr/share/AAVMF/AAVMF_CODE.fd \
 --os-variant openbsd7.0 \
 --ram 2048 \
 --import \
 --disk /home/staf/Downloads/isos/openbsd/install74.img  \
 --disk /var/lib/libvirt/images/openbsd-gitlabrunner001/openbsd-gitlabrunner001.qcow2

If everything goes well the virtual machine gets booted.

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)
disks: sd0*
>> OpenBSD/arm64 BOOTAA64 1.18
boot> 
cannot open sd0a:/etc/random.seed: No such file or directory
booting sd0a:/bsd: 2861736+1091248+12711584+634544 [233295+91+666048+260913]=0x13d5cf8
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.4 (RAMDISK) #2131: Sun Oct  8 13:35:40 MDT 2023
    deraadt@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/RAMDISK
real mem  = 2138013696 (2038MB)
avail mem = 2034593792 (1940MB)
random: good seed from bootblocks
mainbus0 at root: linux,dummy-virt
psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
efi0 at mainbus0: UEFI 2.7
efi0: EDK II rev 0x10000
smbios0 at efi0: SMBIOS 3.0.0
smbios0:
sd1 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, >
sd1: 51200MB, 512 bytes/sector, 104857600 sectors
virtio35: msix per-VQ
ppb5 at pci0 dev 1 function 5 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci6 at ppb5 bus 6
ppb6 at pci0 dev 1 function 6 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci7 at ppb6 bus 7
ppb7 at pci0 dev 1 function 7 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci8 at ppb7 bus 8
ppb8 at pci0 dev 2 function 0 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci9 at ppb8 bus 9
ppb9 at pci0 dev 2 function 1 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci10 at ppb9 bus 10
ppb10 at pci0 dev 2 function 2 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci11 at ppb10 bus 11
ppb11 at pci0 dev 2 function 3 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci12 at ppb11 bus 12
ppb12 at pci0 dev 2 function 4 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci13 at ppb12 bus 13
ppb13 at pci0 dev 2 function 5 vendor "Red Hat", unknown product 0x000c rev 0x00: irq
pci14 at ppb13 bus 14
pluart0 at mainbus0: rev 1, 16 byte fifo
pluart0: console
"pmu" at mainbus0 not configured
agtimer0 at mainbus0: 54000 kHz
"apb-pclk" at mainbus0 not configured
softraid0 at root
scsibus2 at softraid0: 256 targets
root on rd0a swap on rd0b dump on rd0b
WARNING: CHECK AND RESET THE DATE!
erase ^?, werase ^W, kill ^U, intr ^C, status ^T

Welcome to the OpenBSD/arm64 7.4 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell? 

Continue with the OpenBSD installation as usual. Make sure that you select the second disk during the installation process.

To fully automate the installation we need a system that executes the post-configuration at the first boot. On GNU/Linux is normally done by cloud-init while there are solutions to get cloud-init working on the BSDs. I didn’t look into this (yet).

Have fun!

Links

February 24, 2024

If you use MySQL Shell for Visual Studio Code, using a bastion host is the easiest method to connect to a MySQL HeatWave DB Instance on OCI.

If you already have a connection setup using a bastion to host, you may experience the same problem as me, MySQL Shell complains about an invalid fingerprint detected:

This error has nothing to do with the fingerprint of your user OCI Key. The problem is related to the key of your bastion host as you can see in the output window:

This happens if you have changed your bastion host for example.

To resolve the problem, remove the current ssh host key for the bastion host stored in your know_hosts:

$ ssh-keygen -R "host.bastion.us-ashburn-1.oci.oraclecloud.com"

Use the name of your bastion host of course.

When done, it’s already fixed, you can connect back to your MySQL HeatWave DB Instance using MySQL Shell for Visual Studio Code.

February 23, 2024

Sorry, maar een bedrijf schandelijk de dieperik injagen is onwettelijk. Punt. Daar mag, nee moet, een gepaste straf tegenover staan.

Kurt Meers – opinie stuk

Ik vraag me ook af waarom journalisten die Vansteenbrugge uitnodigen niet ingaan op zijn totaal foute definities:

Laster is iemand kwaadwillig iets verwijten zodat dit deze persoon blootstelt aan publieke verachting, zonder dat men erin slaagt het wettelijke bewijs ervan te leveren, terwijl de wet het leveren van het bewijs veronderstelt. Dit kan bijvoorbeeld wanneer je in het openbaar beschuldigd wordt van pedofilie, terwijl hier absoluut geen bewijs van voorhanden is.

Eerroof is iemand kwaadwillig iets verwijten zodat de eer van die persoon wordt aangetast of hij/zij wordt blootgesteld aan publieke verachting, zonder dat het bewijs hiervan wettelijk gezien geleverd kan of mag worden. Iemand wordt bijvoorbeeld beschuldigd van diefstal, maar het wettelijke bewijs daarvan is niet meer te leveren omdat de zaak verjaard is.

Advo-Recht

M.a.w. Laster en eerroof heeft totaal niets te maken met wat Vansteenbrugge daarover beweert: iets over dat je mening de democratie niet in gevaar mag brengen, wat er totaal niets mee te maken heeft – en wat trouwens wel toegelaten is als meningsuiting. Bijvoorbeeld Sharia4Belgium mocht haar mening uiten. Daar werden ze niet voor bestraft. Wel voor het rondselen van terroristen.

Ik heb zelden zo’n slechte advocaat gezien. Het is een populist die slecht theater speelt en ook de meest dwaze strategie adviseert voor zijn cliënt (waardoor die nu een gevangenisstraf i.p.v. een werkstraf oploopt). Maarja. Dat was duidelijk onderdeel van het theater om zo de publieke opinie te kunnen bespelen met: het is zo’n zware straf! Eigen keuze. Eigen schuld.

Laten we de definitie van Laster even vergelijken met wat er hier gebeurd is: het aan de schandpaal nagelen van de eigenaars van een restaurant die totaal niets met de zaak te maken hebben. Dat is letterlijk en exact hetzelfde als het blootstellen aan publieke verachting zonder dat men erin slaagt het wettelijke bewijs ervan te leveren.

Dat valt niet onder de noemer vrijheid van meningsuiting. Wel onder de noemer misdaad.

Er wordt ook schadevergoeding met straf vergeleken. Wat totaal onjuist is. Een schadevergoeding is geen straf. Het is de vergoeding voor schade die jij anderen hebt aangedaan. Dat is geen straf. Die vergoeding kan trouwens nog oplopen (dus meer dan de 20000 Euro worden) aangezien er een expert is aangesteld die de werkelijke schade nu zal begroten.

Ik denk eerlijk gezegd dat dat restaurant in Antwerpen veel meer dan 20000 Euro schade heeft opgelopen. Dat moet allemaal en volledig vergoed worden. Hoe hoog dat bedrag ook is.

Die schadevergoeding vergelijken met de schadevergoeding die betaald moest worden aan de ouders van een student die gestorven is tijdens een studentendoop: a) die zaken hebben totaal niets met elkaar te maken en het is b) lijkenpikkenrij. Er is maar weinig dat ik meer verachtelijk vind. Behalve dan politieke lijkenpikkerij. Dat is nog erger en bovendien een aanfluiting van de scheiding der machten.

Waar je voor zou kunnen pleiten in het Federaal Parlement is om de schadevergoeding voor onopzettelijke doding met schuld te verhogen. Maar daarover hoort dan eerst een behoorlijk gevoerd democratisch debat plaatst te vinden. Dat is dus niet de populistische onzin die we gewoon zijn van Sammy Mahdi.

De wet veronderstelt het leveren van het bewijs hiervoor: hij heeft de afgelopen jaren als politicus geen enkele aanzet om dat democratisch debat te voeren ondernomen.

Hij is dus een populistische lijkenpikker.

L’amour est comme l’oiseau de Twitter
On est bleu de lui, seulement pour 48 heures

StromaeCarmen

Q.E.D.

ps. Lees ook zeker de open brief van de moeder van Sanda Dia. Zij is de recuperatie van de dood van haar zoon ook volledig beu.

February 22, 2024

For close to a decade, I dedicated myself to building out Grafana Labs and subsequently took some personal time to recharge. I came out of these experiences as a different person, and am ready to share a few things I learned.

First step is bringing my abandoned blog back to life:

  • Rewrote the about/bio and projects pages
  • Switched from lighttpd to Caddy (automagic https! wow!)
  • Enabled Google Cloud CDN
  • Switched from Google Analytics to the open source PostHog
  • Tried CloudFlare Turnstile for comment spam prevention, but it’s quite complicated and has un-googleable error codes, so i switched to something more simplistic
  • Upgraded hugo from 0.15 to 0.122 with very few breaking changes (!) and found a nice new minimalist theme
  • I took more care than probably necessary to avoid accidental republishing of articles in RSS feeds, since that’s such an easy mistake to happen with rss GUID’s.

I don’t know who still reads this blog, but if you do, stay tuned for more!

February 20, 2024

Ik merkte op het einde van een Terzake dat onze teerbeminde ADIV opzoek is naar mensen.

Ik mag hopen dat ze planet.grep lezen want ja, een deel van het soort volk dat de ADIV nodig heeft leest onze blogs. Gelukkig maar.

Nu, ik heb in 2014 al eens omschreven wat toen Miguel De Bruycker nodig had voor zijn Centrum voor Cybersecurity.

Meneer Van Strythem Michel; jij hebt precies hetzelfde nodig als Miguel toen. Toch? Je gaat die mensen dan ook op precies dezelfde manier vinden als hoe Miguel ze vond: Bescherm ons tegen afluisteren, luister zelf enkel binnen een wettelijk kader af.

Ik schreef in 2014:

Zolang de overheid haar eigen moreel compas volgt, zijn deze mensen bereid hun kunnen voor de overheid in te zetten.

Maar de wet staat boven de militair. Ze moet gevolgd worden. Ook door de inlichtingendiensten. Het is onze enige garantie op een vrije samenleving: ik wil niet werken of geholpen hebben aan een wereld waarin de burger door technologie vrijheden zoals privacy verliest.

Daartussen vond je in mijn schrijfsel een opsommig van een paar opdrachtjes die een klein beetje moeilijk zijn.

Zoals het verbergen van een Linux kernel module, wat dus een standaard Linux virus zou doen. Het exfiltreren van data over een netwerk device zonder dat het al te veel opvalt. Dat is iets wat de ADIV zal willen doen. Het ontvangen van commando’s. Dat is eigenlijk wat Back orifice ook al deed.

Ik omschreef ook dat ze moeten weten hoe een bufferoverflow fout werkt. Omdat dit een veel voorkomende programmeursfout is die tot beveiligingsproblemen leidt. M.a.w. ze moeten toch op zijn minst weten hoe ze zelf zo’n beveiligingsprobleem zouden kunnen maken. Al was het maar om hun aanvalstechnieken te kunnen voorbereiden.

Ik verwachtte ook dat ze een standaard socket-servertje kunnen opzetten. Natuurlijk. Dat is minimaal.

De gemiddelde programmeur zal dat vast allemaal niet kunnen. Maar we hebben de gemiddelde programmeur dan ook niet nodig.

Als de ADIV zich houdt aan de wet en steeds binnen de context van de wet werkt, dan zal de ADIV de besten van ons land vinden. Net zoals hoe de brandweer de beste mensen van het land weet te activeren, zal de ADIV de beste programmeurs kunnen activeren.

Maar dat is omdat het zich aan haar eigen morele compas houdt.

Doet ze dat niet, dan niet.

Met vriendelijke groeten,

Philip. Programmeur.

February 15, 2024

Recently during the Swedish MySQL User Group (SMUG), I presented a session dedicated to MySQL InnoDB Primary Keys.

I forgot to mention a detail that many people are not aware, but Jeremy Cole has pointed out.

Primary Key always included in secondary indexes at the right-most column

When we define a secondary index, the secondary index includes the Primary Key as the right-most column of the index. It’s silently added, meaning that it’s not visible but it’s used to point back to the record in the clustered index.

This is an example with a table having a Primary Key composed of multiple columns:

CREATE TABLE `t1` (
`a` int NOT NULL,
`b` int NOT NULL, 
`c` int NOT NULL,
`d` int NOT NULL,
`e` int NOT NULL,
`f` varchar(10) DEFAULT 'aaa',
`inserted` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(`a`,`b`,`c`) ) ENGINE=InnoDB; 

And this is the table’s content (only 2 records):

SELECT * FROM t1;
+---+---+---+----+----+-----+---------------------+
| a | b | c | d  | e  | f   | inserted            |
+---+---+---+----+----+-----+---------------------+
| 1 | 2 | 3 |  4 |  5 | abc | 2024-02-11 17:37:16 |
| 7 | 8 | 9 | 10 | 11 | def | 2024-02-11 17:37:26 |
+---+---+---+----+----+-----+---------------------+

Now let’s create a secondary key for the column f:

ALTER TABLE t1 ADD INDEX f_idx(f);

This key will then include the Primary Key as the right-most column(s) on the secondary index:

The orange filled entries are the hidden one.

Let’s verify this on the InnoDB page for that index:

And indeed, we can see that the Primary Key columns (in red) are included in each entry of the secondary index (in purple).

But not always !

When we have a Primary Key or Part of a Primary Key included in a Secondary Index, only the eventual missing columns of the Primary Key index will be added as right-most and hidden entries to the secondary index.

Let’s create a secondary index where the column b will be missing:

ALTER TABLE t1 ADD INDEX sec_idx (`d`,`c`,`e`,`a`);

The column b will be indeed added as the right-most hidden column of the index. Let’s verify this:

We can see above, that indeed, the value of column b is added. Same for the second record:

If we check in the InnoDB source code, this is also commented:

But what will happen, if we just use a prefix part of the Primary Key in the secondary index?

Let’s try:

CREATE TABLE `t1` (
  `a` varchar(10) NOT NULL DEFAULT 'aaaaaaaaaa',
  `b` varchar(10) NOT NULL DEFAULT 'bbbbbbbbbb',
  `c` int NOT NULL DEFAULT '1',
  `f` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`a`,`b`,`c`),
  KEY `sec_idx` (`c`,`f`,`a`(2))
) ENGINE=InnoDB 

SELECT * FROM t1;
+------------+------------+---+-----+
| a          | b          | c | f   |
+------------+------------+---+-----+
| aaaaaaaaaa | bbbbbbbbbb | 1 | abc |
| ccccccccc  | dddddddddd | 2 | def |
+------------+------------+---+-----+

We can see that only 2 characters of the column a are used in the secondary index.

If we check in the InnoDB page, we can notice that in-fact, the full column will also be added as the right-most hidden part of the secondary index:

So InnoDB needs to have the full PK, visible or hidden in the secondary index.

This is something not often known. But let’s hope this blog post clarifies it.

Enjoy MySQL & InnoDB… and of course, don’t forget to use Primary Keys !

February 14, 2024

So I like and use Spectra (aka “ultimate addons for gutenberg”) on some of my WordPress sites. If you use Spectra as well, you might want to be know that Spectra 2.12 has a significant performance regression, sucking in almost all of . Here’s hoping this was not intentional and that it’ll get fixed soon More info in https://wordpress.org/support/topic/2-12-significant-performance-regression/

Source

Our MySQL Belgian Days were a complete success. With 120 participants, the room was packed throughout the 2 days from the first to the last session. A record for this event !

The audience was made up of customers (some from very far away), users, contributors, community members and, of course, old friends.

The atmosphere was relaxed and everyone seemed happy to be there and to be talking about MySQL.

This is one of the rare opportunities to meet the MySQL developers who also operate our MySQL service on OCI and, for the first time, to meet the developers who code the unique features relating to HeatWave. Their presentations were technical, explaining the problems they encountered and which algorithms they used to to solve them, this was very interesting. The feedback from the audience was very positive.

The first day was dedicated to the MySQL Community. Several speakers presented their solutions, ranging from caching and monitoring to distributed systems.

Some other speakers explained their architecture and how they are using hybrid systems, on premise and cloud.

The second day was dedicated to the MySQL Team. The engineers presented content related to their current projects. Geir Hoydalsvik, the Senior Director of Software Development, presented the new MySQL release model, emphasizing the imminent arrival of the very first Long-Term Support (LTS) version. The need for this session was underlined by the apparent lack of clarity of some participants on the subject.

It was also very clear that many people are using MySQL Shell and that they love it ! Many users are in fact already deploying their architectures using the Admin API, this is a great news!

MySQL Shell Dump & Load utility are also very popular tools and everyone has praised their effectiveness.

The developers present in the room were also interested in Oystein‘s presentation regarding the use of GraalVM in MySQL to write Store Procedures in JavaScript.

Before the end of this 2 days conference, we had the MySQL Rockstar ceremony. All the winners were unanimous, and our first winner of the MySQL Rockstar Legend award received the loudest applause: well done Datacharmer!

The conference concludes with the traditional MySQL Community Dinner dinner expertly organised by Matthias, kudos !

We now have one year to prepare for our next MySQL Belgian Days, when we will celebrate MySQL’s 30th anniversary !

Here are the slides of the sessions:

Day 1 – MySQL Ecosystem Day

Day 2 – Oracle MySQL Day

MySQL Rockstar 2023 Awards

February 06, 2024

FOSDEM is over, but our video infrastructure is still working hard! This year, there were 855 events in rooms where the FOSDEM video team had equipment to stream, record, and publish everything, for a total of 365 scheduled hours of content. Postprocessing that much video is a massive task that takes quite a bit of effort; for this reason, since the 2017 edition, we have crowdsourced most of the manual work that needs to happen for this postprocessing to the devroom organizers and the speakers, as explained in this talk at FOSDEM 2019. As of this writing, out of those舰

February 05, 2024

A male business person working on a laptop in a futuristic, cryptopunk setting.

For nearly three decades, third-party cookies have been a major privacy concern on the web. They allow organizations, such as advertisers, to track users' browsing activities across multiple websites, often without explicit consent.

Unlike third-party cookies, first-party cookies are restricted to the website you are on. They are often used to improve the user experience, such as keeping you logged in, remembering what is in your shopping cart, and more.

Of course, first-party cookies can also be used to track your activity, like with Google Analytics, but they can't be used to follow you beyond that site. While both types of cookies can be used to track users, third-party cookies are much more invasive and problematic.

In 2018, I made the decision to remove all tracking tools, including Google Analytics, from my personal site. My website aspires to the privacy of printed works. The anonymity of a book holds a unique charm, and I find joy in not knowing who visits my website.

That said, I have no issue with the use of first-party cookies, provided it's consensual and used to improve the user experience. I understand their importance for many organizations, especially in marketing.

Fortunately, the era of third-party cookies is coming to a close. Browsers like Safari and Firefox have already taken steps to limit third-party tracking. They still allow certain third-party cookies to ensure websites work properly. Two examples include:

  1. E-commerce sites often rely on third-party cookies for integrating payment systems. Blocking these cookies could disrupt the payment process.
  2. Complex digital platforms, like healthcare and government websites, sometimes use cross-site authentication to link departmental websites.  Blocking these could prevent access to important services.

While Safari and Firefox have been limiting third-party cookies for some time, Google Chrome is lagging behind.  Google only began phasing out third-party cookies in early 2024 (a few weeks ago), starting with just 1% of its users. Their plan is to expand this to all users by the end of 2024.

Google's strategy can be viewed in two ways:

  1. Negatively, it could look like Google is delaying the phase-out because it profits from the advertising revenue these cookies generate.
  2. Positively, Google is cautious in removing third-party cookies to avoid disrupting websites that rely on them for non-advertising purposes.

As a regular Chrome user, I didn't want to wait until the end of 2024. If you feel the same, you can block third-party cookies in Chrome now. Just head to Settings, select Privacy and Security and activate Block third-party cookies.  Just beware, as some sites might stop working.

If you manage or develop websites, check they rely on third-party cookies. Use my HTTP Header Analyzer to check for SameSite=None attributes indicating third-party cookies, or test by disabling them in Chrome.

February 04, 2024

Why your app turned into spaghetti

Cover Image

"I do not like your software sir,
your architecture's poor.

Your users can't do anything,
unless you code some more.

This isn't how it used to be,
we had this figured out.

But then you had to mess it up
by moving into clouds."

There's a certain kind of programmer. Let's call him Stanley.

Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state.

Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions.

He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer will probably be: "you're using it wrong."

Stanley is most likely a back-end or systems-level developer. Because nirvana in front-end development is reached when you understand that this view of software is not just wrong, but fundamentally incompatible with the real world.

I will prove it.

Alice in wonderland

State Your Intent

Take a text editor. What happens if you press the up and down arrows?

The keyboard cursor (aka caret) moves up and down. Duh. Except it also moves left and right.

The editor state at the start has the caret on line 1 column 6. Pressing down will move it to line 2 column 6. But line 2 is too short, so the caret is forcibly moved left to column 1. Then, pressing down again will move it back to column 6.

It should be obvious that any editor that didn't remember which column you were actually on would be a nightmare to use. You know it in your bones. Yet this only works because the editor allows the caret to be placed on a position that "does not exist." What is the caret state in the middle? It is both column 1 and column 6.

Intent - State - View

To accommodate this, you need more than just a View that is a pure function of a State, as is now commonly taught. Rather, you need an Intent, which is the source of truth that you mutate... and which is then parsed and validated into a State. Only then can it be used by the View to render the caret in the right place.

To edit the intent, aka what a classic Controller does, is a bit tricky. When you press left/right, it should determine the new Intent.column based on the validated State.column +/- 1. But when you press up/down, it should keep the Intent.column you had before and instead change only Intent.line. New intent is a mixed function of both previous intent and previous state.

The general pattern is that you reuse Intent if it doesn't change, but that new computed Intent should be derived from State. Note that you should still enforce normal validation of Intent.column when editing too: you don't allow a user to go past the end of a line. Any new intent should be as valid as possible, but old intent should be preserved as is, even if non-sensical or inapplicable.

Validate vs Mutate

Functionally, for most of the code, it really does look and feel as if the state is just State, which is valid. It's just that when you make 1 state change, the app may decide to jump into a different State than one would think. When this happens, it means some old intent first became invalid, but then became valid again due to a subsequent intent/state change.

This is how applications actually work IRL. FYI.

Dining Etiquette

Knives and Forks

I chose a text editor as an example because Stanley can't dismiss this as just frivolous UI polish for limp wristed homosexuals. It's essential that editors work like this.

The pattern is far more common than most devs realize:

  • A tree view remembers the expand/collapse state for rows that are hidden.
  • Inspector tabs remember the tab you were on, even if currently disabled or inapplicable.
  • Toggling a widget between type A/B/C should remember all the A, B and C options, even if mutually exclusive.

All of these involve storing and preserving something unknown, invalid or unused, and bringing it back into play later.

More so, if software matches your expected intent, it's a complete non-event. What looks like a "surprise hidden state transition" to a programmer is actually the exact opposite. It would be an unpleasant surprise if that extra state transition didn't occur. It would only annoy users: they already told the software what they wanted, but it keeps forgetting.

The ur-example is how nested popup menus should work: good implementations track the motion of the cursor so you can move diagonally from parent to child, without falsely losing focus:

This is an instance of the goalkeeper's curse: people rarely compliment or notice the goalkeeper if they do their job, only if they mess up. Successful applications of this principle are doomed to remain unnoticed and unstudied.

Validation is not something you do once, discarding the messy input and only preserving the normalized output. It's something you do continuously and non-destructively, preserving the mess as much as possible. It's UI etiquette: the unspoken rules that everyone expects but which are mostly undocumented folklore.

This poses a problem for most SaaS in the wild, both architectural and existential. Most APIs will only accept mutations that are valid. The goal is for the database to be a sequence of fully valid states:

Mutate

The smallest possible operation in the system is a fully consistent transaction. This flattens any prior intent.

In practice, many software deviates from this ad-hoc. For example, spreadsheets let you create cyclic references, which is by definition invalid. The reason it must let you do this is because fixing one side of a cyclic reference also fixes the other side. A user wants and needs to do these operations in any order. So you must allow a state transition through an invalid state:

Google sheets circular reference
Mutate through invalid state

This requires an effective Intent/State split, whether formal or informal.

Edsger Dijkstra

Because cyclic references can go several levels deep, identifying one cyclic reference may require you to spider out the entire dependency graph. This is functionally equivalent to identifying all cyclic references—dixit Dijkstra. Plus, you need to produce sensible, specific error messages. Many "clever" algorithmic tricks fail this test.

Now imagine a spreadsheet API that doesn't allow for any cyclic references ever. This still requires you to validate the entire resulting model, just to determine if 1 change is allowed. It still requires a general validate(Intent). In short, it means your POST and PUT request handlers need to potentially call all your business logic.

That seems overkill, so the usual solution is bespoke validators for every single op. If the business logic changes, there is a risk your API will now accept invalid intent. And the app was not built for that.

If you flip it around and assume intent will go out-of-bounds as a normal matter, then you never have this risk. You can write the validation in one place, and you reuse it for every change as a normal matter of data flow.

Note that this is not cowboy coding. Records and state should not get irreversibly corrupted, because you only ever use valid inputs in computations. If the system is multiplayer, distributed changes should still be well-ordered and/or convergent. But the data structures you're encoding should be, essentially, entirely liberal to your user's needs.

Git diff

Consider git. Here, a "unit of intent" is just a diff applied to a known revision ID. When something's wrong with a merge, it doesn't crash, or panic, or refuse to work. It just enters a conflict state. This state is computed by merging two incompatible intents.

It's a dirty state that can't be turned into a clean commit without human intervention. This means git must continue to work, because you need to use git to clean it up. So git is fully aware when a conflict is being resolved.

As a general rule, the cases where you actually need to forbid a mutation which satisfies all the type and access constraints are small. A good example is trying to move a folder inside itself: the file system has to remain a sensibly connected tree. Enforcing the uniqueness of names is similar, but also comes with a caution: falsehoods programmers believe about names. Adding (Copy) to a duplicate name is usually better than refusing to accept it, and most names in real life aren't unique at all. Having user-facing names actually requires creating tools and affordances for search, renaming references, resolving duplicates, and so on.

Even among front-end developers, few people actually grok this mental model of a user. It's why most React(-like) apps in the wild are spaghetti, and why most blog posts about React gripes continue to miss the bigger picture. Doing React (and UI) well requires you to unlearn old habits and actually design your types and data flow so it uses potentially invalid input as its single source of truth. That way, a one-way data flow can enforce the necessary constraints on the fly.

The way Stanley likes to encode and mutate his data is how programmers think about their own program: it should be bug-free and not crash. The mistake is to think that this should also apply to any sort of creative process that program is meant to enable. It would be like making an IDE that only allows you to save a file if the code compiles and passes all the tests.

surprised, mind blown, cursing, thinking, light bulb

Trigger vs Memo

Coding around intent is a very hard thing to teach, because it can seem overwhelming. But what's overwhelming is not doing this. It leads to codebases where every new feature makes ongoing development harder, because no part of the codebase is ever finished. You will sprinkle copies of your business logic all over the place, in the form of request validation, optimistic local updaters, and guess-based cache invalidation.

If this is your baseline experience, your estimate of what is needed to pull this off will simply be wrong.

In the traditional MVC model, intent is only handled at the individual input widget or form level. e.g. While typing a number, the intermediate representation is a string. This may be empty, incomplete or not a number, but you temporarily allow that.

I've never seen people formally separate Intent from State in an entire front-end. Often their state is just an adhoc mix of both, where validation constraints are relaxed in the places where it was most needed. They might just duplicate certain fields to keep a validated and unvalidated variant side by side.

There is one common exception. In a React-like, when you do a useMemo with a derived computation of some state, this is actually a perfect fit. The eponymous useState actually describes Intent, not State, because the derived state is ephemeral. This is why so many devs get lost here.

const state = useMemo(
  () => validate(intent),
  [intent]
);

Their usual instinct is that every action that has knock-on effects should be immediately and fully realized, as part of one transaction. Only, they discover some of those knock-on effects need to be re-evaluated if certain circumstances change. Often to do so, they need to undo and remember what it was before. This is then triggered anew via a bespoke effect, which requires a custom trigger and mutation. If they'd instead deferred the computation, it could have auto-updated itself, and they would've still had the original data to work with.

e.g. In a WYSIWYG scenario, you often want to preview an operation as part of mouse hovering or dragging. It should look like the final result. You don't need to implement custom previewing and rewinding code for this. You just need the ability to layer on some additional ephemeral intent on top of the intent that is currently committed. Rewinding just means resetting that extra intent back to empty.

You can make this easy to use by treating previews as a special kind of transaction: now you can make preview states with the same code you use to apply the final change. You can also auto-tag the created objects as being preview-only, which is very useful. That is: you can auto-translate editing intent into preview intent, by messing with the contents of a transaction. Sounds bad, is actually good.

The same applies to any other temporary state, for example, highlighting of elements. Instead of manually changing colors, and creating/deleting labels to pull this off, derive the resolved style just-in-time. This is vastly simpler than doing it all on 1 classic retained model. There, you run the risk of highlights incorrectly becoming sticky, or shapes reverting to the wrong style when un-highlighted. You can architect it so this is simply impossible.

The trigger vs memo problem also happens on the back-end, when you have derived collections. Each object of type A must have an associated type B, created on-demand for each A. What happens if you delete an A? Do you delete the B? Do you turn the B into a tombstone? What if the relationship is 1-to-N, do you need to garbage collect?

If you create invisible objects behind the scenes as a user edits, and you never tell them, expect to see a giant mess as a result. It's crazy how often I've heard engineers suggest a user should only be allowed to create something, but then never delete it, as a "solution" to this problem. Everyday undo/redo precludes it. Don't be ridiculous.

The problem is having an additional layer of bookkeeping you didn't need. The source of truth was collection A, but you created a permanent derived collection B. If you instead make B ephemeral, derived via a stateless computation, then the problem goes away. You can still associate data with B records, but you don't treat B as the authoritative source for itself. This is basically what a WeakMap is.

Event Sourcing

In database land this can be realized with a materialized view, which can be incremental and subscribed to. Taken to its extreme, this turns into event-based sourcing, which might seem like a panacea for this mindset. But in most cases, the latter is still a system by and for Stanley. The event-based nature of those systems exists to support housekeeping tasks like migration, backup and recovery. Users are not supposed to be aware that this is happening. They do not have any view into the event log, and cannot branch and merge it. The exceptions are extremely rare.

It's not a system for working with user intent, only for flattening it, because it's append-only. It has a lot of the necessary basic building blocks, but substitutes programmer intent for user intent.

What's most nefarious is that the resulting tech stacks are often quite big and intricate, involving job queues, multi-layered caches, distribution networks, and more. It's a bunch of stuff that Stanley can take joy and pride in, far away from users, with "hard" engineering challenges. Unlike all this *ugh* JavaScript, which is always broken and unreliable and uninteresting.

Except it's only needed because Stanley only solved half the problem, badly.

Patch or Bust

When factored in from the start, it's actually quite practical to split Intent from State, and it has lots of benefits. Especially if State is just a more constrained version of the same data structures as Intent. This doesn't need to be fully global either, but it needs to encompass a meaningful document or workspace to be useful.

It does create an additional problem: you now have two kinds of data in circulation. If reading or writing requires you to be aware of both Intent and State, you've made your code more complicated and harder to reason about.

Validate vs Mutate

More so, making a new Intent requires a copy of the old Intent, which you mutate or clone. But you want to avoid passing Intent around in general, because it's fishy data. It may have the right types, but the constraints and referential integrity aren't guaranteed. It's a magnet for the kind of bugs a type-checker won't catch.

I've published my common solution before: turn changes into first-class values, and make a generic update of type Update<T> be the basic unit of change. As a first approximation, consider a shallow merge {...value, ...update}. This allows you to make an updateIntent(update) function where update only specifies the fields that are changing.

In other words, Update<Intent> looks just like Update<State> and can be derived 100% from State, without Intent. Only one place needs to have access to the old Intent, all other code can just call that. You can make an app intent-aware without complicating all the code.

Validate vs Mutate 2

If your state is cleaved along orthogonal lines, then this is all you need. i.e. If column and line are two separate fields, then you can selectively change only one of them. If they are stored as an XY tuple or vector, now you need to be able to describe a change that only affects either the X or Y component.

const value = {
  hello: 'text',
  foo: { bar: 2, baz: 4 },
};

const update = {
  hello: 'world',
  foo: { baz: 50 },
};

expect(
  patch(value, update)
).toEqual({
  hello: 'world',
  foo: { bar: 2, baz: 50 },
});

So in practice I have a function patch(value, update) which implements a comprehensive superset of a deep recursive merge, with full immutability. It doesn't try to do anything fancy with arrays or strings, they're just treated as atomic values. But it allows for precise overriding of merging behavior at every level, as well as custom lambda-based updates. You can patch tuples by index, but this is risky for dynamic lists. So instead you can express e.g. "append item to list" without the entire list, as a lambda.

I've been using patch for years now, and the uses are myriad. To overlay a set of overrides onto a base template, patch(base, overrides) is all you need. It's the most effective way I know to erase a metric ton of {...splats} and ?? defaultValues and != null from entire swathes of code. This is a real problem.

You could also view this as a "poor man's OT", with the main distinction being that a patch update only describes the new state, not the old state. Such updates are not reversible on their own. But they are far simpler to make and apply.

It can still power a global undo/redo system, in combination with its complement diff(A, B): you can reverse an update by diffing in the opposite direction. This is an operation which is formalized and streamlined into revise(…), so that it retains the exact shape of the original update, and doesn't require B at all. The structure of the update is sufficient information: it too encodes some intent behind the change.

With patch you also have a natural way to work with changes and conflicts as values. The earlier WYSIWIG scenario is just patch(commited, ephemeral) with bells on.

The net result is that mutating my intent or state is as easy as doing a {...value, ...update} splat, but I'm not falsely incentivized to flatten my data structures.

Instead it frees you up to think about what the most practical schema actually is from the data's point of view. This is driven by how the user wishes to edit it, because that's what you will connect it to. It makes you think about what a user's workspace actually is, and lets you align boundaries in UX and process with boundaries in data structure.

Array vs Linked List

Remember: most classic "data structures" are not about the structure of data at all. They serve as acceleration tools to speed up specific operations you need on that data. Having the reads and writes drive the data design was always part of the job. What's weird is that people don't apply that idea end-to-end, from database to UI and back.

SQL tables are shaped the way they are because it enables complex filters and joins. However, I find this pretty counterproductive: it produces derived query results that are difficult to keep up to date on a client. They also don't look like any of the data structures I actually want to use in my code.

A Bike Shed of Schemas

This points to a very under-appreciated problem: it is completely pointless to argue about schemas and data types without citing specific domain logic and code that will be used to produce, edit and consume it. Because that code determines which structures you are incentivized to use, and which structures will require bespoke extra work.

From afar, column and line are just XY coordinates. Just use a 2-vector. But once you factor in the domain logic and etiquette, you realize that the horizontal and vertical directions have vastly different rules applied to them, and splitting might be better. Which one do you pick?

This applies to all data. Whether you should put items in a List<T> or a Map<K, V> largely depends on whether the consuming code will loop over it, or need random access. If an API only provides one, consumers will just build the missing Map or List as a first step. This is O(n log n) either way, because of sorting.

The method you use to read or write your data shouldn't limit use of everyday structure. Not unless you have a very good reason. But this is exactly what happens.

A lot of bad choices in data design come down to picking the "wrong" data type simply because the most appropriate one is inconvenient in some cases. This then leads to Conway's law, where one team picks the types that are most convenient only for them. The other teams are stuck with it, and end up writing bidirectional conversion code around their part, which will never be removed. The software will now always have this shape, reflecting which concerns were considered essential. What color are your types?

{
  order: [4, 11, 9, 5, 15, 43],
  values: {
    4: {...},
    5: {...},
    9: {...},
    11: {...},
    15: {...},
    43: {...},
  },
);

For List vs Map, you can have this particular cake and eat it too. Just provide a List<Id> for the order and a Map<Id, T> for the values. If you structure a list or tree this way, then you can do both iteration and ID-based traversal in the most natural and efficient way. Don't underestimate how convenient this can be.

This also has the benefit that "re-ordering items" and "editing items" are fully orthogonal operations. It decomposes the problem of "patching a list of objects" into "patching a list of IDs" and "patching N separate objects". It makes code for manipulating lists and trees universal. It lets you to decide on a case by case basis whether you need to garbage collect the map, or whether preserving unused records is actually desirable.

Limiting it to ordinary JSON or JS types, rather than going full-blown OT or CRDT, is a useful baseline. With sensible schema design, at ordinary editing rates, CRDTs are overkill compared to the ability to just replay edits, or notify conflicts. This only requires version numbers and retries.

Users need those things anyway: just because a CRDT converges when two people edit, doesn't mean the result is what either person wants. The only case where OTs/CRDTs are absolutely necessary is rich-text editing, and you need bespoke UI solutions for that anyway. For simple text fields, last-write-wins is perfectly fine, and also far superior to what 99% of RESTy APIs do.

CRDT

A CRDT is just a mechanism that translates partially ordered intents into a single state. Like, it's cool that you can make CRDT counters and CRDT lists and whatnot... but each CRDT implements only one particular resolution strategy. If it doesn't produce the desired result, you've created invalid intent no user expected. With last-write-wins, you at least have something 1 user did intend. Whether this is actually destructive or corrective is mostly a matter of schema design and minimal surface area, not math.

The main thing that OTs and CRDTs do well is resolve edits on ordered sequences, like strings. If two users are typing text in the same doc, edits higher-up will shift edits down below, which means the indices change when rebased. But if you are editing structured data, you can avoid referring to indices entirely, and just use IDs instead. This sidesteps the issue, like splitting order from values.

For the order, there is a simple solution: a map with a fractional index, effectively a dead-simple list CRDT. It just comes with some overhead.

Google docs comment

Using a CRDT for string editing might not even be enough. Consider Google Docs-style comments anchored to that text: their indices also need to shift on every edit. Now you need a bespoke domain-aware CRDT. Or you work around it by injecting magic markers into the text. Either way, it seems non-trivial to decouple a CRDT from the specific target domain of the data inside. The constraints get mixed in.

If you ask me, this is why the field of real-time web apps is still in somewhat of a rut. It's mainly viewed as a high-end technical problem: how do we synchronize data structures over a P2P network without any data conflicts? What they should be asking is: what is the minimal amount of structure we need to reliably synchronize, so that users can have a shared workspace where intent is preserved, and conflicts are clearly signposted. And how should we design our schemas, so that our code can manipulate the data in a straightforward and reliable way? Fixing non-trivial user conflicts is simply not your job.

Most SaaS out there doesn't need any of this technical complexity. Consider that a good multiplayer app requires user presence and broadcast anyway. The simplest solution is just a persistent process on a single server coordinating this, one per live workspace. It's what most MMOs do. In fast-paced video games, this even involves lag compensation. Reliable ordering is not the big problem.

The situations where this doesn't scale, or where you absolutely must be P2P, are a minority. If you run into them, you must be doing very well. The solution that I've sketched out here is explicitly designed so it can comfortably be done by small teams, or even just 1 person.

Private CAD app

The (private) CAD app I showed glimpses of above is entirely built this way. It's patch all the way down and it's had undo/redo from day 1. It also has a developer mode where you can just edit the user-space part of the data model, and save/load it.

When the in-house designers come to me with new UX requests, they often ask: "Is it possible to do ____?" The answer is never a laborious sigh from a front-end dev with too much on their plate. It's "sure, and we can do more."

If you're not actively aware the design of schemas and code is tightly coupled, your codebase will explode, and the bulk of it will be glue. Much of it just serves to translate generalized intent into concrete state or commands. Arguments about schemas are usually just hidden debates about whose job it is to translate, split or join something. This isn't just an irrelevant matter of "wire formats" because changing the structure and format of data also changes how you address specific parts of it.

In an interactive UI, you also need a reverse path, to apply edits. What I hope you are starting to realize is that this is really just the forward path in reverse, on so many levels. The result of a basic query is just the ordered IDs of the records that it matched. A join returns a tuple of record IDs per row. If you pre-assemble the associated record data for me, you actually make my job as a front-end dev harder, because there are multiple forward paths for the exact same data, in subtly different forms. What I want is to query and mutate the same damn store you do, and be told when what changes. It's table-stakes now.

With well-architected data, this can be wired up mostly automatically, without any scaffolding. The implementations you encounter in the wild just obfuscate this, because they don't distinguish between the data store and the model it holds. The fact that the data store should not be corruptible, and should enforce permissions and quotas, is incorrectly extended to the entire model stored inside. But that model doesn't belong to Stanley, it belongs to the user. This is why desktop applications didn't have a "Data Export". It was just called Load and Save, and what you saved was the intent, in a file.

Windows 95 save dialog

Having a universal query or update mechanism doesn't absolve you from thinking about this either, which is why I think the patch approach is so rare: it looks like cowboy coding if you don't have the right boundaries in place. Patch is mainly for user-space mutations, not kernel-space, a concept that applies to more than just OS kernels. User-space must be very forgiving.

If you avoid it, you end up with something like GraphQL, a good example of solving only half the problem badly. Its getter assembles data for consumption by laboriously repeating it in dozens of partial variations. And it turns the setter part into an unsavory mix of lasagna and spaghetti. No wonder, it was designed for a platform that owns and hoards all your data.

* * *

Viewed narrowly, Intent is just a useful concept to rethink how you enforce validation and constraints in a front-end app. Viewed broadly, it completely changes how you build back-ends and data flows to support that. It will also teach you how adding new aspects to your software can reduce complexity, not increase it, if done right.

A good metric is to judge implementation choices by how many other places of the code need to care about them. If a proposed change requires adjustments literally everywhere else, it's probably a bad idea, unless the net effect is to remove code rather than add.

Live Canvas

I believe reconcilers like React or tree-sitter are a major guide stone here. What they do is apply structure-preserving transforms on data structures, and incrementally. They actually do the annoying part for you. I based Use.GPU on the same principles, and use it to drive CPU canvases too. The tree-based structure reflects that one function's state just might be the next function's intent, all the way down. This is a compelling argument that the data and the code should have roughly the same shape.

Back-end vs Front-end split

You will also conclude there is nothing more nefarious than a hard split between back-end and front-end. You know, coded by different people, where each side is only half-aware of the other's needs, but one sits squarely in front of the other. Well-intentioned guesses about what the other end needs will often be wrong. You will end up with data types and query models that cannot answer questions concisely and efficiently, and which must be babysat to not go stale.

In the last 20 years, little has changed here in the wild. On the back-end, it still looks mostly the same. Even when modern storage solutions are deployed, people end up putting SQL- and ORM-like layers on top, because that's what's familiar. The split between back-end and database has the exact same malaise.

None of this work actually helps make the app more reliable, it's the opposite: every new feature makes on-going development harder. Many "solutions" in this space are not solutions, they are copes. Maybe we're overdue for a NoSQL-revival, this time with a focus on practical schema design and mutation? SQL was designed to model administrative business processes, not live interaction. I happen to believe a front-end should sit next to the back-end, not in front of it, with only a thin proxy as a broker.

What I can tell you for sure is: it's so much better when intent is a first-class concept. You don't need nor want to treat user data as something to pussy-foot around, or handle like it's radioactive. You can manipulate and transport it without a care. You can build rich, comfy functionality on top. Once implemented, you may find yourself not touching your network code for a very long time. It's the opposite of overwhelming, it's lovely. You can focus on building the tools your users need.

This can pave the way for more advanced concepts like OT and CRDT, but will show you that neither of them is a substitute for getting your application fundamentals right.

In doing so, you reach a synthesis of Dijkstra and anti-Dijkstra: your program should be provably correct in its data flow, which means it can safely break in completely arbitrary ways.

Because the I in UI meant "intent" all along.

More:

February 03, 2024

A while ago I wrote a article comparing Waterloo with Bakhmut. That was in August 2023. We are in February 2024 and I predict that soon I will have to say the same about Avdiivka.

The city is about to be encircled the coming month and is in fact already ~ technically encircled (all of its supply roads are within fire range of Russians). I think that Avdiivka has already surpassed many times the death toll of Bakhmut.

The amounts of deaths are now also becoming comparable to the WWII Operation Bagration which had about 400000 German and 180000 Red Army kills.

The Soviet operation was named after the Georgian prince Pyotr Bagration (1765–1812), a general of the Imperial Russian Army during the Napoleonic Wars.

Wikipedia

February 01, 2024

HTTP headers play a crucial part in making your website fast and secure. For that reason, I often inspect HTTP headers to troubleshoot caching problems or review security settings.

The complexity of the HTTP standard and the challenge to remember all the best practices led me to develop an HTTP Header Analyzer four years ago.

It is pretty simple: enter a URL, and the tool will analyze the headers sent by your webserver, CMS or web application. It then explains these headers, offers a score, and suggests possible improvements.

For a demonstration, click https://dri.es/headers?url=https://www.reddit.com. As the URL suggests, it will analyze the HTTP headers of Reddit.com.

I began this as a weekend project in the early days of COVID, seeing it as just another addition to my toolkit. At the time, I simply added it to my projects page but never announced or mentioned it on my blog.

So why write about it now? Because I happened to check my log files and, lo and behold, the little scanner that could clocked in more than 5 million scans, averaging over 3,500 scans a day.

So four years and five million scans later, I'm finally announcing it to the world!

If you haven't tried my HTTP header analyzer, check it out. It's free, easy to use, requires no sign-up, and is built to help improve your website's performance and security.

The crawler works with all websites, but naturally, I added some special checks for Drupal sites.

I don't have any major plans for the crawler. At some point, I'd like to move it to its own domain, as it feels a bit out of place as part of my personal website. But for now, that isn't a priority.

For the time being, I'm open to any feedback or suggestions and will commit to making any necessary corrections or improvements.

It's rewarding to know that this tool has made thousands of websites faster and safer! It's also a great reminder to share your work, even in the simplest way – you never know the impact it could have.

January 27, 2024

With FOSDEM just a few days away, it is time for us to enlist your help. Every year, an enthusiastic band of volunteers make FOSDEM happen and make it a fun and safe place for all our attendees. We could not do this without you. This year we again need as many hands as possible, especially for heralding during the conference, during the buildup (starting Friday at noon) and teardown (Sunday evening). No need to worry about missing lunch at the weekend, food will be provided. Would you like to be part of the team that makes FOSDEM tick?舰

January 25, 2024

I recently added a new page to my website, which displays the top 10 topics of each year. This page serves a dual purpose: it visualizes the evolution of my interests, and provides visitors with an overview of the core content of my site.

As I stared at this new page, it became very obvious that my blog has maintained a consistent focus throughout the years. The top categories have not changed meaningfully in 18 years. The recurring themes include Drupal, Acquia, Open Source, and photography. The latter, a long-term interest, has experienced ebbs and flows during this time.

Just as having the same cereal each day can become boring – a fact my wife kindly reminds me of – covering the same topics for 18 years can lead to a certain "predictability". This realization sparked a desire to spice things up!

A cabinet displaying jars, each with a different interest inside: travel, photography, electronics, tennis, food, investing, coffee, and more. A cabinet with all my interests.

My blog history wouldn't suggest it, but I consider myself a person with many interests. Sometimes I have too many interests. As a result, I'm often drawn into various side projects – another fact my wife kindly reminds me of.

My point is that there are certainly different aspects of my life and interests that I could write more about. With that in mind, a goal for 2024 is to diversify my blog's content. I set a goal to introduce at least two new topics into my top 10 list for 2024, alongside Drupal and Acquia.

For example, I've been passionate about investing for 15 years. A few of my investments have generated a 5,000% return. Yet, I've been reluctant to write about investing for 15 years. The reasons for this hesitation are as diverse as a well-balanced investment portfolio: the challenge of finding enough time, moments of impostor syndrome, fear of confusing my long-term readers, and the sensitivity around discussing financial successes and failures.

Some might say that finance-related blog posts don't exactly bring the heat when it comes to spicing up a blog. They are probably right. But then I reminded myself that this blog is, first and foremost, for me. I write to learn and engage with readers. Consider this a heads-up: brace yourselves for some new flavors on my blog, from finance to who knows what else.

What I love most about blogging is how it stimulates my thoughts and sparks engaging discussions, often offline or via email. Introducing new two topics should most certainly lead to more of that. I look forward to such conversations.

January 21, 2024

As of Autoptimize Pro 2.3 there is an option to delay all JavaScript. Delaying can have a bigger positive performance impact then asyncing or deferring because AOPro will only load the delayed JS after a to be defined delay or (better even) only at the moment user interaction (such as swiping/ scrolling or mouse movement) is noticed. Obviously when delaying all JS you might want to be able to…

Source

January 18, 2024

2024 will be the first year there will be a FOSDEM junior track, organizing workshops for children 7-17. Developers from the FOSS Educational Programming Languages devroom will be organizing workshops for children from 7 - 17 with the help of CoderDojo. At FOSDEM, volunteers from CoderDojo International, CoderDojo Netherlands, and CoderDojo Belgium will be helping during the workshops. There will be workshops from MicroBlocks, Snap!, Zim, MIT App Inventor, Raspberry Pi, Hedy, and CoderDojo. For these workshops, you will need to get a free ticket. Visit https://fosdem.org/2024/schedule/track/junior and click on the workshop of your choice. On the workshop page, you舰

January 16, 2024

Maintaining a production dataset at a manageable size can present a considerable challenge during the administration of a MySQL InnoDB Cluster.

Old Days

Back in the day when we only had one main copy of our data (the source), and one read copy (the replica) that we used to look at current and old data from our main system, we used a special trick to remove data without affecting the replica. The trick was to turn off writes to the binary log for our removal commands in the main system. External tools like pt-archiver were also able to use that trick. To stop bypass writing into the binary log, we used the command: SET SQL_LOG_BIN=0.

This mean that on the main production server (replication source), we were purging the data without writing the delete operation into the binary logs:

Current Days

These days, nobody runs a single MySQL node in production. High Availability is highly recommended (required!).

So people are using MySQL InnoDB Cluster as main production solution. In such environment, having a reasonable size dataset is also recommended (CLONE, backups, …).

However the old solution to archive data is not working anymore as if we skip writing events in the binary logs, those events won’t be replicated across the other members of the cluster as Group Replication is also relying on those binary logs.

This means we will end up with an inconsistent cluster where all the nodes have a different dataset!

GTIDs can assist in the process; however, it is often challenging to recall the particular UUID component of the GTID that is utilized for archiving purposes. Moreover, it gets complicated to figure out where these transactions come from.

MySQL 8.3, which launched today, introduces even more efficient ways to streamline our purging and archiving workflows.

GTID TAGS

We MySQL 8.3, the GTID format has evolved and now includes the possibility to have some identifiable tags in it:

source_id:tag:transaction_id

Tags are of course not mandatory and GTIDs will work as before too.

So on the replica we want to be used as the archiving server, we will change the GTID set to include a large range of transaction coming from the cluster but with a tag we defined. In this example, I will use “archiving”.

On a member of the MySQL InnoDB Cluster, the following query was executed to retrieve the UUID segment of the GTID used by the Group:

InnoDB Cluster (any node) > show global variables like '%group_name';
+------------------------------+--------------------------------------+
| Variable_name                | Value                                |
+------------------------------+--------------------------------------+
| group_replication_group_name | 1e4516ec-b47c-11ee-bb6e-c8cb9e32df8e |
+------------------------------+--------------------------------------+

Now on the Replica used as archived server, I run the following statement:

Replica > set global gtid_purged=
             "+1e4516ec-b47c-11ee-bb6e-c8cb9e32df8e:archiving:1-9999";

On the Primary instance, we can now start the archiving process like this:

InnoDB Cluster Primary > set gtid_next="automatic:archiving";
InnoDB Cluster Primary > delete from t1 where date(timestamp)<"2023-02-01";
InnoDB Cluster Primary > set gtid_next="automatic"

It’s of course better to use a dedicated session to archive the data.

This is the illustration of the process:

It’s imperative to ensure to use a large range and closely monitor its usage on the cluster. In this example we could only have 9,999 purging transaction, upon reaching this threshold, next actions will be applied on the replica too.

Of course it was already possible to perform such operation with standard GTIDs but it was more complicated, or you needed to use specific UUID. With the new GTID Tag, we can directly and easily identify the purging transactions.

But the integration with external tools is also easier, for example this is an example on how to use GTID and TAGS with pt-archiver:

./pt-archiver \
--source h=production_cluster,P=3306,u=user,p=xxxx,D=mydata,t=t1 \
--purge --where 'date(timestamp)<"2023-04-01"' \
--set-vars 'GTID_NEXT="automatic:archiving"'

Conclusion

MySQL 8.3 marks a notable improvement for database administrators who manage High Availability environments using MySQL InnoDB Cluster. With the introduction of GTID tags, the archiving and purging process becomes substantially more manageable and less prone to human error. These tags allow for clear identification of transaction sources, thus avoiding the pitfalls of inconsistent datasets across cluster nodes.

Beyond simplifying the identification process, GTID tags facilitate the integration with external tools.

Enjoy archiving and purging your data !

January 12, 2024

The MySQL Belgian Days are sold-out, our event that will take place the 1st and 2nd February 2024 in Brussels seems to have already attracted a lot of interest.

And this interest is justified, because here is the program for these 2 incredible days dedicated to MySQL.

Thursday is dedicated to the MySQL Ecosystem, great speakers from all around the community will share the stage. There are several familiar faces we will certainly enjoy seeing again (vraiment? oui oui!)*.

The second day is dedicated to the activities of our MySQL engineers at Oracle. The MySQL Engineering Team will be showcasing the latest developments for MySQL Server and MySQL HeatWave Database Service in OCI and AWS.

During the intermissions, do not hesitate to engage in conversation with the speakers and take the opportunity to directly engage with the MySQL Engineers.

On Friday, we will conclude the day by extending a warm welcome to the new MySQL Rockstars. Be sure not to miss the award ceremony.

And finally, on Friday night, the Belgian Party Squad is organizing the traditional MySQL & Friends Community Dinner, don’t forget to register, tickets are going fast !

See you soon in Belgium !!

(*) private joke

January 08, 2024

At the beginning of every year, I publish a retrospective that looks back at the previous year at Acquia. I also discuss the changing dynamics in our industry, focusing on Content Management Systems (CMS) and Digital Experience Platforms (DXP).

If you'd like, you can read all of my retrospectives for the past 15 years: 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009.

Resilience and growth amid market turbulence

At the beginning of 2023, interest rates were 4.5%. Technology companies, investors, and PE firms were optimistic, anticipating modest growth. However, as inflation persisted and central banks raised rates more than expected, this optimism dwindled.

The first quarter also saw a regional bank crisis, notably the fall of Silicon Valley Bank, which many tech firms, including Acquia, relied on. Following these events, the market's pace slowed, and the early optimism shifted to a more cautious outlook.

Despite these challenges, Acquia thrived. We marked 16 years of revenue increase, achieved record renewal rates, and continued our five-year trend of rising profitability. 2023 was another standout year for Acquia.

One of our main objectives for 2023 was to expand our platform through M&A. However, tighter credit lending, valuation discrepancies, and economic uncertainty complicated these efforts. By the end of 2023, with the public markets rebounding, the M&A landscape showed slight improvement.

In November, we announced Acquia's plan to acquire Monsido, a platform for improving website accessibility, content quality, SEO, privacy, and performance. The acquisition closed last Friday. I'm excited about expanding the value we offer to our customers and look forward to welcoming Monsido's employees to Acquia.

Working towards a safer, responsible and inclusive digital future

Image with panels showcasing various digital trends: one emphasizes inclusivity with a woman in a wheelchair at a workstation, and another shows a man interacting with an AI robot, illustrating human-AI collaboration.

Looking ahead to 2024, I anticipate these to be the dominant trends in the CMS and DXP markets:

  • Converging technology ecosystems: MACH and Jamstack are evolving beyond their original approaches. As a result, we'll see their capabilities converge with one another, and with those of traditional CMSes. I wrote extensively about this in Jamstack and MACH's journey towards traditional CMS concepts.
  • Navigating the cookie-less future: Marketers will need to navigate a cookie-less future. This means organizations will depend more and more on data they collect from their own digital channels (websites, newsletters, video platforms, etc).
  • Digital decentralization: The deterioration of commercial social media platforms has been a positive development in my opinion. I anticipate users will continue to reduce their time on these commercial platforms. The steady shift towards open, decentralized alternatives like Mastodon, Nostr, and personal websites is a welcome trend.
  • Growth in digital accessibility: The importance of accessibility is growing and will become even more important in 2024 as organizations prepare for enforcement of the European Accessibility Act in 2025. This trend isn't just about responding to legislation; it's about making sure digital experiences are inclusive to everyone, including individuals with disabilities.
  • AI's impact on digital marketing and websites: As people start getting information directly from Artificial Intelligence (AI) tools, organic website traffic will decline. Just like with the cookie-less future, organizations will need to focus more on growing their own digital channels with exclusive and personalized content.
  • AI's impact on website building: We'll witness AI's evolution from assisting in content production to facilitating the building of applications. Instead of laboriously piecing together landing pages or campaigns with a hundred clicks, users will simply be able to guide the process with AI prompts. AI will evolve to become the new user interface for complex tasks.
  • Cybersecurity prioritization: As digital landscapes expand, so do vulnerabilities. People and organizations will become more protective of their personal and privacy data, and will demand greater control over the sharing and storage of their information. This means a growing focus on regulation, more strict compliance rules, automatic software updates, AI-driven monitoring and threat detection, passwordless authentication, and more.
  • Central content and data stores: Organizations are gravitating more and more towards all-in-one platforms that consolidate data and content. This centralization enables businesses to better understand and anticipate customer needs, and deliver better, personalized customer experiences.

While some of these trends suggest a decline in the importance of traditional websites, others trends point towards a positive future for websites. On one side, the rise of AI in information gathering will decrease the need for traditional websites. On the other side, the decline of commercial social media and the shift to a cookie-less future suggest that websites will continue to be important, perhaps even more so.

What I like most about many of these trends is that they are shaping a more intuitive, inclusive, and secure digital future. Their impact on end-users will be profound, making every interaction more personal, accessible, and secure.

However, I suspect the ways in which we do digital marketing will need to change quite a bit. Marketing teams will need to evolve how they generate leads. They'll have to use privacy-friendly methods to develop strong customer relationships and offer more value than what AI tools provide.

This means getting closer to customers with content that is personal and relevant. The use of intent data, first-party data and predictive marketing for determining the "next best actions" will continue to grow in importance.

It also means that more content may transition into secure areas such as newsletters, members-only websites, or websites that tailor content dynamically for each user, where it can't be mined by AI tools.

All this bodes well for CMSes, Customer Data Platforms (CDPs), personalization software, Account-Based Marketing (ABM), etc. By utilizing these platforms, marketing teams can better engage with individuals and offer more meaningful experiences. Acquia is well-positioned based on these trends.

Reaffirming our DXP strategy, with a focus on openness

On a personal front, my title expanded from CTO to CTO & Chief Strategy Officer. Since Acquia's inception, I've always played a key role in shaping both our technology and business strategies. This title change reflects my ongoing responsibilities.

Until 2018, Acquia mainly focused on CMS. In 2018, we made a strategic shift from being a leader in CMS to becoming a leader in DXP. We have greatly expanded our product portfolio since then. Today, Acquia's DXP includes CMS, Digital Asset Management (DAM), Customer Data Platform (CDP), Marketing Automation, Digital Experience Optimization, and more. We've been recognized as leaders in DXP by analyst firms including Gartner, GigaOm, Aragon Research and Omdia.

As we entered 2023, we felt we had successfully executed our big 2018 strategy shift. With my updated title, I spearheaded an effort to revise our corporate strategy to figure out what is next. The results were that we reaffirmed our commitment to our core DXP market with the goal of creating the best "Open DXP" in the market.

We see "Open" as a key differentiator. As part of our updated strategy, we explicitly defined what "Open" means to us. While this topic deserves a blog post on its own, I will touch on it here.

Being "Open" means we actively promote integrations with third-party vendors. When you purchase an Acquia product, you're not just buying a tool; you're also buying into a technology ecosystem.

However, our definition of "Open" extends far beyond mere integrations. It's also about creating an inclusive environment where everyone is empowered to participate and contribute to meaningful digital experiences in a safe and secure manner. Our updated strategy, while still focused on the DXP ecosystem, champions empowerment, inclusivity, accessibility, and safety.

A slide titled "The Open DXP" detailing its four core principles: integrations that support customization without vendor lock-in, empowerment for user-driven development, accessibility for all backgrounds and abilities, and a focus on safety with security and compliance. A slide from our strategy presentation, summarizing our definition of an Open DXP. The definition is the cornerstone of Acquia's "Why"-statement.

People who have followed me for a while know that I've long advocated for an Open Web, promoting inclusivity, accessibility, and safety. It's inspiring to see Acquia fully embrace these principles, a move I hope will inspire not just me, but our employees, customers, and partners too. It's not just a strategy; it's a reflection of our core values.

It probably doesn't come as a surprise that our updated strategy aligns with the trends I outlined above, many of which also point towards a safer, more responsible, and inclusive digital future. Our enthusiasm for the Monsido acquisition is also driven by these core principles.

Needless to say, our strategy update is about much more than a commitment to openness. Our commitment to openness drives a lot of our strategic decisions. Here are a few key examples to illustrate our direction.

  • Expanding into the mid-market: Acquia has primarily catered to the enterprise and upper mid-market sectors. We're driven by the belief that an open platform, dedicated to inclusivity, accessibility, and safety, enhances the web for everyone. Our commitment to contributing to a better web is motivating us to broaden our reach, making expanding into the mid-market a logical strategic move.
  • Expanding partnerships, empowering co-creation: Our partnership program is well-established with Drupal, and we're actively expanding partnerships for Digital Asset Management (DAM), CDP, and marketing automation. We aim to go beyond a standard partner program by engaging more deeply in co-creation with our partners, similar to what we do in the Open Source community. The goal is to foster an open ecosystem where everyone can contribute to developing customer solutions, embodying our commitment to empowerment and collaboration. We've already launched a marketplace in 2023, Acquia Exchange, featuring more than 100 co-created solutions, with the goal of expanding to 500 by the end of 2024.
  • Be an AI-fueled organization: In 2023, we launched numerous AI features and we anticipate introducing even more in 2024. Acquia already adheres to responsible AI principles. This aligns with our definition of "Open", emphasizing accountability and safety for the AI systems we develop. We want to continue to be a leader in this space.

Stronger together

We've always been very focused on our greatest asset: our people. This year, we welcomed exceptional talent across the organization, including two key additions to our Executive Leadership Team (ELT): Tarang Patel, leading Corporate Development, and Jennifer Griffin Smith, our Chief Market Officer. Their expertise has already made a significant impact.

Eight logos in a row with different '2023 Best Places to Work' awards. Eight awards for "Best Places to Work 2023" in various categories such as IT, mid-size workplaces, female-friendly environments, wellbeing, and appeal for millennials, across India, the UK, and the USA.

In 2023, we dedicated ourselves to redefining and enhancing the Acquia employee experience, committing daily to its principles through all our programs. This focus, along with our strong emphasis on diversity, equity, and inclusion (DEI), has cultivated a culture of exceptional productivity and collaboration. As a result, we've seen not just record-high employee retention rates but also remarkable employee satisfaction and engagement. Our efforts have earned us various prestigious "Best Place to Work" awards.

Customer-centric excellence, growth, and renewals

Our commitment to delivering great customer experiences is evident in the awards and recognition we received, many of which are influenced by customer feedback. These accolades include recognition on platforms like TrustRadius and G2, as well as the prestigious 2023 CODiE Award.

A list of 32 awards across various products for 2023, including 'Leader' and 'High Performer' from G2, 'Best Of,' 'Best Feature Set,' 'Best Value for Price,' and 'Best Relationship' from TrustRadius, and the '2023 SIIA CODiE Winner' recognition, all highlighting top customer ratings. Acquia received 32 awards for leadership in various categories across its products.

As mentioned earlier, we delivered consistently excellent, and historically high, renewal rates throughout 2023. It means our customers are voting with their feet (and wallets) to stay with Acquia.

Furthermore, we achieved remarkable growth within our customer base with record rates of expansion growth. Not only did customers choose to stay with Acquia, they chose to buy more from Acquia as well.

To top it all off, we experienced a substantial increase in the number of customers who were willing to serve as references for Acquia, endorsing our products and services to prospects.

Many of the notable customer stories for 2023 came from some of the world's most recognizable organizations, including:

  • Nestlé: With thousands of brand sites hosted on disparate technologies, Nestlé brand managers had difficulty maintaining, updating, and securing brand assets globally. Not only was it a challenge to standardize and govern the multiple brands, it was costly to maintain resources for each technology and inefficient with work being duplicated across sites. Today, Nestlé uses Drupal, Acquia Cloud Platform and Acquia Site Factory to face these challenges. Nearly all (90%) of Nestlé sites are built using Drupal. Across the brand's entire portfolio of sites, approximately 60% are built on a shared codebase – made possible by Acquia and Acquia Site Factory.
  • Novartis: The Novartis product sites in the U.S. were fragmented across multiple platforms, with different approaches and capabilities and varying levels of technical debt. This led to uncertainty in the level of effort and time to market for new properties. Today, the Novartis platform built with Acquia and EPAM has become a model within the larger Novartis organization for how a design system can seamlessly integrate with Drupal to build a decoupled front end. The new platform allows Novartis to create new or move existing websites in a standardized design framework, leading to more efficient development cycles and more functionality delivered in each sprint.
  • US Drug Enforcement Administration: The U.S. DEA wanted to create a campaign site to increase public awareness regarding the increasing danger of fake prescription pills laced with fentanyl. Developed with Tactis and Acquia, the campaign website One Pill Can Kill highlights the lethal nature of fentanyl. The campaign compares real and fake pills through videos featuring parents and teens who share their experiences with fentanyl. It also provides resources and support for teens, parents, and teachers and discusses the use of Naloxone in reversing the effects of drug overdose.
  • Cox Automotive: Cox Automotive uses first-party data through Acquia Campaign Studio for better targeted marketing. With their Automotive Marketing Platform (AMP) powered by Acquia, they access real-time data and insights, delivering personalized messages at the right time. The results? Dealers using AMP see consumers nine times more likely to purchase within 45 days and a 14-fold increase in sales gross profit ROI.

I'm proud of outcomes like this: it show how valuable our DXP is to our customers.

Product innovation

In 2023, we remained focused on solving problems for our current and future customers. We use both quantitative and qualitative data to assess areas of opportunities and run hypothesis-driven experiments with design prototypes, hackathons, and proofs-of-concept. This approach has led to hundreds of improvements across our products, both by our development teams and through partnerships. Below are some key innovations that have transformed the way our customers operate:

  • We released many AI features in 2023, including AI assistants and automations for Acquia DAM, Acquia CDP, Acquia Campaign Studio, and Drupal. This includes: AI assist during asset creation in Drupal and Campaign Studio, AI-generated descriptions for assets and products in DAM, auto-tagging in DAM with computer vision, next best action/channel predictions in CDP, ML-powered customer segmentation in CDP, and much more.
  • Our Drupal Acceleration Team (DAT) worked with the Drupal community on major upgrade of the Drupal field UI, which makes it significantly faster and more user-friendly to perform content modeling. We also open sourced Acquia Migrate Accelerate as part of the run-up to the Drupal 7 community end-of-life in January 2025. Finally, DAT contributed to a number of major ongoing initiatives including Project Browser, Automatic Updates, Page Building, Recipes, and more that will be seen in later versions of Drupal.
  • We launched a new trial experience for Acquia Cloud Platform, our Drupal platform. Organizations can now explore Acquia's hosting and developer tools to see how their Drupal applications perform on our platform.
  • Our Kuberbetes-native Drupal hosting platform backed by AWS, Acquia Cloud Next, continued to roll out to more customers. Over two-thirds of our customers are now enjoying Acquia Cloud Next, which provides them the highest levels of performance, self-healing, and dynamic scaling. We've seen a 50% decrease in critical support tickets since transitioning customers to Acquia Cloud Next, all while maintaining an impressive uptime record of 99.99% or higher.
  • Our open source marketing automation tool, Acquia Campaign Studio, is now running on Acquia Cloud Next as its core processing platform. This consolidation benefits everyone: it streamlines and accelerates innovation for us while enabling our customers to deliver targeted and personalized messages at a massive scale.
  • We decided to make Mautic a completely independent Open Source project, letting it grow and change freely. We've remained the top contributor ever since.
  • Marketers can now easily shape the Acquia CDP data model using low-code tools, custom attributes and custom calculations features. This empowers all Acquia CDP users, regardless of technical skill, to explore new use cases.
  • Acquia CDP's updated architecture enables nearly limitless elasticity, which allows the platform to scale automatically based on demand. We put this to the test during Black Friday, when our CDP efficiently handled billions of events. Our new architecture has led to faster, more consistent processing times, with speeds improving by over 70%.
  • With Snowflake as Acquia's data backbone, Acquia customers can now collaborate on their data within their organization and across business units. Customers can securely share and access governed data while preserving privacy, offering them advanced data strategies and solutions.
  • Our DAM innovation featured 47 updates and 13 new integrations. These updates included improved Product Information Management (PIM) functionality, increased accessibility, and a revamped search experience. Leveraging AI, we automated the generation of alt-text and product descriptions, which streamlines content management. Additionally, we established three partnerships to enhance content creation, selection, and distribution in DAM: Moovly for AI-driven video creation and translation, Vizit for optimizing content based on audience insights, and Syndic8 for distributing visual and product content across online commerce platforms.
  • With the acquisition of Monsido and new partnerships with VWO (tools for optimizing website engagement and conversions) and Conductor (SEO platform), Acquia DXP now offers an unparalleled suite of tools for experience optimization. Acquia already provided the best tools to build, manage and operate websites. With these additions, Acquia DXP also offers the best solution for experience optimization.
  • Acquia also launched Acquia TV, a one-stop destination for all things digital experience. It features video podcasts, event highlights, product breakdowns, and other content from a diverse collection of industry voices. This is a great example of how we use our own technology to connect more powerfully with our audiences. It's something our customers strive to do everyday.

Conclusion

In spite of the economic uncertainties of 2023, Acquia had a remarkable year. We achieved our objectives, overcame challenges, and delivered outstanding results. I'm grateful to be in the position that we are in.

Our achievements in 2023 underscore the importance of putting our customers first and nurturing exceptional teams. Alongside effective management and financial responsibility, these elements fuel ongoing growth, irrespective of economic conditions.

Of course, none of our results would be possible without the support of our customers, our partners, and the Drupal and Mautic communities. Last but not least, I'm grateful for the dedication and hard work of all Acquians who made 2023 another exceptional year.

January 03, 2024

A skilled craftsperson works on the intricate mechanism of a droplet-shaped vintage watch.

Since 1999, I've been consistently working on this website, making it one of my longest-standing projects. Even after all these years, the satisfaction of working on my website remains strong. Remarkable, indeed.

During rare moments of calm — be it a slow holiday afternoon, a long flight home, or the early morning stillness — I'm often drawn to tinkering with my website.

When working on my website, I often make small tweaks and improvements. Much like a watchmaker meticulously fine-tuning the gears of an antique clock, I pay close attention to details.

This holiday, I improved the lazy loading of images in my blog posts, leading to a perfect Lighthouse score. A perfect score isn't necessary, but it shows the effort and care I put into my website.

A screenshot of dri.es' Lighthouse scores showing 100% scores in performance, accessibility, best practices, and SEO. Screenshot of Lighthouse scores via https://pagespeed.web.dev/.

I also validated my RSS feeds, uncovering a few opportunities for improvement. Like a good Belgian school boy, I promptly implemented these improvements, added new PHPUnit tests and integrated these into my CI/CD pipeline. Some might consider this overkill for a personal site, but for me, it's about mastering the craft, adhering to high standards, and building something that is durable.

Last year, I added 135 new photos to my website, a way for me to document my adventures and family moments. As the year drew to a close, I made sure all new photos have descriptive alt-texts, ensuring they're accessible to all. Writing alt-texts can be tedious, yet it's these small but important details that give me satisfaction.

Just like the watchmaker working on an antique watch, it's not just about keeping time better; it's about cherishing the process and craft. There is something uniquely calming about slowly iterating on the details of a website. I call it the The Watchmaker's Approach to Web Development, where the process holds as much value as the result.

I'm thankful for my website as it provides me a space where I can create, share, and unwind. Why share all this? Perhaps to encourage more people to dive into the world of website creation and maintenance.

January 02, 2024

December 28, 2023

Drupal's database abstraction layer is great for at least two reasons. First, it ensures compatibility with various database systems like MySQL, MariaDB, PostgreSQL and SQLite. Second, it improves security by preventing SQL injection attacks.

My main concern with using any database abstraction layer is the seemingly reduced control over the final SQL query. I like to see and understand the final query.

To inspect and debug database queries, I like to use MariaDB's General Query Log. When enabled, this feature captures every query directly in a text file.

I like this approach for a few reasons:

  • It directly captures the query in MariaDB instead of Drupal. This provides the most accurate view of the final SQL queries.
  • It enables real-time monitoring of all queries in a separate terminal window, removing the need to insert debug statements into code.
  • It records both successful and unsuccessful queries, making it a handy tool for debugging purposes.

If you're interested in trying it out, here is how I set up and use MariaDB's General Query Log.

First, log in as the MySQL super user. Since I'm using DDEV for my development, the command to do that looks like this:

$ ddev mysql -u root -proot

After logging in as the MySQL super user, you need to set a few global variables:

SET global general_log = 1;

SET global log_output = 'file';

SET global general_log_file = '/home/dries/queries.txt';

This will start logging all queries to a file named 'queries.txt' in my home directory.

DDEV employs containers, isolated environments designed for operating specific software such as databases. To view the queries, you must log into the database container.

$ ddev ssh -s db

Now, you can easily monitor the queries in real-time from a separate terminal window:

$ tail -f /home/dries/queries.txt

This is a simple yet effective way to monitor all database queries that Drupal generates.

December 27, 2023

Ik schrijf nu zo’n zeven jaar software voor high-end vijf-as CNC machines bij Heidenhain.

Wij zijn bezig aan de TNC 7 software. Dit is de opvolger van de TNC 640 software. Ik ben samen met een team ontwikkelaars die in Traunreut zitten de ontwikkelaar van voornamelijk de NC editor en alles wat daarbij hoort (alle integraties met de omliggende onderdelen van al wat er ook bij komt kijken – en dat is veel).

Als ik naar de TNC 640 kijk is dit software die veertig jaar zal meegaan.

Als ik naar onze eigen Belgische ruimtevaartindustrie maar ook bv. wapenindustrie kijk, maar ook echt om het even wat, is dat software die ongeveer overal gebruikt wordt. Voor ongeveer alles. De Jobs-pagina van FN Herstal bijvoorbeeld toont al een paar jaar mensen die effectief een CNC machine met een Heidenhain TNC 640 bedienen. Binnenkort zal daar dus onze nieuwe TNC 7 gebruikt worden! (om dan weet ik veel, bijvoorbeeld onze soldaten hun machinegeweren mee te maken).

Enfin. Genoeg gestoef daarover!

Ik ben (dus) de laatste tijd geïnteresseerd geraakt in het wereldje van metaalbewerking. Toen ik een jonge gast van zo’n 15 – 16 was, was mijn grootvader een draaibank metaalbewerker (bij AGFA-gevaert). Die man was altijd trots om mij voortdurend uitleg te geven daarover.

Ik heb ook op school tijdens mijn jaren electromechanica een paar toetsen mogen of moeten doen met een eenvoudige manuele draaibank.

Dat was eigenlijk wel interessant. Maar computers, die waren toen ook heel erg interessant!

Vandaag komt het voor mij samen.

Maar ik moet zeggen. Nee echt. De echte ‘hackers‘ (het aanpassen van een systeem opdat het meer doet dan dat waar het voor ontworpen is) zitten nog veel meer in de machining wereld dan in de of onze software wereld. Hoewel wij er ook wel een stel hebben rondlopen.

Jullie (of wij) andere sofware ontwikkelaars hebben er vaak geen idee van hoe enorm uitgebreid die andere wereld is. Zowel in hout als in metaal. Het loopt daar de spuigaten uit van de hackers.

Ik wil maar zeggen, vooral aan de jonge kuikens: leg uw interesses breed. Ga heel erg ver. Het is allemaal zo ontzettend interessant.

En vooral: het zit in de combinatie van software én iets anders. Dat kan security zijn. Maar dus ook machining. Of medisch (imaging, enzo)? Een combinatie van alles. Ik sprak met mensen die tools maken voor CNC machines die zaken voor chirurgen produceren.

Enkel Python code kloppen is niet de essentie van onze passie. Je schrijft die Python code opdat het samenwerkt met of voor iets anders. Dat iets anders moet je dus ook kennen of willen kennen.

An African-American scientist analyzes converting technologies in a futuristic, cryptopunk setting.

In recent years, new architectures like MACH and Jamstack have emerged in the world of Content Management Systems (CMS) and Digital Experience Platforms (DXP).

In some way, they have challenged traditional models. As a result, people sometimes ask for my take on MACH and Jamstack. I've mostly refrained from sharing my perspective to avoid controversy.

However, recognizing the value of diverse viewpoints, I've decided to share some of my thoughts. I hope it contributes positively to the ongoing evolution of these technologies.

The Jamstack is embracing its inner CMS

Jamstack, born in 2015, began as an approach for building static sites. The term Jamstack stands for JavaScript, APIs and Markup. Jamstack is an architectural approach that decouples the web experience layer from content and business logic. The web experience layer is often pre-rendered as static pages (markup), served via a Content Delivery Network (CDN).

By 2023, the Jamstack community underwent a considerable transformation. It evolved significantly from its roots, increasingly integrating traditional CMS features, such as "content previews". This shift is driven by the need to handle more complex websites and to make the Jamstack more user-friendly for marketers, moving beyond its developer-centric origins.

Netlify, a leader in the Jamstack community, has acknowledged this shift. As part of that, they are publicly moving away from the term "Jamstack" towards being a "composable web platform".

Many traditional CMSes, including Drupal, have long embraced the concept of "composability". For example, Drupal has been recognized as a composable CMS for two decades and offers advanced composable capabilities.

This expansion reflects a natural progression in technology, where software tends to grow more complex and multifaceted over time, often referred to as the law of increasing complexity. What starts simple becomes more complex over time.

I believe the Jamstack will continue adding traditional CMS features such as menu management, translation workflows, and marketing technology integrations until they more closely resemble traditional CMSes.

Wake-up call: CMSes are hybrid now

The Jamstack is starting to look more like traditional CMSes, not only in features, but also in architecture. Traditional CMSes can work in two main ways: "coupled" or "integrated", where they separate the presentation layer from the business logic using an approach called Model-View-Controller (MVC), or "decoupled", where the front-end and back-end are split using web service APIs, like the Jamstack does.

Modern CMSes integrate well with various JavaScript frameworks, have GraphQL backends, can render static pages, and much more.

This combination of both methods is known as "hybrid". Most traditional CMSes have adopted this hybrid approach. For example, Drupal has championed a headless approach for over a decade, predating both the terms "Jamstack" and "Headless".

Asserting that traditional CMSes are "monolithic" and "outdated" is a narrow-minded view held by people who have overlooked their evolution. In reality, today's choice is between a purely Headless or a Hybrid CMS.

Redefining "Jamstack" beyond overstated claims

As the Jamstack becomes more versatile, the original "Jamstack" name and definition feel restrictive. The essence of Jamstack, once centered on creating faster, secure websites with a straightforward deployment process, is changing.

It's time to move beyond some of the original value proposition, not just because of its evolution, but also because many of the Jamstack's advantages have been overstated for years:

In short, Jamstack, initially known for its static site generation and simplicity, is growing into something more dynamic and complex. This is a positive evolution driven by the market's need. This evolution narrows the gap between Jamstack and traditional CMSes, though they still differ somewhat – Jamstack offers a pure headless approach, while traditional CMSes offer both headless and integrated options.

Last but not least, this shift is making Jamstack more similar to MACH, leading to my opinion on the latter.

The key difference between MACH and traditional CMSes

Many readers, especially people in the Drupal community, might be unaware of MACH. MACH is an acronym for Microservices, API-first, Cloud-native SaaS, Headless. It specifies an architectural approach where each business function is a distinct cloud service, often created and maintained by separate vendors and integrated by the end user.

Imagine creating an online store with MACH certified solutions: you'd use different cloud services for each aspect – one for handling payments, another for your product catalog, and so on. A key architectural concept of MACH is that each of these services can be developed, deployed, and managed independently.

At first glance, that doesn't look too different from Drupal or WordPress. Platforms like WordPress and Drupal are already integration-rich, with Drupal surpassing 50,000 different modules in 2023. In fact, some of these modules integrate Drupal with MACH services. For example, when building an online store in Drupal or WordPress, you'd install modules that connect with a payment service, SaaS-based product catalog, and so on.

At a glance, this modular approach seems similar to MACH. Yet, there is a distinct contrast: Drupal and WordPress extend their capabilities by adding modules to a "core platform", while MACH is a collection of independent services, operating without relying on an underlying core platform.

MACH could benefit from "monolithic" features

Many in the MACH community applaud the lack of a core platform, often criticizing it as "monolithic". Calling a CMS like Drupal "monolithic" is, in my opinion, a misleading and simplistic marketing strategy. From a technical perspective, Drupal's core platform is exceptionally modular, allowing all components to be swapped.

Rather than (mis)labeling traditional CMSes as "monolithic", a more accurate way to explain the difference between MACH and traditional CMSes is to focus on the presence or absence of a "core platform".

More importantly, what is often overlooked is the vital role a core platform plays in maintaining a consistent user experience, centralized account management, handling integration compatibility and upgrades, streamlining translation and content workflows across integrations, and more. The core platform essentially offers "shared services" aimed to help improve the end user and developer experience. Without a core platform, there can be increased development and maintenance costs, a fragmented user experience, and significant challenges in composability.

This aligns with a phenomenon that I call "MACH fatigue". Increasingly, organizations come to terms with the expenses of developing and maintaining a MACH-based system. Moreover, end-users often face a fragmented user experience. Instead of a seamlessly integrated interface, they often have to navigate multiple disconnected systems to complete their tasks.

To me, it seems that an ideal solution might be described as "loosely-coupled architectures with a highly integrated user experience". Such architectures allow the flexibility to mix and match the best systems (like your preferred CMS, CRM, and commerce platform). Meanwhile, a highly integrated user experience ensures that these combinations are seamless, not just for the end users but also for content creators and experience builders.

At present, traditional platforms like Drupal are closer to this ideal compared to MACH solutions. I anticipate that in the coming years, the MACH community will focus on improving the cost-effectiveness of MACH platforms. This will involve creating tools for managing dependencies and ensuring version compatibility, similar to the practices embraced by Drupal with Composer.

Efforts are already underway to streamline the MACH user experience with "Digital Experience Composition (DXC) tools". DXC acts as a layer over a MACH architecture, offering a user interface that breaks down various MACH elements into modular "Lego blocks". This allows both developers and marketers to assemble these blocks into a digital experience. Users familiar with traditional CMSes might find this a familiar concept, as many CMS platforms include DXC elements as integral features within their core platform.

Traditional CMSes like Drupal can also take cues from Jamstack and MACH, particularly in their approaches to web services. For instance, while Drupal effectively separates business logic from the presentation layer using the MVC pattern, it primarily relies on internal APIs. A shift towards more public web service APIs could enhance Drupal's flexibility and innovation potential.

Conclusions

In recent years, we've witnessed a variety of technical approaches in the CMS/DXP landscape, with MACH, Jamstack, decoupled, and headless architectures each carving out their paths.

Initially, these paths appeared to diverge. However, we're now seeing a trend of convergence, where different systems are learning from each other and integrating their unique strengths.

The Jamstack is evolving beyond its original focus on static site generation into a more dynamic and complex approach, narrowing the gap with traditional CMSes.

Similarly, to bridge the divide with traditional CMSes, MACH may need to broaden its scope to encompass shared services commonly found in the core platform of traditional CMSes. That would help with developer cost, composability and user-friendliness.

In the end, the success of any platform is judged by how effectively it delivers a good user experience and cost efficiency, regardless of its architecture. The focus needs to move away from architectural considerations to how these technologies can create more intuitive, powerful platforms for end users.

December 23, 2023

Het ontbreekt ons in het Westen aan analysten die kunnen bewegen in hun mening.

Een groot probleem is dat om het even wie die iets zegt, onmiddellijk vastegepint is.

Zo’n persoon kan nooit nog een andere richting uitgaan.

We zijn nu aangekomen in een militair conflict met Rusland. Onze analysten moeten terug kunnen bewegen zoals de wind waait. Want die waait heel de tijd.

Nieuwe inzichten zijn eigenlijk belangrijker dan al onze oude onzin.

We moeten niet zomaar mensen cancelen. We moeten terug toelaten dat analysten fouten maakten, fouten maken en zich dus (daarom) willen aanpassen.

Ik vraag ook ondubbelzinnig dat Duitsland opstaat, en haar leger volledig brengt waar het hoort: de grootste macht van Europa. De Wereld Oorlogen zijn voorbij en wat gebeurd is is gebeurd. We hebben Duitsland nodig om ons te beschermen. We moeten daar aan bijdragen.

Zeit bleibt nicht stehen.

December 22, 2023

De huidige tijd wordt denk ik het best omschreven door Rammstein’s Zeit.

De oorlog tussen Rusland en Oekraïne is voor het Westen eigenlijk volledig verloren. Rusland zal bijna zeker een groot deel van Oekraïne innemen.

Massa’s mannen in Oekraïne sterven momenteel. Ongeveer zo’n 20 000 mensen per maand. Dat waren er zo’n 650 vandaag.

Dit komt overeen met wat DPA, Weeb Union, History Legends en Military Summary zeggen.

Ik ben er al langer uit dat de Westerse propaganda (ja sorry jongens, het is nu echt wel propaganda hoor) kei hard liegt. En dat de bovenstaanden een veel beter en klaar beeld geven van de werkelijkheid.

Die werkelijkheid is bitterhard. Namelijk dat Rusland dit aan het winnen is. Geen beetje. Maar ernstig en echt.

Extreem veel mensen uit Oekraïne zijn aan het sterven. Iedere dag.

Er zijn ook geen gewonnen gebieden. Men kan ze nauwelijks vinden op een map als ze er zijn. En als ze al niet terug ingenomen zijn door Rusland.

Zeit, bitte bleib stehen, bleib stehen. Zeit, das soll immer so weitergehen.

Rammstein, song Zeit

De werkelijkheid is dat Rusland vanaf nu gebieden zal gaan innemen. De tijd zal dus niet blijven staan. Laat staan stilstaan.

Dat is hard en dit zijn harde woorden. Het zijn harde tijden.

Het Westen zal dit verliezen.

Wat U daar ook van vindt.

I could have set MySQL between parenthesis in the title as this article is more about how to use OpenTofu to deploy on OCI.

I will explain how to install OpenTofu and how to use it to deploy on OCI. I will also mention what are the required changes be able to use my previous Terraform deployment files.

As an example, let’s use the modules to deploy WordPress with MySQL HeatWave Database Service: oci-wordpress-mds.

Installing OpenTofu

If like me you are using a RPM based Linux distro, you can find the necessary information to create the yum repository on OpenTofu’s website:

$ sudo su -
# cat >/etc/yum.repos.d/opentofu.repo <<EOF
[opentofu]
name=opentofu
baseurl=https://packages.opentofu.org/opentofu/tofu/rpm_any/rpm_any/\$basearch
repo_gpgcheck=0
gpgcheck=1
enabled=1
gpgkey=https://get.opentofu.org/opentofu.gpg
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300

[opentofu-source]
name=opentofu-source
baseurl=https://packages.opentofu.org/opentofu/tofu/rpm_any/rpm_any/SRPMS
repo_gpgcheck=0
gpgcheck=1
enabled=1
gpgkey=https://get.opentofu.org/opentofu.gpg
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
EOF

When the repo is created, you just need to use yum (or dnf) like this:

$ sudo dnf install -y tofu

You can verify that OpenTofu is installed correctly by running the following command that returns the installed version:

$ tofu version
OpenTofu v1.6.0-rc1
on linux_amd64

Terraform code

To test, we download the code from GitHub (v1.9.2):

$ git clone https://github.com/lefred/oci-wordpress-mds.git
cd oci-wordpress-mds

We need to first copy the file terraform.tfvars.template to terraform.tvars and edit the content with our OCI information (tenancy, ocids, keys, …).

When ready, we can start with the initialization of the environment:

$ tofu init

Initializing the backend...
Initializing modules...
- mds-instance in modules/mds-instance
- wordpress in modules/wordpress

Initializing provider plugins...
- Finding latest version of hashicorp/template...
- Finding latest version of hashicorp/oci...
- Finding latest version of hashicorp/tls...
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed, key ID 0C0AF313E5FD9F80)
- Installing hashicorp/tls v4.0.5...
- Installed hashicorp/tls v4.0.5 (signed, key ID 0C0AF313E5FD9F80)

Providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://opentofu.org/docs/cli/plugins/signing/
╷
│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider hashicorp/oci: provider registry
│ registry.opentofu.org does not have a provider named registry.opentofu.org/hashicorp/oci
│ 
│ All modules should specify their required_providers so that external consumers will get the correct
│ providers when using a module. To see which modules are currently depending on hashicorp/oci, run the
│ following command:
│     tofu providers
│ 
│ If you believe this provider is missing from the registry, please submit a issue on the OpenTofu
│ Registry https://github.com/opentofu/registry/issues/
╵

With Terraform, the same code will work but will return the following warning:

│ Warning: Additional provider information from registry
│ 
│ The remote registry returned warnings for registry.terraform.io/hashicorp/oci:
│ - For users on Terraform 0.13 or greater, this provider has moved to oracle/oci. Please update your
│ source in required_providers.
╵

Provider

Step 1 is to fix the provider and use Oracle’s OCI. We edit provider.tf and we add the following lines:

terraform {
  required_providers {
    oci = {
      source  = "oracle/oci"
    }
  }
}

We can run again the init command:

$ tofu init

Initializing the backend...
Initializing modules...

Initializing provider plugins...
- Finding latest version of hashicorp/template...
- Finding latest version of hashicorp/oci...
- Finding latest version of oracle/oci...
- Finding latest version of hashicorp/tls...
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed, key ID 0C0AF313E5FD9F80)
- Installing oracle/oci v5.23.0...
- Installed oracle/oci v5.23.0 (signed, key ID 1533A49284137CEB)
- Installing hashicorp/tls v4.0.5...
- Installed hashicorp/tls v4.0.5 (signed, key ID 0C0AF313E5FD9F80)

Providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://opentofu.org/docs/cli/plugins/signing/
╷
│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider hashicorp/oci: provider registry
│ registry.opentofu.org does not have a provider named registry.opentofu.org/hashicorp/oci
│ 
│ Did you intend to use oracle/oci? If so, you must specify that source address in each module which
│ requires that provider. To see which modules are currently depending on hashicorp/oci, run the
│ following command:
│     tofu providers
│ 
│ If you believe this provider is missing from the registry, please submit a issue on the OpenTofu
│ Registry https://github.com/opentofu/registry/issues/
╵

We can see that the oracle/oci v5.23.0 provider plugin was installed, but it still fails. Let’s run the recommended tofu providers command:

$ tofu providers

Providers required by configuration:
.
├── provider[registry.opentofu.org/hashicorp/tls]
├── provider[registry.opentofu.org/hashicorp/template]
├── provider[registry.opentofu.org/oracle/oci]
├── module.wordpress
│   ├── provider[registry.opentofu.org/hashicorp/oci]
│   └── provider[registry.opentofu.org/hashicorp/template]
└── module.mds-instance
    └── provider[registry.opentofu.org/hashicorp/oci]

We can observe that the oracle/oci provider plugin is indeed used for the root (.) but we can also see that for the 2 modules (wordpress and mds-instance), hashicorp/oci is still used (and not found).

Let’s add a provider.tf file containing the following lines in both modules:

terraform {
  required_providers {
    oci = {
      source  = "oracle/oci"
    }
  }
}

If we check again the providers, we can see now that they all use the correct one:

$ tofu providers

Providers required by configuration:
.
├── provider[registry.opentofu.org/oracle/oci]
├── provider[registry.opentofu.org/hashicorp/tls]
├── provider[registry.opentofu.org/hashicorp/template]
├── module.mds-instance
│   └── provider[registry.opentofu.org/oracle/oci]
└── module.wordpress
    ├── provider[registry.opentofu.org/oracle/oci]
    └── provider[registry.opentofu.org/hashicorp/template]

We can run init again:

$ tofu init

Initializing the backend...
Initializing modules...

Initializing provider plugins...
- Finding latest version of oracle/oci...
- Finding latest version of hashicorp/tls...
- Finding latest version of hashicorp/template...
- Installing oracle/oci v5.23.0...
- Installed oracle/oci v5.23.0 (signed, key ID 1533A49284137CEB)
- Installing hashicorp/tls v4.0.5...
- Installed hashicorp/tls v4.0.5 (signed, key ID 0C0AF313E5FD9F80)
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed, key ID 0C0AF313E5FD9F80)

Providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://opentofu.org/docs/cli/plugins/signing/

OpenTofu has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that OpenTofu can guarantee to make the same selections by default when
you run "tofu init" in the future.

OpenTofu has been successfully initialized!

You may now begin working with OpenTofu. Try running "tofu plan" to see
any changes that are required for your infrastructure. All OpenTofu commands
should now work.

If you ever set or change modules or backend configuration for OpenTofu,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Perfect ! Note that with Terraform the warnings will also be gone.

OpenTofu Plan & Apply

We can now plan and if no error, we can easily apply:

$ tofu plan
[...]
Plan: 14 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + mds_instance_ip       = (known after apply)
  + ssh_private_key       = "/home/fred/.ssh/id_rsa_oci"
  + wordpress_db_password = "MyWPpassw0rd!"
  + wordpress_db_user     = "wp"
  + wordpress_public_ip   = (known after apply)
  + wordpress_schema      = "wordpress"

$ tofu apply

And it’s deployed !

Conclusion

OpenTofu works as expected but it requires the code and module to be following the latest Terraform specifications.

Of course, in the future, the compatibility might change, but at present, deploying on OCI using OpenTofu instead of Terraform works perfectly.

Enjoy deploying in OCI !

December 19, 2023

We recently saw that we can benefit from using JSON documents with MySQL HeatWave cluster in OCI (HeatWave accelerator for MySQL Database Service).

However sometimes the data can’t be loaded to HeatWave Cluster or the query cannot use it.

And this is not always easy to understand why. Let’s get familiar on how to find the info.

Data not loaded in HW Cluster

Let’s see an example with this table (collection):

SQL > show create table listingsAndReviews\G
*************************** 1. row ***************************
       Table: listingsAndReviews
Create Table: CREATE TABLE `listingsAndReviews` (
  `doc` json DEFAULT NULL,
  `_id` char(28) GENERATED ALWAYS AS
 (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
  `_json_schema` json GENERATED ALWAYS AS 
  (_utf8mb4'{"type":"object"}') VIRTUAL,
  PRIMARY KEY (`_id`),
  CONSTRAINT `$val_strict_B70EB65BDDBAB20B0EE5BB7532C9060C422A06F8` 
  CHECK (json_schema_valid(`_json_schema`,`doc`)) /*!80016 NOT ENFORCED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
 SECONDARY_ENGINE=rapid

Now when we try to load the data int HeatWave Cluster we get the following output:

SQL > alter table listingsAndReviews secondary_load;
Query OK, 0 rows affected, 3 warnings (19.1727 sec)
Warning (code 3877): Guided Load executed 
'ALTER TABLE `docstore`.`listingsAndReviews` MODIFY `_json_schema` json 
GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL 
COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load executed 'ALTER TABLE
`docstore`.`listingsAndReviews` MODIFY `doc` json 
DEFAULT NULL COMMENT '' NOT SECONDARY'.
Warning (code 3877): Guided Load requested 11 InnoDB parallel read 
threads to load docstore.listingsAndReviews (session variable value: 32).

We can verify which columns are loaded in HeatWave cluster from that table:

SQL> select name, column_name, data_placement_type, nrows 
   from rpd_column_id ci 
   join rpd_tables t on t.id=ci.table_id 
     join rpd_table_id ri on ri.id=t.id 
     where table_name='listingsAndReviews';
+-----------------------------+-------------+---------------------+-------+
| name                        | column_name | data_placement_type | nrows |
+-----------------------------+-------------+---------------------+-------+
| docstore.listingsAndReviews | _id         | PrimaryKey          |  5555 |
+-----------------------------+-------------+---------------------+-------+
1 row in set (0.0010 sec)

We can see that the column doc, the JSON one, is not loaded. But why ?

If we use in OCI Console the HeatWave Cluster node estimation tool, we also get a warning notifying us that only one column will be loaded:

But once again, we don’t know the reason and as we are using MySQL 8.2.0-u1, JSON is supported, so the column doc should also be loaded into the secondary engine.

When we try with the auto pilot load (call sys.heatwave_load(JSON_ARRAY('docstore'), NULL)) we don’t get much more details.

But in the error_log (available in performance_schema) we have the info we are looking for:

{"warn": "Column size is more than 65532 bytes", "table_name": "listingsAndReviews", "column_name": "doc", "schema_name": "docstore"}

We can see why the JSON column was not loaded, it exceeds the maximum column size allowed of 65532 bytes.

The information is also available in sys.heatwave_autopilot_report after using the auto pilot load procedure (call sys.heatwave_load()).

Query not off-loaded to secondary engine

Now let’s see if a query is off-loaded to HeatWave cluster (secondary engine). So first, we check how much queries have been offloaded:

SQL >  show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59    |
+---------------------------+-------+

And then we run a query:

SQL > with cte1 as (
       select doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine, 
       (
        select avg(score) 
         from json_table (
            doc, "$.grades[*]" columns (score int path "$.score")) as r
         ) as  avg_score 
        from restaurants) select *, rank() 
        over (  
          partition by cuisine order by avg_score desc) as `rank`  
     from cte1 order by `rank`, avg_score desc limit 5;
+-----------------------+--------------------------------+-----------+------+
| name                  | cuisine                        | avg_score | rank |
+-----------------------+--------------------------------+-----------+------+
| Juice It Health Bar   | Juice, Smoothies, Fruit Salads |   75.0000 |    1 |
| Golden Dragon Cuisine | Chinese                        |   73.0000 |    1 |
| Palombo Pastry Shop   | Bakery                         |   69.0000 |    1 |
| Go Go Curry           | Japanese                       |   65.0000 |    1 |
| Koyla                 | Middle Eastern                 |   61.0000 |    1 |
+-----------------------+--------------------------------+-----------+------+
5 rows in set (31.4319 sec)

31 .4 seconds, this doesn’t seem to be using the HeatWave cluster.

Let’s check again the amount of query offloaded:

SQL > show status like 'rapid%query%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| rapid_query_offload_count | 59    |
+---------------------------+-------+

Same amount….

We can verify the Query Execution Plan using EXPLAIN:

EXPLAIN: -> Limit: 5 row(s)
    -> Sort row IDs: rank, cte1.avg_score DESC
        -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
            -> Temporary table  (cost=0..0 rows=0)
                -> Window aggregate: rank() OVER (PARTITION BY cte1.cuisine ORDER BY cte1.avg_score desc ) 
                    -> Sort row IDs: cte1.cuisine, cte1.avg_score DESC  (cost=5.08e+6..5.08e+6 rows=2.1e+6)
                        -> Table scan on cte1  (cost=438291..464507 rows=2.1e+6)
                            -> Materialize CTE cte1  (cost=438291..438291 rows=2.1e+6)
                                -> Table scan on restaurants  (cost=228577 rows=2.1e+6)
                                -> Select #3 (subquery in projection; dependent)
                                    -> Aggregate: avg(r.score)  (cost=0 rows=1)
                                        -> Materialize table function

There is indeed not sign of using the secondary engine.

Usually, when using HeatWave cluster, we can enable the optimizer trace:

SQL > set  optimizer_trace="enabled=on";
SQL > explain format=tree with cte1 as .... <THE QUERY> ... desc limit 5\G
[the output of the QEP]

And then we verify the reason in information_schema.optimizer_trace:

SQL > select query, trace->'$**.Rapid_Offload_Fails' 'Offload Failed',
             trace->'$**.secondary_engine_not_used' 'HeatWave Not Used'
      from information_schema.optimizer_trace\G
*************************** 1. row ***************************
            query: explain format=tree with cte1 as (select doc->>"$.name" as name,
  doc->>"$.cuisine" as cuisine, (select avg(score) from json_table (doc,
 "$.grades[*]" columns (score int path "$.score")) as r) as  avg_score from
 restaurants) select *, rank() over (  partition by cuisine order by avg_score
 desc) as `rank`  from cte1 order by `rank`, avg_score desc limit 5
   Offload Failed: NULL
HeatWave Not Used: NULL

Most of the time, when a query is not offloaded to the secondary engine, we have the reason… but not this time, the value is NULL !

This could be related to the JSON datatype which was very recently supported.

Let’s have a look in the error log then:

SQL> select * from (select * from performance_schema.error_log 
     where subsystem='RAPID' order by 1 desc limit 3) a order by 1\G
*************************** 1. row ***************************
    LOGGED: 2023-12-15 17:41:58.876588
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Attempting offload of query for HeatWave. qid=142, my_qid=5253596,
 DB = "docstore". Query = "explain format=tree with cte1 as 
(select doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine,
(select avg(score) from json_table (doc, "$.grades[*]" columns 
(score int path "$.score")) as r) as  avg_score from restaurants) 
select *, rank() over (  partition by cuisine order by avg_score desc)
 as `rank`  from cte1 order by `rank`, avg_score desc limit 5"
*************************** 2. row ***************************
    LOGGED: 2023-12-15 17:41:58.876733
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Starting costing for HeatWave query qid=142
*************************** 3. row ***************************
    LOGGED: 2023-12-15 17:41:58.876763
 THREAD_ID: 3535
      PRIO: System
ERROR_CODE: MY-011071
 SUBSYSTEM: RAPID
      DATA: Heatwave chosen plan quality accepted: qid=142 : total SG pairs: 0, Plan Quality metric: 1.0 , Plan Quality threshold threshold: 90.0

We can see that the MySQL attempted to offload the query….

So, how can we know the reason why the secondary engine is not used ?

There is an Optimizer Hint that will help us finding the reason: /*+ set_var(use_secondary_engine=forced) */.

When used, this optimizer hint will try to offload the query to HeatWave cluster but won’t run it on InnoDB when it fails to run on the secondary engine. Let’s try:

SQL > with cte1 as (
       select /*+ set_var(use_secondary_engine=forced) */
       doc->>"$.name" as name,  doc->>"$.cuisine" as cuisine, 
       (
        select avg(score) 
         from json_table (
            doc, "$.grades[*]" columns (score int path "$.score")) as r
         ) as  avg_score 
        from restaurants) select *, rank() 
        over (  
          partition by cuisine order by avg_score desc) as `rank`  
     from cte1 order by `rank`, avg_score desc limit 5;
ERROR: 3889: Secondary engine operation failed. Reason: "SUBQUERY not yet
 supported", "Table ``.`/db/tmp/#sql36f2_dcf_135` is not loaded in HeatWave".

Here we have a the reason why the query is not running in our HeatWave Clustser.

And in fact this unsupported subquery is related to the function JSON_TABLES() that is not supported in HeatWave cluster.

Conclusion

When encountering issues while utilizing JSON documents in the MySQL HeatWave Database Service in OCI and enabling the HeatWave cluster, it is crucial to become familiar with the proper way to obtain the necessary information for resolution. We learn how users can find those resource that can guide them in troubleshooting and rectifying any unexpected functionality problems.

Enjoy fast JSON in MySQL HeatWave on OCI !

December 17, 2023

Like many games of that time, Snow Bros. is a simple game that can be mastered with practice (or rather, coins). It’s fun all the way, specially when playing with 2 players (in co-op). The gameplay is somewhat reminiscent of Bubble Bobble, but instead of bubbles you shoot snow to bury enemies in snowballs. Push the snowball to get rid of all the enemies on its path! The arcade game was ported to a lot of game consoles and gaming computers.

December 16, 2023

Since the release of MySQL 8.0, the MySQL X Dev API has provided users with the convenient ability to utilize MySQL without the need to write a single line of SQL!

MySQL X Dev API brings the support for CRUD operations on JSON documents that are stored in MySQL. MySQL Document Store is ACID compliant and is compatible with everything MySQL like replication, InnoDB Cluster, …

The MySQL X Dev API is available using the MySQL X Protocol, listening by default on port 33060.

If you are interested in learning more about MySQL Document Store, please refer to the these presentations [1], [2], [3].

OCI MySQL HeatWave Database Service

The MySQL DBaaS on OCI is the only one providing access to the X Protocol. This mean you can create applications that doesn’t use any SQL to add, delete and modify and retrieve JSON documents stored in the DB System.

Let’s add some documents to our DB System. We will use the restaurants collection used in various MongoDB examples.

We first connect to our DB System (MySQL HeatWave instance in OCI) using MySQL Shell and the X Protocol (default). You can use a VPN, a Compute Instance, a bastion host…

Next, we create a new schema and we import all the JSON documents using the importJson() utility:

JS > session.createSchema('docstore')
<Schema:docstore>
JS > \u docstore
Default schema `docstore` accessible through db.
JS > util.importJson('restaurants.json', {convertBsonOid: true})
Importing from file "restaurants.json" to collection 
`docstore`.`restaurants` in MySQL Server at 10.0.1.249:33060

.. 25359.. 25359
Processed 15.60 MB in 25359 documents in 0.9856 sec (25.36K documents/s)
Total successfully imported documents 25359 (25.36K documents/s)

Done ! It’s fast and easy.

CRUD

It’s time to test the import and read the data we have imported.

Let’s start by just displaying the first document and verify it’s indeed a valid JSON document:

JS > db.restaurants.find().limit(1)
{
    "_id": "000065796b9c0000000000000001",
    "name": "Howard-Reyes",
    "grades": [
        {
            "date": {
                "$date": "2023-03-12T00:00:00.000+0000"
            },
            "grade": "B",
            "score": 5
        }
    ],
    "address": {
        "coord": [
            8.7301765,
            52.705775
        ],
        "street": "Ward Branch",
        "zipcode": "22737",
        "building": "74523"
    },
    "borough": "West Brandimouth",
    "cuisine": "Hawaiian"
}
1 document in set (0.0016 sec)

Let’s display 10 restaurants and their type of cuisine that are in Brooklyn:

JS > db.restaurants.find("borough = 'Brooklyn'").fields("name", "cuisine").limit(10)
{
    "name": "Wendy'S",
    "cuisine": "Hamburgers"
}
{
    "name": "Riviera Caterer",
    "cuisine": "American"
}
{
    "name": "Wilken'S Fine Food",
    "cuisine": "Delicatessen"
}
{
    "name": "Regina Caterers",
    "cuisine": "American"
}
{
    "name": "Taste The Tropics Ice Cream",
    "cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
    "name": "C & C Catering Service",
    "cuisine": "American"
}
{
    "name": "May May Kitchen",
    "cuisine": "Chinese"
}
{
    "name": "Seuda Foods",
    "cuisine": "Jewish/Kosher"
}
{
    "name": "Carvel Ice Cream",
    "cuisine": "Ice Cream, Gelato, Yogurt, Ices"
}
{
    "name": "Nordic Delicacies",
    "cuisine": "Delicatessen"
}
10 documents in set (0.3392 sec)

Adding Documents

We can also generate JSON documents without a single line of SQL using the X Dev API.

In this example, we use Python (mysql-connector-python3-8.2.0-1.el8.x86_64) and we add a document using the add() method of a collection (line 72):

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
#!/usr/bin/python3.8
import mysqlx
import sys
from random import choice, randrange
from faker import Faker
def connect():
session = mysqlx.get_session(
{
"host": "db instance IP",
"port": 33060,
"user": "login",
"password": "password",
"ssl-mode": "REQUIRED",
}
)
return session
def gen_cuisine():
cuisine_list = [
"Belgian",
"Italian"
]
return choice(cuisine_list)
session = connect()
db = session.get_schema("docstore")
col = db.get_collection("restaurants")
fake = Faker()
print("Generating new documents.", end="", flush=True)
total = 1000
if len(sys.argv) > 1:
if sys.argv[1]:
total = int(sys.argv[1])
for _ in range(total):
doc = {}
doc["name"] = fake.company()
address = {}
address["street"] = fake.street_name()
address["building"] = fake.building_number()
address["zipcode"] = fake.postcode()
doc["borough"] = fake.city()
doc["cuisine"] = gen_cuisine()
coord = []
coord.append(float(fake.latitude()))
coord.append(float(fake.longitude()))
address["coord"] = coord
doc["address"] = address
grades = []
for _ in range(randrange(5)):
grade = {}
grade_date = {}
date = fake.date_time_this_decade()
grade_date["$date"] = date.strftime("%Y-%m-%dT00:00:00.000+0000")
grade_note = choice(["A", "B", "C"])
grade_score = randrange(20)
grade["date"] = grade_date
grade["grade"] = grade_note
grade["score"] = grade_score
grades.append(grade)
doc["grades"] = grades
col.add(doc).execute()
if total > 100000 and not _ % 1000:
print(".", end="", flush=True)
else:
print(".", end="", flush=True)
print("\nDone ! {} documents generated.".format(total))

You can find on GitHub all the required files: https://github.com/lefred/restaurants-mysql-ds

In case you need large JSON documents, you can increase the size of the mysqlx_max_allowed_packet of your DB System’s configuration:

This will allow you to store JSON documents of 1GB !

HeatWave Accelerator

Now that we all our documents stored in a MySQL HeatWave instance on OCI, could we also benefit from a HeatWave Cluster to boost our queries ?

Since the release of 8.2.0-u1 in OCI, MySQL HeatWave also supports JSON datatype in the HeatWave Cluster. See Accelerating JSON Query Processing using MySQL HeatWave.

When using MySQL Document Store, to be able to load a collection to HeatWave Cluster, we need to modify how the Primary Key is stored as by default, a VARBINARY is used but not supported in HeatWave secondary engine.

If you run the HeatWave cluster nodes estimation you will get the following message:

Error comment: UNABLE TO LOAD TABLE WHEN PRIMARY KEY COLUMN(S) CANNOT BE LOADED
Columns to be loaded:1 (100% VARLEN-encoded columns)
Estimated number of rows:24082

This is the statement to perform the Primary Key change:

SQL> alter table restaurants modify _id char(28) GENERATED ALWAYS AS
     (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL;

As soon as we have enabled HeatWave Cluster we can load our collection to it:

But before loading it, let’s try a SQL query (yes you can do SQL queries on your JSON documents inside a collection) to see if there is some improvement:

SQL > select doc->>"$.borough" borough, count(*) tot,
             max(length(doc->>"$.name")) longest_name, 
             min(length(doc->>"$.name")) shortest_name,
             round(avg(length(doc->>"$.name")),2) avg_name_length 
      from restaurants 
      where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian' 
      group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough       | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael  |  17 |           28 |             9 |           18.59 |
| Port Michael  |  11 |           29 |             8 |           19.91 |
| East Michael  |  11 |           25 |             8 |           13.64 |
| South Michael |  11 |           27 |            11 |           19.64 |
| South David   |  11 |           30 |             8 |           13.82 |
| Port Matthew  |  11 |           27 |            10 |           18.45 |
| Michaelmouth  |  10 |           25 |             9 |           15.50 |
| Port Jennifer |  10 |           29 |             7 |           18.50 |
| West Michael  |  10 |           26 |            10 |           21.10 |
| Lake James    |  10 |           27 |             9 |           15.70 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.5191 sec)

This took a bit more than half second (0.5191 sec) to parse 962,520 JSON documents.

Now we load our data into our HeatWave Cluster:

SQL > call sys.heatwave_load(JSON_ARRAY('docstore'), NULL);
+------------------------------------------+
| INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
+------------------------------------------+
| Version: 2.41                            |
|                                          |
| Load Mode: normal                        |
| Load Policy: disable_unsupported_columns |
| Output Mode: normal                      |
|                                          |
+------------------------------------------+
6 rows in set (19.47 sec)
...
+------------------------------------------+
| LOADING TABLE                            |
+------------------------------------------+
| TABLE (2 of 2): `docstore`.`restaurants` |
| Commands executed successfully: 2 of 2   |
| Warnings encountered: 0                  |
| Table loaded successfully!               |
|   Total columns loaded: 2                |
|   Table loaded using 32 thread(s)        |
|   Elapsed time: 19.47 s                  |
|                                          |
+------------------------------------------+
8 rows in set (19.47 sec)
+----------------------------------------------------------------+
| LOAD SUMMARY                                                   |
+----------------------------------------------------------------+
|                                                                |
| SCHEMA           TABLES       TABLES      COLUMNS         LOAD |
| NAME             LOADED       FAILED       LOADED     DURATION |
| ------           ------       ------      -------     -------- |
| `docstore`            1            0            2      19.47 s |
|                                                                |
+----------------------------------------------------------------+
6 rows in set (19.47 sec)

Let’s now run the query again:

SQL > select doc->>"$.borough" borough, count(*) tot,
             max(length(doc->>"$.name")) longest_name, 
             min(length(doc->>"$.name")) shortest_name,
             round(avg(length(doc->>"$.name")),2) avg_name_length 
      from restaurants 
      where doc->>"$.cuisine" collate utf8mb4_0900_ai_ci like 'belgian' 
      group by borough order by tot desc limit 10;
+---------------+-----+--------------+---------------+-----------------+
| borough       | tot | longest_name | shortest_name | avg_name_length |
+---------------+-----+--------------+---------------+-----------------+
| Lake Michael  |  17 |           28 |             9 |           18.59 |
| South David   |  11 |           30 |             8 |           13.82 |
| Port Michael  |  11 |           29 |             8 |           19.91 |
| South Michael |  11 |           27 |            11 |           19.64 |
| Port Matthew  |  11 |           27 |            10 |           18.45 |
| East Michael  |  11 |           25 |             8 |           13.64 |
| Port Jennifer |  10 |           29 |             7 |           18.50 |
| West Michael  |  10 |           26 |            10 |           21.10 |
| Lake James    |  10 |           27 |             9 |           15.70 |
| Michaelmouth  |  10 |           25 |             9 |           15.50 |
+---------------+-----+--------------+---------------+-----------------+
10 rows in set (0.1017 sec)

It’s faster ! But it was already very fast. I even had to reduce the value of secondary_engine_cost_threshold because the query cost was just a little lower and MySQL HeatWave was not offloading it to the accelerator.

But with even more data you could really benefit for the acceleration.

For example with 2,162,520 documents we go from 1.1713 seconds to 0.1922 seconds when using HeatWeave Cluster.

Limitations

All is well in the best of worlds! Almost! When it comes to utilizing MySQL Document Store with MySQL HeatWave Database Service in OCI, you will find an absence of limitations. However, if your intention is to enable the HeatWave Cluster (Accelerator), certain restrictions should be taken into consideration.

The JSON document cannot be larger than 65532 bytes. As the default character set is utf8mb4, the maximum size of a JSON document is then 16Kb.

Currently, the JSON_TABLE() function is not supported.

Conclusion

MySQL Document Store is highly valued by developers and is available in both MySQL Community Edition and only in MySQL HeatWave when opting for a hosted and managed MySQL Service in the cloud.

With the lasted update, it’s also possible to benefit from HeatWave Cluster Accelerator when using MySQL Document Store on OCI which makes a very good choice for those having to deal with a large amount of JSON documents like in the IoT systems for example.

Enjoy using MySQL without a single line of SQL… even in the cloud !

Iedere keer Jonathan Holslag weer eens heeft mogen factureren aan de VRT omdat ze hem opdraven bij De Afspraak of Terzake vraag ik me af: met welke kennis van zaken spreekt deze man eigenlijk?

Bijna iedere uitspraak van hem is een ideologisch pro NAVO, pro VS, anti China en (uiteraard) anti Rusland. De volstrekte Atlanticist.

Ik kan helemaal volgen wanneer we Frank Creyelman veroordelen voor het hand- en spandiensten te leveren voor Chinese inlichtingendiensten.

Maar waar het ineens vandaan komt dat we dat feit ook moeten koppelen aan wat Yves Leterme doet, ontgaat mij volledig.

Als we Yves van iets moeten beschuldigen, is het dat hij de scheiding der machten niet respecteerde door de Fortis-rechter te beïnvloeden. Dat heeft dan ook gepaste gevolgen gehad.

Maar spionage voor China?

Ik vind het eigenlijk de titel professor bij de VUB onwaardig om zulke valse beschuldigingen te uiten, gewoon om wat centen te kunnen factureren bij de VRT.

Beste heer Holslag. Heb jij eigenlijk iets te zeggen wat wel op feiten is gebaseerd? Heb jij bewijzen?

Ik verwacht dat van iemand die zich professor mag noemen. En ik verwacht ook het ontslag wanneer zo iemand met zo’n titel niet voldoet aan die verwachting.

M.a.w. beste professor Holslag: bewijs dat Leterme een Chinese spion is. Of neem ontslag. Valse beschuldigingen zijn trouwens strafbaar onder de noemer Laster en Eerroof.

Willen we wel professors die zich schuldig maken aan strafbare feiten?

December 14, 2023

Time can be cruel for games. Some games don’t age well. The advance in computational and graphical power is painfully clear. But time can also be a filter to separate great from mediocre games. If a game survives for decades, there is a big chance it’s a great game. Games are released to run on very specific hardware and operating systems. Older games do often not run out of the box on modern computers.

December 13, 2023

Some time ago, we saw how we could deploy WordPress on OCI using MySQL HeatWave Database Service with Read Replicas. We had to modify WordPress to use a specific plugin that configures the Read/Write Splitting on the application (WordPress): LudicrousDB.

Today, we will not modify WordPress to split the Read and Write operations, but we will use MySQL Router 8.2.0 (see [1], [2], [3]).

Architecture

The architecture is simple, we will use one application server (Oracle Linux 8 on an Ampere Compute Instance) with Apache, PHP 8.2 and MySQL Router. The latest WordPress will be installed on it:

You can see that we will use different version of MySQL in MySQL HeatWave Database Service. We can use the latest 8.0 and an Innovation Release as Read Replica.

WordPress

We deploy WordPress (6.4.2) on an Always Free Ampere Compute instance using 4 cores and 24GB of RAM:

As reminder, this is how we install Apache, PHP 8.2 and WordPress on the compute instance:

ssh -i <path to the ssh key> opc@<public IP address>
sudo su -
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022
yum -y install https://rpms.remirepo.net/enterprise/remi-release-8.rpm
rpm -ivh https://dev.mysql.com/get/mysql80-community-release-el8-9.noarch.rpm

yum install -y mysql-shell --enablerepo=mysql-tools-innovation-community
yum install -y mysql-router --enablerepo=mysql-tools-innovation-community

dnf -y module enable php:remi-8.2
dnf -y install httpd php php-cli php-mysqlnd php-zip php-gd \
 php-mcrypt php-mbstring php-xml php-json

cd /var/www/
wget https://wordpress.org/latest.tar.gz
tar zxvf latest.tar.gz
rm -rf html/ latest.tar.gz
mv wordpress html
chown apache. -R html

systemctl start httpd
systemctl enable httpd

firewall-cmd --zone=public --permanent --add-port=80/tcp
firewall-cmd --zone=public --permanent --add-port=443/tcp
firewall-cmd --reload

chcon --type httpd_sys_rw_content_t /var/www/html
chcon --type httpd_sys_rw_content_t /var/www/html/*
setsebool -P httpd_can_network_connect_db 1
setsebool -P httpd_can_network_connect=1

MySQL HeatWave Database Service

Now we need to deploy a MySQL HeatWave instance in OCI. To use Read Replicas, the current requirement is to use at lease 4 OCPUs:

The IP of our new instance is available in the DB system’s connections tab:

Read Replicas

We can create one or more Read Replicas, and we can specify the required version, there is no need to use the same as the Primary DB System (source):

We need to keep the following information (IP of the Primary DB System and the Load Balancer one) for later:

WordPress Database and User

It’s time to create the WordPress database and its dedicated user in MySQL. We first connect to our new DB System using MySQL Shell from the Compute Instance (the WordPress instance):

mysqlsh --sql admin@10.0.1.191

And we can create the database and the dedicated user:

create database wordpress;
create user wp identified by 'W0rdPress4#';
grant all privileges on wordpress.* to wp;

MySQL Router

Transparent Read/Write splitting in MySQL Router is compatible with all integrated solutions for replication, high availability (HA), and disaster recovery (DR). These solutions are MySQL InnoDB ReplicaSet, MySQL InnoDB Cluster, and MySQL InnoDB ClusterSet.

To enable transparent Read/Write splitting, it is necessary to manually create certain tables and populate them with the IP address of our MySQL HeatWave Instance and the IP address of the Read Replica Load Balancer.

Download the following file, unzip it and load it in the DB System (via the compute instance):

[root@wordpress opc]# wget https://lefred.be/wp-content/uploads/2023/12/router_metadata.zip

[root@wordpress opc]# unzip router_metadata.zip 
Archive:  router_metadata.zip
  inflating: router_metadata.sql
     
[root@wordpress opc]# mysqlsh --sql admin@10.0.1.191 < router_metadata.sql 

Records: 3  Duplicates: 0  Warnings: 0
Records: 3  Duplicates: 0  Warnings: 0
Records: 2  Duplicates: 0  Warnings: 0

And we change the instances data to match what we have in OCI:

mysqlsh --sql admin@10.0.1.191

use mysql_innodb_cluster_metadata

update instances set address="10.0.1.191:3306", 
       mysql_server_uuid=@@server_uuid, instance_name=@@hostname
  where instance_id=1;
update instances set addresses = '{"mysqlX": "10.0.1.191:33060", 
       "mysqlClassic": "10.0.1.191:3306"}' 
  where instance_id=1;

update instances set address="10.0.1.16:3306", 
       mysql_server_uuid="86b0b07b-98d2-11ee-9c5a-020017241124",
       instance_name="read_replica_load_balancer" 
  where instance_id=2;
update instances set addresses = '{"mysqlX": "10.0.1.16:33060", 
       "mysqlClassic": "10.0.1.16:3306"}' 
  where instance_id=2;

I used the uuid of one of the replica for the load balancer entry.

Bootstrap

We can now bootstrap MySQL Router on the WordPress application server (the compute instance):

mysqlrouter --bootstrap admin@10.0.1.191 \
--conf-set-option=routing:bootstrap_rw_split.bind_port=3306 \
--user=mysqlrouter

You need to bootstrap using the IP of the Primary DB instance.

We can verify that in the new generated configuration (/etc/mysqlrouter/mysqlrouter.conf) the default R/W splitting port 6450 has been replaced by 3306:

We can start MySQL Router:

systemclt start mysqlrouter.service

Finishing WordPress Installation

Everything is now prepared to complete the WordPress installation process.

We enter the public IP of the compute instance in a browser and we follow the WordPress installation wizard:

Testing

To test, we will add a PHP code snippet to a new WordPress post that display some information related to the database it’s connected to:

And now we load the article several times:

Sometimes, it is observed that the PHP snippet is executed on one Read Replica at times and on the other one at other times.

We can see that we can create posts, meaning writes are indeed redirected as expected to the Primary DB System.

We can test that also by putting the query in a transaction, this will have the same effect than a write:

Perfect, it’s evident that we are reaching the MySQL HeatWave instance being the source of the Read Replicas !

Conclusion

We have observed that the MySQL HeatWave Database Service Read Replicas capability can be advantageous for WordPress without necessitating any modifications or configuration changes within WordPress itself. With the assistance of MySQL Router 8.2.0, which enables Transparent Read/Write Splitting, we can automatically direct read and write queries to the appropriate instance. This eliminates the need for plugins or altering the WordPress configuration on disk.

Enjoy deploying WordPress in OCI, MySQL HeatWave, Read Replicas and MySQL Router !

December 11, 2023

Firewatch is a single person adventure that I enjoyed from the beginning to (almost the) end. Certainly recommended. My only point of criticism is the rather abrupt ending. The game is short (around 5 hours), but the story is by no means rushed. Your life, as the protagonist, is in shambles. What better way to get your thoughts straight than wandering all day in nature while spotting wildfires? As a firewatcher in a very remote area, your only lifeline to the outside world is a warm and friendly voice through the walkie-talkie.

December 10, 2023

A Short Hike is a small indie game. And by small, well, I mean small: you’ll finish the story in around 1 hour and half. You can spend a few additional hours to 100% the game, but there isn’t much incentive to do so once you finish it. That said, the game is a great example of a cozy game. The artwork is beautiful, the mechanics are good and the quests are easy.

December 09, 2023

Is a game released on 2017 still retrogaming? It is if you played on a Wii-U, the abandoned last-generation console of Nintendo. Yes, it was also released on a Nintendo Switch, as exactly the same game with a very small bump in resolution (from 720p to 900p). Originally it was planned as a Wii-U exclusive, but it took Nintendo a few extra years to get it right. And they did.

December 08, 2023

The ability to store data on Object Storage and retrieve it dynamically when necessary is a notable advantage of Lakehouse when managing MySQL historical data we would like to archive.

Let’s illustrate this with the following table:

CREATE TABLE `temperature_history` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `time_stamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `device_id` varchar(30) DEFAULT NULL,
  `value` decimal(5,2) NOT NULL DEFAULT '0.00',
  `day_date` date GENERATED ALWAYS AS (cast(`time_stamp` as date)) STORED NOT NULL,
  PRIMARY KEY (`id`,`day_date`),
  KEY `device_id_idx` (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=129428417 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci SECONDARY_ENGINE=RAPID
/*!50500 PARTITION BY RANGE  COLUMNS(day_date)
(PARTITION p0_before2023_11 VALUES LESS THAN ('2023-11-01') ENGINE = InnoDB,
 PARTITION p2023_12 VALUES LESS THAN ('2023-12-01') ENGINE = InnoDB,
 PARTITION p2024_01 VALUES LESS THAN ('2024-01-01') ENGINE = InnoDB,
 PARTITION p2024_02 VALUES LESS THAN ('2024-02-01') ENGINE = InnoDB,
 PARTITION p2024_03 VALUES LESS THAN ('2024-03-01') ENGINE = InnoDB,
 PARTITION p2024_04 VALUES LESS THAN ('2024-04-01') ENGINE = InnoDB,
 PARTITION p9_future VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */

You can notice that the table is also loaded in MySQL HeaWave Cluster (see this previous post).

This table is full of records that were generated by IoT devices:

select count(*) from temperature_history;
+-----------+
| count(*)  |
+-----------+
| 129428416 |
+-----------+
1 row in set (0.0401 sec)

Mind the response time 😉

You can also notice that we have partitions. The plan to save disk space and archive the data on cheap storage is the following:

  1. one a partition is not required anymore, we dump its content to an Object Storage bucket.
  2. we drop the partition
  3. if it’s the first time, we create a second archive table from the the HeatWave load report
  4. if needed we load/unload the data on demand
  5. we can create a new future partition (optional)

Dumping a Partition to Object Storage

Now the partition with data before December can be archived. Let’s see how much data this represents:

select count(*) from temperature_history partition(p2023_12);
+----------+
| count(*) |
+----------+
|  1894194 |
+----------+
1 row in set (0.0373 sec) 

Object Storage Bucket & PAR

Now we need to create a bucket where we will archive the data for the temperature_history table:

We will use a Pre-Authenticated Request (PAR) to write and read data in Object Storage:

It’s important to allow read and writes and the listing of the objects:

And this is the PAR’s url we will use:

Data Transfer

We use MySQL Shell dumpTables() utility to copy the data from the partition to Object Storage using the PAR url:

util.dumpTables("piday", ["temperature_history"],
     "https://<namespace>.objectstorage.<region>.oci.customer-oci.com/p/<random>/n/<namespave>/b/temperature_history_archive/o/",
     {"dialect": "csv", "compression": "none", "partitions":
                 {"piday.temperature_history": ["p2023_12"]}
     })

It’s very important to specify to not compress the files as by default they are.

From OCI Console, we can see all the generated files and we need to keep only the csv ones:

If you have a very large table with a log of files (chunks), deleting all the .idx files is a long process, you can then use a tool like Fileon – S3 Browser:

Partition Management

Now that the data is stored in Object Storage, we can delete the partition:

alter table temperature_history drop partition p2023_12;

As we are working on the partition, we can already add an extra one (above optional point 5) using the following syntax:

alter table temperature_history reorganize partition p9_future 
      into (
            partition p2024_05 values less than ('2024-05-01'),
            partition p9_future values less than (maxvalue)
      );

Archive Table Creation

The first time, we need to create the archived table in which we will load the Object Storage data used for Lakehouse.

Lakehouse

We need to have a HeatWave Cluster with Lakehouse enabled:

We need to prepare the system with the table and partition we want to load. For this operation, we need to set 3 variables:

  • db_list: list of the database we will load
  • dl_tables: list of the tables we will load and the information related to the format and the Object Storage location
  • options: preparation of the arguments for the heatwave_load procedure. We also parse and include the dl_tables variable

db_list

We start by defining the db_list. In our case it’s easy has we only use one database: piday

SET @db_list = '["piday"]';

dl_tables

We need to provide information related to the table we want to create and specify where and how is the table stored:

SET @dl_tables='[{"db_name": "piday","tables": 
    [
     {"table_name": "temperature_history_archive",
      "dialect": 
        {"format": "csv", 
          "field_delimiter": "\,", 
          "record_delimiter": "\\n"
        },
      "file": [{"par": "https://...<the_par_url>..."}]
     }
    ]
}]';

options

We can now generate the options variable that we will use as argument for our procedure:

SET @options = JSON_OBJECT('mode', 'dryrun', 'policy',
 'disable_unsupported_columns', 
 'external_tables', CAST(@dl_tables AS JSON));

Auto Parallel Load

Lakehouse has the capability to create the table for us and load the data into it. But as we want to explicitly use some specific column names instead of using generic ones, we will use the report to create the table and load the data manually in two different steps. This is why we specified dryrun as mode in the @options definition:

call sys.heatwave_load(@db_list, @options);

We can now retrieve the table’s creation statement and manually modify the name of the columns while creating the table:

 SELECT log->>"$.sql" AS "Load Script"
 FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id\G

Let’s modify all the col_X column with the field names we want:

CREATE TABLE `piday`.`temperature_history_archive`( 
   `id` int unsigned NOT NULL, `time_stamp` timestamp(0) NOT NULL,
   `device_id` varchar(28) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',
   `value` decimal(4,2) NOT NULL
) ENGINE=lakehouse SECONDARY_ENGINE=RAPID 
  ENGINE_ATTRIBUTE='{"file": [{"par": "https://...<PAR>..."}], 
"dialect": {"format": "csv", "field_delimiter": ",", "record_delimiter": "\\n"}}';

Once created, we can load the data to the secondary engine:

ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ 
   `piday`.`temperature_history_archive` SECONDARY_LOAD;

We can verify that the data was loaded correctly:

select count(*) from temperature_history_archive;
+----------+
| count(*) |
+----------+
|  1894194 |
+----------+
1 row in set (0.0299 sec)

And later ?

Now let’s move forward in time and let’s assume we can archive the data present in the partition p2024_01:

select count(*) from temperature_history partition(p2024_01);
+----------+
| count(*) |
+----------+
| 50034435 |
+----------+
1 row in set (0.1562 sec)

We need to dump the data in our Object Storage bucket, unfortunately we need to use a different folder at the dump needs an empty destination. We will use a temporary folder in our bucket:

util.dumpTables("piday",["temperature_history"],"https://<PAR URL>/tmp/",
 {"dialect": "csv","compression":"none",
  "partitions": {"piday.temperature_history": ["p2024_01"]}})
Acquiring global read lock
Global read lock acquired
Initializing - done 
1 tables and 0 views will be dumped.
Gathering information - done 
All transactions have been started
Locking instance for backup
Global read lock has been released
Writing global DDL files
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done       
Writing DDL - done       
Writing table metadata - done       
Starting data dump
40% (50.03M rows / ~124.50M rows), 1.63M rows/s, 104.01 MB/s                 
Dump duration: 00:00:24s
Total duration: 00:00:24s
Schemas dumped: 1
Tables dumped: 1
Data size: 3.44 GB
Rows written: 50034435
Bytes written: 3.44 GB
Average throughput: 141.20 MB/s

This produces a lot of files:

As we only need the csv ones, I will use a fuse module to mount the Object Storage Bucket on my system and use the usual commands to move and delete files (see this post on how to setup s3fs-fuse).

[fred@dell ~] $ mkdir mnt
[fred@dell ~] $ s3fs temperature_history_archive ~/mnt/ -o endpoint=us-ashburn-1 \
-o passwd_file=~/.passwd-ocifs \
-o url=https://xxx.compat.objectstorage.us-ashburn-1.oraclecloud.com/ \
-onomultipart -o use_path_request_style
[fred@dell ~] $ ls mnt
piday@temperature_history@p2023_12@0.csv   tmp
piday@temperature_history@p2023_12@@1.csv
[fred@dell ~/mnt] $ mv tmp/*.csv .
[fred@dell ~/mnt] $ rm -rf tmp

We can now unload and load the data back in Lakehouse:

ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ 
   `piday`.`temperature_history_archive` SECONDARY_UNLOAD;

ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ 
   `piday`.`temperature_history_archive` SECONDARY_LOAD;

select count(*) from temperature_history_archive;
+----------+
| count(*) |
+----------+
| 51928629 |
+----------+
1 row in set (0.0244 sec)

We can safely remove the partition from the production table:

alter table temperature_history drop partition p2024_01;

If we don’t need the archive data, we can simply unload it again (and load it back later):

ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ 
   `piday`.`temperature_history_archive` SECONDARY_UNLOAD;

select count(*) from temperature_history_archive;
ERROR: 3877: Secondary engine operation failed. 
Reason: "Table `piday`.`temperature_history_archive` is not loaded in HeatWave"

Conclusion

In this article, we explored the advantages of utilizing HeatWave Lakehouse to effectively store MySQL data for archiving purposes and reloading it as needed. It is noteworthy to mention that the entire archived dataset, consisting of 51 million records, was loaded from Object Storage within a relatively impressive time frame of 26.58 seconds on my MySQL HeatWave OCI instance.

This can help saving disk space on your MySQL HeatWave instance and increase performance by cleaning up large tables.

Bypassing the creation of the json and idx files, and the possibility to dump data on a non empty destination would be two very nice features for MySQL Shell dump utility.

Enjoy archiving your data in MySQL, HeatWave and Lakehouse !

December 07, 2023

I finally set up some Github pre-commit hooks on my local Drupal development environment.

Pre-commit hooks in Git can streamline your workflow by automating checks and tasks for every commit. The official documentation can be found at https://git-scm.com/docs/githooks.

If you, like me, often forget to run your code linter and static code analyzer, pre-commit hooks offer a great solution. They automate these tasks with every commit.

To set up a pre-commit hook, edit the file .git/hooks/pre-commit in your Drupal directory. Git specifically looks for a file named pre-commit in the .git/hooks directory so the name has to be exact.

Here is my current script:

#!/bin/sh

# Execute PHP Code Beautifier and Fixer
./vendor/bin/phpcbf
if [ $? -ne 0 ]; then
 echo "Failed command: ./vendor/bin/phpcbf"
 exit 1
fi

# Execute PHPStan for static code analysis
./vendor/bin/phpstan analyse –memory-limit=256M
if [ $? -ne 0 ]; then
 echo "Failed command: ./vendor/bin/phpstan analyse –memory-limit=256M"
 exit 1
fi

The script runs phpcbf (PHP Code Beautifier and Fixer) for code formatting and phpstan (a static analysis tool for PHP) for identifying bugs. Should these tools find any issues, the commit is blocked.

Installing phpcbf and phpstan

Drupal does not ship with phpcbf or phpstan out-of-the-box, so these tools need to be installed. They can be installed individually using Composer:

$ composer require –dev squizlabs/php_codesniffer phpstan/phpstan

However, my recommended way to install these tools for Drupal is by installing the drupal/core-dev package. This package bundles a number of Drupal development tools and best practices, including phpcbf and phpstan.

$ composer require drupal/core-dev –dev –update-with-all-dependencies

December 02, 2023

Misfortune Ramirez Hernandez, a cute 8-year-old girl, loves to tell the world she is “a little Miss”. Little Misfortune is more than a wordplay on the story of the child. It’s a lovely, funny and sad little adventure game. All at the same time. It’s about a child’s sense of wonder and about the loss of innocence. Although short (I finished it around 3 hours), it’s rewarding. No need to make it longer just for the sake of it.
A fun, kind of realistic tennis game. Best for playing in Quick Tournament mode (the history mode is silly). Not easy, but certainly worth it. Handhelds like the rp3+ can upscale it to about 3x the original resolution. My old 2016-Celeron NUC with Batocera can upscale it to 1080p without any issues. Have a look at “How to play retrogames?” if you don’t know how to play retrogames.
This is a beautiful game with interesting puzzles and… a story. Another perfect example of SNES-level graphics on a small handheld. The game takes a little investment in the beginning because it starts slowly, certainly if you’re not familiar with the action-adventure genre The quests will become clearer and the speed will pick up. You will be stuck more than once (actually, lots of time). You’ll want to finish it in one go, but this open world game will keep you occupied for way, way, longer than you expected.
This game is probably my favorite in the Metroid Series. It’s a perfect example of SNES-level graphics on a small handheld. This is the most story-driven installment of all Metroid 2D games. Be ready to be lost for hours and experience a huge adrenaline boost once you finally figure out what you need to do or when you finish that impossible boss. Oh, and in this Metroid there are situations where there is no way you can win, and you have to run.
The 2D Metroid Series offer a perfect mix of platformer action, adventure and intelligent puzzles. While preceded by Metroid (1986) on the NES and Metroid II: Return of Samus (1991) on the GameBoy, this is the game that gave the series its distinct look and identity. While Metrovania is a thing and inspired many great games for decades to come, it is Metroid and not Castlevania the game that immerses the played in an eerie, dreadful, inmense and most of all lonely universe.
This game returns to the top-down perspective of the first Zelda while making the game graphically stunning (the NES graphics ages less well compared to the SNES/GBA sequels). Even today, more than 30 years later, it’s pretty much the pattern to follow for pixel-art games. This game is long and hard, but at the same time extremely rewarding. It’s a must-play classic. Have a look at “How to play retrogames?
Dr. Mario is often bundled with its more famous predecessor Tetris. The puzzle game is certainly a worthy successor. While there are versions of this game on more recent consoles, the straightforwardness of the Game Boy is something the more recent ports lack. The game is fast, the music fantastic. Simple and a lot of addictive fun. Have a look at “How to play retrogames?” if you don’t know how to play retrogames.
Tetris was bundled with the European and North-American release of the new Game Boy handheld. It’s until today in the top 10 of most sold Nintendo games of all time. There are many older versions of Tetris, including the original release on the Sovjet Electronika 60 and later versions on Western and Eastern arcade machines around the world (most of the time, arcade releases were far superior to home console ports).
This game is the first Nintendo game for a handheld console. No pressure. It’s probably the most quirky Mario game, but in a good way. Mostly line art and 2 Gradius-like flying levels. It’s a short game and rather difficult at the end, but a lot of fun. A worthy adaptation of the Super Mario Bros. formula for the weaker handheld. Have a look at “How to play retrogames?” if you don’t know how to play retrogames.
This is the game that started the almost 40-years franchise that expanded to animated TV-series and movies. It’s fun, it’s challenging and the soundtrack is recognizable everywhere. The original version is still the best version. Later versions on other consoles often crop the image in order to adapt it to different screen geometries, e.g. the NES Classics release on the Game Boy Advanced. Still the king! Have a look at “How to play retrogames?

November 30, 2023

What a disaster this is, the DIY framework laptop.

The box...

...contains more boxes.


The parts are few...


...so the laptop is assembled in less than a minute?!

Unpacking the thing took longer than assembling it!

Reading this webpage took longer than assembling it!

Installing the DDR5 memory took 1 second: click and done.
Installing the nvme took 3 seconds: unscrew one screw, click and screw.
Installing the bezel was far more challenging, took almost 30 seconds! (*)
Attaching the keyboard: about three seconds?
Installing the four modules: 1 second per module (or less?)

I'm not that good at irony, but I was hoping for at least 20 minutes of fun assembling this thing... and only got one minute. :-/

(*) This gave fond memories of assembling tower PC's in the nineties; ISA/PCI slots click, CD-ROM atapi click, hard disk and IDE jumpers easy. But closing the tower case... well that was the hard part!

Installing an OS

The guide says to be patient when booting (powering on) the laptop for the first time, it may take several minutes.

I timed it, it took 57 seconds to get to this screen:

And now there is this; it has been a long time since I saw a default XFCE, Firefox and terminal.


Let the fun begin!

November 26, 2023

In my previous blog post, we installed GitLab-CE and did some post configuration. In this blog post, we’ll continue to create user accounts and set up SSH to the git repository.

In the next blog posts will add code to GitLab and set up GitLab runners on different Operating systems.

Users

Update root password

There was an initial root user with a password created during the installation, the initial root password is saved - or was saved as GitLab will delete the initial password after 24H - to /etc/gitlab/initial_root_password.

To update the root password, log in to GitLab with your web browser click on your “avatar” icon and click on [ Edit profile ]

Alt text

Select [ Password ] to update the root password.

Alt text

You need to re-login after the password is updated.

Creating users

In this section we’ll set up an additional administrator user, I usually try to have a backup admin account. It isn’t recommended to use an admin account directly so we’ll set up a regular user that we can use for your daily development tasks.

The full documentation about GitLab users is available at: https://docs.gitlab.com/ee/user/profile/account/create_accounts.html#create-users-in-admin-area

Admin user

Alt text

In the admin area select users on the left-hand side.

Alt text

In the New user window, fill in the Account Name / Username and Email Address. Please note that you can’t use admin as this is a reversed user name in GitLab. At the Access level select Administrator.

Alt text

Alt text

If everything goes well you or the user will receive an email to reset his or her password.

Regular user

Repeat the same steps to create a normal user and keep the Access level to “Regular”.

Alt text

GitLab SSH access

To access your git repositories using ssh you’ll need to upload your ssh public key. I use hardware tokens ( smartcard-hsm ) to protect my private keys.

You can find more information on how to use SmartCard-HSM with ssh in a previous blog post: https://stafwag.github.io/blog/blog/2015/12/05/protecting-your-ssh-keys-with-smartcard-hsm/

You can find information on how to generate a ssh key pair for GitLab at:https://docs.gitlab.com/ee/user/ssh.html

If you don’t mind having your keypair on your filesystem you can use the steps below ( but I advise you to look at the better options ).

Generate a ssh key pair

Execute ssh-keygen to generate a ssh key-pair.

I recommend using a passphrase so the private key is encrypted; this prevents an attacker from copying it as plain text from the filesystem. To prevent that you need to type in a passphrase during the development you can use a ssh-agent so you don’t need to type in your password each time you push a commit to your git repository (see below).

The unencrypted key is still stored in memory if you use an ssh-agent unless you store your private key on a hardware token (HSM).

Please note that we use a non-default ssh key location in the command below ~/.ssh/testgitlab.key, we update your ssh client config to use this private key.

[staf@vicky ~]$ ssh-keygen -f ~/.ssh/testgitlab.key -t ed25519 -C "test gitlab key"
Generating public/private ed25519 key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/staf/.ssh/testgitlab.key
Your public key has been saved in /home/staf/.ssh/testgitlab.key.pub
The key fingerprint is:
SHA256:m4TpKmOjvwkoY2H3arG3tTLTOgYeA03BJoACD86Gkr0 test gitlab key
The key's randomart image is:
+--[ED25519 256]--+
|B ...            |
|** +             |
|=+B              |
|o. o   o         |
| oE.  o S        |
|o o=.. . o       |
|=.. *.o.o        |
|oo==.O...        |
|.+=*+oB.         |
+----[SHA256]-----+
[staf@vicky ~]$ 

Upload the public key to GitLab

Edit your profile.

Alt text

Goto SSH Keys and select [ Add new key ].

[staf@vicky ~]$ cat ~/.ssh/testgitlab.key.pub 
ssh-ed25519 <snip> test gitlab key
[staf@vicky ~]$ 

Add the public key to GitLab. If you don’t want to set an Expiration date, clear the field.

And click on [ Add key ]

Alt text

When you upload your public key to the GitLab GUI, it will update ~/.ssh/authorized_keys for the git user. The git user was created during the installation.

Let’s quickly review it.

Log on to your GitLab server.

[staf@vicky ~]$ ssh staf@gitlab.stafnet.local 
Linux gitlab 6.1.0-13-arm64 #1 SMP Debian 6.1.55-1 (2023-09-29) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Nov 25 08:02:53 2023 from 192.168.1.10
staf@gitlab:~$ 

Get details about the git user.

staf@gitlab:~$ cat /etc/passwd | grep -i git
gitlab-www:x:999:996::/var/opt/gitlab/nginx:/bin/false
git:x:996:995::/var/opt/gitlab:/bin/sh
gitlab-redis:x:995:994::/var/opt/gitlab/redis:/bin/false
gitlab-psql:x:994:993::/var/opt/gitlab/postgresql:/bin/sh
gitlab-prometheus:x:993:992::/var/opt/gitlab/prometheus:/bin/sh
staf@gitlab:~$ 

Display the ~/.ssh/authorized_keys contents.

staf@gitlab:~$ sudo cat /var/opt/gitlab/.ssh/authorized_keys
[sudo] password for staf: 
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-1",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-2",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-3",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
staf@gitlab:~$ 

Test connection

Check the connection with the command below. Please note that we use the git user. GitLab will use the matching public key to determine (map to) the real GitLab user.

[staf@vicky ~]$ ssh -T git@gitlab.stafnet.local -i ~/.ssh/testgitlab.key 
Enter passphrase for key '/home/staf/.ssh/testgitlab.key': 
Welcome to GitLab, @stafwag!
[staf@vicky ~]$ 

The -T option disables pseudo-terminal allocation (pty). Without it you’ll get a warning.

Reconfigure your SSH client

We’ll update the ssh config to use our private key.

Open your ssh config with your favourite editor.

[staf@vicky ~]$ vi ~/.ssh/config

Add a section for our GitLab host and set the IdentityFile set to our private key.

Host gitlab.stafnet.local
  PreferredAuthentications publickey
  IdentityFile ~/.ssh/testgitlab.key

Test the SSH access.

[staf@vicky ~]$ ssh -T git@gitlab.stafnet.local
Welcome to GitLab, @stafwag!
[staf@vicky ~]$ 

Use a ssh-agent

We’ll add our private key to the ssh-agent to avoid the need to type in our password each time we perform a git action.

Most modern Unix desktop systems ( GNOME, KDE, Xfce ) will set up a ssh-agent automatically.

Some GNU/Linux distributions will even add to the key automatically when you use it. My FreeBSD desktop (Xfce) only starts the ssh-agent.

When your Unix desktop doesn’t set up a ssh-agent you’ll need to start it.

When a ssh-agent is configured the SSH_AGENT_PID and the SSH_AUTH_SOCK environment variables are set.

Check if there is already a ssh-agent configured.

staf@fedora39:~$ set | grep SSH
SSH_CLIENT='192.168.122.1 49618 22'
SSH_CONNECTION='192.168.122.1 49618 192.168.122.96 22'
SSH_TTY=/dev/pts/2
staf@fedora39:~$ 

The easiest way to start a ssh-agent and set up the environment variables is to execute the command eval $(ssh-agent).

staf@fedora39:~$ eval $(ssh-agent)
Agent pid 3542
staf@fedora39:~$ 

This starts the ssh-agent, eval will execute the ssh-agent output and set the required environment variables.

staf@fedora39:~$ set | grep SSH
SSH_AGENT_PID=3542
SSH_AUTH_SOCK=/tmp/ssh-XXXXXXcQhGER/agent.3541
SSH_CLIENT='192.168.122.1 60044 22'
SSH_CONNECTION='192.168.122.1 60044 192.168.122.96 22'
SSH_TTY=/dev/pts/2
staf@fedora39:~$ 

Add the private key to the ssh-agent.

staf@fedora39:~$ ssh-add /home/staf/.ssh/testgitlab.key
Enter passphrase for /home/staf/.ssh/testgitlab.key: 
Identity added: /home/staf/.ssh/testgitlab.key (test gitlab key)
staf@fedora39:~$ 

Test the connection.

staf@fedora39:~$ ssh -T git@gitlab.stafnet.local
Welcome to GitLab, @stafwag!
staf@fedora39:~$ 

With everything set up we’re ready to start to use GitLab. In a further blog post we will go start with adding code to GitLab, add runners and create artifacts.

Have fun!

Links

November 22, 2023

Today my rooster August(us) was not standing well on its legs. Because it’s getting winter and more importantly because my woman is pregnant I decided that this time we wouldn’t try to save the animal ..

Last time one of our chickens (Livia, our white chicken) fell ill we took it inside (somewhere in the basement where the temperature is constant) to save it from the other chickens (who were also picking on her – because that’s what healthy chickens do to a sick one) and give it the yolk of the other chickens’ eggs for extra strength.

It was a mess all the time. Outside the chickens take care of themselves. But petting it inside of the house you kinda have to do all that for the chicken. Chickens shit a lot. The outcome isn’t very hygienic. I was not planning to go through this ordeal with our rooster August.

Anyway. I had already bought a restraining cone for chicken slaughter a few months ago. A sharp knife I had in the kitchen. August was very relax and not very defensive because well, he was feeling sick and all that. It was not hard to put him head first in the cone. I also petted him and calmed him before all this.

That’s all very bad news. But the good news is that this new year’s eve we will be doing Coq au vin!

Afterwards in warm water we plucked the feathers, then opened up the body and took the intestines out (carefully not cutting open the gallblatter and urineblatter to make sure we don’t spoil any meat).

Meanwhile Livia, Julia and the newest of the flock Gwendolyn (that name because she’s a Aarschot’s breed of chicken – I always use political figures for chicken’s names) are still alive and happily laying eggs for us.

It’s ~ decided that Julia must go too. Because our new year’s eve family gathering will count five attendants and we need enough chickenmeat. Plus, the nice people of Hagelandsneerhof organize the Neerhofdieren show every year around the end of Januari at O-Green in Rillaar. I plan to start with a new flock of chickens and buy a new entire group there.

Poor chickens.

They had a good life. I don’t know yet what I’ll do with Gwendolyn and Livia. Especially Livia is very special: she’s always the first to arrive when we bring food. Fiercely fighting for it with Gwendolyn.

Poor August. He was so strong and defensive for his chickens. He cut me several times during one of his attacks (before I learned how to tame him).

But I couldn’t let him be in agony of sickness.

November 19, 2023

We now invite proposals for presentations. FOSDEM offers open source and free software developers a place to meet, share ideas and collaborate. Renowned for being highly developer-oriented, the event brings together some 8000+ geeks from all over the world. The twenty-fourth edition will take place on Saturday 3rd and Sunday 4th February 2024 at the usual location, ULB Campus Solbosch in Brussels. Developer Rooms For more details about the Developer Rooms, please refer to the Calls for Papers at https://fosdem.org/2024/news/2023-11-08-devrooms-announced. Main Tracks Main track presentations cover topics of interest to a significant part of our audience that do not舰
With great pleasure we can announce the following projects will have a stand at FOSDEM 2024 (3 & 4th of February). This is the list of stands (in alphabetic order): AlmaLinux Ansible Automotive Grade Linux Badge.Team CentOS, RDO and OKD CERN Open Source Checkmk CiviCRM Code your Future Codeberg coreboot / flashrom / EDKII / OpenBMC Debian CalyxOS + Droidian Linux Fedora Project FOSSASIA Free Software Foundation Europe FreeCAD GNOME GNU HEALTH PROJECT GNU Radio with IQEngine Google Summer of Code 20th year stand Grafana Homebrew Infostand on amateur radio ISRG - Let's Encrypt, Prossimo & Divi Up Jenkins舰

November 15, 2023

CI/CD Platform Overview

When you want or need to use CI/CD you have a lot of CI/CD platforms where you can choose from. As with most “tools”, the tool is less important. What (which flow, best practices, security benchmarks, etc) and how you implement it, is what matters.

One of the most commonly used options is Jenkins.

I used and still use Jenkins and created a jenkins build workstation to build software and test in my homelab a couple of years back.

jenkins

Jenkins started as Hudson at Sun Microsystem(RIP). Hudson is one of the many open-source projects that were started at Sun and killed by Oracle. Jenkins continued as the open-source fork of Hudson.

Jenkins has evolved. If you need to do more complex things you probably end up creating a lot of groovy scripts, nothing wrong with groovy. But as with a lot of discussions about programming, the ecosystem (who is using it, which libraries are available, etc) is important.

Groovy isn’t that commonly used in and known in the system administration ecosystem so this is probably something you need to learn if you’re coming for the system administrator world ( as I do, so I learnt the basics of Groovy this way ).

The other option is to implement CI/CD using the commonly used source hosting platforms; GitHub and GitLab.

Both have offerings that you can use on-premise;

There are other CI/CD systems available. Other examples are Tekon - used by RedHat on OpenShift - and Drone which have gained popularity. Just to name a few.

I started to use GitLab in my lab to build/test/and deploy software. Why GitLab? with Gitlab you have the option to use the same pipelines as on the Gitlab source hosting platform on your on-premise installation.

gitlab

Gitlab comes in two versions;

You can find more information about the difference between the two versions at: https://about.gitlab.com/handbook/marketing/brand-and-product-marketing/product-and-solution-marketing/tiers/.

In general, I don’t like this approach, having a paid non-open source version doesn’t help that end-users (developers) and different companies work together to improve the software. Companies that use the software will pay for support and consultancy services in the long run.

But it is what it is. If you want to use GitLab CI/CD on GitLab in your homelab environment GitLab-CE is the option to go.

In this blog series, we will go over the installation of GitLab CE and its basic features, as always my blog posts are my installation instructions/notes that I took during my setup in the hope that it is useful to somebody else.

Installation

All actions are executed on Debian GNU/Linux 12 (bookworm) on X86 and ARM64 as I migrated all my systems that are running constantly on a cluster running on Raspberry Pi’s 4 to save power.

I mainly follow the official installation instructions at https://about.gitlab.com/install/#debian and the blog post at: https://www.linuxtechi.com/how-to-install-gitlab-on-ubuntu/.

Install requirements

Update

As always it’s a good idea to update your software (regularly).

Update the package database.

staf@tstgitlab:~$ sudo apt update -y
[sudo] password for staf: 
Get:1 http://security.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:2 http://deb.debian.org/debian bookworm InRelease [151 kB]                                         
Get:3 http://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]                                               
Get:4 http://deb.debian.org/debian bookworm-backports InRelease [56.5 kB]
Get:5 http://security.debian.org/debian-security bookworm-security/main Sources [57.5 kB]
Get:7 http://security.debian.org/debian-security bookworm-security/main amd64 Packages [95.7 kB]
Get:8 http://security.debian.org/debian-security bookworm-security/main Translation-en [54.4 kB]
Get:9 http://deb.debian.org/debian bookworm-backports/main Sources.diff/Index [63.3 kB]
Get:10 http://deb.debian.org/debian bookworm-backports/main amd64 Packages.diff/Index [63.3 kB]
Hit:6 https://packages.gitlab.com/gitlab/gitlab-ce/debian bookworm InRelease
Get:11 http://deb.debian.org/debian bookworm-backports/main Sources T-2023-11-03-1405.27-F-2023-11-03-1405.27.pdiff [836 B]
Get:11 http://deb.debian.org/debian bookworm-backports/main Sources T-2023-11-03-1405.27-F-2023-11-03-1405.27.pdiff [836 B]
Get:12 http://deb.debian.org/debian bookworm-backports/main amd64 Packages T-2023-11-03-2011.07-F-2023-11-03-2011.07.pdiff [1,216 B]
Get:12 http://deb.debian.org/debian bookworm-backports/main amd64 Packages T-2023-11-03-2011.07-F-2023-11-03-2011.07.pdiff [1,216 B]
Fetched 644 kB in 1s (793 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
staf@tstgitlab:~$

Upgrade the packages.

staf@tstgitlab:~$ sudo apt upgrade -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
staf@tstgitlab:~$ 

Install the required packages

Install the required dependencies.

staf@tstgitlab:~$ sudo apt-get install -y curl openssh-server ca-certificates perl
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
curl is already the newest version (7.88.1-10+deb12u4).
curl set to manually installed.
openssh-server is already the newest version (1:9.2p1-2+deb12u1).
ca-certificates is already the newest version (20230311).
ca-certificates set to manually installed.
perl is already the newest version (5.36.0-7).
perl set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
staf@tstgitlab:~$ 

If you want your system to send out emails for GitLab, you’ll need to set up a local mail server like Postfix or you can reconfigure GitLab to use another email server. See below for references on how to configure this on GitLab.

Setup the GitLab CE Repositories

GitLab provides a script to set up the GitLab repositories for both GitLab CE and GitLab EE. We’ll use GitLab CE. As always it’s a good idea to not run a script that you pulled for the internet blindly; we’ll download the script, review it and execute it.

Create a directory.

staf@tstgitlab:~$ mkdir gitlab
staf@tstgitlab:~$ 

Download the repository setup script.

staf@tstgitlab:~/gitlab$ wget https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh
--2023-11-03 08:11:58--  https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh
Resolving packages.gitlab.com (packages.gitlab.com)... 172.64.148.245, 104.18.39.11, 2606:4700:4400::6812:270b, ...
Connecting to packages.gitlab.com (packages.gitlab.com)|172.64.148.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6865 (6.7K) [text/install_instructions]
Saving to: ‘script.deb.sh’

script.deb.sh       100%[===================>]   6.70K  --.-KB/s    in 0s      

2023-11-03 08:11:58 (35.9 MB/s) - ‘script.deb.sh’ saved [6865/6865]

staf@tstgitlab:~/gitlab$ 

Review the script.

staf@tstgitlab:~/gitlab$ vi script.deb.sh

Execute it.

staf@tstgitlab:~/gitlab$ sudo bash script.deb.sh
Detected operating system as debian/12.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Running apt-get update... done.
Installing debian-archive-keyring which is needed for installing 
apt-transport-https on many Debian systems.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/gitlab_gitlab-ce.list...done.
Importing packagecloud gpg key... done.
Running apt-get update... done.

The repository is setup! You can now install packages.
staf@tstgitlab:~/gitlab$ 

Installation

The installation is straightforward, you need to specify the EXTERNAL_URL environment variable and install the gitlab-ce package.

staf@tstgitlab:~/gitlab$ sudo EXTERNAL_URL="http://tstgitlab" apt install gitlab-ce
<snip>
Notes:
Default admin account has been configured with following details:
Username: root
Password: You didn't opt-in to print initial root password to STDOUT.
Password stored to /etc/gitlab/initial_root_password. This file will be cleaned up in first reconfigure run after 24 hours.

NOTE: Because these credentials might be present in your log files in plain text, it is highly recommended to reset the password following https://docs.gitlab.com/ee/security/reset_use
r_password.html#reset-your-root-password.

gitlab Reconfigured!

       *.                  *.
      ***                 ***
     *****               *****
    .******             *******
    ********            ********
   ,,,,,,,,,***********,,,,,,,,,
  ,,,,,,,,,,,*********,,,,,,,,,,,
  .,,,,,,,,,,,*******,,,,,,,,,,,,
      ,,,,,,,,,*****,,,,,,,,,.
         ,,,,,,,****,,,,,,
            .,,,***,,,,
                ,*,.
  


     _______ __  __          __
    / ____(_) /_/ /   ____ _/ /_
   / / __/ / __/ /   / __ `/ __ \
  / /_/ / / /_/ /___/ /_/ / /_/ /
  \____/_/\__/_____/\__,_/_.___/
  

Thank you for installing GitLab!
GitLab should be available at http://tstgitlab

For a comprehensive list of configuration options please see the Omnibus GitLab readme
https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md

Help us improve the installation experience, let us know how we did with a 1 minute survey:
https://gitlab.fra1.qualtrics.com/jfe/form/SV_6kVqZANThUQ1bZb?installation=omnibus&release=16-5

staf@tstgitlab:~/gitlab$ 

Open your browser to the GitLab URL if everything goes well you see the GitLab login screen.

Alt text

You can log in as the root user and the initial password that was created during the installation. This password is stored in the /etc/gitlab/initial_root_password file.

Please note that the password file will be deleted by GitLab automatically, so copy it and keep it in a safe place.

Display the password.

staf@tstgitlab:~$ ls -l /etc/gitlab/initial_root_password 
-rw------- 1 root root 749 Nov  3 08:18 /etc/gitlab/initial_root_password
staf@tstgitlab:~$ 
staf@tstgitlab:~$ sudo cat /etc/gitlab/initial_root_password 
[sudo] password for staf: 
# WARNING: This value is valid only in the following conditions
#          1. If provided manually (either via `GITLAB_ROOT_PASSWORD` environment variable or via `gitlab_rails['initial_root_password']` setting in `gitlab.rb`, it was provided before database was seeded for the first time (usually, the first reconfigure run).
#          2. Password hasn't been changed manually, either via UI or via command line.
#
#          If the password shown here doesn't work, you must reset the admin password following https://docs.gitlab.com/ee/security/reset_user_password.html#reset-your-root-password.

Password: <snip>

# NOTE: This file will be automatically deleted in the first reconfigure run after 24 hours.
staf@tstgitlab:~$ 

And login:

Alt text

Post configuration

HTTPS

GitLab uses NGINX under the hood, enabling https on our installation is reconfiguring NGINX. GitLab also comes with Let’s Encrypt support; see https://docs.gitlab.com/omnibus/settings/ssl/ for more information. When you specify https:// as part of the EXTERNAL_URL environment variable during the installation the GitLab installer will set up Let’s Encrypt automatically. But I didn’t try this, in this blog post we’ll set up a self-signed certificate.

Security (rant)

In a corporate environment, you might want to use an internal certificate authority. But if you want/need to set up an internal CA authority it needs to be secured.

You need to protect your private keys and limit access to the authority, if you don’t have the resources or the time available to set up CA authority in a decent way, it’s better not you use an internal CA authority as this is an attack vector - if an attacker has gained access to the CA authority - or has copied the private key - he/she can generate certificates for every host/domain.

Bottom line; if you don’t have the time to set up CA authority in a secure way it’s better to use Let’s Encrypt or another third-party Authority. Or even just use a self-signing certificate.

I use a HSM - SmartCard-HSM and my CA Authority is offline see https://stafwag.github.io/blog/blog/2020/04/29/setup-an-ca-with-smartcard/

Create a self-signed certificate

I mainly followed the official documentation at GitLab: https://docs.gitlab.com/omnibus/settings/ssl/index.html#configure-https-manually

Set the external_url & disable let’s encrypt

Open your favourite editor.

root@gitlab:~# nvi /etc/gitlab/gitlab.rb 
root@gitlab:~# 

Update the external_url to use https://.

## GitLab URL
##! URL on which GitLab will be reachable.
##! For more details on configuring external_url see:
##! https://docs.gitlab.com/omnibus/settings/configuration.html#configuring-the-external-url-for-gitlab
##!
##! Note: During installation/upgrades, the value of the environment variable
##! EXTERNAL_URL will be used to populate/replace this value.
##! On AWS EC2 instances, we also attempt to fetch the public hostname/IP
##! address from AWS. For more details, see:
##! https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
external_url 'https://gitlab.stafnet.local'

Set letsencrypt['enable'] = false to disable let’s encrypt support.

################################################################################
# Let's Encrypt integration
################################################################################
letsencrypt['enable'] = false

Create a self-signed certificate

What is a self-signed certificate?

A certificate is a way to distribute a public key. A certificate is normally signed by a CA authority private key. You trust the CA authority public key distributed in the CA certificate.

A “self-signed certificate” is just a certificate that is signed with the corresponding private key of the certificate instead of the private key of the CA authority.

Create the SSL directory

Create the /etc/gitlab/ssl directory.

root@gitlab:/etc/gitlab# ls
config_backup  gitlab.rb  gitlab-secrets.json  trusted-certs
root@gitlab:/etc/gitlab# mkdir ssl
root@gitlab:/etc/gitlab# 

And set the permissions.

root@gitlab:/etc/gitlab# chmod 755 ssl
root@gitlab:/etc/gitlab# 
Private key

We’ll create a private key for our self-signed certificate, please note that we don’t encrypt the private key as this would as we would need to type in the password for the private key password each time we start GitLab.

Create a private key.

root@gitlab:/etc/gitlab/ssl# openssl genrsa -out gitlab.stafnet.local.key 4096
root@gitlab:/etc/gitlab/ssl# 
CSR

Create a certificate request.

root@gitlab:/etc/gitlab/ssl# openssl req -key gitlab.stafnet.local.key -new -out gitlab.stafnet.local.csr
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:BE
State or Province Name (full name) [Some-State]:Antwerp
Locality Name (eg, city) []:Antwerp
Organization Name (eg, company) [Internet Widgits Pty Ltd]:stafnet
Organizational Unit Name (eg, section) []:         
Common Name (e.g. server FQDN or YOUR name) []:gitlab.stafnet.local
Email Address []:

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
root@gitlab:/etc/gitlab/ssl# 
Sign

Sign your certificate with your private key.

root@gitlab:/etc/gitlab/ssl# openssl x509 -signkey gitlab.stafnet.local.key -in gitlab.stafnet.local.csr -req -days 365 -out gitlab.stafnet.local.crt
Certificate request self-signature ok
subject=C = BE, ST = Antwerp, L = Antwerp, O = stafnet, CN = gitlab.stafnet.local
root@gitlab:/etc/gitlab/ssl# 

Verify permissions

Make sure that your private key is not world-readable.

root@gitlab:/etc/gitlab/ssl# ls -l
total 12
-rw-r--r-- 1 root root 1895 Nov 12 10:22 gitlab.stafnet.local.crt
-rw-r--r-- 1 root root 1691 Nov 12 10:20 gitlab.stafnet.local.csr
-rw------- 1 root root 3272 Nov  8 20:25 gitlab.stafnet.local.key
root@gitlab:/etc/gitlab/ssl# 

Reconfigure GitLab

Open gitlab.rb in your favourite editor.

root@gitlab:/etc/gitlab/ssl# vi /etc/gitlab/gitlab.rb
root@gitlab:/etc/gitlab/ssl# 

Update the external_url setting.

##! EXTERNAL_URL will be used to populate/replace this value.
##! On AWS EC2 instances, we also attempt to fetch the public hostname/IP
##! address from AWS. For more details, see:
##! https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
external_url 'https://gitlab.stafnet.local'

Add the following settings to the bottom of the file.

nginx['redirect_http_to_https'] = true
registry_nginx['redirect_http_to_https'] = true
mattermost_nginx['redirect_http_to_https'] = true

Run gitlab-ctl reconfigure to reconfigure gitlab.

root@gitlab:/etc/gitlab/ssl# gitlab-ctl reconfigure
oot@gitlab:/etc/gitlab/ssl# gitlab-ctl reconfigure
[2023-11-12T11:05:43+01:00] INFO: Started Cinc Zero at chefzero://localhost:1 with repository at /opt/gitlab/embedded (One version per cookbook)
Cinc Client, version 17.10.0
Patents: https://www.chef.io/patents
Infra Phase starting
[2023-11-12T11:05:43+01:00] INFO: *** Cinc Client 17.10.0 ***
[2023-11-12T11:05:43+01:00] INFO: Platform: aarch64-linux
[2023-11-12T11:05:43+01:00] INFO: Cinc-client pid: 52849
[2023-11-12T11:05:46+01:00] INFO: Setting the run_list to ["recipe[gitlab]"] from CLI options
[2023-11-12T11:05:46+01:00] INFO: Run List is [recipe[gitlab]]
[2023-11-12T11:05:46+01:00] INFO: Run List expands to [gitlab]
<snip>
  * template[/var/opt/gitlab/postgres-exporter/queries.yaml] action create (up to date)
  * consul_service[postgres-exporter] action delete
    * file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] action delete (up to date)
     (up to date)
Recipe: gitlab::database_reindexing_disable
  * crond_job[database-reindexing] action delete
    * file[/var/opt/gitlab/crond/database-reindexing] action delete (up to date)
     (up to date)
[2023-11-12T11:06:20+01:00] INFO: Cinc Client Run complete in 34.625114871 seconds

Running handlers:
[2023-11-12T11:06:20+01:00] INFO: Running report handlers
Running handlers complete
[2023-11-12T11:06:20+01:00] INFO: Report handlers complete
Infra Phase complete, 0/805 resources updated in 37 seconds
gitlab Reconfigured!

If everything goes well you’re able to login to your gitlab instance over https://.

Alt text

Email

By default, GitLab will use localhost as the email server. You either need to reconfigure the local email server to be able to send out emails. Or you can reconfigure GitLab to use an external SMTP server.

You can find more information on how to use an external email server in GitLab at: https://docs.gitlab.com/omnibus/settings/smtp.

Backup

The GitLab backup/restore procedure is explained at: https://docs.gitlab.com/ee/administration/backup_restore/

To execute a backup run the gitlab-backup create command.

The backups are restored in the /var/opt/gitlab/backups directory.

root@gitlab:/etc/gitlab/ssl# ls -l /var/opt/gitlab/backups
total 3684
-rw------- 1 git git  491520 Oct  7 11:53 1696672275_2023_10_07_16.4.0_gitlab_backup.tar
-rw------- 1 git git  512000 Nov  3 11:36 1699007796_2023_11_03_16.4.1_gitlab_backup.tar
-rw------- 1 git git 1382400 Nov 12 11:49 1699786111_2023_11_12_16.5.1_gitlab_backup.tar
-rw------- 1 git git 1382400 Nov 12 11:53 1699786384_2023_11_12_16.5.1_gitlab_backup.tar
root@gitlab:/etc/gitlab/ssl# 

Have fun!

Links

November 14, 2023

I released AOPro 2.2 today and the main new feature is the ability to delay the loading/ rendering of iFrames. Despite the fact iFrames can be lazy-loaded (using browser-native attribute), in reality this does not always work in WordPress, hence the new “delay iFrames” option on the Pro Boosters tab. Apart from delaying iFrames, AOPro 2.2 also has…

Source