Planet Grep

Planet'ing Belgian FLOSS people

Planet Grep is maintained by Wouter Verhelst. All times are in UTC.

December 02, 2023

Misfortune Ramirez Hernandez, a cute 8-year-old girl, loves to tell the world she is “a little Miss”. Little Misfortune is more than a wordplay on the story of the child. It’s a lovely, funny and sad little adventure game. All at the same time. It’s about a child’s sense of wonder and about the loss of innocence. Although short (I finished it around 3 hours), it’s rewarding. No need to make it longer just for the sake of it.
A fun, kind of realistic tennis game. Best for playing in Quick Tournament mode (the history mode is silly). Not easy, but certainly worth it. Handhelds like the rp3+ can upscale it to about 3x the original resolution. My old 2016-Celeron NUC with Batocera can upscale it to 1080p without any issues.
This is a beautiful game with interesting puzzles and… a story. Another perfect example of SNES-level graphics on a small handheld. The game takes a little investment in the beginning because it starts slowly, certainly if you’re not familiar with the action-adventure genre The quests will become clearer and the speed will pick up. You will be stuck more than once (actually, lots of time). You’ll want to finish it in one go, but this open world game will keep you occupied for way, way, longer than you expected.
This game is probably my favorite in the Metroid Series. It’s a perfect example of SNES-level graphics on a small handheld. This is the most story-driven installment of all Metroid 2D games. Be ready to be lost for hours and experience a huge adrenaline boost once you finally figure out what you need to do or when you finish that impossible boss. Oh, and in this Metroid there are situations where there is no way you can win, and you have to run.
The 2D Metroid Series offer a perfect mix of platformer action, adventure and intelligent puzzles. While preceded by Metroid (1986) on the NES and Metroid II: Return of Samus (1991) on the GameBoy, this is the game that gave the series its distinct look and identity. While Metrovania is a thing and inspired many great games for decades to come, it is Metroid and not Castlevania the game that immerses the played in an eerie, dreadful, inmense and most of all lonely universe.
This game returns to the top-down perspective of the first Zelda while making the game graphically stunning (the NES graphics ages less well compared to the SNES/GBA sequels). Even today, more than 30 years later, it’s pretty much the pattern to follow for pixel-art games. This game is long and hard, but at the same time extremely rewarding. It’s a must-play classic.
Dr. Mario is often bundled with its more famous predecessor Tetris. The puzzle game is certainly a worthy successor. While there are versions of this game on more recent consoles, the straightforwardness of the Game Boy is something the more recent ports lack. The game is fast, the music fantastic. Simple and a lot of addictive fun.
Tetris was bundled with the European and North-American release of the new Game Boy handheld. It’s until today in the top 10 of most sold Nintendo games of all time. There are many older versions of Tetris, including the original release on the Sovjet Electronika 60 and later versions on Western and Eastern arcade machines around the world (most of the time, arcade releases were far superior to home console ports).
This game is the first Nintendo game for a handheld console. No pressure. It’s probably the most quirky Mario game, but in a good way. Mostly line art and 2 Gradius-like flying levels. It’s a short game and rather difficult at the end, but a lot of fun. A worthy adaptation of the Super Mario Bros. formula for the weaker handheld.
This is the game that started the almost 40-years franchise that expanded to animated TV-series and movies. It’s fun, it’s challenging and the soundtrack is recognizable everywhere. The original version is still the best version. Later versions on other consoles often crop the image in order to adapt it to different screen geometries, e.g. the NES Classics release on the Game Boy Advanced. Still the king!

I won’t cover reasons why you should migrate from MariaDB to MySQL, I already covered the topic in previous blogs and webinars:

In this post, I will cover a new easier way to migrate using the latest MySQL Shell 8.2.1 and the copyInstance() utility.

Previously, if you wanted to migrate to MySQL HeatWave (the name of the MySQL Database Service in OCI, DBaaS) you needed to use an intermediate storage (local disk or Object Storage). With the new copyInstance(), this is not required anymore.

Once you have created your MySQL HeatWave instance, on a compute instance or any local machine that has access to your local MariaDB instance and the MySQL HeatWave one, you can copy all the data directly from one instance to the other.

Please don’t forget that you also have preliminary checks to perform, like checking that you are not using some special storage engines and more (see Migrating from MariaDB to MySQL).

Let’s illustrate the procedure.

MySQL Shell 8.2.1

We need first to install MySQL Shell 8.2.1. Pay attention that you need the latest version (at least 8.2.1) that is compatible with MariaDB:

$ sudo yum install 
 https://dev.mysql.com/get/Downloads/MySQL-Shell/mysql-shell-8.2.1-1.el8.x86_64.rpm

Then you connect to your MariaDB instance:

copyInstance()

Once connected, you can start the verification before the copy using the dryRun mode (in Python or Javascript mode):

MySQL JS> \js
MySQL JS> util.copyInstance('mysql://admin@10.0.1.105', 
    {"compatibility": ["force_innodb", "skip_invalid_accounts",
                       "strip_definers", "strip_restricted_grants",
                       "strip_tablespaces", "ignore_wildcard_grants",
                       "strip_invalid_grants","create_invisible_pks"], 
      users: "false", threads: 4, ignoreVersion: "true", dryRun:"true"})

Mind the required options:

  • compatibility: applying some specific requirements for MySQL HeatWave
    • force_innodb
    • skip_invalid_accounts
    • strip_definers
    • strip_restricted_grants
    • strip_tablespaces
    • ignore_wildcard_grants (not really required as we will skip the users)
    • strip_invalid_grants (not really required as we will skip the users)
    • create_invisible_pks (required if we plan to enable HA in the future)
  • users: MariaDB users are not supported, we need to handle them separately
  • threads
  • ignoreVersion
  • dryRun: we just perform the checks, we don’t actually copy the data

If the command succeeds, we can start the copy using the same command but setting dryRun to false:

… [output truncated] …

And voilà !

No intermediate storage has been required and the copy went smoothly.

Users

MariaDB manages the users differently than MySQL 8. If you need to also transfer the users, you can use the user MySQL Shell plugin (https://github.com/lefred/mysqlshell-plugins) :

Conclusion

With the latest MySQL Shell copyInstance() utility, migrating from MariaDB to MySQL HeatWave has become exceptionally straightforward. In fact, this tool is not limited to just that; it can also be utilized for seamless migration to MySQL on-premise.

Whether you’re working with a MariaDB or MySQL instance, the copyInstance() utility empowers you to effortlessly duplicate entire instances, schemas, and tables to MySQL 8.

This noticeable addition simplifies the migration process even further, eliminating the need for any intermediate storage.

You can download the full guide here.

Happy migrations !

November 30, 2023

What a disaster this is, the DIY framework laptop.

The box...

...contains more boxes.


The parts are few...


...so the laptop is assembled in less than a minute?!

Unpacking the thing took longer than assembling it!

Reading this webpage took longer than assembling it!

Installing the DDR5 memory took 1 second: click and done.
Installing the nvme took 3 seconds: unscrew one screw, click and screw.
Installing the bezel was far more challenging, took almost 30 seconds! (*)
Attaching the keyboard: about three seconds?
Installing the four modules: 1 second per module (or less?)

I'm not that good at irony, but I was hoping for at least 20 minutes of fun assembling this thing... and only got one minute. :-/

(*) This gave fond memories of assembling tower PC's in the nineties; ISA/PCI slots click, CD-ROM atapi click, hard disk and IDE jumpers easy. But closing the tower case... well that was the hard part!

Installing an OS

The guide says to be patient when booting (powering on) the laptop for the first time, it may take several minutes.

I timed it, it took 57 seconds to get to this screen:

And now there is this; it has been a long time since I saw a default XFCE, Firefox and terminal.


Let the fun begin!

November 28, 2023

Last week, from November 21 to 24, the DOAG Conference took place in Nuremberg, Germany.

The MySQL Team was present at the Oracle booth. I attended the conference Thursday and Friday.

There were 20 sessions tagged “MySQL” and one full day workshop dedicated to students. I delivered the first part of the workshop dedicated to MySQL for Developers, DBAs and Ops. My colleague Carsten who attended the full week, delivered the second part of the workshop dedicated to the MySQL offer in OCI: MySQL HeatWave and LakeHouse.

The first day was a special “theme” day about Open Source Databases and other topics. Colin Charles talked about MySQL Security, Raphael Salguero presented the differences between Oracle Database, PostgreSQL and MySQL.

There were also presentations about MySQL Clustering and HA from Matthias Jung and future directions of MySQL by Mughees Minhas.

According to Carsten, the MySQL Innovation Release (8.2) and the plans for MySQL LTS sparked a lot of discussions, not just during the presentations but also during the breaks.

The focus of the second DOAG day was only partially on OpenSource, but the questions and discussions continued into the morning at the Oracle booth.

The tutorial attracted many participants who were attentive and receptive throughout the day. They were able to discover the specifics of MySQL such as MySQL Document Store, and the different architectures that are very easy to deploy using MySQL Shell: MySQL InnoDB Cluster, ClusterSet, and Read Replicas.

They were also able to deploy a MySQL HeatWave instance in OCI and experience first-hand the power of HeatWave Cluster (Accelerator) to dramatically improve the slowest queries.

I also attended Carsten’s presentation on the Future of MySQL & MySQL Heatwave where he highlighted the future additions to MySQL HeatWave and Lakehouse but also the support of new MySQL store procedures written in Javascript with GraalVM and future development around AI.

As usual, the discussions around our booth and in the hallways were very interesting, and I talked to some clients who are in the process of migrating to MySQL HeatWave on OCI and are eager to use our query acceleration solution for their problematic OLAP workload due to the amount of data in their current MySQL environment.

This type of discussions is very interesting, and the feedback is very constructive. It also emerges that many users really appreciate MySQL Shell !

During the event I also had the chance to finally meet in real life Mathias Jung, Oracle ACE for MySQL.

This was a great event to meet the Oracle & MySQL Community, customers and also share our message with students and users not always aware of all the potential of MySQL.

Many thanks to all our MySQL customers who came and see you next year for another great DOAG.

November 26, 2023

In my previous blog post, we installed GitLab-CE and did some post configuration. In this blog post, we’ll continue to create user accounts and set up SSH to the git repository.

In the next blog posts will add code to GitLab and set up GitLab runners on different Operating systems.

Users

Update root password

There was an initial root user with a password created during the installation, the initial root password is saved - or was saved as GitLab will delete the initial password after 24H - to /etc/gitlab/initial_root_password.

To update the root password, log in to GitLab with your web browser click on your “avatar” icon and click on [ Edit profile ]

Alt text

Select [ Password ] to update the root password.

Alt text

You need to re-login after the password is updated.

Creating users

In this section we’ll set up an additional administrator user, I usually try to have a backup admin account. It isn’t recommended to use an admin account directly so we’ll set up a regular user that we can use for your daily development tasks.

The full documentation about GitLab users is available at: https://docs.gitlab.com/ee/user/profile/account/create_accounts.html#create-users-in-admin-area

Admin user

Alt text

In the admin area select users on the left-hand side.

Alt text

In the New user window, fill in the Account Name / Username and Email Address. Please note that you can’t use admin as this is a reversed user name in GitLab. At the Access level select Administrator.

Alt text

Alt text

If everything goes well you or the user will receive an email to reset his or her password.

Regular user

Repeat the same steps to create a normal user and keep the Access level to “Regular”.

Alt text

GitLab SSH access

To access your git repositories using ssh you’ll need to upload your ssh public key. I use hardware tokens ( smartcard-hsm ) to protect my private keys.

You can find more information on how to use SmartCard-HSM with ssh in a previous blog post: https://stafwag.github.io/blog/blog/2015/12/05/protecting-your-ssh-keys-with-smartcard-hsm/

You can find information on how to generate a ssh key pair for GitLab at:https://docs.gitlab.com/ee/user/ssh.html

If you don’t mind having your keypair on your filesystem you can use the steps below ( but I advise you to look at the better options ).

Generate a ssh key pair

Execute ssh-keygen to generate a ssh key-pair.

I recommend using a passphrase so the private key is encrypted; this prevents an attacker from copying it as plain text from the filesystem. To prevent that you need to type in a passphrase during the development you can use a ssh-agent so you don’t need to type in your password each time you push a commit to your git repository (see below).

The unencrypted key is still stored in memory if you use an ssh-agent unless you store your private key on a hardware token (HSM).

Please note that we use a non-default ssh key location in the command below ~/.ssh/testgitlab.key, we update your ssh client config to use this private key.

[staf@vicky ~]$ ssh-keygen -f ~/.ssh/testgitlab.key -t ed25519 -C "test gitlab key"
Generating public/private ed25519 key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/staf/.ssh/testgitlab.key
Your public key has been saved in /home/staf/.ssh/testgitlab.key.pub
The key fingerprint is:
SHA256:m4TpKmOjvwkoY2H3arG3tTLTOgYeA03BJoACD86Gkr0 test gitlab key
The key's randomart image is:
+--[ED25519 256]--+
|B ...            |
|** +             |
|=+B              |
|o. o   o         |
| oE.  o S        |
|o o=.. . o       |
|=.. *.o.o        |
|oo==.O...        |
|.+=*+oB.         |
+----[SHA256]-----+
[staf@vicky ~]$ 

Upload the public key to GitLab

Edit your profile.

Alt text

Goto SSH Keys and select [ Add new key ].

[staf@vicky ~]$ cat ~/.ssh/testgitlab.key.pub 
ssh-ed25519 <snip> test gitlab key
[staf@vicky ~]$ 

Add the public key to GitLab. If you don’t want to set an Expiration date, clear the field.

And click on [ Add key ]

Alt text

When you upload your public key to the GitLab GUI, it will update ~/.ssh/authorized_keys for the git user. The git user was created during the installation.

Let’s quickly review it.

Log on to your GitLab server.

[staf@vicky ~]$ ssh staf@gitlab.stafnet.local 
Linux gitlab 6.1.0-13-arm64 #1 SMP Debian 6.1.55-1 (2023-09-29) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Nov 25 08:02:53 2023 from 192.168.1.10
staf@gitlab:~$ 

Get details about the git user.

staf@gitlab:~$ cat /etc/passwd | grep -i git
gitlab-www:x:999:996::/var/opt/gitlab/nginx:/bin/false
git:x:996:995::/var/opt/gitlab:/bin/sh
gitlab-redis:x:995:994::/var/opt/gitlab/redis:/bin/false
gitlab-psql:x:994:993::/var/opt/gitlab/postgresql:/bin/sh
gitlab-prometheus:x:993:992::/var/opt/gitlab/prometheus:/bin/sh
staf@gitlab:~$ 

Display the ~/.ssh/authorized_keys contents.

staf@gitlab:~$ sudo cat /var/opt/gitlab/.ssh/authorized_keys
[sudo] password for staf: 
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-1",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-2",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-3",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty <snip>
staf@gitlab:~$ 

Test connection

Check the connection with the command below. Please note that we use the git user. GitLab will use the matching public key to determine (map to) the real GitLab user.

[staf@vicky ~]$ ssh -T git@gitlab.stafnet.local -i ~/.ssh/testgitlab.key 
Enter passphrase for key '/home/staf/.ssh/testgitlab.key': 
Welcome to GitLab, @stafwag!
[staf@vicky ~]$ 

The -T option disables pseudo-terminal allocation (pty). Without it you’ll get a warning.

Reconfigure your SSH client

We’ll update the ssh config to use our private key.

Open your ssh config with your favourite editor.

[staf@vicky ~]$ vi ~/.ssh/config

Add a section for our GitLab host and set the IdentityFile set to our private key.

Host gitlab.stafnet.local
  PreferredAuthentications publickey
  IdentityFile ~/.ssh/testgitlab.key

Test the SSH access.

[staf@vicky ~]$ ssh -T git@gitlab.stafnet.local
Welcome to GitLab, @stafwag!
[staf@vicky ~]$ 

Use a ssh-agent

We’ll add our private key to the ssh-agent to avoid the need to type in our password each time we perform a git action.

Most modern Unix desktop systems ( GNOME, KDE, Xfce ) will set up a ssh-agent automatically.

Some GNU/Linux distributions will even add to the key automatically when you use it. My FreeBSD desktop (Xfce) only starts the ssh-agent.

When your Unix desktop doesn’t set up a ssh-agent you’ll need to start it.

When a ssh-agent is configured the SSH_AGENT_PID and the SSH_AUTH_SOCK environment variables are set.

Check if there is already a ssh-agent configured.

staf@fedora39:~$ set | grep SSH
SSH_CLIENT='192.168.122.1 49618 22'
SSH_CONNECTION='192.168.122.1 49618 192.168.122.96 22'
SSH_TTY=/dev/pts/2
staf@fedora39:~$ 

The easiest way to start a ssh-agent and set up the environment variables is to execute the command eval $(ssh-agent).

staf@fedora39:~$ eval $(ssh-agent)
Agent pid 3542
staf@fedora39:~$ 

This starts the ssh-agent, eval will execute the ssh-agent output and set the required environment variables.

staf@fedora39:~$ set | grep SSH
SSH_AGENT_PID=3542
SSH_AUTH_SOCK=/tmp/ssh-XXXXXXcQhGER/agent.3541
SSH_CLIENT='192.168.122.1 60044 22'
SSH_CONNECTION='192.168.122.1 60044 192.168.122.96 22'
SSH_TTY=/dev/pts/2
staf@fedora39:~$ 

Add the private key to the ssh-agent.

staf@fedora39:~$ ssh-add /home/staf/.ssh/testgitlab.key
Enter passphrase for /home/staf/.ssh/testgitlab.key: 
Identity added: /home/staf/.ssh/testgitlab.key (test gitlab key)
staf@fedora39:~$ 

Test the connection.

staf@fedora39:~$ ssh -T git@gitlab.stafnet.local
Welcome to GitLab, @stafwag!
staf@fedora39:~$ 

With everything set up we’re ready to start to use GitLab. In a further blog post we will go start with adding code to GitLab, add runners and create artifacts.

Have fun!

Links

November 25, 2023

Announcing Offpunk 2.0

I’m happy to announce the release, last week, of Offpunk 2.0.

Offpunk is an offline-first command-line browser/RSS reader. You control it by typing command and it maintains a cache of all the networked resources to allow you to access them offline indefinitely.

If a non-cached resource is tentatively accessed, the URL is marked as to be fetched later. Running periodically "offpunk --sync" will fetch those resources and add them to your "tour" to remind you that you wanted to access it.

Screenshot

Mandatory screenshot showing Offpunk browsing Offpunk’s website. There’s a screenshot of Offpunk in the screenshot. Mandatory screenshot showing Offpunk browsing Offpunk’s website. There’s a screenshot of Offpunk in the screenshot.

Switching the license to AGPLv3

Offpunk originally started as a branch then a friendly fork of AV-98. It was called AV-98-offline and, as such, shared the same BSD license.

During multiple discussions, Solderpunk and I came to the conclusion that AV-98-offline was becoming too different from the initial goal of AV-98. It was thus renamed Offpunk. At the same time, I grew increasingly convinced that we needed more copyleft software and that the AGPL license was better suited to protect the commons.

As a symbolic move, I’ve thus decided to switch Offpunk license from BSD to AGPLv3 but needed an opportunity to do so. The 2.0 release is such an opportunity.

Multiple independent tools

Like AV-98, Offpunk was one single big python file. I liked the simplicity of it. But it really became a mess and I wanted to offer Offpunk’s features as separate command-line tool. With Offpunk 2.0, you will thus have three new command-line tools:

- netcache : when given a URL, will download and cache this URL or only access the cache if the "--offline" option is provided.
- ansicat : will render an HTML, an RSS, a Gemtext or even a picture in your terminal, with various options.
- opnk : universal opener. Will try to render any file or any URL in your terminal. If it fails, it will fallback to xdg-open.

Those three commands should come with a man page and a "--help" but they are still quite new. To my own surprise, I found myself using "opnk" all the time. I don’t think anymore about how to handle a file, I simply give it to opnk.

Packaging those tools was a lot harder than expected and I want to thank all the contributors to this work, including Austreelis, David Zaslavsky and Jean Abou Samra.

Themes

The goal of Offpunk, through Ansicat, is to render web, RSS, gemini and gopher pages as coloured ANSI text in your terminal. Until now, those colours were hardcoded. With 2.0, they can be customised. See "help theme".

Screenshot of Offpunk customised with the worst possible colours I could find. Screenshot of Offpunk customised with the worst possible colours I could find.

In offpunk, customisation can be made permanent by adding all the commands you want to run at startup in your .config/offpunk/offpunkrc file. Mine contains one single line: "offline", ensuring I use Offpunk only in offline mode.

Getting started

Using Offpunk daily as your main browsing/rss driver takes some learning. You need to get used to the Offpunk philosophy: adding elements to tour instead of clicking them, creating lists to read later, doing a daily synchronisation. It is not trivial.

The "help" command will probably be your best allies. The community also provide support on a user dedicated mailing-list.

If Offpunk becomes useful to you, the community is open. Contributions, documentation, blog post about how you use Offpunk, help to new users and packaging are warmly welcome. Sometimes, simple feedback is all it takes to make a developer happy. So don’t hesitate to contribute in one of our lists.

I’ve also started an experimental Matrix room on #offpunk:matrix.org. I have the belief that mailing-list is better suited for discussions but I’m giving this the benefit of doubt and willing to explore whether or not direct real-time discussion could help new users.

As a writer and an engineer, I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

If you read French, you can support me by buying/sharing/reading my books and subscribing to my newsletter in French or RSS. I also develop Free Software.

November 22, 2023

Today my rooster August(us) was not standing well on its legs. Because it’s getting winter and more importantly because my woman is pregnant I decided that this time we wouldn’t try to save the animal ..

Last time one of our chickens (Livia, our white chicken) fell ill we took it inside (somewhere in the basement where the temperature is constant) to save it from the other chickens (who were also picking on her – because that’s what healthy chickens do to a sick one) and give it the yolk of the other chickens’ eggs for extra strength.

It was a mess all the time. Outside the chickens take care of themselves. But petting it inside of the house you kinda have to do all that for the chicken. Chickens shit a lot. The outcome isn’t very hygienic. I was not planning to go through this ordeal with our rooster August.

Anyway. I had already bought a restraining cone for chicken slaughter a few months ago. A sharp knife I had in the kitchen. August was very relax and not very defensive because well, he was feeling sick and all that. It was not hard to put him head first in the cone. I also petted him and calmed him before all this.

That’s all very bad news. But the good news is that this new year’s eve we will be doing Coq au vin!

Afterwards in warm water we plucked the feathers, then opened up the body and took the intestines out (carefully not cutting open the gallblatter and urineblatter to make sure we don’t spoil any meat).

Meanwhile Livia, Julia and the newest of the flock Gwendolyn (that name because she’s a Aarschot’s breed of chicken – I always use political figures for chicken’s names) are still alive and happily laying eggs for us.

It’s ~ decided that Julia must go too. Because our new year’s eve family gathering will count five attendants and we need enough chickenmeat. Plus, the nice people of Hagelandsneerhof organize the Neerhofdieren show every year around the end of Januari at O-Green in Rillaar. I plan to start with a new flock of chickens and buy a new entire group there.

Poor chickens.

They had a good life. I don’t know yet what I’ll do with Gwendolyn and Livia. Especially Livia is very special: she’s always the first to arrive when we bring food. Fiercely fighting for it with Gwendolyn.

Poor August. He was so strong and defensive for his chickens. He cut me several times during one of his attacks (before I learned how to tame him).

But I couldn’t let him be in agony of sickness.

November 19, 2023

We now invite proposals for presentations. FOSDEM offers open source and free software developers a place to meet, share ideas and collaborate. Renowned for being highly developer-oriented, the event brings together some 8000+ geeks from all over the world. The twenty-fourth edition will take place on Saturday 3rd and Sunday 4th February 2024 at the usual location, ULB Campus Solbosch in Brussels. Developer Rooms For more details about the Developer Rooms, please refer to the Calls for Papers at https://fosdem.org/2024/news/2023-11-08-devrooms-announced. Main Tracks Main track presentations cover topics of interest to a significant part of our audience that do not舰
With great pleasure we can announce the following projects will have a stand at FOSDEM 2024 (3 & 4th of February). This is the list of stands (in alphabetic order): AlmaLinux Ansible Automotive Grade Linux Badge.Team CentOS, RDO and OKD CERN Open Source Checkmk CiviCRM Code your Future Codeberg coreboot / flashrom / EDKII / OpenBMC Debian CalyxOS + Droidian Linux Fedora Project FOSSASIA Free Software Foundation Europe FreeCAD GNOME GNU HEALTH PROJECT GNU Radio with IQEngine Google Summer of Code 20th year stand Grafana Homebrew Infostand on amateur radio ISRG - Let's Encrypt, Prossimo & Divi Up Jenkins舰

November 15, 2023

CI/CD Platform Overview

When you want or need to use CI/CD you have a lot of CI/CD platforms where you can choose from. As with most “tools”, the tool is less important. What (which flow, best practices, security benchmarks, etc) and how you implement it, is what matters.

One of the most commonly used options is Jenkins.

I used and still use Jenkins and created a jenkins build workstation to build software and test in my homelab a couple of years back.

jenkins

Jenkins started as Hudson at Sun Microsystem(RIP). Hudson is one of the many open-source projects that were started at Sun and killed by Oracle. Jenkins continued as the open-source fork of Hudson.

Jenkins has evolved. If you need to do more complex things you probably end up creating a lot of groovy scripts, nothing wrong with groovy. But as with a lot of discussions about programming, the ecosystem (who is using it, which libraries are available, etc) is important.

Groovy isn’t that commonly used in and known in the system administration ecosystem so this is probably something you need to learn if you’re coming for the system administrator world ( as I do, so I learnt the basics of Groovy this way ).

The other option is to implement CI/CD using the commonly used source hosting platforms; GitHub and GitLab.

Both have offerings that you can use on-premise;

There are other CI/CD systems available. Other examples are Tekon - used by RedHat on OpenShift - and Drone which have gained popularity. Just to name a few.

I started to use GitLab in my lab to build/test/and deploy software. Why GitLab? with Gitlab you have the option to use the same pipelines as on the Gitlab source hosting platform on your on-premise installation.

gitlab

Gitlab comes in two versions;

You can find more information about the difference between the two versions at: https://about.gitlab.com/handbook/marketing/brand-and-product-marketing/product-and-solution-marketing/tiers/.

In general, I don’t like this approach, having a paid non-open source version doesn’t help that end-users (developers) and different companies work together to improve the software. Companies that use the software will pay for support and consultancy services in the long run.

But it is what it is. If you want to use GitLab CI/CD on GitLab in your homelab environment GitLab-CE is the option to go.

In this blog series, we will go over the installation of GitLab CE and its basic features, as always my blog posts are my installation instructions/notes that I took during my setup in the hope that it is useful to somebody else.

Installation

All actions are executed on Debian GNU/Linux 12 (bookworm) on X86 and ARM64 as I migrated all my systems that are running constantly on a cluster running on Raspberry Pi’s 4 to save power.

I mainly follow the official installation instructions at https://about.gitlab.com/install/#debian and the blog post at: https://www.linuxtechi.com/how-to-install-gitlab-on-ubuntu/.

Install requirements

Update

As always it’s a good idea to update your software (regularly).

Update the package database.

staf@tstgitlab:~$ sudo apt update -y
[sudo] password for staf: 
Get:1 http://security.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:2 http://deb.debian.org/debian bookworm InRelease [151 kB]                                         
Get:3 http://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]                                               
Get:4 http://deb.debian.org/debian bookworm-backports InRelease [56.5 kB]
Get:5 http://security.debian.org/debian-security bookworm-security/main Sources [57.5 kB]
Get:7 http://security.debian.org/debian-security bookworm-security/main amd64 Packages [95.7 kB]
Get:8 http://security.debian.org/debian-security bookworm-security/main Translation-en [54.4 kB]
Get:9 http://deb.debian.org/debian bookworm-backports/main Sources.diff/Index [63.3 kB]
Get:10 http://deb.debian.org/debian bookworm-backports/main amd64 Packages.diff/Index [63.3 kB]
Hit:6 https://packages.gitlab.com/gitlab/gitlab-ce/debian bookworm InRelease
Get:11 http://deb.debian.org/debian bookworm-backports/main Sources T-2023-11-03-1405.27-F-2023-11-03-1405.27.pdiff [836 B]
Get:11 http://deb.debian.org/debian bookworm-backports/main Sources T-2023-11-03-1405.27-F-2023-11-03-1405.27.pdiff [836 B]
Get:12 http://deb.debian.org/debian bookworm-backports/main amd64 Packages T-2023-11-03-2011.07-F-2023-11-03-2011.07.pdiff [1,216 B]
Get:12 http://deb.debian.org/debian bookworm-backports/main amd64 Packages T-2023-11-03-2011.07-F-2023-11-03-2011.07.pdiff [1,216 B]
Fetched 644 kB in 1s (793 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
staf@tstgitlab:~$

Upgrade the packages.

staf@tstgitlab:~$ sudo apt upgrade -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
staf@tstgitlab:~$ 

Install the required packages

Install the required dependencies.

staf@tstgitlab:~$ sudo apt-get install -y curl openssh-server ca-certificates perl
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
curl is already the newest version (7.88.1-10+deb12u4).
curl set to manually installed.
openssh-server is already the newest version (1:9.2p1-2+deb12u1).
ca-certificates is already the newest version (20230311).
ca-certificates set to manually installed.
perl is already the newest version (5.36.0-7).
perl set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
staf@tstgitlab:~$ 

If you want your system to send out emails for GitLab, you’ll need to set up a local mail server like Postfix or you can reconfigure GitLab to use another email server. See below for references on how to configure this on GitLab.

Setup the GitLab CE Repositories

GitLab provides a script to set up the GitLab repositories for both GitLab CE and GitLab EE. We’ll use GitLab CE. As always it’s a good idea to not run a script that you pulled for the internet blindly; we’ll download the script, review it and execute it.

Create a directory.

staf@tstgitlab:~$ mkdir gitlab
staf@tstgitlab:~$ 

Download the repository setup script.

staf@tstgitlab:~/gitlab$ wget https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh
--2023-11-03 08:11:58--  https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh
Resolving packages.gitlab.com (packages.gitlab.com)... 172.64.148.245, 104.18.39.11, 2606:4700:4400::6812:270b, ...
Connecting to packages.gitlab.com (packages.gitlab.com)|172.64.148.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6865 (6.7K) [text/install_instructions]
Saving to: ‘script.deb.sh’

script.deb.sh       100%[===================>]   6.70K  --.-KB/s    in 0s      

2023-11-03 08:11:58 (35.9 MB/s) - ‘script.deb.sh’ saved [6865/6865]

staf@tstgitlab:~/gitlab$ 

Review the script.

staf@tstgitlab:~/gitlab$ vi script.deb.sh

Execute it.

staf@tstgitlab:~/gitlab$ sudo bash script.deb.sh
Detected operating system as debian/12.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Running apt-get update... done.
Installing debian-archive-keyring which is needed for installing 
apt-transport-https on many Debian systems.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/gitlab_gitlab-ce.list...done.
Importing packagecloud gpg key... done.
Running apt-get update... done.

The repository is setup! You can now install packages.
staf@tstgitlab:~/gitlab$ 

Installation

The installation is straightforward, you need to specify the EXTERNAL_URL environment variable and install the gitlab-ce package.

staf@tstgitlab:~/gitlab$ sudo EXTERNAL_URL="http://tstgitlab" apt install gitlab-ce
<snip>
Notes:
Default admin account has been configured with following details:
Username: root
Password: You didn't opt-in to print initial root password to STDOUT.
Password stored to /etc/gitlab/initial_root_password. This file will be cleaned up in first reconfigure run after 24 hours.

NOTE: Because these credentials might be present in your log files in plain text, it is highly recommended to reset the password following https://docs.gitlab.com/ee/security/reset_use
r_password.html#reset-your-root-password.

gitlab Reconfigured!

       *.                  *.
      ***                 ***
     *****               *****
    .******             *******
    ********            ********
   ,,,,,,,,,***********,,,,,,,,,
  ,,,,,,,,,,,*********,,,,,,,,,,,
  .,,,,,,,,,,,*******,,,,,,,,,,,,
      ,,,,,,,,,*****,,,,,,,,,.
         ,,,,,,,****,,,,,,
            .,,,***,,,,
                ,*,.
  


     _______ __  __          __
    / ____(_) /_/ /   ____ _/ /_
   / / __/ / __/ /   / __ `/ __ \
  / /_/ / / /_/ /___/ /_/ / /_/ /
  \____/_/\__/_____/\__,_/_.___/
  

Thank you for installing GitLab!
GitLab should be available at http://tstgitlab

For a comprehensive list of configuration options please see the Omnibus GitLab readme
https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md

Help us improve the installation experience, let us know how we did with a 1 minute survey:
https://gitlab.fra1.qualtrics.com/jfe/form/SV_6kVqZANThUQ1bZb?installation=omnibus&release=16-5

staf@tstgitlab:~/gitlab$ 

Open your browser to the GitLab URL if everything goes well you see the GitLab login screen.

Alt text

You can log in as the root user and the initial password that was created during the installation. This password is stored in the /etc/gitlab/initial_root_password file.

Please note that the password file will be deleted by GitLab automatically, so copy it and keep it in a safe place.

Display the password.

staf@tstgitlab:~$ ls -l /etc/gitlab/initial_root_password 
-rw------- 1 root root 749 Nov  3 08:18 /etc/gitlab/initial_root_password
staf@tstgitlab:~$ 
staf@tstgitlab:~$ sudo cat /etc/gitlab/initial_root_password 
[sudo] password for staf: 
# WARNING: This value is valid only in the following conditions
#          1. If provided manually (either via `GITLAB_ROOT_PASSWORD` environment variable or via `gitlab_rails['initial_root_password']` setting in `gitlab.rb`, it was provided before database was seeded for the first time (usually, the first reconfigure run).
#          2. Password hasn't been changed manually, either via UI or via command line.
#
#          If the password shown here doesn't work, you must reset the admin password following https://docs.gitlab.com/ee/security/reset_user_password.html#reset-your-root-password.

Password: <snip>

# NOTE: This file will be automatically deleted in the first reconfigure run after 24 hours.
staf@tstgitlab:~$ 

And login:

Alt text

Post configuration

HTTPS

GitLab uses NGINX under the hood, enabling https on our installation is reconfiguring NGINX. GitLab also comes with Let’s Encrypt support; see https://docs.gitlab.com/omnibus/settings/ssl/ for more information. When you specify https:// as part of the EXTERNAL_URL environment variable during the installation the GitLab installer will set up Let’s Encrypt automatically. But I didn’t try this, in this blog post we’ll set up a self-signed certificate.

Security (rant)

In a corporate environment, you might want to use an internal certificate authority. But if you want/need to set up an internal CA authority it needs to be secured.

You need to protect your private keys and limit access to the authority, if you don’t have the resources or the time available to set up CA authority in a decent way, it’s better not you use an internal CA authority as this is an attack vector - if an attacker has gained access to the CA authority - or has copied the private key - he/she can generate certificates for every host/domain.

Bottom line; if you don’t have the time to set up CA authority in a secure way it’s better to use Let’s Encrypt or another third-party Authority. Or even just use a self-signing certificate.

I use a HSM - SmartCard-HSM and my CA Authority is offline see https://stafwag.github.io/blog/blog/2020/04/29/setup-an-ca-with-smartcard/

Create a self-signed certificate

I mainly followed the official documentation at GitLab: https://docs.gitlab.com/omnibus/settings/ssl/index.html#configure-https-manually

Set the external_url & disable let’s encrypt

Open your favourite editor.

root@gitlab:~# nvi /etc/gitlab/gitlab.rb 
root@gitlab:~# 

Update the external_url to use https://.

## GitLab URL
##! URL on which GitLab will be reachable.
##! For more details on configuring external_url see:
##! https://docs.gitlab.com/omnibus/settings/configuration.html#configuring-the-external-url-for-gitlab
##!
##! Note: During installation/upgrades, the value of the environment variable
##! EXTERNAL_URL will be used to populate/replace this value.
##! On AWS EC2 instances, we also attempt to fetch the public hostname/IP
##! address from AWS. For more details, see:
##! https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
external_url 'https://gitlab.stafnet.local'

Set letsencrypt['enable'] = false to disable let’s encrypt support.

################################################################################
# Let's Encrypt integration
################################################################################
letsencrypt['enable'] = false

Create a self-signed certificate

What is a self-signed certificate?

A certificate is a way to distribute a public key. A certificate is normally signed by a CA authority private key. You trust the CA authority public key distributed in the CA certificate.

A “self-signed certificate” is just a certificate that is signed with the corresponding private key of the certificate instead of the private key of the CA authority.

Create the SSL directory

Create the /etc/gitlab/ssl directory.

root@gitlab:/etc/gitlab# ls
config_backup  gitlab.rb  gitlab-secrets.json  trusted-certs
root@gitlab:/etc/gitlab# mkdir ssl
root@gitlab:/etc/gitlab# 

And set the permissions.

root@gitlab:/etc/gitlab# chmod 755 ssl
root@gitlab:/etc/gitlab# 
Private key

We’ll create a private key for our self-signed certificate, please note that we don’t encrypt the private key as this would as we would need to type in the password for the private key password each time we start GitLab.

Create a private key.

root@gitlab:/etc/gitlab/ssl# openssl genrsa -out gitlab.stafnet.local.key 4096
root@gitlab:/etc/gitlab/ssl# 
CSR

Create a certificate request.

root@gitlab:/etc/gitlab/ssl# openssl req -key gitlab.stafnet.local.key -new -out gitlab.stafnet.local.csr
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:BE
State or Province Name (full name) [Some-State]:Antwerp
Locality Name (eg, city) []:Antwerp
Organization Name (eg, company) [Internet Widgits Pty Ltd]:stafnet
Organizational Unit Name (eg, section) []:         
Common Name (e.g. server FQDN or YOUR name) []:gitlab.stafnet.local
Email Address []:

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
root@gitlab:/etc/gitlab/ssl# 
Sign

Sign your certificate with your private key.

root@gitlab:/etc/gitlab/ssl# openssl x509 -signkey gitlab.stafnet.local.key -in gitlab.stafnet.local.csr -req -days 365 -out gitlab.stafnet.local.crt
Certificate request self-signature ok
subject=C = BE, ST = Antwerp, L = Antwerp, O = stafnet, CN = gitlab.stafnet.local
root@gitlab:/etc/gitlab/ssl# 

Verify permissions

Make sure that your private key is not world-readable.

root@gitlab:/etc/gitlab/ssl# ls -l
total 12
-rw-r--r-- 1 root root 1895 Nov 12 10:22 gitlab.stafnet.local.crt
-rw-r--r-- 1 root root 1691 Nov 12 10:20 gitlab.stafnet.local.csr
-rw------- 1 root root 3272 Nov  8 20:25 gitlab.stafnet.local.key
root@gitlab:/etc/gitlab/ssl# 

Reconfigure GitLab

Open gitlab.rb in your favourite editor.

root@gitlab:/etc/gitlab/ssl# vi /etc/gitlab/gitlab.rb
root@gitlab:/etc/gitlab/ssl# 

Update the external_url setting.

##! EXTERNAL_URL will be used to populate/replace this value.
##! On AWS EC2 instances, we also attempt to fetch the public hostname/IP
##! address from AWS. For more details, see:
##! https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
external_url 'https://gitlab.stafnet.local'

Add the following settings to the bottom of the file.

nginx['redirect_http_to_https'] = true
registry_nginx['redirect_http_to_https'] = true
mattermost_nginx['redirect_http_to_https'] = true

Run gitlab-ctl reconfigure to reconfigure gitlab.

root@gitlab:/etc/gitlab/ssl# gitlab-ctl reconfigure
oot@gitlab:/etc/gitlab/ssl# gitlab-ctl reconfigure
[2023-11-12T11:05:43+01:00] INFO: Started Cinc Zero at chefzero://localhost:1 with repository at /opt/gitlab/embedded (One version per cookbook)
Cinc Client, version 17.10.0
Patents: https://www.chef.io/patents
Infra Phase starting
[2023-11-12T11:05:43+01:00] INFO: *** Cinc Client 17.10.0 ***
[2023-11-12T11:05:43+01:00] INFO: Platform: aarch64-linux
[2023-11-12T11:05:43+01:00] INFO: Cinc-client pid: 52849
[2023-11-12T11:05:46+01:00] INFO: Setting the run_list to ["recipe[gitlab]"] from CLI options
[2023-11-12T11:05:46+01:00] INFO: Run List is [recipe[gitlab]]
[2023-11-12T11:05:46+01:00] INFO: Run List expands to [gitlab]
<snip>
  * template[/var/opt/gitlab/postgres-exporter/queries.yaml] action create (up to date)
  * consul_service[postgres-exporter] action delete
    * file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] action delete (up to date)
     (up to date)
Recipe: gitlab::database_reindexing_disable
  * crond_job[database-reindexing] action delete
    * file[/var/opt/gitlab/crond/database-reindexing] action delete (up to date)
     (up to date)
[2023-11-12T11:06:20+01:00] INFO: Cinc Client Run complete in 34.625114871 seconds

Running handlers:
[2023-11-12T11:06:20+01:00] INFO: Running report handlers
Running handlers complete
[2023-11-12T11:06:20+01:00] INFO: Report handlers complete
Infra Phase complete, 0/805 resources updated in 37 seconds
gitlab Reconfigured!

If everything goes well you’re able to login to your gitlab instance over https://.

Alt text

Email

By default, GitLab will use localhost as the email server. You either need to reconfigure the local email server to be able to send out emails. Or you can reconfigure GitLab to use an external SMTP server.

You can find more information on how to use an external email server in GitLab at: https://docs.gitlab.com/omnibus/settings/smtp.

Backup

The GitLab backup/restore procedure is explained at: https://docs.gitlab.com/ee/administration/backup_restore/

To execute a backup run the gitlab-backup create command.

The backups are restored in the /var/opt/gitlab/backups directory.

root@gitlab:/etc/gitlab/ssl# ls -l /var/opt/gitlab/backups
total 3684
-rw------- 1 git git  491520 Oct  7 11:53 1696672275_2023_10_07_16.4.0_gitlab_backup.tar
-rw------- 1 git git  512000 Nov  3 11:36 1699007796_2023_11_03_16.4.1_gitlab_backup.tar
-rw------- 1 git git 1382400 Nov 12 11:49 1699786111_2023_11_12_16.5.1_gitlab_backup.tar
-rw------- 1 git git 1382400 Nov 12 11:53 1699786384_2023_11_12_16.5.1_gitlab_backup.tar
root@gitlab:/etc/gitlab/ssl# 

Have fun!

Links

We’ve seen in previous articles what Transparent Read/Write Splitting is with MySQL 8.2 and how to use it with MySQL Connector/Python.

I love this new feature, but I was wondering if it was really worth it.

Will the application benefit from offloading reads to another node, and won’t using the MySQL Router and parsing the request slow down connections?

These are the kinds of questions I’d like to cover and answer in this article.

The environment

To perform the test, I use the following environment:

  • Linux Kernel 5.15.0 – aarch64
  • MySQL Community Server 8.2.0
  • MySQL Router 8.2.0
  • sysbench 1.1.0 using LuaJIT 2.1.0-beta3
  • VM.Standard.A1.Flex – Neoverse-N1 (50 BogoMIPS) 4 cores
  • 24GB of RAM

MySQL InnoDB Cluster running on 3 machines and one machine for MySQL Router and Sysbench.

Sysbench is prepared with 8 tables of 100000 records each.

MySQL Connections are using SSL.

The MySQL InnoDB Cluster

This is the overview of the cluster in MySQL Shell:

JS > cluster.describe()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "topology": [
            {
                "address": "mysql1:3306", 
                "label": "mysql1:3306", 
                "role": "HA"
            }, 
            {
                "address": "mysql2:3306", 
                "label": "mysql2:3306", 
                "role": "HA"
            }, 
            {
                "address": "mysql3:3306", 
                "label": "mysql3:3306", 
                "role": "HA"
            }
        ], 
        "topologyMode": "Single-Primary"
    }
}

JS > cluster.status()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql1:3306": {
                "address": "mysql1:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "mysql2:3306": {
                "address": "mysql2:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "mysql3:3306": {
                "address": "mysql3:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "mysql1:3306"
}
JS > cluster.status()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql1:3306": {
                "address": "mysql1:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "mysql2:3306": {
                "address": "mysql2:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "mysql3:3306": {
                "address": "mysql3:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "mysql1:3306"
}

And the MySQL Router that has been bootstrapped to the cluster, where we can see all the ports:

JS > cluster.listRouters()
{
    "clusterName": "myCluster", 
    "routers": {
        "router.sub09280951550.mysqlgermany.oraclevcn.com::system": {
            "hostname": "router.subXXXXXXX.mysqlgermany.oraclevcn.com", 
            "lastCheckIn": "2023-11-15 09:27:18", 
            "roPort": "6447", 
            "roXPort": "6449", 
            "rwPort": "6446", 
            "rwSplitPort": "6450", 
            "rwXPort": "6448", 
            "version": "8.2.0"
        }
    }
}

OLTP Read/Write

We use oltp_read_write.lua script for sysbench using directly the primary node (on port 3306), then we run it again using MySQL Router’s read/write dedicated port (6446) and finally the read/write split port (6450).

8 threads are used and 3 runs are made each time.

This is the command used, the host and the port are changing of course:

$ sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql \
   --mysql-user=sysbench --mysql-password=XxxxXX --mysql-ssl=REQUIRED \
   --mysql-host=<...> --mysql-port=<...> --tables=8 --table-size=100000 \
   --threads=8 run

8 threads isn’t a lot, and there isn’t much contention in this workload, but we can see that the difference is minimal. Of course, analyzing each transaction/query and sending them to different hosts has a cost.

OLTP Read Only

This time, we run again sysbench but we use the oltp_read_only.lua script. Again 8 threads and 3 runs:

We see a a bigger difference here, as there are no writes and a single machine can easily handle the full load. The cost of parsing the request and sending it to several servers is therefore higher.

Of course this is only because the server can handle the load without any problem.

If we have a read-only workload, then I’d suggest using MySQL Router’s Read-Only port: 6447.

OLTP Write Only

We also tested the write only workload using the oltp_write_only.lua script. Again 8 threads and 3 runs:

We can see that again the difference is minimal but it exists of course.

More Complex Workload

Using a more complex and less optimized workload, such as oltp_update_non_index.lua, we can see a significant difference…

Conclusion

MySQL 8.2’s Transparent Read/Write splitting is a very interesting feature that we’ve all been waiting for, and comes at a very small price in terms of performance – even beneficial for less linear workloads.

You really should test it out and send us your comments.

Enjoy MySQL, MySQL Router and all integrated solutions (InnoDB ReplicaSet, InnoDB Cluster, InnoDB ClusterSet, InnoDB Read Replicas).

November 14, 2023

After 12 consecutive years of MySQL presence at FOSDEM, the MySQL Devroom has been rejected of the 2024 edition.

The MySQL & Friends Devroom has been always very popular, almost full for each sessions and maybe the room with the most received submissions.

This is also the reason why since 2017, MySQL was proposing a fringe event known as preFOSDEM Day(s).

The MySQL Community was very sad about FOSDEM’s decision and this is why we are back with an event 100% dedicated to MySQL on February 1st and 2nd: the MySQL Belgian Days 2024 !

We’re currently working on the program. But you can already expect to hear about the latest enhancements and features in MySQL 8.0 and Innovation Releases, customer testimonials, how the MySQL HeatWave service recently expanded its offering, and sessions from our Great Community.

Although the agenda is not yet known, you can already register for this event and, as usual, only register if you really intend to attend, as places are limited!

Register here !

I released AOPro 2.2 today and the main new feature is the ability to delay the loading/ rendering of iFrames. Despite the fact iFrames can be lazy-loaded (using browser-native attribute), in reality this does not always work in WordPress, hence the new “delay iFrames” option on the Pro Boosters tab. Apart from delaying iFrames, AOPro 2.2 also has...

Source

November 10, 2023

With my Youtube addiction I learned from channels where they rebuild cars, how to renovate your car’s leather seats.

From there I started wondering: can’t I do this with shoes too?

I’m a big fan of Mephisto shoes. Each time I wore them down, I bought a new pair and I kept the old ones for getting the eggs from my chickens and other dirty gardening things.

Any other kind of shoes I owned really didn’t last long but no matter what I did, the Mephisto shoes simply wouldn’t break.

Then from what I learned I used a heat gun to get most of the wrinkles out of the leather. Put a cloth tightly inside the shoe so that it is shaped right in the front. Don’t use your hands for that: you’ll get burned.

Then with a leather repear kit I repainted them.

Then I applied beeswax on the leather. You let the beeswax melt into the pores of the leather by heating it a little bit with the heat gun.

My old Mephisto pairs look almost as good as my new pairs.

Incredible. I have too many pairs of shoes now.

I don’t have pictures from before. But you can trust me that these shoes had basically no more color at all.

You can see that while I was getting the wrinkles out of the leather, I burned the laces with my heat gun accidentally. Guess I’ll need to buy new laces now. The shoe had a lot of wrinkles where it typically bends.

The leather repair kit came with different colors, so I used four different colors on the shoe (as I remembered the original shoe came with different colors there too).

To apply the leather repair paint is about the same as applying shoe polish. It just stays on permanently. Afterwards beeswax. And you’re done.

ps. I wouldn’t try the heat gun trick on shoes of which you are not sure that all of its areas are real leather. Look at my laces: it melts stuff. Heat tends to do that. Ehm, it’s probably a good idea to remove the laces too.

ps. I recommend to tryout on old shoes first.

The gift of time

Maintaining a free software project is spending years of your life to solve a problem that would have taken several hours or even days without the software.

Which is, joke aside, an incredible contribution to the common good.

The time saved is multiplied by the number of users and quickly compound. They are saving time without the need to exchange their own time.

Free software offers free time, free life extension to many human living now and maybe in the future.

Instead of contributing to the economy, free software developers contribute to humanity. To the global progress.

Free software is about making our short lifetimes a common good instead of an economical product.

As a writer and an engineer, I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

If you read French, you can support me by buying/sharing/reading my books and subscribing to my newsletter in French or RSS. I also develop Free Software.

November 09, 2023

As you know, one of the most eagerly waited features was released with MySQL 8.2: the transparent read/write splitting.

In this post, we’ll look at how to use it with MySQL-Connector/Python.

Architecture

To play with our Python program, we will use an InnoDB Cluster.

This is an overview of the cluster in MySQL Shell:

JS > cluster.status()
{
    "clusterName": "fred", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "127.0.0.1:3310", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "127.0.0.1:3310": {
                "address": "127.0.0.1:3310", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "127.0.0.1:3320": {
                "address": "127.0.0.1:3320", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }, 
            "127.0.0.1:3330": {
                "address": "127.0.0.1:3330", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.2.0"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "127.0.0.1:3310"
}

JS > cluster.listRouters()
{
    "clusterName": "fred", 
    "routers": {
        "dynabook::system": {
            "hostname": "dynabook", 
            "lastCheckIn": "2023-11-09 17:57:59", 
            "roPort": "6447", 
            "roXPort": "6449", 
            "rwPort": "6446", 
            "rwSplitPort": "6450", 
            "rwXPort": "6448", 
            "version": "8.2.0"
        }
    }
}

MySQL Connector/Python

The Python program uses MySQL-Connector/Python 8.2.0.

This is the initial code:

import mysql.connector

cnx = mysql.connector.connect(user='python',
                              passowrd='Passw0rd!Python',
                              host='127.0.0.1',
                              port='6450')

cursor = cnx.cursor()

query = ("""select member_role, @@port port 
            from performance_schema.replication_group_members 
            where member_id=@@server_uuid""")

for (role, port) in cursor:
    print("{} - {}".format(role, port))

cursor.close()
cnx.close()

We can already test it:

$ python test_router.py 
PRIMARY - 3310

Good, we can connect to the cluster using the read/write splitting port (6540) and execute the query…. oh ?! But why are we reaching the Primary instance ?

Shouldn’t we access a Read/Only instance (one of the Secondaries) ?

autocommit

Connector/Python disables autocommit by default (see MySQLConnection.autocommit Property). And the Read/Write Splitting functionality must have autocommit enabled to work properly.

Add the following code above line 8:

cnx.autocommit = True

Then we can run the program again:

$ python test_router.py 
SECONDARY - 3320
$ python test_router.py 
SECONDARY - 3330

Great, it works as expected !

query attributes

Now let’s see how to force execution of the query on the Primary node.

The MySQL Router offers the possibility of using a query attribute to force the Read/Write Split decision: router.access_mode.

Add the following line just before executing the query (cursor.execute(query)):

 cursor.add_attribute("router.access_mode", "read_write")

Let’s execute it one more time:

$ python test_router.py 
PRIMARY - 3310

The accepted values for the router.access_mode are:

  • auto
  • read_only
  • read_write

Test with DML

Let’s try something different, we’re going to insert rows into a table.

We’ll use the following table:

CREATE TABLE `t1` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `port` int DEFAULT NULL,
  `role` varchar(15) DEFAULT NULL,
  `timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB 

We’ll use the following python script:

import mysql.connector

cnx = mysql.connector.connect(user='python', 
                              password='Passw0rd!Python',
                              host='127.0.0.1',
                              port='6450',
                              database='test')
cnx.autocommit = True
cursor = cnx.cursor()

for i in range(3):
    query = ("""insert into t1 values(0, @@port, (
          select member_role
            from performance_schema.replication_group_members 
            where member_id=@@server_uuid), now())""")
    cursor.execute(query)

cursor.close()
cnx.close()

for i in range(3):
    cnx = mysql.connector.connect(user='python', 
                              password='Passw0rd!Python',
                              host='127.0.0.1',
                              port='6450',
                              database='test')
    cnx.autocommit = True
    cursor = cnx.cursor()
    query = ("""select *, @@port port_read from t1""") 
    cursor.execute(query)
    for (id, port, role, timestamp, port_read) in cursor:
             print("{} : {}, {}, {} : read from {}".format(id, 
                                             port, 
                                             role, 
                                             timestamp, 
                                             port_read))

    cursor.close()
    cnx.close()

Let’s execute it :

$ python test_router2.py 
1 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330
2 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330
3 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330
1 : 3310, PRIMARY, 2023-11-09 18:44:00 : read from 3320
2 : 3310, PRIMARY, 2023-11-09 18:44:00 : read from 3320
3 : 3310, PRIMARY, 2023-11-09 18:44:00 : read from 3320
1 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330
2 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330
3 : 3310, PRIMARY, 2023-11-09 17:44:00 : read from 3330

We can see that there were no errors and that we wrote to the Primary node and read from all Secondaries.

Be careful if you set the query attribute for router.access_mode to read_only just before writing (line 16), you’ll get an error as writes are not allowed on a secondary:

_mysql_connector.MySQLInterfaceError: The MySQL server is running with the --super-read-only option so it cannot execute this statement

Transactions

Now we’re going to play with transactions. We create a new script that will perform several transactions:

  1. a read operation in autocommit
  2. a read operation in a transaction (by default, this is a read/write transaction)
  3. a read operation in a read only transaction
  4. a transaction with several inserts and rollback

This is the source of the program:

import mysql.connector

cnx = mysql.connector.connect(user='python', 
                              password='Passw0rd!Python',
                              host='127.0.0.1',
                              port='6450',
                              database='test')
cnx.autocommit = True
cursor = cnx.cursor()
query = ("""select member_role, @@port port 
            from performance_schema.replication_group_members 
            where member_id=@@server_uuid""")
cursor.execute(query)

for (role, port) in cursor:
    print("{} - {}".format(role, port))


cnx.start_transaction()
query = ("""select member_role, @@port port 
            from performance_schema.replication_group_members 
            where member_id=@@server_uuid""")
cursor.execute(query)

for (role, port) in cursor:
    print("{} - {}".format(role, port))

cnx.commit()

cnx.start_transaction(readonly=True)
query = ("""select member_role, @@port port 
            from performance_schema.replication_group_members 
            where member_id=@@server_uuid""")
cursor.execute(query)

for (role, port) in cursor:
    print("{} - {}".format(role, port))
    

cnx.commit()

cnx.start_transaction()

for i in range(3):
    query = ("""insert into t1 values(0, @@port, (
          select member_role
            from performance_schema.replication_group_members
            where member_id=@@server_uuid), now())""")
    cursor.execute(query)

cnx.rollback()
cursor.close()

cnx.close()

Let’s execute the script:

$ python test_router3.py 
SECONDARY - 3320
PRIMARY - 3310
SECONDARY - 3320

We can see that the first operation (1) reached a secondary instance, the second operation (2), which was a transaction, reached the primary node.

The read-only transaction (3) reached a secondary node.

We didn’t get any errors for the multiple writes that were part of the transaction we rolled back.

Conclusion

We’ve seen how easy it is to use MySQL Connector/Python with MySQL 8.2 Read/Write Splitting for an InnoDB Cluster.

Enjoy using MySQL Read / Write Splitting with MySQL Connector Python !

November 07, 2023

We are pleased to announce the developer rooms that will be organised at FOSDEM 2024. Developer rooms are assigned to self-organising groups to work together on open source projects, to discuss topics relevant to a broader subset of the community, etc. The individual developer room organisers will issue their calls for participation in the next few days. The list below will be updated accordingly. Topic Call for Participation AI and Machine Learning CfP APIs & friends CfP Collaboration and Content Management CfP Community CfP Confidential Computing CfP Containers CfP Debuggers and analysis tools CfP Declarative and Minimalistic Computing CfP舰

November 05, 2023

During last Ubuntu Summit in Riga, I got an interesting question from Alex Lutay, engineering manager at Canonical regarding the different ways to perform some Admin API commands in MySQL Shell.

He wanted to know why, in the manual and blogs, we always use separate commands instead of combining them, as he would expect this to have an impact on the connections established with the server.

Let’s illustrate this by looking at the different ways of obtaining the state of a cluster. Here are the four different methods:

Method 1

This is the method most frequently used in the documentation:

$ mysqlsh admin@10.0.0.1
JS> cluster=dba.getCluster()
JS> cluster.status() 

Method 2

This is Alex’s preferred method as he expects to have less round trips with the network:

$ mysqlsh admin@10.0.0.1
JS> dba.getCluster().status()

Method 3

In this method, we force the creation of the cluster object when the connection is made:

$ mysqlsh admin@10.0.0.1 --cluster
JS> cluster.status()

Method 4

The last method consists of using the MySQL Shell line arguments to call the status command directly without entering the MySQL Shell interactive mode:

$ mysqlsh admin@10.0.0.1 -- cluster status

In fact, this shouldn’t matter because, in the background, the MySQL Shell Admin API makes exactly the same calls to the server.

I have captured the network traffic for all four methods and the size of the captured data is almost the same with some slightly noticeable differences :

$ ls -lh cap*tcp
-rw-r--r-- 1 fred fred 22K Nov  5 22:29 cap1.tcp
-rw-r--r-- 1 fred fred 21K Nov  5 22:29 cap2.tcp
-rw-r--r-- 1 fred fred 22K Nov  5 22:29 cap3.tcp
-rw-r--r-- 1 fred fred 20K Nov  5 22:30 cap4.tcp

This difference is mainly due to the quantity of packets captured for each test:

$ for i in $(ls cap*.tcp)
> do
> echo $i $(cat $i | wc -l)
> done
cap1.tcp 139
cap2.tcp 133
cap3.tcp 140
cap4.tcp 127

Why do we have then some difference ?

This is because MySQL Shell in interactive mode runs some statements to set some session variables, and Shell variables (and this could be even more if you have a special prompt).

Here are some examples of instructions captured when connecting to the server with the MySQL Shell

select @@sql_mode;
SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
SELECT @@SESSION.session_track_system_variables
SET SESSION session_track_system_variables = 'time_zone,autocommit,
   character_set_client, character_set_results,
   character_set_connection, sql_mode';
select concat(@@version, ' ', @@version_comment)
select schema()

Conclusion

The difference is minimal, but if you really need to optimise your calls, avoiding the MySQL Shell’s interactive mode is the best option.

At least you’re now familiar with the four options for achieving the same result.

Enjoy MySQL Shell and the Admin API !

November 03, 2023

Pourquoi sommes-nous tellement accros à Google Maps et Waze ?

S’il y’a bien un logiciel propriétaire difficile à lâcher, c’est Google Maps. Ou Waze, qui appartient également à Google. Pourquoi est-ce si compliqué de produire un logiciel de navigation libre ? Ayant passé quelques années dans cette industrie, je vais vous expliquer les différents composants d’un logiciel de navigation.

Les briques de base d’un logiciel de navigation sont la position, les données, le mapmatching, le routing, la recherche et les données temps réel. Pour chaque composant, je propose une explication et une analyse des solutions libres.

La position

Le premier composant est un système de positionnement qui va fournir une coordonnée géographique avec, parfois, un degré de précision. Une longitude et une latitude, tout simplement.

Il existe plusieurs manières d’estimer une position. Le plus connu est le GPS qui capte des ondes émises par les satellites du même nom. Contrairement à une idée tenace, votre téléphone n’émet rien lorsqu’il utilise le GPS, il se contente d’écouter les signaux GPS tout comme une radio FM écoute les ondes déjà présentes. Votre téléphone n’a de toute façon pas la puissance d’émettre jusqu’à un satellite. Les satellites GPS sont, au plus près, à 20.000 km de vous. Vous croyez que votre téléphone puisse envoyer un signal à 20.000 km ?

Pour simplifier à outrance, le principe d’un satellite GPS est d’émettre en permanence un signal avec l’heure qu’il est à son bord. Votre téléphone, en captant ce signal, compare cette heure avec sa propre horloge interne. Le décalage entre les deux permet de mesurer la distance entre le téléphone et le satellite, sachant que l’onde se déplace à la vitesse de la lumière, une onde radio n’étant que de la lumière dans un spectre invisible à l’œil humain. En refaisant cette opération pour trois satellites dont la position est connue, le téléphone peut, par triangulation, connaître sa position exacte.

Fait intéressant: ce calcul n’est possible qu’en connaissant la position des satellites GPS. Ces positions étant changeantes et difficilement prévisibles à long terme, elles sont envoyées par les satellites eux-mêmes, en plus de l’heure. On parle des « éphémérides ». Cependant, attendre l’envoi des éphémérides complètes peut prendre plusieurs minutes, le signal GPS ne pouvant envoyer que très peu de données.

C’est la raison pour laquelle un GPS éteint depuis longtemps mettra un long moment avant d’afficher sa position. Un GPS éteint depuis quelques heures seulement pourra réutiliser les éphémérides précédentes. Et pour votre smartphone, c’est encore plus facile : il profite de sa connexion 4G ou Wifi pour télécharger les éphémérides sur Internet et vous offrir un positionnement (un « fix ») quasi instantané.

Le système GPS appartient à l’armée américaine. Le concurrent russe s’appelle GLONASS et la version civile européenne Galileo. La plupart des appareils récents supportent les trois réseaux, mais ce n’est pas universel.

Même sans satellite, votre smartphone vous positionnera assez facilement en utilisant les bornes wifi et les appareils Bluetooth à proximité. De quelle manière ? C’est très simple : les appareils Google et Apple envoient, en permanence, à leur propriétaires respectifs (les deux entreprises susnommées) votre position GPS ainsi que la liste des wifi, appareils Bluetooth et NFC dans le voisinage. Le simple fait d’avoir cet engin nous transforme un espion au service de ces entreprises. En fait, de nombreux engins espionnent en permanence notre position pour revendre ces données.

Si on coupe le GPS d’un appareil Android Google, celui-ci se contentera d’envoyer une requête à Google sous la forme : « Dis, je ne connais pas ma position, mais je capte le wifi grandmaman64 et superpotes89 ainsi qu’une télé Samsung compatible Bluetooth, t’aurais pas une idée d’où je suis ? ». Réponse : « Ben justement, j’ai trois utilisateurs qui sont passés hier près de ces wifis et de cette télé, ils étaient dans la rue Machinchose. Donc tu es probablement dans la rue Machinchose. » Apple fait exactement pareil.

Quelle que soit la solution utilisée, GPS ou autre, la position d’un smartphone est fournie par le système d’exploitation et ne pose donc aucun problème au développeur d’application. C’est complètement transparent, mais l’obtention d’une position sera parfois légèrement plus longue sans les services Google ou Apple propriétaires décrits ci-dessus.

Les datas (données cartographiques)

Ce n’est pas tout d’avoir une position, encore faut-il savoir à quoi elle correspond. C’est le rôle des données cartographiques, souvent appelées "data" dans l’industrie.

Obtenir des données cartographiques est un boulot inimaginable qui, historiquement, impliquait de faire rouler des voitures sur toutes les routes d’un pays, croisant les données avec la cartographie officielle puis mêlant cela aux données satellites. Dans les années 2000, deux fournisseurs se partageaient un duopole (Navteq, acquis par Nokia en 2007 et TeleAtlas, acquis par Tomtom en 2008). Google Maps utilisait d’ailleurs souvent des données issues de ces fournisseurs (ainsi que tous les GPS de l’époque). Dans certaines régions, le logo Navteq était même visible sur la cartographie Google Maps. Mais plutôt que de payer une fortune à ces entreprises, Google a décidé de lancer sa propre base de données, envoyant ses propres voitures sur les routes (et profitant de l’occasion pour lancer Google Street View).

La toute grande difficulté des data, c’est qu’elles changent tout le temps. Les sentiers et les chemins se modifient. Des routes sont ouvertes. D’autres, fermées. Des constructions se font, des quartiers entiers apparaissent alors qu’une voie se retrouve à sens unique. Parcourir la campagne à vélo m’a appris que chaque jour peut être complètement différent. Des itinéraires deviennent soudainement impraticables pour cause de ronces, de fortes pluies ou de chutes d’arbres. D’autres apparaissent comme par magie. C’est un peu moins rapide pour les automobilistes, mais tentez de traverser l’Europe avec une carte d’une dizaine d’années et vous comprendrez votre douleur.

En parallèle de ces fournisseurs commerciaux est apparu le projet OpenStreetMap que personne ne voulait prendre au sérieux dans l’industrie. On m’a plusieurs fois ri au nez lorsque j’ai suggéré que cette solution était l’avenir. Tout comme Universalis ne prenait pas Wikipédia au sérieux.

Ma région sur OpenStreetMap Ma région sur OpenStreetMap

Le résultat, nous le connaissons : OpenStreetMap est aujourd’hui la meilleure base de données cartographiques pour la plupart des cas d’usage courant. À tel point que les géants comme Waze n’hésitent pas à les repomper illégalement. Sebsauvage signale le cas d’un contributeur OSM qui a sciemment inventé un parc de toutes pièces. Ce parc s’est retrouvé sur Waze…

Mais les applications utilisant OpenStreetMap doivent faire face à un gros défi : soit demander à l’utilisateur de charger les cartes à l’avance et de les mettre à jour régulièrement, soit de les télécharger au fur et à mesure, ce qui rend l’utilisation peu pratique (comment calculer un itinéraire ou trouver une adresse dans une zone dont on n’a pas la carte ?). Le projet OpenStreetMaps est en effet financé essentiellement par les dons et ne peut offrir une infrastructure de serveurs répondant immédiatement à chaque requête, chose que Google peut confortablement se permettre.

Le mapmatching

Une fois qu’on a la carte et la position, il suffit d’afficher la position sur la carte, non ? Et bien ce n’est pas aussi simple. Tout d’abord parce que la planète est loin de correspondre à une surface plane. Il faut donc considérer la courbure de la terre et le relief. Mais, surtout, le GPS tout comme les données cartographiques peuvent avoir plusieurs mètres d’imprécision.

Le mapmatching consiste à tenter de faire coïncider les deux informations : si un GPS se déplace à 120km/h sur une ligne parallèle située à quelques mètres de l’autoroute, il est probablement sur l’autoroute ! Il faut donc corriger la position en fonction des données.

En ville, des hauts bâtiments peuvent parfois refléter le signal GPS et donc allonger le temps de parcours de celui-ci. Le téléphone croira alors être plus loin du satellite que ce n’est réellement le cas. Dans ce genre de situation, le mapmatching vous mettra dans une rue parallèle. Cela vous est peut-être déjà arrivé et c’est assez perturbant.

Une autre application du mapmatching, c’est de tenter de prédire la position future, par exemple dans un tunnel. La position GPS, de par son fonctionnement, introduit en effet une latence de quelques secondes. Dans une longue ligne droite, ce n’est pas dramatique. Mais quand il s’agit de savoir à quel embranchement d’un rond-point tourner, chaque seconde est importante.

Le logiciel peut alors tenter de prédire, en fonction de votre vitesse, votre position réelle. Parfois, ça foire. Comme lorsqu’il vous dit que vous avez déjà dépassé l’embranchement que vous devez prendre alors que ce n’est pas le cas. Ou qu’il vous dit de tourner dans trente mètres alors que vous êtes déjà passé.

La recherche

On a la position sur la carte qui est, le plus souvent, notre point de départ. Il manque un truc important: le point d’arrivée. Et pour trouver le point d’arrivée, il faut que l’utilisateur l’indique.

Les recherches géographiques sont très compliquées, car la manière dont nous écrivons les adresses n’a pas beaucoup de sens : on donne le nom de la rue avant de donner la ville avant de donner le pays ! Dans les voitures, la solution a été de forcer les utilisateurs à entrer leurs adresses à l’envers: pays, ville, rue, numéro. C’est plus logique, mais nous sommes tellement habitués à l’inverse que c’est contre-intuitif.

Le problème de la recherche dans une base de données est un problème très complexe. Avec les applications OpenStreetMap, la base de données est sur votre téléphone et votre recherche est calculée par le minuscule processeur de ce dernier.

Ici, Google possède un avantage concurrentiel incommensurable. Ce n’est pas votre téléphone qui fait la recherche, mais bien les gigantesques serveurs de Google. Tapez "rue Machinchose" et la requête est immédiatement envoyée à Google (qui en profite pour prendre note dans un coin, histoire de pouvoir utiliser ces informations pour mieux vous cibler avec des publicités). Les ordinateurs de Google étant tellement rapide, ils peuvent même tenter d’être intelligent: il y’a 12 rue Machinchose dans tout le pays, mais une MachinChause, avec une orthographe différente, dans un rayon de 10km, on va donc lui proposer celle-là. Surtout que, tient, nous avons en mémoire qu’il s’est rendu 7 fois dans cette rue au cours des trois dernières années, même sans utiliser le GPS.

Force est de constater que les applications libres qui font la recherche sur votre téléphone ne peuvent rivaliser en termes de rapidité et d’aisance. Pour les utiliser, il faut s’adapter, accepter de refaire la recherche avec des orthographes différentes et d’attendre les résultats.

Le routing

On a le départ, on a l’arrivée. Maintenant il s’agit de calculer la route, une opération appelée « routing ». Pour faire du routing, chaque tronçon de route va se voir attribuer différentes valeurs : longueur, temps estimé pour le parcourir, mais aussi potentiellement le prix (routes payantes), la beauté (si on veut proposer un trajet plus agréable), le type de revêtement, etc.

L’algorithme de routing va donc aligner tous les tronçons de route entre le départ et l’arrivée, traçant des centaines ou des milliers d’itinéraires possibles, calculant pour chaque itinéraire la valeur totale en additionnant les valeurs de chaque tronçon.

Il va ensuite sélectionner l’itinéraire avec la meilleure valeur totale. Si on veut le plus rapide, c’est le temps total estimé le plus court. Si on veut la distance, c’est la distance la plus courte, etc.

À mon époque, l’algorithme utilisé était le plus souvent de type « Bidirectionnal weighted A-star ». Cela signifie qu’on commence à la fois du départ et de l’arrivée, en explorant jusqu’au moment où les chemins se rencontrent et en abandonnant les chemins qui sont déjà de toute façon disqualifiés, car un plus court existe (oui, on peut aller de Bruxelles à Paris en passant par Amsterdam, mais ce n’est pas le plus efficace).

Une fois encore, le problème est particulièrement complexe et votre téléphone va prendre un temps énorme à calculer l’itinéraire. Alors que les serveurs de Google vont le faire pour vous. Google Maps ne fait donc aucun calcul sur votre téléphone : l’application se contente de demander aux serveurs Google de les faire à votre place. Ceux-ci centralisent les milliers d’itinéraires demandés par les utilisateurs et les réutilisent parfois sans tout recalculer. Quand on est un monopole, il n’y a pas de petits profits.

Les données temps réels

Mais si on veut le trajet le plus rapide en voiture, une évidence saute aux yeux: il faut éviter les embouteillages. Et les données concernant les embouteillages sont très difficiles à obtenir en temps réel.

Sauf si vous êtes un monopole qui se permet d’espionner une immense majorité de la population en temps réel. Il vous suffit alors, pour chaque tronçon de route, de prendre la vitesse moyenne des téléphones qui sont actuellement sur ce tronçon.

L’artiste Simon Weckert avait d’ailleurs illustré ce principe en promenant 99 smartphones connectés sur Google maps dans un chariot. Le résultat ? Une rue déserte est devenue un embouteillage sur Google Maps.

Simon Weckert créant un embouteillage sur Google Maps en tirant 99 smartphones dans un petit chariot Simon Weckert créant un embouteillage sur Google Maps en tirant 99 smartphones dans un petit chariot

Là, force est de constater qu’il est difficile, voire impossible, de fournir ces données sans espionner massivement toute la population. À ce petit jeu, les alternatives libres ne pourront donc jamais égaler un monopole de surveillance comme celui de Google.

Mais tout n’est pas noir, car, contrairement à ce qu’on pourrait croire, les infos trafic ne nous permettent pas d’aller plus vite. Elles donnent une illusion d’optimalité qui empire le trafic sur les itinéraires alternatifs et, au final, le temps perdu reste identique. Le seul avantage est que la prévision du temps de trajet est grandement améliorée.

Ce résultat résulte de ce que j’appelle le paradoxe de l’embouteillage. C’est un fait bien connu des scientifiques et ignoré à dessein des politiciens que le trafic automobile est contre-intuitif. Au plus la route est large et permet à de nombreux véhicules de passer, au plus les embouteillages seront importants et la circulation chaotique. Si votre politicien propose de rajouter une bande sur le périphérique pour fluidifier la circulation, changez de politicien !

L’explication de ce phénomène tient au fait que lorsqu’il y’a un embouteillage sur le périphérique, ce n’est pas le périphérique qui bouche. C’est qu’il y’a plus de voitures qui rentrent sur le périphérique que de voitures qui en sortent. Or, les sorties restent et resteront toujours limitées par la taille des rues dans les villes.

En bref, un embouteillage est causé par le goulot d’étranglement, les parties les plus étroites qui sont, le plus souvent, les rues et ruelles des différentes destinations finales. Élargir le périphérique revient à élargir le large bout d’un entonnoir en espérant qu’il se vide plus vite. Et, de fait, cela rend les choses encore pires, car cela augmente le volume total de l’entonnoir, ce qui fait qu’il contient plus d’eau et mettra donc plus longtemps à se vider.

99 smartphones dans un bac à roulette: c’est tout ce que nous sommes pour Google 99 smartphones dans un bac à roulette: c’est tout ce que nous sommes pour Google

Les infotrafics et les itinéraires alternatifs proposés par Google Maps ne font pas autre chose que de rajouter une bande de trafic virtuelle (sous forme d’un itinéraire alternatif) et donc élargissent le haut de l’entonnoir. Les infos trafic restent utiles dans les cas particuliers où votre destination est complètement différente du reste de la circulation. Où si la congestion apparait brusquement, comme un accident : dans ce cas, vous pourriez avoir le bénéfice rare, mais enviable d’emprunter l’itinéraire de secours juste avant sa congestion.

La plupart du temps, les infotrafics sont globalement contre-productifs par le simple fait que tout le monde les utilise. Elles seraient parfaites si vous étiez la seule personne à en bénéficier. Mais comme tout le monde les utilise, vous êtes également obligé de les utiliser. Tout le monde y perd.

Leur impact premier est surtout psychologique: en jouant avec les itinéraires alternatifs, vous pouvez vous convaincre que vous n’avez pas d’autre choix que prendre votre mal en patience. Alors que, sans eux, vous serez persuadés qu’il y’a forcément une autre solution.

Les logiciels

Alors, se passer de Google Maps ? Comme nous l’avons vu, ce n’est pas évident. Le service Google Maps/Waze se base sur l’espionnage permanent et instantané de milliards d’utilisateurs, offrant une précision et une rapidité insurpassable. Quand on y pense, le coût de ce confort est particulièrement élevé. Et pourtant, Google Maps n’est pas la panacée.

J’ai personnellement un faible pour Organic Maps, que je trouve bien meilleur que Google Maps pour tout à l’exception du trafic routier : les itinéraires à pieds, en vélo et même en voiture hors des grands axes sont bien plus intéressants. Certes, il nécessite de télécharger les cartes. Inconvénient, selon moi, mineur, car permettant une utilisation même sans connexion. La recherche est, par contre, souvent frustrante et lente.

Mais le mieux est peut-être d’explorer les alternatives libres à Google Maps dans cet excellent article de Louis Derrac.

Et puis, pourquoi ne pas lutter contre la privatisation du bien commun qu’est la cartographie en apprenant à contribuer à OpenStreetMap ?

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 31, 2023

We believe that the former dates of the stands cfp would create logistical problems. Therefore we are going to switch the dates as follow: Submission deadline 10th November instead of 20th of November Publication date on 20th November instead of 4th December We apologise for the confusion! If you arrive late for the submission, drop us a mail at stands@fosdem.org

The latest release of MySQL (October 25th, 2023) is the second Innovation Release of the most popular Open Source database.

This new evolution release brings us ever closer to the very first MySQL LTS Release!

This new Innovation Release already contains contributions from our great Community. MySQL 8.2.0 contains patches from Meta, Nikolai Kostrigin, Meng-Hsiu Chiang (Amazon), Richard Dang, Shaohua Wang, Hao Lu, Wen He (Tencent), Yin Peng (Tencent) and Daniël van Eeden.

Let’s have a look at all these contributions:

Server Compiling

  • #111549 – Made additional improvements to WITH_ZLIB functionality – Nikolai Kostrigin
  • #111467 – Building with WITH_ZLIB=”system” would break the MySQL build as it failed to find ZLIB – Meng-Hsiu Chiang (Amazon)

Client

  • For mysqldump: added an –ignore-views option to skip table views in the generated dump file – Meta
  • For mysqldump: added –init-command and –init-command-add options to allow executing SQL statements after connecting or reconnecting to the MySQL server – Meta
  • For mysql: added an –init-command-add option that adds an additional SQL statement to execute after connecting or reconnecting to the MySQL server. It’s similar to the –init-command option – Meta

InnoDB

  • If a MySQL table in a system schema had an INSTANT ADD column that was added before 8.0.29 (they are not allowed as of that version), and after MySQL was upgraded to a version greater than 8.0.29, DMLs on these tables would result in the server unexpectedly closing – Richard Dang
  • Fixed processing of single character tokens by a FTS parser plugin – Shaohua Wang

Optimizer

  • #109595 – records_in_range performed an excessive number of disk reads for insert operations – Meta
  • #111564 – EXPLAIN FORMAT=TREE lost the subquery in a hash join – Wen He (Tencent)
  • A previous fix in MySQL 8.0.30 for a stored program that was not executed correctly following the first invocation did not cover the case where it contained a SET statement – Hao Lu

Clone

  • #110569 – Clone plugin sometimes failed, errno 22, Clone_Snapshot::extend_and_flush_files() always created files of type OS_CLONE_DATA_FILE. This function uses flush_redo() to handle redo log files, which must be of type OS_CLONE_LOG_FILE, which could sometimes lead to errors in os_file_set_size() – Yin Peng (Tencent)

Documentation

  • #111200 – Contribution: Replace MySQL Internals Manual links – Daniël van Eeden

We can see that after the first innovation release, the number of contributions processed by the development team rises again.

If you have patches and you also want to be part of the MySQL Contributors, it’s easy, you can send Pull Requests from MySQL’s GitHub repositories or send your patches on Bugs MySQL (signing the Oracle Contributor Agreement is required).

Thanks again to all our contributors !

Ever since the advent of HTTP/2 people asked if they still needed JS/ CSS files to be combined and indeed by default aggregation is off in Autoptimize. But based on this interesting article by Harry Roberts, “bundling is here to stay for a while”. So when in doubt; test performance with and without aggregating CSS/ JS ideally both on mobile and desktop bandwidth profiles and learn which works best...

Source

October 30, 2023

La fabrique à souvenirs

Extrait de mon journal du 21 octobre 2023.

Photo prise au bord de la Meuse, début août 2023, d’une affiche pour le « Festival du folklore » à Namur et Jambes. Sur l’affiche, des personnes, toutes habillées de différents costumes traditionnels, se regroupent pour prendre un selfie. Photo prise au bord de la Meuse, début août 2023, d’une affiche pour le « Festival du folklore » à Namur et Jambes. Sur l’affiche, des personnes, toutes habillées de différents costumes traditionnels, se regroupent pour prendre un selfie.

Les photos étaient une manière de garder la trace d’un événement. C’en est désormais devenu l’objectif premier. Plutôt que de nous souvenir de ce que nous avons vécu, nous créons de toutes pièces des faux souvenirs, de fausses memorabilia afin de tromper notre futur moi.

Nous souffrons une journée entière à faire la file dans un Disneyland bondé afin de pouvoir, dans cinq ou dix ans, prétendre que nos sourires étaient sincères, que notre amusement était réel.

D’ailleurs, cela nous sera confirmé par tous ceux qui ont reçu nos photos dans les heures, parfois les secondes après la prise de vue. Nos followers sont les faux témoins que nous achetons, que nous corrompons afin de nous inventer des souvenirs.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 29, 2023

Pourquoi j’ai supprimé mon compte Twitter (et pourquoi vous pouvez probablement en faire autant sans hésiter)

Je suis complètement addict aux réseaux sociaux. Je suis complètement obnubilé par mon image sur ceux-ci. Pendant des années, dès qu’une nouvelle plateforme apparaissait, j’y créais un compte "@ploum" histoire de « garder le contrôle » sur mon pseudonyme. Je tenais les comptes de mes followers sur chacune. Je me présentais comme « @ploum » dans le premier slide de mes conférences.

Il y a déjà un an, Elon Musk prenait les rênes de Twitter, le renommait en « X-anciennement-Twitter » et le transformait, d’après les témoignages que j’en ai, en une soupe nauséabonde. Je dis « d’après les témoignages » parce qu’à l’époque, cela faisait justement un an que j’avais supprimé mon compte.

Si j’ai supprimé mon compte, avant même l’arrivée d’Elon Musk, il y’a des chances que vous puissiez supprimer le vôtre également. Et peut-être pas seulement sur Twitter.

Je parle bien de le supprimer, pas de ne « plus l’utiliser » ou « le mettre en sommeil ». Je suis passé par là également et cela n’a rien à voir. C’est comme les personnes, dont la télé trône au milieu du salon, mais qui disent ne pas la regarder. Ou rarement. Enfin… pas trop souvent. Enfin, juste quand on s’ennuie. Ou quand il y’a un truc intéressant… Et puis aussi pour avoir une présence.

En supprimant mon compte, j’ai retiré un utilisateur de la plateforme et fait baisser sa valeur. J’ai supprimé toute possibilité de me contacter sur ces plateformes, possibilité qui faisait que, même si je n’utilisais plus un service, je m’y connectais une fois par mois pour répondre aux messages qui arrivaient forcément là-bas, car, si compte il y a, il y’aura toujours quelqu’un pour l’utiliser.

En supprimant mon compte, je suis devenu injoignable sur cette plateforme. Ce qui rend la plateforme un tout petit peu moins attractive pour mon entourage et ceux qui me suivent. Ce qui fait que la plateforme ne pourra pas montrer mon nom dans la liste de contacts lorsqu’une personne qui a mon numéro de téléphone s’inscrira pour la première fois. J’ai également supprimé un follower de tous ces créateurs que j’aime, mais qui sont, comme moi, un peu trop addicts aux likes.

Bref, en supprimant mon compte Twitter, j’ai rendu le monde un poil meilleur.

Oui, mais si on veut te contacter via cette plateforme

Si on veut me contacter, supprimer mon compte est la meilleure des choses. Parce que personne ne tentera de me contacter sur une plateforme où je ne suis pas. Personne ne pensera que j’ai reçu le message.

Comme je l’expliquais, les réseaux sociaux publicitaires ne nous mettent pas en relation, ils nous vendent l’illusion d’être en relation. En faisant parfois exactement le contraire.

Pour le cas d’un groupe particulier utilisant une plateforme, c’est souvent difficile d’être le premier à quitter. J’ai souvent eu l’impression de m’exclure des groupes qui n’étaient pas techniques (les différents sports que je pratique dans mon cas). J’ai signalé à plusieurs personnes que je ne recevais pas les infos. J’ai rappelé que je n’étais pas sur la plateforme utilisée, Facebook, Twitter ou Whatsapp. J’ai demandé à certains de me faire suivre les messages.

Cela a été difficile jusqu’au moment où une deuxième personne s’est révélée ne pas être non plus sur la plateforme. Soit qu’elle l’ait quittée, soit qu’elle ne l’ait jamais été. À partir de ce moment-là, les membres du groupe prennent conscience que la plateforme n’est plus représentative du groupe. Et l’intérêt pour la plateforme diminue pour disparaitre totalement avec la troisième personne qui n’y est pas non plus.

Être le premier est difficile et pas toujours possible dans un groupe. Mais si vous ne savez pas être le premier, soutenez toute autre tentative et soyez le second.

Oui, mais on peut usurper ton identité.

Sur Twitter, je disposais d’un compte vérifié (et ce depuis plusieurs années, à une époque où c’était encore rare et une source de frime), un compte créé en 2007 avec presque 7000 followers. J’y étais attaché. J’en étais fier même si avoir un nombre de followers à 4 chiffres est un peu la gêne chez les influenceurs de la nouvelle génération.

Avant de supprimer mon compte, je l’ai annoncé. À tous les messages qui arrivaient pendant une semaine ou quelques jours, j’ai répondu que ce compte allait être supprimé. Je l’ai également annoncé sur mon site et sur Mastodon.

Il est important de rappeler qu’à la suppression d’un compte Twitter, le pseudo est bloqué pendant un an. Pendant un an, personne ne peut l’utiliser.

Un an plus tard, quelqu’un pourrait en effet utiliser votre identifiant. C’est arrivé avec @ploum, un an jour pour jour après la suppression du compte. Le nouveau compte @ploum n’a rien à voir avec moi et ne peut en aucun cas être confondu avec moi.

Oui, ma petite notoriété m’a déjà fait subir des attaques voire du harcèlement. Oui, j’ai déjà vu des faux ploum se faire passer pour moi, ce qui a motivé d’ailleurs à l’époque ma vérification par Twitter. Pourtant, la probabilité que l’identifiant soit réutilisé par une personne qui me connait et est motivée pour me nuire était tout de même très faible. Parce que, honnêtement, tout le monde s’en fout de mon compte Twitter. Surtout quand il faut attendre un an après sa disparition.

Mais admettons que ce soit le cas. Un compte Twitter serait apparu qui aurait repris mon pseudo et un avatar crédible avant de commencer à raconter des atrocités en se faisant passer pour moi.

Et alors ?

Ce genre de compte a toujours été possible en jouant sur de subtiles variations orthographiques. On pourrait imaginer @pl0um, @ploom, @p1oum, … Cela fait un an que mon compte avait disparu, il n’est plus référencé sur mon site ni dans aucune bio, il a 0 follower. Quelle est la crédibilité d’un faux compte ?

Ne pas supprimer son compte Twitter par peur d’usurpation d’identité, c’est reconnaître à Twitter un pouvoir énorme, un pouvoir étatique : celui d’assigner l’identité des individus. Reconnaissez-vous Elon Musk comme garant de votre identité ? Si non, il est urgent de supprimer votre compte. Et si oui, rappelez-vous que Musk peut s’arroger de prendre votre identifiant à sa guise. Il l’a déjà fait.

Ce genre d’argument, que j’entends très souvent, me fait également souvent sourire parce que, en toute honnêteté, qui est suffisamment important pour qu’on veuille usurper son identité sur Twitter ? Et quels problèmes de cette situation très hypothétique ne pourraient pas être réglés par un simple « Ce compte Twitter se fait passer pour moi, mais ce n’est pas moi » sur vos autres plateformes et sur votre site ? Franchement, au rythme où ça va, vous pensez vraiment qu’il y’aura quoi que ce soit de crédible sur Twitter dans un an ? Si votre identité numérique est importante, investissez dans un nom de domaine avant toute chose !

L’inventeur, auteur et technologiste Jaron Lanier, par exemple, n’a jamais eu de compte sur aucun réseau social. Il a d’ailleurs écrit un livre très court pour vous convaincre d’effacer vos comptes. Pourtant, il y’a plusieurs comptes à son nom, certains portant même la mention « officiel ». Il se contente de dire sur son site que ces comptes ne sont pas de lui. Point à la ligne, problème réglé.

OK, toi tu l’as fait, mais moi je vais perdre ma communauté et mon audience

Comme le raconte Cory Doctorow, votre audience Twitter a déjà disparu. Ce n’est qu’un chiffre. Le média NPR a supprimé son compte Twitter et ses visites ont baissé de moins de 1%. Cory Doctorow a 10 fois plus de followers sur Twitter que sur Mastodon. Mais quand on parle des partages et des réponses, le ratio s’inverse. Mastodon est clairement beaucoup plus actif.

La même expérience vient d’être menée involontairement par l’application Signal. Le compte Twitter officiel de Signal, 600k followers, a en effet réagi à l’annonce d’une faille de sécurité.

Ce message a fait la première page du populaire site Hacker News et a donc été vu beaucoup de fois, y compris par des gens ne suivant pas le compte Signal sur Twitter.

5h plus tard, alors que le message Twitter faisait déjà le buzz, Signal a reposté le contenu sur son compte Mastodon, qui n’a « que » 40k followers (15 fois moins).

Pourtant, à l’heure où j’écris ces lignes, le nombre de partages est incroyablement identique (641 contre 615). Le nombre de réponses est également très similaire (30 contre 23). Et si on retire les "lol", les memes et autres réponses de moins de cinq mots, on peut même arriver à la conclusion que le fameux « engagement » sur Twitter est à peu près nul. (UPDATE: une semaine plus tard, le nombre de partages est passé à 1100 sur Mastodon pour 900 sur Twitter)

L’écrivain Henri Lœvenbruck a également supprimé complètement son compte Twitter et sa page Facebook en 2022. Il est pourtant connu et vit de sa notoriété. Son roman « Les disparus de Blackmore », publié quelques mois après cette suppression, s’est mieux vendu que le précédent. Nul ne saura jamais s’il aurait pu en vendre encore plus en étant sur Facebook ou Twitter. Mais la preuve est faite que cette présence n’est absolument pas indispensable.

Je le dis et le redis : le nombre de followers est faux. C’est une information qui est conçue dans l’optique de vous tromper.

Oui, vous avez le droit de supprimer vos comptes

Le sentiment de m’être fait avoir en créant des comptes sur Twitter, Facebook et autres Medium est fort. Mais ma seule erreur a été de croire les promesses de cette industrie. Ce n’est pas moi qui me suis trompé, ce sont les plateformes qui nous ont menti. Certains le prédisaient déjà à l’époque et me traitaient de naïf. Je ne les ai pas écoutés, je m’en excuse auprès d’eux. J’ai parfois argué « qu’il fallait aller où les gens étaient », devenant moi-même un allié de ces plateformes. Je vous ai encouragé, vous qui me lisez depuis des années, à m’y rejoindre, contribuant à leur emprise. Je m’en excuse profondément auprès de vous.

Ne pas déceler un mensonge est une erreur. À ma décharge, c’est une erreur qui peut arriver à tout le monde.

Mais aujourd’hui, le mensonge est éclatant. Il est indéniable. Recommanderais-je à mes amis de s’inscrire sur ces plateformes ? Serais-je d’accord que mes enfants s’y inscrivent ? Si la réponse est non à l’une de ces questions, garder un compte sur ces plateformes n’est plus excusable.

Nous sommes le composant essentiel des plateformes centralisées. Si nous n’aimons pas ce qu’elles sont ou ce qu’elles deviennent, si leurs valeurs sont en contradiction avec les nôtres, notre devoir est de les quitter, de les assécher, pas de lutter pour les améliorer.

Ne pas réagir et continuer à se laisser faire lorsque le mensonge est flagrant n’est pas une erreur, c’est à la limite de la complicité. C’est encore plus le cas pour les organisations et les militants qui prétendent soutenir des valeurs opposées à celles de la plateforme. On ne peut pas lutter contre le capitalo-consumérisme sur Facebook ni contre l’extrême droite sur Twitter. Le prétendre n’est qu’hypocrisie intellectuelle.

Et j’en ai été le premier coupable.

Aujourd’hui, je tente de réparer mes erreurs du passé en vous demandant, à vous mes amis qui lisez ceci, de supprimer vos comptes sur ces réseaux sociaux publicitaires. Je peux vous rassurer : non, vous n’allez que peu ou prou manquer des choses importantes. Oui, ça sera dur au début, mais ça ira de mieux en mieux. Et peut-être que vous allez y gagner beaucoup plus que ce que vous imaginez.

Oui mes amis, vous avez le droit, vous avez le devoir de supprimer vos comptes !

PS : Je dédie ce post à Henri Lœvenbruck, cité plus haut dans cet article. Cela fait un an jour pour jour que t’es arrivé sur Mastodon. J’en suis heureux pour toutes les expériences vécues ensemble cette année et dans les prochaines. Joyeux mastanniversaire mon ami !

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 26, 2023

Allez, ik heb me weer eens druk gemaakt. Deze keer is mijn burgemeester Gwendolyn Rutten mijn slachtoffer:

Dag mevrouw Rutten,

Als iemand uit Rillaar hebben we elkaar al eens gesproken in Het Gemeentehuis. Dat ging geloof ik toen over de opvang die Scherpenheuvel al heel vroeg organiseerde voor Oekraïense vluchtelingen.

We zijn weer een jaartje verder en ik vraag me als zelfstandige (en liberaal) af wat uw partij toch aan het doen is? De partij blijkt vooral met zichzelf en dus niet met ons bezig te zijn. Dat is eigenlijk ook al jaren zo.

Wij zelfstandigen begrijpen best wel dat een partij soms eens nood heeft aan bezinning. Maar nu is het erover. Los erover.

De OpenVLD is nu bijna communistisch met zichzelf bezig. Ze vindt eigenlijk ook dat wij zelfstandige vlamingen dat ook maar moeten begrijpen ook. Dat een partij voor zelfstandigen zich voor jaren aan een stuk uitsluitend moet bezighouden met zichzelf in plaats van met de mensen waarvan de partij hoopt dat ze hun stem zullen uitbrengen voor die partij.

Maar eigenlijk is dat niet zo. Eigenlijk is het een redelijk zielige vertoning aan het worden.

Er ligt toch veel werk voor een liberale partij op de plank? Misschien ben ik te dom of zo hiervoor? Maar stond dit niet vandaag in De Tijd? “Faillissementen breken alle records in Vlaanderen“. Ah nee, sorry. Dat was gisteren. Dus misschien daarom dat OpenVLD vandaag toch maar beter bezig is met zichzelf? Dat zal vast wel belangrijker zijn! Misschien moet ik ook niet zo speciaal willen zijn en niet De Tijd lezen maar wel X of Twitter?

Zou het niet beter zijn dat U de OpenVLD achterwegen laat en gewoon zelfstandig gaat of dat U een nieuwe partij opricht?

Want dit kan echt niet langer zo.

Waar moeten wij zelfstandige liberalen nog op stemmen? Toch niet op de zielige OpenVLD? Wij gaan toch niet op een bende ruziemakers stemmen?

Ik verwacht ook veel meer van U als burgemeester dan een hoopje geruzie zoals wat we live meemaken in de Vlaamse media over uw partij. Ik verwacht bijvoorbeeld dat U als liberale burgemeester ervoor zorgt dat onze Aarschotse zelfstandigen zoals onze Horeca door deze crisisperiode kunnen geraken. Ik weet nog goed dat ik de eigenaares van De Kiezel heb zien huilen tijdens COVID19 omdat ze haar werknemers niet in dienst kon houden. Het lijkt nu terug beter te gaan. Zorgt U voor hun continuïteit in onze gemeente?

Dat is namelijk waarom ik op U stemde. Zodat wanneer ik terugkom van mijn klant, ik (soms ook met mijn klant) eens goed bij hen kan gaan eten.

Met vriendelijke groeten,

Philip Van Hoof

Zaakvoerder Codeminded BV.

October 25, 2023

We’ve been all waiting for it! It’s now available! Read/Write Splitting in MySQL !!

At scale, we distribute reads between replica(s), but this has to be managed somehow in the application: pointing writes somewhere and reads somewhere else.

With MySQL 8.2, MySQL Router is now able to identify reads and writes and route them to Primary Instances in the case of an InnoDB Cluster, or to an asynchronous replication source for the writes and to secondary instances or replicas for the reads.

To illustrate this, I’ve deployed the easiest architecture: MySQL InnoDB ReplicaSet

MySQL InnoDB ReplicaSet

This is just a replication source instance and one (or more) asynchronous replica:

This is the status of the ReplicaSet object in MySQL Shell:

Bootstrap MySQL Router 8.2

Let’s configure (bootstrap) MySQL Router:

mysqlrouter --bootstrap root@127.0.0.1:3306 --user mysqlrouter

We can also see the router in the MySQL Shell ReplicaSet object:

Connecting to MySQL using the Read/Write Port (6450):

We can see that by default we reach the replica (secondary) if we do a read, but if we start a transaction, we reach the replication source (primary) without changing the port and using the same connection.

We can also see the difference when using a read-only transaction:

And we can see in the MySQL Router’s configuration file the generated settings for the R/W splitting:

[routing:bootstrap_rw_split]
bind_address=0.0.0.0
bind_port=6450
destinations=metadata-cache://myreplica/?role=PRIMARY_AND_SECONDARY
routing_strategy=round-robin
protocol=classic
connection_sharing=1
client_ssl_mode=PREFERRED
server_ssl_mode=PREFERRED
access_mode=auto

You also have the possibility to define in your session which type of instance you want to reach using the command ROUTER SET access_mode=:

Conclusion

In conclusion, MySQL Router 8.2 supports Read-Write splitting. This is a valuable feature for optimizing database performance and scalability without having to make any changes in the application.

This configuration enables you to direct all read traffic to read-only instances, and all write traffic to read-write instances.

This feature not only enhances the overall user experience but also simplifies database management and deployment.

Read-write instances are primaries or sources. Read-only instances are replicas (InnoDB Cluster secondaries, ReplicaSet secondaries or secondary instances in a Replica Cluster.

Enjoy MySQL and no more excuse to not spread the workload to replicas !

October 20, 2023

Since MySQL Shell 8.1, it’s even easier to create a logical backup of your MySQL instance and store it directly in Object Storage, an internet-scale, high-performance storage platform in Oracle Cloud Infrastructure (OCI).

MySQL Shell now offers the option of dumping to Object Storage Bucket using PAR (Pre-Authenticated Request).

Bucket Creation

The first step is to create an Object Storage Bucket in the OCI Console:

Let’s call it lefred-mysql-backups:

When created, we can click on the three-dots and create a new PAR:

It’s important to allow reads and writes. We also need to enable the listing of the objects:

Please note that the PAR url is only displayed once. You currently have two urls. We’re going to use the new one, so we need to select it and copy it:

MySQL Shell

Now we will use the created PAR to store our logical dump directly to OCI Object Storage form MySQL Shell.

We can use MySQL Shell (command line utility) or MySQL Shell for Visual Studio Code. But to work as expected, don’t forget we should use MySQL Shell 8.1 or more recent.

When connected to our MySQL Instance, we use the dumpInstance() utility following the syntax below:

JS> util.dumpInstance(PAR_URL, options)

Example on Visual Studio Code:

You can get the list of options from the manual.

I recommend to add some suffix to the PAR url. For example I use something like /host/date/:

JS> util.dumpInstance("https://xxxxx.objectsorage.xxxx/dell/2023-10-20/",
       {compatibility: ["strip_invalid_grants"])

Object Storage

The backup is listed in the Object Storage Bucket:

The file @.json contains all the important information related to the dump.

Conclusion

The new MySQL Shell makes it even easier to store logical data in the cloud.

It’s a great option for storing and backing up your data offsite and keep it safe.

Of course, this dump can also be used to migrate effortlessly to MySQL HeatWave.

Back up your data and enjoy MySQL and MySQL Shell.

October 19, 2023

[Edit: Sorry for the “bullet-point” style, it was a lot of details to compile in this blog post]

We were back at the Alvisse Parc Hotel after a break of four years! In 2022, only a light CTI summit was organized (see my wrap-up), but this year, hack.lu was back with a new format: Two days dedicated to CTI and two other days dedicated to normal talks around security. The proposed format for the talks was a 30-minute slot per speaker. This means more presentations but also a strong flow of information to collect. Here is a quick wrap-up of the four days.

The very first speaker was Ange Albertini with “SBud: Infovis in infosec”. As infosec professionals, we have to prepare slides, reports, and often we have to display high-technical information. It’s essential for readers to understand it. Ange presented his tool called « SBud » which helps to display technical information (like he dumps) within a nice layout with arrows, colors, etc. The tool is available online, but you can just clone it to run your local instance.

Then, Emmanuel Seemann presented “Detecting VPNs/proxies by analyzing their attack patterns over time”. Crowdstrike is known to provide a block list of malicious IP addresses (approximately 60K entries). They detected that more and more IP addresses are only used for the « anonymization » of attackers (via Tor exit nodes or proxies). Emmanuel explained how they improved the detection of such usage using a machine-learning model.

Philippe Ombredanne said: “SBoMs: are they a threat or a menace?”. SBoM means “Software Bill of Material”. Everything software relies on pieces of existing code (why reinvent the wheel?), and it’s the same for FOSS. Philippe explained how SBOM can be used for good but also for bad purposes.

After the lunch break, we had a first shoot of interesting lightning talks. Personally, I liked the presentation of MMDB-Server. It’s an open-source fact API server to look up IP addresses for their geographic details, AS name, etc. Another one was about PyOTI, a Python framework to query multiple APIs for information about IOCs. Think about a Python version of Cortex.

David Rufenacht gave a presentation about “CTI is dead, long live CTI!”. The idea of the presentation was to extend the classic CTI operations (reading reports, digesting IOCs & TTPs) by collecting more data present in the organization and analyzing it.

JJ Josing continued with “FOSStering an ISAC: Enabling a Community with Open-Source Tools” and explained how FOSS can be beneficial to build powerful CTI platforms. He explained how they use MISP as a central sharing point for Retail & Hospitality ISAC members.

Quentin Jérôme presented his new project, Kunai. The idea is to write a tool like Sysmon for Linux to increase the visibility of operations performed by processes. Sysmon for Linux has existed for a while but seems to be a dead project in Github. Quentin explained why he doesn’t like Sysmon (I agree with him on some points, like the fact that Linux & Windows are two different OS and should map the event IDs). He explained how it started and what he had to learn to develop the kernel and user-land code. The tool is still under development but looks promising.

Then, Ondra Rojcik presented “Why does the CTI industry struggle with communicating uncertainties?”. WEP does not only mean “Wireless Encryption Protocol” but, in the context of CTI, “Words of Estimative Probability”. Link with confidence levels. They help to convey uncertainties. Indeed, it’s sometimes difficult to understand what’s behind “could”, “would”, “can” or to differentiate terms such as “likely” and “highly likely.”.

Victor Barrault presented “Ensuring IoC quality at CERT-FR”. The idea of the talk was to explain how to handle the daily flood of IOCs that we receive and have to digest. How to be sure to spot the most interesting ones, reduce the noise, false positives, etc. They use a set of tools to perform this task. The tools are based on an internal library, itself based on pymisp. It provides a set of functions, superseding pymisp’s ones, to create, update, and delete attributes and tags in MISP.

Cocomelonc presented « Malware AV Evasion – Cryptography & Malware » to wrap up the first day. Crypto is used everywhere in the malware landscape today and for multiple reasons: Function call obfuscation, Windows API function call hashing, String obfuscation and encryption, Payload encryption, Syscalls, … Several crypto algorithms were reviewed with practical examples of how the VT score of sample can be reduced thanks to them.

The second day dedicated to CTI started with a presentation of Cratos – “Use your bloody indicators” by Dennis Rant. The tool has been designed to solve problems that some of you might have in complex infrastructure with multiple MISP instances (by example). Cratos acts as a proxy between MISP instances and 3rd-party tools. It adds a security control layer so people don’t need to access the full instance. It utilizes the FastAPI framework to build a REST API and has a Memcached in the backend to speed up operations.

The next presentation focused on IPFS: “IPFS Unveiled – Exploring Data Collection, Analysis and security”. IPFS means “Inter-Planetary File System” and is a peer-to-peer network storing files across multiple locations. It is used for good purposes but, today, many attackers use it to store phishing components or payloads (malware). The speakers explained in detail how IPFS works and how they started to monitor it to get access to exciting content.

After the coffee break, we had another run of talks. We had a review of the Lazarus group: “A Tale of Lazarus and His Family” by JeongGak Lyu. Andras Iklody presented Cerebrate, a new OSS community management and orchestration tool. The next one was “Unraveling cyber battle between UKR-RUS” presented by Ondrej Nekovar. Finally, before the lunch break, Melanie Niethammer presented “How to operationalize CTI – A real-world example”.

The lunch break was followed by a new session of lightning talks. Pawe? Pawli?ski and Alexandre Dulaunoy presented “JTAN – Data Sharing Network”. JTAN means “Joint Threat Analysis Network” and its goal is to help share information between organizations in EU. MISP is, of course, the core component but other tools are also used, like AIL, MWDB, Graphoscope or Taranis NG.

Then, Arwa Alomari presented “Turbocharging IOCs Validation – Become a more efficient CTI Analyst”. The goal is to reduce the volume of IOCs and process them faster. The model presented was “Low Regret Scoring

The next presented was “Modern IOC matching with Suricata” by Eric Leblond and Peter Manev. Suricata is a great tool and much more than an IDS/IPS. Eric & Peter explained how Suricata can be linked to a MISP instance and fed with IOCs to increase its detection capabilities.

The three next talks are not covered because two of them were flagged as TLP:Amber and the third one (about Pyrrha) was presented at Pass-The-Salt in July (see my wrap-up).

The next slot was assigned to Crowdstrike. They explained how to maintain their IP address block-list. From a CTI point of view, it must be reliable (no false positive), updated regularly, comprehensive and automated. Their current list has 60K IP addresses with a 7-day expiration. 4% of daily renewal.

The day ended with two presentations about interesting tools: “MISP42” by Rémi Seguy. This tools is a Splunk application that interconnects with MISP instances to get IOCs directly in Splunk. Finally, “Yeti” by presented by Thomas Chopitea.

On Wednesday, the first day of the conference started with “How Digital Technologies are Redefining Warfare and Why It Matters” by Mauro Vignati. The “cyber” component was added to the regular ones for a while but, if it remained related to military activities, today we can see a slightly move to civilian people. They have all the tools and devices to attack military targets. On the opposite, military operations against infrastructure can have a significant impact on civilians.

The next talk was presented by Paul Rascagnères: “Ongoing EvilEye Campaigns Targeting CCP Adversaries”. The threat actor is “EvilBamboo” or “EvilEye”. It started in 2019 with a Google blog post about an iOS 0-day. A timeline was presented with many discoveries and articles about this actor. The bad trilogy: BadBazaar, BadSignal, BadSolar. Their TTPs were reviewed. BadSignal has the capability to silently to interact with a Signal app. Whoscall is a legit app that was backdoored by EvilBamboo.
Take a famous app, add a backdoor and rename it “MyApp+”
Fake websites with malicious JS payload (victim profiling)
Apps link posted in forums
iOS implant?
Flygram is a fake Telegram app
Apple removed the app TibetOne a few days after being published. The existence of the app was also revealed even by the API help (same calls with with « Ios » in the name)

The next presentation was “Defeating VPN Always-On” by Maxime Clementz. This was presented at DEFCON. But slides have been updated after a discussion with Palo Alto. The idea of always-on VPN: No access to the LAN, only connection to the corporate environment via a tunnel. When the tunnel is down, restricted network access is applied, and user can’t disable the feature.
Two key concepts:

  • Trusted network detection (TND)
  • Captive portal detection (CPD)

It is interesting to understand how captive portals are detected by different solutions…
Ex: Google performs an HTTP GET clients3.google.com/generate_204 and expects a “204 No Content”.

“The Renaissance of Cyber-Physical Offensive Capabilities” by Daniel Kapellmann Zafra. Evolution of threats related to OT: 2010: Stuxnet, 2014; Black Energy, 2023:CosmicEnergy
CosmicEnergy? Malware develops to disrupt electric power. Interacts with IEC-104 devices. It’s likely a red-teaming tool.

“Introduction to cyberwarfare: theory and practice” by Lukasz Olejnik. ,

After lunch, we had another round of interesting lightning talks. I’d like to mention The integration of Veloricaptor and Tenzir to speed up investigations. TIDeMEC is a project from the European Commission to automate the deployment of detection rules. (« Threat-Informed Detection Modeling and Engineering as Code.

The first talk of the afternoon was “Embedded Threats: A Deep Dive into the eSIM World” from Markus Vervier. The idea was: How to use an eSIM as C2 channel? How eSIMs work? The main idea is to switch operators without changing the physical SIM card. Security concerns are similar to the old system (privacy, cloning, spoofing, …). Attacks explained here were deployed on an eSIM with a desktop computer application. Nice demo of a C2 communication via SMS… Slow, but it worked!

The next talk was “Building an Evil Phone Charging Station” by Stef van Dop & Tomas Philippart.
USB(-C) is not only helpful for charging a phone. USB Ethernet, HID devices, …
Lightning: proprietary connector by Apple
Basic PoC: Victim’s phone -> Charger slot -> Power/HDMI -> HDMI capture -> Mirror
Extracting sensitive info: PIN codes/ passwords
Extracting passwords automatically? When typed, the last character is displayed.
Awareness: don’t plug your phone into an unknown port

Ange Albertini came back with “Do’s and don’ts in file formats”. Ange covered the MP3 file format. The original format was developed in 1994 and contained just data, no metadata (author, song name, …)

Christophe Brocas presented “ACME: benefits of deploying an Internet Security protocol inside your corporate network”. The ACME protocol is pretty well known these days, thanks to Let’s Encrypt. But, how do you generate certificates for internal hosts? You can always use some kind of reverse proxy and request a Let’s Encrypt certificate but your (internal) host names will be published on the Certificate Transparency List! Many people use ACME but don’t know its internal (I’m raising my hand here I admit). Today, HTTPS is mandatory in many organizations even for internal apps. The default process is a pain: CSR -> Approval -> Generation -> Download cert -> Install.

After a coffee break, “Your unknown Twins: Identity in the era of Deepfakes, AI and mass Biometrics exposure” was presented by Vladimir Kropotov. Ransomware double extorsion business model:

  • Expose lot of data as a side effect (PII,…)
  • Boosts the underground scans for these info.
    What do we leak in the wild? Sound recordings (voice), photos & videoed, 3D models,
    Even fingerprints can be detected on Instagram pictures!

The last of the day was “PHP filter chains: How to use it” by Rémi Matasse. It was presented at PTS

The last day started with Patrice Auffrets, who presented “Internet exposure of satellite modems, and their vulnerabilities”. It started with the Russia intrusion into the ViaSat network. In Feb 2022, thousands of modems were offline. The attack life-cycle: IP addresses recon, attacking found IP, exploit the VPN vulnerability. The satellite ecosystem is growing (and the market) because satellite size is pretty small these days. (Star link is a perfect example) Each found device had a state: Secure, Vulnerable or “not security at all”. Patrice made a live demo of looking for compromised or sensitive devices.

Then FrédériqueD presented “Almost 2 years after log4j .. if your PSIRT has survived, Are the Lessons learned or not learned on security incident & vulnerability management?”. Do you remember the famous Log4j vulnerability? Frederique did a wrap-up about the vulnerability. Then she explained how to detect it based on scoring and methodology (that must be identified upfront). To be prepared to handle such incidents in the future, you must be prepared and, in the case of a popular software component, SBOM could be the solution.

The next talk was “Avoiding the basilisk’s fangs: State-of-the-art in AI LLM detection” by Jacobs Torrey. If more and more people (ab)use of the LLM tools, it becomes also critical to be able to detect data produced by such tools. LLMs are language models that don’t understand the text. They just process data, that’s why we can try to detect their usage.
Some tools:

  • OpenAI has a LLN detector
  • GLTR
  • GPTZero, CrossPlag

Jacobs presented ZipPy is a compression-based estimator tool able to detect LLM usage on text files.

Dimitrios Valsamaras presented “Permissionless Universal Overlays”. He started with basic concepts of the Android user interface. Applications have « Surfaces » -> SurfaceFlinger. The idea is that Windows are displayed depending on their type (ex: TYPE_SYSTEM_ALERT). They are other flogs like « FLAG_NOT_TOUCHABLE ». You can also have floating windows. As you can expect, these « features » are used by malware. Any app can fake the look of another app and claim to be the original app.

Stefan Hager presented “Raiders of the Lost Arts”. In the early days, trust was there, crime not an issue and slow Internet speeds as well as availability. Protocols were unencrypted and unauthenticated: DNS, SNMP, NTP, TFTP, Syslog, … UDP amplification attacks. No real flow of details… lost?

After the lunch break, we had a last round of lightning talks:

  • Fraudulent smart contracts
  • Suricata language server
  • Wintermute (an LLM based pen-tester buddy)
  • SLP (Service Location Provider) protocol amplification (DDoS)
  • DER editing with asn1template
  • Supply chain issues

Then, Xeno Kovah presented “Open Wounds: The last 5 years have left Bluetooth to bleed”. Question: What BT chip we have in devices? Are they vulnerable? BT is everywhere. He reviewed tools and techniques used by many researchers to assess BT implementations. darkmentors.com/bt.html

“The rise of malicious MSIX file” was presented by Shogo Hayashi & Rintaro Kokie. MSIX files are the successor of MSI files (Windows package files). They are supported since Windows 10 and allow people to install apps. Instead of being based on OLE format, the new files are, like Microsoft Office documents, ZIP archives with XML files. They explained the internal structure of these files and how it can be abused by attackers. In the next part of the presentation, they demonstrated a real case of malware deployed via a MSIX file.

“Reviving our oldest Tool – Using Bayesian inference to detect cyber attacks” by Emanuel Seemann.

“Using Apple Sysdiagnose for mobile forensics and integrity checks” was presented by David Durvaux & Aaron Kaplan. The idea is to not (jail)break the device to access interesting data. Sysdiagnose is a tool provided by Apple for support reason. Information collected by the tool is enormous, with plenty of file types, etc. It’s a mess to analyze. So, they developed a framework to parse the data. The main issue is that Apple manages its tool and can change the format or add/remove some data without notifications.

After the coffee break, Jacq presented “A deep dive into Maritime Cybersecurity”.
The maritime sector is pretty complex, with many components. This sector is part of NISv2 («critical infrastructure), but ships are not in scope. They are also an essential element of the supply chain. These days, 80% of worldwide goods are carried by the sea. Like any domain, we are facing a maritime digitalization. Jacq explained the issues and challenges related to this specific topic. It was a great talk. Note the ADMIRAL dataset which reports all incidents.

“Operation Duck Hunt – A peak behind the curtain of DuckTail” by Pol Thill. This was a review of the malware campaign based on .Net. This malware uses Telegram for C2 communications. A funny live demo with Telegram was performed… Pol explained how the malware works but also who was behind it. The author made some OPSEC mistakes and it was easy to reveal his operations. More and more malware samples are using Telegram as a C2. If you can, block access to api.telegram.com from your network.

That’s all for this edition! A lot of speakers, a lot of slides, a lot of information to compile. Hopefully, all talks (if accepted by the speaker) have been recorded and are already available on Cooper’s Youtube page. Great job!

The post Hack.lu 2023 Wrap-Up appeared first on /dev/random.

October 17, 2023

Le nouveau transhumanisme

Les poumons remplis par la cigarette électronique,
Les oreilles bouchées par les écouteurs,
Les yeux obnubilés par l’écran,
Les doigts agrippés au smartphone,
Que l’on porte alternativement devant la bouche ou l’oreille,
Dans son absurde horizontalité.

Nous rêvions d’un transhumanisme pour étendre nos capacités,
Pour augmenter notre sensorialité,
Pour démultiplier notre perception et notre impact sur la réalité.

Nous avons construit à la place une technologie de l’anesthésie.
Nous bloquons, nous bouchons, nous tentons d’oublier.
Nous désactivons nos sens pour ne pas nous sentir crever.

Et lorsque nous nous retrouvons brièvement déconnectés,
Les sens soudain réveillés sur la conscience de la douleur d’exister,
Angoissés nous cherchons une connexion, un substitut, un objet à acheter,
Un cancer à consommer en cannette, barre sucrée ou cendres inhalées.

L’extension, l’amélioration de la réalité étaient un rêve.
Mais les rêves ne sont plus faits pour se réaliser,
Ils ne sont que l’inspiration de produits à consommer.
J’aurais bien sauvé le monde, mais je vais rater.
Le dernier épisode de la nouvelle série télé.

Après tout, ce petit écran ne me donne-t-il pas accès au monde entier ?
Au savoir humain dans son entièreté ?
Moi dont la voix pourrait porter à l’autre bout de la planète,
Moi qui pourrais sans effort créer de quoi…

Oh, tiens, une mise à jour à installer !

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 16, 2023

A while ago, I saw Stefano's portable monitor, and thought it was very useful. Personally, I rent a desk at an office space where I have a 27" Dell monitor; but I do sometimes use my laptop away from that desk, and then I do sometimes miss the external monitor.

So a few weeks before DebConf, I bought me one myself. The one I got is about a mid-range model; there are models that are less than half the price of the one that I bought, and there are models that are more than double its price, too. ASUS has a very wide range of these monitors; the cheapest model that I could find locally is a 720p monitor that only does USB-C and requires power from the connected device, which presumably if I were to connect it to my laptop with no power connected would half its battery life. More expensive models have features such as wifi connectivity and miracast support, builtin batteries, more connection options, and touchscreen fancyness.

While I think some of these features are not worth the money, I do think that a builtin battery has its uses, and that I would want a decent resolution, so I got a FullHD model with builtin battery.

20231016_215332

The device comes with a number of useful accessories: a USB-C to USB-C cable for the USB-C connectivity as well as to charge the battery; an HDMI-to-microHDMI cable for HDMI connectivity; a magnetic sleeve that doubles as a back stand; a beefy USB-A charger and USB-A-to-USB-C convertor (yes, I know); and a... pen.

No, really, a pen. You can write with it. Yes, on paper. No, not a stylus. It's really a pen.

Sigh, OK. This one:

20231016_222024

OK, believe me now?

Good.

Don't worry, I was as confused about this as you just were when I first found that pen. Why would anyone do that, I thought. So I read the manual. Not something I usually do with new hardware, but here you go.

It turns out that the pen doubles as a kickstand. If you look closely at the picture of the laptop and the monitor above, you may see a little hole at the bottom right of the monitor, just to the right of the power button/LED. The pen fits right there.

Now I don't know what the exact thought process was here, but I imagine it went something like this:

  • ASUS wants to make money through selling monitors, but they don't want to spend too much money making them.
  • A kickstand is expensive.
  • So they choose not to make one, and add a little hole instead where you can put any little stick and make that function as a kickstand.
  • They explain in the manual that you can use a pen with the hole as a kickstand. Problem solved, and money saved.
  • Some paper pusher up the chain decides that if you mention a pen in the manual, you can't not ship a pen

    • Or perhaps some lawyer tells them that this is illegal to do in some jurisdictions
    • Or perhaps some large customer with a lot of clout is very annoying
  • So in a meeting, it is decided that the monitor will have a pen going along with it

  • So someone in ASUS then goes through the trouble of either designing and manufacturing a whole set of pens that use the same color scheme as the monitor itself, or just sourcing them from somewhere; and those pens are then branded (cheaply) and shipped with the monitors.

It's an interesting concept, especially given the fact that the magnetic sleeve works very well as a stand. But hey.

Anyway, the monitor is very nice; the battery lives longer than the battery of my laptop usually does, so that's good, and it allows me to have a dual-monitor setup when I'm on the road.

And when I'm at the office? Well, now I have a triple-monitor setup. That works well, too.

October 12, 2023

Verstandige mens waar ik toen ik jong was naar opkeek bevestigt wat ik al langer denk: Woke is een gevaarlijke cult. En hoewel ik Jordan Peterson’s uitleg op veel vlakken waardeer, zoals dat ik zijn speeches later zal kunnen gebruiken om mijn eigen kleine uit te leggen dat hij zijn kamer moet opruimen, ben ik het met Dawkins ook eens dat we woke niet moeten inruilen met de kerk om er van af te geraken. Misschien moeten we in plaats daarvan onze eigen propagandamachine maar eens wat intomen en de Westerse media duidelijk maken zich niet altijd zo extreem met iedere belachelijke hype mee te laten slepen.

Les territoires perdus

Les hommes avaient mis la nature en prison, la détruisant, la repoussant pour planter ces immensités de jachères macadamisées où poussent la tôle, le bruit, l’air vicié et les accidents.

Les arbres tentaient vainement de subsister, leur chlorophylle grise en quête de quelques brins de lumière ayant traversé le smog.

Les ligneux esprits avaient du mal à comprendre cette humanité délirante : « Mais pourquoi les humains construisent-ils des cages à parking ? »

Ce texte est une réponse instinctive et spontanée à la photo « Les territoires perdus » de Bruno Leyval, photo qui illustre cet article et reproduite ici avec sa bénédiction.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 11, 2023

Let’s see how to deploy WordPress and MySQL on a Kubernetes Cluster. The Kubernets cluster we are using is OKE (Oracle Kubernetes Engine) in OCI (Oracle Cloud Infrastructure):

OKE Cluster

We start by creating a Kubernetes Cluster on OCI using the Console:

We select the Quick create mode:

We need to name our cluster and make some choices:

When created, we can find it in the OKE Clusters list:

And we can see the pool of workers nodes and the workers:

kubectl

I like to use kubectl directly on my latop to manage my K8s Cluster.

On my Linux Desktop, I need to install kubernetes-client package (rpm).

Then on the K8s Cluster details, you can click on Access Cluster to get all the commands to use:

We need to copy them on our terminal and then, I like to also enable the bash completion for kubectl in my environment:

$ source <(kubectl completion bash)
$ echo "source <(kubectl completion bash)" >> $HOME/.bashrc

And now, we can easily test it:

$ kubectl get nodes
NAME          STATUS   ROLES   AGE     VERSION
10.0.10.155   Ready    node    21s     v1.27.2
10.0.10.193   Ready    node    21s     v1.27.2
10.0.10.216   Ready    node    21s     v1.27.2

MySQL Operator Deployment

To deploy MySQL, we use the mysql-operator for Kubernetes that manages the deployment of an InnoDB Cluster (including MySQL Router).

This is an overview of the architecture:

We start by installing the operator using manifest files:

$ kubectl apply -f https://raw.githubusercontent.com/mysql/mysql-operator/trunk/deploy/deploy-crds.yaml
customresourcedefinition.apiextensions.k8s.io/innodbclusters.mysql.oracle.com created
customresourcedefinition.apiextensions.k8s.io/mysqlbackups.mysql.oracle.com created
customresourcedefinition.apiextensions.k8s.io/clusterkopfpeerings.zalando.org created
customresourcedefinition.apiextensions.k8s.io/kopfpeerings.zalando.org created
$ kubectl apply -f https://raw.githubusercontent.com/mysql/mysql-operator/trunk/deploy/deploy-operator.yaml
clusterrole.rbac.authorization.k8s.io/mysql-operator created
clusterrole.rbac.authorization.k8s.io/mysql-sidecar created
clusterrolebinding.rbac.authorization.k8s.io/mysql-operator-rolebinding created
clusterkopfpeering.zalando.org/mysql-operator created
namespace/mysql-operator created
serviceaccount/mysql-operator-sa created
deployment.apps/mysql-operator created

We can verify that the mysql-operator for Kubernetes has been successfully deployed and that it’s reay:

$ kubectl get deployment mysql-operator --namespace mysql-operator
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
mysql-operator   1/1     1            1           35s

MySQL InnoDB Cluster

For our architecture, we will create two namespaces:

  • web-database: for everything related to MySQL
  • web-frontend: for the webservers

This is how we create them:

$ kubectl create ns web-database
$ kubectl create ns web-frontend

We need to create the password for the root MySQL user. We use K8s Secret resource to store the credentials:

$ kubectl -n web-database create secret generic mypwds \
        --from-literal=rootUser=root \
        --from-literal=rootHost=% \
        --from-literal=rootPassword="Passw0rd!"

We can verify that the credentials were created correctly:

$ kubectl -n web-database get secret mypwds
NAME     TYPE     DATA   AGE
mypwds   Opaque   3      112s

Just for fun, you can try to decode the password (or in case you forgot it):

$ kubectl -n web-database get secret mypwds -o yaml | \
> grep rootPassword | cut -d: -f 2 | xargs | base64 -d
Passw0rd!

To deploy our first MySQL InnoDB Cluster, we need to create a YAML file (mycluster.yaml):

apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
  name: mycluster
spec:
  secretName: mypwds
  tlsUseSelfSigned: true
  instances: 3
  router:
    instances: 1

And we deploy it in the web-database namespace:

$ kubectl -n web-database apply -f mycluster.yaml    
innodbcluster.mysql.oracle.com/mycluster created

We can verify the status of the pods:

$ kubectl -n web-database get pods
NAME          READY   STATUS            RESTARTS   AGE
mycluster-0   2/2     Running           0          80s
mycluster-1   0/2     PodInitializing   0          80s
mycluster-2   0/2     Init:2/3          0          80s

After a while, we can check several resources that have been deployed by the operator:

We will deploy a new pod to connect to our MySQL instances using MySQL Shell. We connect through the router (port 6446):

$ kubectl run --rm -it myshell \
--image=container-registry.oracle.com/mysql/community-operator -n web-database -- \
 mysqlsh root@mycluster.web-database.svc.cluster.local:6446
If you don't see a command prompt, try pressing enter.
********

And in MySQL Shell, we create the wordpress database and a dedicated user:

SQL> create database wordpress;

SQL> create user wordpress identified by 'W0rdPress';

SQL> grant all privileges on wordpress.* to wordpress;

WordPress

It’s time to deploy WordPress. Once again, we create a new YAML file (wordpress.yaml):

apiVersion: v1
kind: Service
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  ports:
    - port: 80
  selector:
    app: wordpress
    tier: frontend
  type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: wp-pv-claim
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: wordpress
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: frontend
    spec:
      containers:
      - image: wordpress:latest
        name: wordpress
        env:
        - name: WORDPRESS_DB_NAME
          value: wordpress
        - name: WORDPRESS_DB_HOST
          value: mycluster.web-database.svc.cluster.local:6446
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass
              key: password
        - name: WORDPRESS_DB_USER
          value: wordpress
        - name: WORDPRESS_CONFIG_EXTRA # enable SSL connection for MySQL
          value: |
            define('MYSQL_CLIENT_FLAGS', MYSQLI_CLIENT_SSL);
        ports:
        - containerPort: 80
          name: wordpress
        volumeMounts:
        - name: wordpress-persistent-storage
          mountPath: /var/www/html
      volumes:
      - name: wordpress-persistent-storage
        persistentVolumeClaim:
          claimName: wp-pv-claim 

It’s very important to force the usage of client SSL for mysqli. If not, WordPress won’t work with a MySQL user using the recommended authentication plugin: caching_sha2_password

The deployment is easy:

$ kubectl -n web-frontend apply -f wordpress.yaml

And we can even scale the WordPress web servers very easily:

$ kubectl scale deployment wordpress --replicas=3 -n web-frontend
deployment.apps/wordpress scaled

And everything is deployed and ready to use:

If we use the external public IP, we join the last step of the WordPress installation:

I’ve installed a PHP code snipped to see which of the 3 web servers I’m reaching when visiting the website:

We can see that we are load balancing the requests accross the 3 web servers.

This is what we deployed:

Conclusion

In this post, we have walked through the comprehensive steps to successfully deploy WordPress and MySQL on Oracle Kubernetes Engine (OKE) using the official mysql-operator for Kubernetes. This process simplifies the complex task of managing databases, automating many of complicated steps involved to setup High Availability.

Happy MySQL deployments in OKE using MySQL Operator for Kubernetes!

October 09, 2023

Ce que l’écologie peut apprendre du logiciel libre

Extrait de mon journal du 8 octobre 2023.

L’écologie a beaucoup à apprendre de l’échec du mouvement pour le logiciel libre. Celui-ci, perçu avec raison comme étant un combat moral s’opposant à la privatisation et la marchandisation des communs, s’est mué en open source, un mouvement très similaire, mais mettant en avant l’aspect technique afin de ne plus remettre en question l’aspect mercantile et la philosophie capitaliste.

Le résultat est sans appel: l’open source a gagné ! Il est partout. Il compose l’essentiel des logiciels que vous utilisez tous les jours. Le plus grand adversaire historique du logiciel libre, Microsoft, est devenu le plus grand contributeur à l’open source, étant même propriétaire de la plus grande et incontournable plateforme de développement open source : Github.

Et pourtant, les utilisateurs n’ont jamais eu aussi peu de liberté (ce qui justifie que je parle d’échec). Nous sommes espionnés, nous devons payer des abonnements mensuels pour tout, nous sommes soumis à des myriades de publicités. Nous n’avons aucun contrôle sur nos données ni même sur les ordinateurs que nous achetons. Là où le logiciel libre s’opposait à la privatisation des communs, l’open source contribue à cet accaparement.

La victoire à la Pyhrrus de l’open source entraine une désertion du combat pour la préservation de nos libertés fondamentales. La disparition de ces libertés n’était, au départ, que perçue comme un délire de quelques geeks paranoïaques. Elle est désormais un fait avéré et totalement banalisé, normalisé dans la vie quotidienne de l’immense majorité des humains. Le simple droit à exister sans être espionné, sans être envahi par les monopoles publicitaires et sans être forcé à dépenser de l’argent pour une énième mise à jour a essentiellement disparu. Se connecter aux plateformes en ligne officielles de nombreuses institutions, y compris étatiques, nécessite désormais le plus souvent un compte Google, Apple ou Microsoft. La plus grande université francophone de Belgique, où je suis employé, force chaque étudiant et chaque membre du personnel à utiliser un compte Microsoft et à y sauver toutes ses données, toutes ses communications.

Le parallèle avec l’écologie est troublant à l’heure où la doxa politique consiste à concilier écologie et consumérisme. L’écologie de marché est promue comme une solution exactement de la même manière que l’open source était vu comme une manière pour le logiciel libre de s’imposer.

Nul besoin d’être prophète pour prédire que le résultat sera identique, car il l’est déjà : une situation aggravée, mais perçue comme acceptable, car le combat fait désormais partie du passé. Les militants restants forment une arrière-garde décatie.

Le marché des compensations carbone, qui produit plus de pollution que s’il n’existait pas tout en autorisant les plus gros pollueurs à s’acheter une conscience, n’est que le premier de nombreux exemples. L’absurde hypocrisie des entreprises de se prétendre « écologiques » ou « vertes » en est une autre. En vérité, il n’y a pas de compromis à faire avec l’économie consumériste, car elle est la racine du mal qui nous ronge.

Bon nombre de militants écologistes se regroupent désormais sur des plateformes publicitaires comme Facebook ou Google qui cherchent à privatiser l’information et les espaces de discussions en nous poussant à la consommation. Ce n’est qu’une des nombreuses illustrations de notre incapacité à imaginer les conséquences logiques de nos actions dès le moment où notre salaire et notre confort quotidien dépendent du fait que nous ne les imaginions pas.

Mon expérience universitaire démontre que les organisations qui sont censées nous servir d’élite intellectuelle sont tout autant corrompues et dénuées de l’imagination qui est pourtant le cœur de leur mission.

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

October 01, 2023

Vandaag had ik een gesprek met een 23 jarige.

Het heeft zowel mijn ogen geopend als dat het ze sloot. Deze dame beweerde endometriose te hebben.

Haar hele lichaam had daar last van. En de wereld moet zich daar maar eens mee bezig houden ook. Vond ze.

Het viel me op dat zij het commerciële heel erg belangrijk vond. Blijkbaar zijn er in de V.S. en in de U.K. specialisten bereid om voor veel geld haar baarmoeder te verwijderen.

En omdat dat vele geld daar is, daarom, is het zo, dat, alles, wat die zeggen, de waarheid is.

Aldus deze 23 jarige.

Zij wilde in geen geval kinderen. Dat wist ze als 23 jarige. De symptonen van endometriose waren dusdanig ernstig dat zij als 23 jarige haar baarmoeder wilde laten verwijderen door een aantal chirurgen in de V.S. en de U.K. die daartoe bereid waren, voor veel geld, dat te doen. Dat, waar de Belgische specialisten in Leuven, niet toe bereid waren.

Het maakt blijkbaar niet meer uit waar onze Belgische specialisten van overtuigd zijn. Het extreem-kapitalisme heeft het al lang overgenomen: vrouwen die hun baarmoeder willen laten verwijderen vinden gehoor in de Angelsaksische wereld.

De vrije markt heeft beslist: baarmoeders verwijderen is een transactie zoals alle anderen.

Het kostte me veel moeite om deze jongedame terug te brengen naar de werkelijkheid: dat haar lichaam een geschenk is en dat haar gift om leven te kunnen schenken het belangrijkste is dat er maar bestaat.

Frequency-domain blue noise generator

Cover Image - Live effect run-time inspector

In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them.

As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise.

This post is about how I designed noise in frequency space. I will cover:

  • What is blue noise?
  • Designing indigo noise
  • How swap works in the frequency domain
  • Heuristics and analysis to speed up search
  • Implementing it in WebGPU

Along the way I will also show you some "street" DSP math. This illustrates how getting comfy in this requires you to develop deep intuition about complex numbers. But complex doesn't mean complicated. It can all be done on a paper napkin.

The WebGPU interface I built

What I'm going to make is this:

If properly displayed, this image should look eerily even. But if your browser is rescaling it incorrectly, it may not be exactly right.

Colorless Blue Ideas

I will start by just recapping the essentials. If you're familiar, skip to the next section.

Ordinary random generators produce uniform white noise: every value is equally likely, and the average frequency spectrum is flat.

Time domain

Frequency domain

To a person, this doesn't actually seem fully 'random', because it has clusters and voids. Similarly, a uniformly random list of coin flips will still have long runs of heads or tails in it occasionally.

What a person would consider evenly random is usually blue noise: it prefers to alternate between heads and tails, and avoids long runs entirely. It is 'more random than random', biased towards the upper frequencies, i.e. the blue part of the spectrum.

Time domain

Frequency domain

Blue noise is great for e.g. dithering, because when viewed from afar, or blurred, it tends to disappear. With white noise, clumps remain after blurring:

Blurred white noise

Blurred blue noise

Blueness is a delicate property. If you have e.g. 3D blue noise in a volume XYZ, then a single 2D XY slice is not blue at all:

XYZ spectrum

XY slice

XY spectrum

The samples are only evenly distributed in 3D, i.e. when you consider each slice in front and behind it too.

Blue noise being delicate means that nobody really knows of a way to generate it statelessly, i.e. as a pure function f(x,y,z). Algorithms to generate it must factor in the whole, as noise is only blue if every single sample is evenly spaced. You can make blue noise images that tile, and sample those, but the resulting repetition may be noticeable.

Because blue noise is constructed, you can make special variants.

  • Uniform Blue Noise has a uniform distribution of values, with each value equally likely. An 8-bit 256x256 UBN image will have each unique byte appear exactly 256 times.

  • Projective Blue Noise can be projected down, so that a 3D volume XYZ flattened into either XY, YZ or ZX is still blue in 2D, and same for X, Y and Z in 1D.

  • Spatio-Temporal Blue Noise (STBN) is 3D blue noise created specifically for use in real-time rendering:

    • Every 2D slice XY is 2D blue noise
    • Every Z row is 1D blue noise

This means XZ or YZ slices of STBN are not blue. Instead, it's designed so that when you average out all the XY slices over Z, the result is uniform gray, again without clusters or voids. This requires the noise in all the slices to perfectly complement each other, a bit like overlapping slices of translucent swiss cheese.

This is the sort of noise I want to generate.

Indigo STBN 64x64x16

XYZ spectrum

Sleep Furiously

A blur filter's spectrum is the opposite of blue noise: it's concentrated in the lowest frequencies, with a bump in the middle.

If you blur the noise, you multiply the two spectra. Very little is left: only the ring-shaped overlap, creating a band-pass area.

This is why blue noise looks good when smoothed, and is used in rendering, with both spatial (2D) and temporal smoothing (1D) applied.

Blur filters can be designed. If a blur filter is perfectly low-pass, i.e. ~zero amplitude for all frequencies > $ f_{\rm{lowpass}} $ , then nothing is left of the upper frequencies past a point.

If the noise is shaped to minimize any overlap, then the result is actually noise free. The dark part of the noise spectrum should be large and pitch black. The spectrum shouldn't just be blue, it should be indigo.

When people say you can't design noise in frequency space, what they mean is that you can't merely apply an inverse FFT to a given target spectrum. The resulting noise is gaussian, not uniform. The missing ingredient is the phase: all the frequencies need to be precisely aligned to have the right value distribution.

This is why you need a specialized algorithm.

The STBN paper describes two: void-and-cluster, and swap. Both of these are driven by an energy function. It works in the spatial/time domain, based on the distances between pairs of samples. It uses a "fall-off parameter" sigma to control the effective radius of each sample, with a gaussian kernel.

$$ E(M) = \sum E(p,q) = \sum \exp \left( - \frac{||\mathbf{p} - \mathbf{q}||^2}{\sigma^2_i}-\frac{||\mathbf{V_p} - \mathbf{V_q}||^{d/2}}{\sigma^2_s} \right) $$

STBN (Wolfe et al.)

The swap algorithm is trivially simple. It starts from white noise and shapes it:

  1. Start with e.g. 256x256 pixels initialized with the bytes 0-255 repeated 256 times in order
  2. Permute all the pixels into ~white noise using a random order
  3. Now iterate: randomly try swapping two pixels, check if the result is "more blue"

This is guaranteed to preserve the uniform input distribution perfectly.

The resulting noise patterns are blue, but they still have some noise in all the lower frequencies. The only blur filter that could get rid of it all, is one that blurs away all the signal too. My 'simple' fix is just to score swaps in the frequency domain instead.

If this seems too good to be true, you should know that a permutation search space is catastrophically huge. If any pixel can be swapped with any other pixel, the number of possible swaps at any given step is O(N²). In a 256x256 image, it's ~2 billion.

The goal is to find a sequence of thousands, millions of random swaps, to turn the white noise into blue noise. This is basically stochastic bootstrapping. It's the bulk of good old fashioned AI, using simple heuristics, queues and other tools to dig around large search spaces. If there are local minima in the way, you usually need more noise and simulated annealing to tunnel over those. Usually.

This set up is somewhat simplified by the fact that swaps are symmetric (i.e. (A,B) = (B,A)), but also that applying swaps S1 and S2 is the same as applying swaps S2 and S1 as long as they don't overlap.

Good Energy

Let's take it one hurdle at a time.

It's not obvious that you can change a signal's spectrum just by re-ordering its values over space/time, but this is easy to illustrate.

Random signal

Take any finite 1D signal, and order its values from lowest to highest. You will get some kind of ramp, approximating a sawtooth wave. This concentrates most of the energy in the first non-DC frequency:

Random signal - re-ordered

Now split the odd values from the even values, and concatenate them. You will now have two ramps, with twice the frequency:

Random signal - re-ordered and split into odd/even

You can repeat this to double the frequency all the way up to Nyquist. So you have a lot of design freedom to transfer energy from one frequency to another.

Random signal - re-ordered and split into odd/even x2

In fact the Fourier transform has the property that energy in the time and frequency domain is conserved:

$$ \int_{-\infty}^\infty |f(x)|^2 \, dx = \int_{-\infty}^\infty |\widehat{f}(\xi)|^2 \, d\xi $$

This means the sum of $ |\mathrm{spectrum}_k|^2 $ remains constant over pixel swaps. We then design a target curve, e.g. a high-pass cosine:

$$ \mathrm{target}_k = \frac{1 - \cos \frac{k \pi}{n} }{2} $$

This can be fit and compared to the current noise spectrum to get the error to minimize.

However, I don't measure the error in energy $ |\mathrm{spectrum}_k|^2 $ but in amplitude $ |\mathrm{spectrum}_k| $. I normalize the spectrum and the target into distributions, and take the L2 norm of the difference, i.e. a sqrt of the sum of squared errors:

$$ \mathrm{error}_k = \frac{\mathrm{target}_k}{||\mathbf{target}||} - \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} $$ $$ \mathrm{loss}^2 = ||\mathbf{error}||^2 $$

This keeps the math simple, but also helps target the noise in the ~zero part of the spectrum. Otherwise, deviations near zero would count for less than deviations around one.

Go Banana

So I tried it.

With a lot of patience, you can make 2D blue noise images up to 256x256 on a single thread. A naive random search with an FFT for every iteration is not fast, but computers are.

Making a 64x64x16 with this is possible, but it's certainly like watching paint dry. It's the same number of pixels as 256x256, but with an extra dimension worth of FFTs that need to be churned.

Still, it works and you can also make 3D STBN with the spatial and temporal curves controlled independently:

Converged spectra

I built command-line scripts for this, with a bunch of quality of life things. If you're going to sit around waiting for numbers to go by, you have a lot of time for this...

  • Save and load byte/float-exact state to a .png, save parameters to .json
  • Save a bunch of debug viz as extra .pngs with every snapshot
  • Auto-save state periodically during runs
  • Measure and show rate of convergence every N seconds, with smoothing
  • Validate the histogram before saving to detect bugs and avoid corrupting expensive runs

I could fire up a couple of workers to start churning, while continuing to develop the code liberally with new variations. I could also stop and restart workers with new heuristics, continuing where it left off.

Protip: you can write C in JavaScript

Drunk with power, I tried various sizes and curves, which created... okay noise. Each has the exact same uniform distribution so it's difficult to judge other than comparing to other output, or earlier versions of itself.

To address this, I visualized the blurred result, using a [1 4 6 4 1] kernel as my base line. After adjusting levels, structure was visible:

Semi-converged

Blurred

The resulting spectra show what's actually going on:

Semi-converged

Blurred

The main component is the expected ring of bandpass noise, the 2D equivalent of ringing. But in between there is also a ton of redder noise, in the lower frequencies, which all remains after a blur. This noise is as strong as the ring.

So while it's easy to make a blue-ish noise pattern that looks okay at first glance, there is a vast gap between having a noise floor and not having one. So I kept iterating:

It takes a very long time, but if you wait, all those little specks will slowly twinkle out, until quantization starts to get in the way, with a loss of about 1/255 per pixel (0.0039).

Semi converged

Fully converged

The effect on the blurred output is remarkable. All the large scale structure disappears, as you'd expect from spectra, leaving only the bandpass ringing. That goes away with a strong enough blur, or a large enough dark zone.

The visual difference between the two is slight, but nevertheless, the difference is significant and pervasive when amplified:

Semi converged

Fully converged

Difference

Final spectrum

I tried a few indigo noise curves, with different % of the curve zero. The resulting noise is all extremely equal, even after a blur and amplify. The only visible noise left is bandpass, and the noise floor is so low it may as well not be there.

As you make the black exclusion zone bigger, the noise gets concentrated in the edges and corners. It becomes a bit more linear and squarish, a contender for violet noise. This is basically a smooth evolution towards a pure pixel checkboard in the limit. Using more than 50% zero seems inadvisable for this reason:

Time domain

Frequency domain

At this point the idea was validated, but it was dog slow. Can it be done faster?

Spatially Sparse

An FFT scales like O(N log N). When you are dealing with images and volumes, that N is actually an N² or N³ in practice.

The early phase of the search is the easiest to speed up, because you can find a good swap for any pixel with barely any tries. There is no point in being clever. Each sub-region is very non-uniform, and its spectrum nearly white. Placing pixels roughly by the right location is plenty good enough.

You might try splitting a large volume into separate blocks, and optimize each block separately. That wouldn't work, because all the boundaries remain fully white. Overlapping doesn't fix this, because they will actively create new seams. I tried it.

What does work is a windowed scoring strategy. It avoids a full FFT for the entire volume, and only scores each NxN or NxNxN region around each swapped point, with N-sized FFTs in each dimension. This is enormously faster and can rapidly dissolve larger volumes of white noise into approximate blue even with e.g. N = 8 or N = 16. Eventually it stops improving and you need to bump the range or switch to a global optimizer.

Here's the progression from white noise, to when sparse 16x16 gives up, followed by some additional 64x64:

Time domain

Frequency domain

Time domain

Frequency domain

Time domain

Frequency domain

A naive solution does not work well however. This is because the spectrum of a subregion does not match the spectrum of the whole.

The Fourier transform assumes each signal is periodic. If you take a random subregion and forcibly repeat it, its new spectrum will have aliasing artifacts. This would cause you to consistently misjudge swaps.

To fix this, you need to window the signal in the space/time-domain. This forces it to start and end at 0, and eliminates the effect of non-matching boundaries on the scoring. I used a smoothStep window because it's cheap, and haven't needed to try anything else:

16x16 windowed data

$$ w(t) = 1 - (3|t|^2 - 2|t|^3) , t=-1..1 $$

This still alters the spectrum, but in a predictable way. A time-domain window is a convolution in the frequency domain. You don't actually have a choice here: not using a window is mathematically equivalent to using a very bad window. It's effectively a box filter covering the cut-out area inside the larger volume, which causes spectral ringing.

The effect of the chosen window on the target spectrum can be modeled via convolution of their spectral magnitudes:

$$ \mathbf{target}' = |\mathbf{target}| \circledast |\mathcal{F}(\mathbf{window})| $$

This can be done via the time domain as:

$$ \mathbf{target}' = \mathcal{F}(\mathcal{F}^{-1}(|\mathbf{target}|) \cdot \mathcal{F}^{-1}(|\mathcal{F}(\mathbf{window})|)) $$

Note that the forward/inverse Fourier pairs are not redundant, as there is an absolute value operator in between. This discards the phase component of the window, which is irrelevant.

Curiously, while it is important to window the noise data, it isn't very important to window the target. The effect of the spectral convolution is small, amounting to a small blur, and the extra error is random and absorbed by the smooth scoring function.

The resulting local loss tracks the global loss function pretty closely. It massively speeds up the search in larger volumes, because the large FFT is the bottleneck. But it stops working well before anything resembling convergence in the frequency-domain. It does not make true blue noise, only a lookalike.

The overall problem is still that we can't tell good swaps from bad swaps without trying them and verifying.

Sleight of Frequency

So, let's characterize the effect of a pixel swap.

Given a signal [A B C D E F G H], let's swap C and F.

Swapping the two values is the same as adding F - C = Δ to C, and subtracting that same delta from F. That is, you add the vector:

V = [0 0 Δ 0 0 -Δ 0 0]

This remains true if you apply a Fourier transform and do it in the frequency domain.

To best understand this, you need to develop some intuition around FFTs of Dirac deltas.

Consider the short filter kernel [1 4 6 4 1]. It's a little known fact, but you can actually sight-read its frequency spectrum directly off the coefficients, because the filter is symmetrical. I will teach you.

The extremes are easy:

  • The DC amplitude is the sum 1 + 4 + 6 + 4 + 1 = 16
  • The Nyquist amplitude is the modulated sum 1 - 4 + 6 - 4 + 1 = 0

So we already know it's an 'ideal' lowpass filter, which reduces the Nyquist signal +1, -1, +1, -1, ... to exactly zero. It also has 16x DC gain.

Now all the other frequencies.

First, remember the Fourier transform works in symmetric ways. Every statement "____ in the time domain = ____ in the frequency domain" is still true if you swap the words time and frequency. This has lead to the grotesquely named sub-field of cepstral processing where you have quefrencies and vawes, and it kinda feels like you're having a stroke. The cepstral convolution filter from earlier is called a lifter.

Usually cepstral processing is applied to the real magnitude of the spectrum, i.e. $ |\mathrm{spectrum}| $, instead of its true complex value. This is a coward move.

So, decompose the kernel into symmetric pairs:

[· · 6 · ·]
[· 4 · 4 ·]
[1 · · · 1]

All but the first row is a pair of real Dirac deltas in the time domain. Such a row is normally what you get when you Fourier transform a cosine, i.e.:

$$ \cos \omega = \frac{\mathrm{e}^{i\omega} + \mathrm{e}^{-i\omega}}{2} $$

A cosine in time is a pair of Dirac deltas in the frequency domain. The phase of a (real) cosine is zero, so both its deltas are real.

Now flip it around. The Fourier transform of a pair [x 0 0 ... 0 0 x] is a real cosine in frequency space. Must be true. Each new pair adds a new higher cosine on top of the existing spectrum. For the central [... 0 0 x 0 0 ...] we add a DC term. It's just a Fourier transform in the other direction:

|FFT([1 4 6 4 1])| =

  [· · 6 · ·] => 6 
  [· 4 · 4 ·] => 8 cos(ɷ)
  [1 · · · 1] => 2 cos(2ɷ)
  
 = |6 + 8 cos(ɷ) + 2 cos(2ɷ)|

Normally you have to use the z-transform to analyze a digital filter. But the above is a shortcut. FFTs and inverse FFTs do have opposite phase, but that doesn't matter here because cos(ɷ) = cos(-ɷ).

This works for the symmetric-even case too: you offset the frequencies by half a band, ɷ/2, and there is no DC term in the middle:

|FFT([1 3 3 1])| =

  [· 3 3 ·] => 6 cos(ɷ/2)
  [1 · · 1] => 2 cos(3ɷ/2)

 = |6 cos(ɷ/2) + 2 cos(3ɷ/2)|

So, symmetric filters have spectra that are made up of regular cosines. Now you know.

For the purpose of this trick, we centered the filter around $ t = 0 $. FFTs are typically aligned to array index 0. The difference between the two is however just phase, so it can be disregarded.

What about the delta vector [0 0 Δ 0 0 -Δ 0 0]? It's not symmetric, so we have to decompose it:

V1 = [· · · · · Δ · ·]
V2 = [· · Δ · · · · ·]

V = V2 - V1

Each is now an unpaired Dirac delta. Each vector's Fourier transform is a complex wave $ Δ \cdot \mathrm{e}^{-i \omega k} $ in the frequency domain (the k'th quefrency). It lacks the usual complementary oppositely twisting wave $ Δ \cdot \mathrm{e}^{i \omega k} $, so it's not real-valued. It has constant magnitude Δ and varying phase:

FFT(V1) = [
Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
] FFT(V2) = [
Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
]

These are vawes.

The effect of a swap is still just to add FFT(V), aka FFT(V2) - FFT(V1) to the (complex) spectrum. The effect is to transfer energy between all the bands simultaneously. Hence, FFT(V1) and FFT(V2) function as a source and destination mask for the transfer.

However, 'mask' is the wrong word, because the magnitude of $ \mathrm{e}^{i \omega k} $ is always 1. It doesn't have varying amplitude, only varying phase. -FFT(V1) and FFT(V2) define the complex direction in which to add/subtract energy.

When added together their phases interfere constructively or destructively, resulting in an amplitude that varies between 0 and 2Δ: an actual mask. The resulting phase will be halfway between the two, as it's the sum of two equal-length complex numbers.

FFT(V) = [
·
Δ
Δ
Δ
Δ
Δ
Δ
Δ
]

For any given pixel A and its delta FFT(V1), it can pair up with other pixels B to form N-1 different interference masks FFT(V2) - FFT(V1). There are N(N-1)/2 unique interference masks, if you account for (A,B) (B,A) symmetry.

Worth pointing out, the FFT of the first index:

FFT([Δ 0 0 0 0 0 0 0]) = [Δ Δ Δ Δ Δ Δ Δ Δ]

This is the DC quefrency, and the fourier symmetry continues to work. Moving values in time causes the vawe's quefrency to change in the frequency domain. This is the upside-down version of how moving energy to another frequency band causes the wave's frequency to change in the time domain.

What's the Gradient, Kenneth?

Using vectors as masks... shifting energy in directions... this means gradient descent, no?

Well.

It's indeed possible to calculate the derivative of your loss function as a function of input pixel brightness, with the usual bag of automatic differentiation/backprop tricks. You can also do it numerically.

But, this doesn't help you directly because the only way you can act on that per-pixel gradient is by swapping a pair of pixels. You need to find two quefrencies FFT(V1) and FFT(V2) which interfere in exactly the right way to decrease the loss function across all bad frequencies simultaneously, while leaving the good ones untouched. Even if the gradient were to help you pick a good starting pixel, that still leaves the problem of finding a good partner.

There are still O(N²) possible pairs to choose from, and the entire spectrum changes a little bit on every swap. Which means new FFTs to analyze it.

Random greedy search is actually tricky to beat in practice. Whatever extra work you spend on getting better samples translates into less samples tried per second. e.g. Taking a best-of-3 approach is worse than just trying 3 swaps in a row. Swaps are almost always orthogonal.

But random() still samples unevenly because it's white noise. If only we had.... oh wait. Indeed if you already have blue noise of the right size, you can use that to mildly speed up the search for more. Use it as a random permutation to drive sampling, with some inevitable whitening over time to keep it fresh. You can't however use the noise you're generating to accelerate its own search, because the two are highly correlated.

What's really going on is all a consequence of the loss function.

Loss amplitude

Given any particular frequency band, the loss function is only affected when its magnitude changes. Its phase can change arbitrarily, rolling around without friction. The complex gradient must point in the radial direction. In the tangential direction, the partial derivative is zero.

The value of a given interference mask FFT(V1) - FFT(V2) for a given frequency is also complex-valued. It can be projected onto the current phase, and split into its radial and tangential component with a dot product.

Loss amplitude vector basis

The interference mask has a dual action. As we saw, its magnitude varies between 0 and 2Δ, as a function of the two indices k1 and k2. This creates a window that is independent of the specific state or data. It defines a smooth 'hash' from the interference of two quefrency bands.

But its phase adds an additional selection effect: whether the interference in the mask is aligned with the current band's phase: this determines the split between radial and tangential. This defines a smooth phase 'hash' on top. It cycles at the average of the two quefrencies, i.e. a different, third one.

Loss amplitude vector basis

Energy is only added/subtracted if both hashes are non-zero. If the phase hash is zero, the frequency band only turns. This does not affect loss, but changes how each mask will affect it in the future. This then determines how it is coupled to other bands when you perform a particular swap.

Note that this is only true differentially: for a finite swap, the curvature of the complex domain comes into play.

The loss function is actually a hyper-cylindrical skate bowl you can ride around. Just the movement of all the bands is tied together.

Frequency bands with significant error may 'random walk' freely clockwise or counterclockwise when subjected to swaps. A band can therefor drift until it gets a turn where its phase is in alignment with enough similar bands, where the swap makes them all descend along the local gradient, enough to counter any negative effects elsewhere.

In the time domain, each frequency band is a wave that oscillates between -1...1: it 'owns' some of the value of each pixel, but there are places where its weight is ~zero (the knots).

So when a band shifts phase, it changes how much of the energy of each pixel it 'owns'. This allows each band to 'scan' different parts of the noise in the time domain. In order to fix a particular peak or dent in the frequency spectrum, the search must rotate that band's phase so it strongly owns any defect in the noise, and then perform a swap to fix that defect.

Thus, my mental model of this is not actually disconnected pixel swapping.

It's more like one of those Myst puzzles where flipping a switch flips some of the neighbors too. You press one pair of buttons at a time. It's a giant haunted dimmer switch.

We're dealing with complex amplitudes, not real values, so the light also has a color. Mechanically it's like a slot machine, with dials that can rotate to display different sides. The cherries and bells are the color: they determine how the light gets brighter or darker. If a dial is set just right, you can use it as a /dev/null to 'dump' changes.

That's what theory predicts, but does it work? Well, here is a (blurred noise) spectrum being late-optimized. The search is trying to eliminate the left-over lower frequency noise in the middle:

Semi converged

Here's the phase difference from the late stages of search, each a good swap. Left to right shows 4 different value scales:

x2

x16

x256

x4096

x2

x16

x256

x4096

At first it looks like just a few phases are changing, but amplification reveals it's the opposite. There are several plateaus. Strongest are the bands being actively modified. Then there's the circular error area around it, where other bands are still swirling into phase. Then there's a sharp drop-off to a much weaker noise floor, present everywhere. These are the bands that are already converged.

Compare to a random bad swap:

x2

x16

x256

x4096

Now there is strong noise all over the center, and the loss immediately gets worse, as a bunch of amplitudes start shifting in the wrong direction randomly.

So it's true. Applying the swap algorithm with a spectral target naturally cycles through focusing on different parts of the target spectrum as it makes progress. This information is positionally encoded in the phases of the bands and can be 'queried' by attempting a swap.

This means the constraint of a fixed target spectrum is actually a constantly moving target in the complex domain.

Frequency bands that reach the target are locked in. Neither their magnitude nor phase changes in aggregate. The random walks of such bands must have no DC component... they must be complex-valued blue noise with a tiny amplitude.

Knowing this doesn't help directly, but it does explain why the search is so hard. Because the interference masks function like hashes, there is no simple pattern to how positions map to errors in the spectrum. And once you get close to the target, finding new good swaps is equivalent to digging out information encoded deep in the phase domain, with O(N²) interference masks to choose from.

Gradient Sampling

As I was trying to optimize for evenness after blur, it occurred to me to simply try selecting bright or dark spots in the blurred after-image.

This is the situation where frequency bands are in coupled alignment: the error in the spectrum has a relatively concentrated footprint in the time domain. But, this heuristic merely picks out good swaps that are already 'lined up' so to speak. It only works as a low-hanging fruit sampler, with rapidly diminishing returns.

Next I used the gradient in the frequency domain.

The gradient points towards increasing loss, which is the sum of squared distance $ (…)^2 $. So the slope is $ 2(…) $, proportional to distance to the goal:

$$ |\mathrm{gradient}_k| = 2 \cdot \left( \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} - \frac{\mathrm{target}_k}{||\mathbf{target}||} \right) $$

It's radial, so its phase matches the spectrum itself:

$$ \mathrm{gradient}_k = \mathrm{|gradient_k|} \cdot \left(1 ∠ \mathrm{arg}(\mathrm{spectrum}_k) \right) $$

Eagle-eyed readers may notice the sqrt part of the L2 norm is missing here. It's only there for normalization, and in fact, you generally want a gradient that decreases the closer you get to the target. It acts as a natural stabilizer, forming a convex optimization problem.

You can transport this gradient backwards by applying an inverse FFT. Usually derivatives and FFTs don't commute, but that's only when you are deriving in the same dimension as the FFT. The partial derivative here is neither over time nor frequency, but by signal value.

The resulting time-domain gradient tells you how fast the (squared) loss would change if a given pixel changed. The sign tells you whether it needs to become lighter or darker. In theory, a pixel with a large gradient can enable larger score improvements per step.

It says little about what's a suitable pixel to pair with though. You can infer that a pixel needs to be paired with one that is brighter or darker, but not how much. The gradient only applies differentially. It involves two pixels, so it will cause interference between the two deltas, and also with the signal's own phase.

The time-domain gradient does change slowly after every swap—mainly the swapping pixels—so this only needs to add an extra IFFT every N swap attempts, reusing it in between.

I tried this in two ways. One was to bias random sampling towards points with the largest gradients. This barely did anything, when applied to one or both pixels.

Then I tried going down the list in order, and this worked better. I tried a bunch of heuristics here, like adding a retry until paired, and a 'dud' tracker to reject known unpairable pixels. It did lead to some minor gains in successful sample selection. But beating random was still not a sure bet in all cases, because it comes at the cost of ordering and tracking all pixels to sample them.

All in all, it was quite mystifying.

Pair Analysis

Hence I analyzed all possible swaps (A,B) inside one 64x64 image at different stages of convergence, for 1024 pixels A (25% of total).

The result was quite illuminating. There are 2 indicators of a pixel's suitability for swapping:

  • % of all possible swaps (A,_) that are good
  • score improvement of best possible swap (A,B)

They are highly correlated, and you can take the geometric average to get a single quality score to order by:

Pixel A quality

The curve shows that the best possible candidates are rare, with a sharp drop-off at the start. Here the average candidate is ~1/3rd as good as the best, though every pixel is pairable. This represents the typical situation when you have unconverged blue-ish noise.

Order all pixels by their (signed) gradient, and plot the quality:

Pixel A quality by gradient

The distribution seems biased towards the ends. A larger absolute gradient at A can indeed lead to both better scores and higher % of good swaps.

Notice that it's also noisier at the ends, where it dips below the middle. If you order pixels by their quality, and then plot the absolute gradient, you see:

Pixel A gradient by quality

Selecting for large gradient at A will select both the best and the worst possible pixels A. This implies that there are pixels in the noise that are very significant, but are nevertheless currently 'unfixable'. This corresponds to the 'focus' described earlier.

By drawing from the 'top', I was mining the imbalance between the good/left and bad/right distribution. Selecting for a vanishing gradient would instead select the average-to-bad pixels A.

I investigated one instance of each: very good, average or very bad pixel A. I tried every possible swap (A, B) and plotted the curve again. Here the quality is just the actual score improvement:

Pixel B quality for good pixel A
Pixel B quality for average pixel A
Pixel B quality for bad pixel A

The three scenarios have similar curves, with the bulk of swaps being negative. Only a tiny bit of the curve is sticking out positive, even in the best case. The potential benefit of a good swap is dwarfed by the potential harm of bad swaps. The main difference is just how many positive swaps there are, if any.

So let's focus on the positive case, where you can see best.

You can order by score, and plot the gradient of all the pixels B, to see another correlation.

Pixel B gradient by quality for good pixel A

It looks kinda promising. Here the sign matters, with left and right being different. If the gradient of pixel A is the opposite sign, then this graph is mirrored.

But if you order by (signed) gradient and plot the score, you see the real problem, caused by the noise:

Pixel B quality by gradient for good pixel A

The good samples are mixed freely among the bad ones, with only a very weak trend downward. This explains why sampling improvements based purely on gradient for pixel B are impossible.

You can see what's going on if you plot Δv, the difference in value between A and B:

Pixel B value by quality for good pixel A

For a given pixel A, all the good swaps have a similar value for B, which is not unexpected. Its mean is the ideal value for A, but there is a lot of variance. In this case pixel A is nearly white, so it is brighter than almost every other pixel B.

If you now plot Δv * -gradient, you see a clue on the left:

Pixel B value by quality for good pixel A

Almost all of the successful swaps have a small but positive value.

This represents what we already knew: the gradient's sign tells you if a pixel should be brighter or darker. If Δv has the opposite sign, the chances of a successful swap are slim.

Ideally both pixels 'face' the right way, so the swap is beneficial on both ends. But only the combined effect on the loss matters: i.e. Δv * Δgradient < 0.

It's only true differentially so it can misfire. But compared to blind sampling of pairs, it's easily 5-10x better and faster, racing towards the tougher parts of the search.

What's more... while this test is just binary, I found that any effort spent on trying to further prioritize swaps by the magnitude of the gradient is entirely wasted. Maximizing Δv * Δgradient by repeated sampling is counterproductive, because it selects more bad candidates on the right. Minimizing Δv * Δgradient creates more successful swaps on the left, but lowers the average improvement per step so the convergence is net slower. Anything more sophisticated incurs too much computation to be worth it.

It does have a limit. This is what it looks like when an image is practically fully converged:

Pixel B value by quality in late convergence

Eventually you reach the point where there are only a handful of swaps with any real benefit, while the rest is just shaving off a few bits of loss at a time. It devolves back to pure random selection, only skipping the coin flip for the gradient. It is likely that more targeted heuristics can still work here.

The gradient also works in the early stages. As it barely changes over successive swaps, this leads to a different kind of sparse mode. Instead of scoring only a subset of pixels, simply score multiple swaps as a group over time, without re-scoring intermediate states. This lowers the success rate roughly by a power (e.g. 0.8 -> 0.64), but cuts the number of FFTs by a constant factor (e.g. 1/2). Early on this trade-off can be worth it.

Even faster: don't score steps at all. In the very early stage, you can easily get up to 80-90% successful swaps just by filtering on values and gradients. If you just swap a bunch in a row, there is a very good chance you will still end up better than before.

It works better than sparse scoring: using the gradient of your true objective approximately works better than using an approximate objective exactly.

The latter will miss the true goal by design, while the former continually re-aims itself to the destination despite inaccuracy.

Obviously you can mix and match techniques, and gradient + sparse is actually a winning combo. I've only scratched the surface here.

Warp Speed

Time to address the elephant in the room. If the main bottleneck is an FFT, wouldn't this work better on a GPU?

The answer to that is an unsurprising yes, at least for large sizes where the overhead of async dispatch is negligible. However, it would have been endlessly more cumbersome to discover all of the above based on a GPU implementation, where I can't just log intermediate values to a console.

After checking everything, I pulled out my bag of tricks and ported it to Use.GPU. As a result, the algorithm runs entirely on the GPU, and provides live visualization of the entire process. It requires a WebGPU-enabled browser, which in practice means Chrome on Windows or Mac, or a dev build elsewhere.

I haven't particularly optimized this—the FFT is vanilla textbook—but it works. It provides an easy ~8x speed up on an M1 Mac on beefier target sizes. With a desktop GPU, 128x128x32 and larger become very feasible.

It lacks a few goodies from the scripts, and only does gradient + optional sparse. You can however freely exchange PNGs between the CPU and GPU version via drag and drop, as long as the settings match.

Live Effect Run-time - Layout

Layout components

Live Effect Run-time - Compute Loop

Compute components

Worth pointing out: this visualization is built using Use.GPU's HTML-like layout system. I can put div-like blocks inside a flex box wrapper, and put text beside it... while at the same time using raw WGSL shaders as the contents of those divs. These visualization shaders sample and colorize the algorithm state on the fly, with no CPU-side involvement other than a static dispatch. The only GPU -> CPU readback is for the stats in the corner, which are classic React and real HTML, along with the rest of the controls.

I can then build an <FFT> component and drop it inside an async <ComputeLoop>, and it does exactly what it should. The rest is just a handful of <Dispatch> elements and the ordinary headache of writing compute shaders. <Suspense> ensures all the shaders are compiled before dispatching.

While running, the bulk of the tree is inert, with only a handful of reducers triggering on a loop, causing a mere 7 live components to update per frame. The compute dispatch fights with the normal rendering for GPU resources, so there is an auto-batching mechanism that aims for approximately 30-60 FPS.

The display is fully anti-aliased, including the pixelized data. I'm using the usual per-pixel SDF trickery to do this... it's applied as a generic wrapper shader for any UV-based sampler.

It's a good showcase that Use.GPU really is React-for-GPUs with less hassle, but still with all the goodies. It bypasses most of the browser once the canvas gets going, and it isn't just for UI: you can express async compute just fine with the right component design. The robust layout and vector plotting capabilities are just extra on top.

I won't claim it's the world's most elegant abstraction, because it's far too pragmatic for that. But I simply don't know any other programming environment where I could even try something like this and not get bogged down in GPU binding hell, or have to round-trip everything back to the CPU.

* * *

So there you have it: blue and indigo noise à la carte.

What I find most interesting is that the problem of generating noise in the time domain has been recast into shaping and denoising a spectrum in the frequency domain. It starts as white noise, and gets turned into a pre-designed picture. You do so by swapping pixels in the other domain. The state for this process is kept in the phase channel, which is not directly relevant to the problem, but drifts into alignment over time.

Hence I called it Stable Fiddusion. If you swap the two domains, you're turning noise into a picture by swapping frequency bands without changing their values. It would result in a complex-valued picture, whose magnitude is the target, and whose phase encodes the progress of the convergence process.

This is approximately what you get when you add a hidden layer to a diffusion model.

What I also find interesting is that the notion of swaps naturally creates a space that is O(N²) big with only N samples of actual data. Viewed from the perspective of a single step, every pair (A,B) corresponds to a unique information mask in the frequency domain that extracts a unique delta from the same data. There is redundancy, of course, but the nature of the Fourier transform smears it out into one big superposition. When you do multiple swaps, the space grows, but not quite that fast: any permutation of the same non-overlapping swaps is equivalent. There is also a notion of entanglement: frequency bands / pixels are linked together to move as a whole by default, but parts will diffuse into being locked in place.

Phase is kind of the bugbear of the DSP world. Everyone knows it's there, but they prefer not to talk about it unless its content is neat and simple. Hopefully by now you have a better appreciation of the true nature of a Fourier transform. Not just as a spectrum for a real-valued signal, but as a complex-valued transform of a complex-valued input.

During a swap run, the phase channel continuously looks like noise, but is actually highly structured when queried with the right quefrency hashes. I wonder what other things look like that, when you flip them around.

More:

The future of Offpunk: UNIX command-line heaven and packaging hell

A story about how the UNIX philosophy made me develop tools I’m actually proud of and why packaging is holding me back.

Two years ago, I decided that I wanted to be able to browse Gemini while offline. I started to add a permanent cache to Solderpunk’s AV-98, the simplest and first Gemini browser ever. It went surprisingly well. Then, as the excellent forlater.email service went down for a week, I thought that I would add a quick and hackish HTTP support to it. Just a temporary experiment.

The same week, I serendipitously stumbled upon chafa, an image rendering tool which was on my computer because of neofetch. I thought it would be funny to have pictures rendered in webpages in my terminal. Just an experiment to take some funny screenshots, nothing more.

But something really surprising happened: it was working. It was really useful. I was really using it and, after adding support for RSS, I realised that this experiment was actually working better for me than forlater.email and newsboat. Offpunk was born without really thinking about it and became a real project with its own philosophy.

Born on Gemini, I wanted Offpunk to keep its minimalistic roots: keeping dependencies under control (making them optional and implementing the underlying feature myself as soon as it makes sense), keeping it simple (one single runnable python script), caring as much as possible about older versions of python, listening to people using it on very minimal systems. I also consciously choose to use only solutions that have been time-trial-tested. I’ve spent too many years of my life falling for the "new-trendy-technology" and learned from those mistakes. The one-file aspect assured that it was really easy to use and to hack: open the file, modify something, run it.

I’m not a good developer. Anything more complex than that is too much for my taste. Unless forced, I’ve never used an IDE, never understood complex toolchains nor packaging. I modify files with (neo)vim (without any plugin), compile from the command line and run the resulting binary (not even needing that step with python). Life is too short for making it more complex. I like to play with the code, not to learn tools that would do it for me.

But offpunk.py was becoming fat. 4500 lines of organic python which have grown over an AV-98 structured to be a test bed for an experimental protocol. The number of people able to understand its code entanglement varied between 0 and 1, depending on the quality of my morning Earl Grey.

I wanted to make life easier for contributors. I also realised that some features I developed might be useful without offpunk. So I stepped into a huge refactoring and managed to split offpunk into several components. My goal was to separate the code into multiple individual components doing one thing and doing it well. And, to my own surprise, I succeeded.

Netcache.py

I called the first component "netcache". Think of netcache as a cached version of wget. If possible, netcache will give you a cached version of the URL you are asking. If no cache or too old and if allowed to go online, netcache will download it.

It means that if you like Offpunk’s core concept but don’t like the interface and want, for example, a GUI, you could write your own browser that would, using netcache, share the cache with Offpunk.

Netcache is currently working just well enough for my needs but could do a lot better. I should, for example, investigate replacing the network code by libcurl and implementing support for multithreaded concurrent downloads.

Ansicat.py

Coloured output in your terminal is done through a standard called ANSI. As I wrote the first HTML to ANSI renderer for offpunk, I started to understand how awful the HTML standard was. Armed with that experience, I started a second renderer and, to be honest, it is actually not that bad. I’m even proud of it.

Ansicat is really useful when in a terminal because it will render HTML and gemtext in a good, readable way. If the optional library python-readability is present, ansicat will try to extract the main content from a web page (and, yes, python-readability is one dependency I would like to reimplement someday).

With netcache and ansicat, you can already do something like:

netcache https://ploum.net | ansicat --format=html

Yes, it works. And yes, as a UNIX junkie, I was completely excited the first time it worked. Look mum, I’m Ken Thompson! Making ansicat a separate tool made me think about adding support for other formats. Like PDF or office documents. How cool would it be to have a single cat command for so many different formats?

Opnk.py

While netcache and ansicat were clear components I wanted to split from Offpunk’s core since the start of the refactoring, another tool appeared spontaneously: opnk.

Opnk (Open-like-a-punk) is basically a wrapper that will run ansicat on any file given. If given a URL, it will ask netcache for the file. Result will be displayed in less (after passing through ansicat, of course).

If ansicat cannot open the file, opnk fallbacks on xdg-open.

That looks like nothing but it proved to be massively useful in my workflow. I already use opnk every day. Each time I want to open a file, I don’t think about the command, I type "opnk". It even replaced cat for many use cases. I’m considering renaming it "opn" to save one character. Using opnk also explains why I want to work on supporting PDF/office documents with ansicat. That would be one less opportunity to leave the terminal.

Offpunk.py

Through this architecture, Offpunk became basically an interface above opnk. And this proved to work well. Many longstanding bugs were fixed, performance and usability were vastly improved.

Everything went so well that I dreamed releasing offpunk 2.0, netcache, ansicat and opnk while running naked with talking animals in field of flowers under a rainbow. Was it really Earl Grey in the cup that day?

Packaging Offpunk.py

Now for the bad news.

As expected, the refactoring forced me to break my "one-single-python-file" rule.

I felt guilty for those people who told me about using offpunk on very minimal systems, sometimes from a USB key. But I thought that this was not a real problem. Instead of one python script, I had four of them (and a fifth file containing some shared code). That should not be that much of a problem, isn’t it?

Well, python packaging systems would like to disagree. Flowers fade, the rainbow disappears behind black and heavy clouds while animals start to look at me with a devilish look and surprisingly sharp teeth.

I’ve spent many hours, asked several people on the best way to package multiple python files without making the whole thing a module. Without success. Hopefully, the community is really helpful. David Zaslavsky stepped on the mailing list to give lots of advice and, as I was discouraged, Austreelis started to work really hard to make offpunk both usable directly and packagable. I’m really grateful for their help and their work. But, so far, without clear success. I feel sad about the amount of energy required to address something as simple as "I’ve 5 python files which depend on each other and I want to be able to launch them separately".

The software is working really well. The refactoring allowed me to fix longstanding bugs and to improve a lot of areas while adding new features (colour themes anyone?) On my computer, I added four aliases in my zsh config: offpunk, opnk, ansicat and netcache. Each alias runs the corresponding python file. Nothing fancy and I want to keep it that way. I know for a fact that several users are doing something similar: git clone then run it from an arbitrary location.

Keeping things as simple as that is the main philosophical goal behind offpunk. It’s an essential part of the project. If people want to use pip or any other tool to mess up their computer configuration, that’s their choice. But it should never be required.

Which means that I’m now in a very frustrating position: Offpunk 2.0 is more than ready from a code point of view. But it cannot be shipped because there’s currently no easy way to package it. The pyproject.toml file had become an obstacle to the whole development process.

I’m contemplating putting everything back in one big file. Or removing the pyprojects.toml file from the repository and releasing offpunk "as it is".

Some will call me an old conservative fart for refusing to use one of those gazillion shiny packaging system. Others will judge me as a pretty poor programmer if I managed to do 20 years of Python without ever understanding pip nor using an IDE.

They are probably right. What would you seriously expect from someone doing a command-line tool to browse Gemini and Gopher?

But there’s maybe an easier solution than to change my mind and offpunk’s core philosophy. A simple solution that I missed. If that’s the case, don’t hesitate to drop a word on the devel mailing-list, Austreelis and I will be happy to hear about your opinion and your experience.

While you are at it, bug reports and feedback are also welcome. I’ve this odd custom of finding embarrassing bugs only hours after a release. I really hope to do better with offpunk 2.0.

And after we’ve solved that little packaging anecdote together, I will happily return to my bare neovim to code all the ideas I want to implement for 2.1, 2.2 and many more releases to come.

As a writer and an engineer, I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

If you read French, you can support me by buying/sharing/reading my books and subscribing to my newsletter in French or RSS. I also develop Free Software.

September 28, 2023

We now invite proposals for developer rooms for FOSDEM 2024. FOSDEM offers open source and free software developers a place to meet, share ideas and collaborate. Renowned for being highly developer-oriented, the event brings together some 8000+ geeks from all over the world. The twenty-fourth edition will take place Saturday 3th and Sunday 4th February 2024 in Brussels, Belgium. Developer rooms are assigned to self-organising groups to work together on open source and free software projects, to discuss topics relevant to a broader subset of the community, etc. Most content should take the form of presentations. Proposals involving collaboration舰

September 27, 2023

40 ans de GNU

Richard Stallman ne voulait pas changer le monde. Il ne voulait pas se battre contre les moulins à vent. Il ne voulait pas réinventer la roue. Richard Stallman voulait simplement retrouver ses amis, sa communauté.

Pour ce jeune homme barbu et rondouillard, les relations sociales n’avaient jamais été simples. Toujours plongé dans les livres et adorant résoudre des casse-têtes logiques, le jeune homme avait toujours eu un peu de mal à trouver sa place. Il avait beau adorer la compagnie, les longues discussions et la danse, ses intérêts pour les mathématiques semblaient toujours un peu en décalage. Son humour, surtout, était souvent mal perçu au point de choquer ou d’effrayer. C’est au laboratoire d’Intelligence Artificielle du MIT qu’il avait enfin eu l’impression d’être entièrement à sa place. Les jours et les nuits devant un écran, les doigts sur un clavier, entourés de personnes qui, comme lui, ne cherchaient que des problèmes à résoudre. À résoudre de la manière la plus simple, la plus élégante, la plus rigolote ou la plus absurde. Pour l’amour de l’art, par besoin ou par simple envie de faire une blague potache.

RMS, ainsi qu’il se présentait chaque fois que l’ordinateur lui affichait le mot "login:", était heureux.

Mais le vent changeait. En 1976, le très jeune dirigeant d’une obscure société vendant un compilateur BASIC s’était fendu d’une longue lettre ouverte à la communauté des utilisateurs d’ordinateurs. Dans cette lettre, il suppliait les amateurs d’ordinateurs d’arrêter de partager des logiciels, de le modifier, de les copier. À la place, arguait-il, il faut acheter les logiciels. Il faut payer les développeurs. Bref, il faut faire la différence entre les développeurs payés et les utilisateurs qui paient et n’ont pas le droit de comprendre comment le programme fonctionne.

S’il l’a lue, la lettre est passée au-dessus de la tête de Richard. Ce que produit ce jeune William Gates, dit Bill, et sa société « Micro-Soft » ne l’intéressait pas à l’époque. Il sait bien que l’esprit « hacker » est celui du partage, de la curiosité. Ken Thompson, l’inventeur d’Unix, n’avait jamais caché son désir de partager toutes ses expérimentations. Lorsque les avocats d’AT&T, son employeur, avaient commencé à rechigner en déposant la marque UNIX puis en interdisant tout partage, lui, Dennis Ritchie, Brian Kernighan et leurs comparses s’étaient amusés à contourner toutes les règles. Le code source se transmettait via des bandes « oubliées » dans un bureau voire sur les bancs des parcs. Le code source entier d’UNIX, annoté et commenté par John Lions pour servir de support éducatif à ses étudiants, se targuait d’être le livre d’informatique le plus photocopié du monde malgré l’interdiction d’en faire des copies.

Les Bill Gates et leurs armées d’avocats ne pourraient jamais venir à bout de l’esprit hacker. Du moins, c’est ce que Richard Stallman pensait en travaillant à sa machine virtuel LISP et à son éditeur Emacs.

Jusqu’au jour où il réalisa qu’une société, Symbolics, avait graduellement engagé tous ses collègues. Ses amis. Chez Symbolics, ceux-ci continuaient à travailler à une machine virtuelle LISP. Mais ils ne pouvaient plus rien partager avec Richard. Ils étaient devenus concurrents, un concept inimaginable pour le hacker aux cheveux en bataille. Par bravade, celui-ci se mit alors à copier et implémenter dans la machine LISP du MIT chaque nouvelle fonctionnalité développée par Symbolics. À lui tout seul, il abattait le même travail que des dizaines d’ingénieurs. Il n’avait bien entendu pas accès au code source et devait se contenter de la documentation de Symbolics pour deviner les principes de fonctionnement.

Le changement d’ambiance avait été graduel. Richard avait perdu ses amis, sa communauté. Il avait été forcé, à son corps défendant, de devenir un compétiteur plutôt qu’un collaborateur. Il ne s’en rendait pas complètement compte. Le problème était encore flou dans sa tête jusqu’au jour où une nouvelle imprimante fit son apparition dans les locaux du MIT.

Il faut savoir que, à l’époque, les imprimantes faisaient la taille d’un lit et avaient pas mal de problèmes. Sur la précédente, Richard avait bricolé un petit système envoyant automatiquement une alerte en cas de bourrage. Il n’avait pas réfléchi, il avait pris le code source de l’imprimante et l’avait modifié sans se poser de questions. Mais, contre toute attente, le code source de la nouvelle imprimante n’était pas livré avec. Le monde de l’informatique était encore tout petit et Richard avait une idée de qui, chez Xerox, avait pu écrire le logiciel faisant fonctionner l’imprimante. Profitant d’un voyage, il se rendit dans le bureau de la personne pour lui demander une copie.

La discussion fut très courte. La personne n’avait pas le droit de partager le code source. Et si elle le partageait, Richard devait signer un accord de non-divulgation. Il n’aurait, à son tour, pas le droit de partager.

Pas le droit de partager ? PAS LE DROIT DE PARTAGER ?

Le partage n’est-il pas l’essence même de l’humanité ? La connaissance ne repose-t-elle pas entièrement sur le partage intellectuel ?

Le ver glissé dans le fruit par Bill Gates commençait à faire son œuvre. Le monde commençait à souscrire à la philosophie selon laquelle faire de Bill Gates l’homme le plus riche du monde était une chose plus importante que le partage de la connaissance. Que la compétition devait nécessairement venir à bout de la collaboration. Les hackers avaient fini par enfiler une cravate et se soumettre aux avocats.

S’il ne faisait rien, Richard ne retrouverait plus jamais ses amis, sa communauté. Bouillonnant de colère, il décida de reconstruire, à lui tout seul, la communauté hacker. De la fédérer autour d’un projet que n’importe qui pourrait partager, améliorer, modifier. Que personne ne pourrait s’approprier.

Il nomma son projet « GNU », les initiales de « GNU’s Not Unix » et l’annonça sur le réseau Usenet le 27 septembre 1983. Il y a 40 ans aujourd’hui.

Bon anniversaire GNU.

Après cette annonce, Richard Stallman allait se mettre à réécrire chacun des très nombreux logiciels qui composaient le système Unix. Tout seul au début, il créait le système GNU de toutes pièces. Son seul échec fut le développement d’un noyau permettant de faire tourner GNU sur des ordinateurs sans avoir besoin d’un système non-GNU. Richard percevait le problème, car, en plus de coder, il développait la philosophie du partage et du libre. Il inventait les fondements du copyleft.

En 1991, en s’aidant des outils GNU, dont le compilateur GCC, un jeune Finlandais, Linus Torvalds, allait justement créer un noyau à partir de rien. Un noyau qu’il allait mettre sous la licence copyleft inventée par Stallman.

Mais ceci est une autre histoire…

Lectures suggérées :

  • Richard Stallman et la révolution du logiciel libre, par Richard Stallman, Sam Williams et Christophe Masutti
  • The Daemon, the Gnu and the Penguin, par Peter H. Salus
  • UNIX, A history and a Memoir, par Brian Kernighan
  • Lion’s Commentary on UNIX 6th Edition with Source Code, par John Lions
  • Lettre ouverte aux utilisateurs d’ordinateurs, par Bill Gates

Ingénieur et écrivain, j’explore l’impact des technologies sur l’humain, tant par écrit que dans mes conférences.

Recevez directement par mail mes écrits en français et en anglais. Votre adresse ne sera jamais partagée. Vous pouvez également utiliser mon flux RSS francophone ou le flux RSS complet.

Pour me soutenir, achetez mes livres (si possible chez votre libraire) ! Je viens justement de publier un recueil de nouvelles qui devrait vous faire rire et réfléchir.

September 24, 2023

Op microcontrollerbordjes zoals een Arduino, Raspberry Pi Pico of ESP32 kun je allerlei leds, knoppen en sensoren aansluiten. Veel van die bordjes hebben een ingebouwde wifi-chip, waardoor je ze op afstand kunt aansturen. Maar soms is wifi niet mogelijk, te lastig of gewoon helemaal niet nodig.

Gelukkig zijn de meeste microcontrollerbordjes uitgerust met een usb-aansluiting en die kun je ook gebruiken om vanaf je computer opdrachten naar de microcontroller te sturen of informatie zoals sensordata terug te krijgen. Dat kan eenvoudig via een seriële interface over USB CDC. Onder Windows is het apparaat dan zichtbaar als een COM-poort, onder Linux als een apparaat zoals /dev/ttyACM0 en onder macOS /dev/cu.usbmodem<ennogiets>. Software op je computer kan dan met de microcontroller communiceren via deze COM-poort of het juiste apparaatbestand.

In het artikel Zo breid je met microcontrollers je computer uit met extra functies op id.nl beschrijf ik hoe je dit doet met CircuitPython op een microcontrollerbordje aan de ene kant en de Web Serial API op de computer aan de andere kant (helaas alleen ondersteund onder Chrome). Zo kun je eenvoudig een webinterface maken om een led op je via usb aangesloten microcontrollerbordje aan te sturen:

/images/microcontroller-usb-webpagina.png

Op dezelfde manier kun je in een webinterface een temperatuursensor van een microcontrollerbordje uitlezen:

/images/microcontroller-usb-temperature.png

Zolang je in je seriële communicatie een eenduidig protocol definieert (1 om de led aan te doen, 0 om ze uit te doen, ...), kun je dezelfde aanpak nog op allerlei manieren uitwerken. Zo hoef je aan de kant van de microcontroller geen CircuitPython te gebruiken. Je kunt exact hetzelfde implementeren met eenvoudige Arduino-code. Ik vind de Digispark bijvoorbeeld een handig microcontrollerbordje. Je laat het via de Arduino-bibliotheek DigiCDC met je computer communiceren.

Ook aan de computerkant zijn er talloze alternatieven voor Web Serial in de browser. Zo kun je in Python een programma schrijven dat via de bibliotheek pySerial met de seriële poort communiceert. Zolang je ervoor zorgt dat beide kanten hetzelfde protocol gebruiken, zijn de verschillende alternatieven uitwisselbaar. Je kunt hetzelfde microcontrollerbordje aansturen via zowel je webbrowser als een Python-programma.

Ik vind USB CDC handig omdat alle grote besturingssystemen het ondersteunen zonder dat je nog speciale drivers hoeft te installeren. Ik gebruik het daarom in mijn lessen "Basis programmeren", waarin ik de studenten van het graduaat Internet of Things leer om te programmeren in Python. In de les over seriële communicatie met pySerial programmeer ik dan gewoon enkele microcontrollerbordjes met mijn CircuitPython-code en deel ze uit in de klas, waarna ik me geen zorgen hoef te maken over driverproblemen. [1] Zo kan ik me focussen op het programmeren.

[1]

Oorspronkelijk wilde ik dat doen met de Digispark, maar dit bordje heeft onder Windows drivers nodig, die niet voor recente Windows-versies bestaan.

September 17, 2023

FOSDEM 2024 will take place at the ULB on the 3th and 4th of February 2024. As has become traditional, we offer free and open source projects a stand to display their work "in real life" to the audience. You can share information, demo software, interact with your users and developers, give away goodies, sell merchandise or accept donations. Anything is possible! We offer you: One table (180x80cm) with a set of chairs and a power socket. Fast wireless internet access. A spot on stands.fosdem.org. You can choose whether you want the spot for the entire conference, or simply舰

How to migrate Jails from ezjail to BastilleBSD

daemon_hammer

In my previous blog post, I reviewed BastilleBSD. In this post, we go through the required steps to migrate the Jails from ezjail to BastilleBSD.

ezjail test Jail

To test the Jail migration, we’ll first create a test Jail with ezjail. This test Jail will migrate to a BastilleBSD Jail.

Create the test ezjail Jail

We use the ezjail-admin create staftestje001 'vtnet0|<ip>' command to create the test Jail.

root@pi-rataplan:~ # ezjail-admin create staftestje001 'vtnet0|<ip>'
Warning: Some services already seem to be listening on all IP, (including 192.168.1.51)
  This may cause some confusion, here they are:
root     nfsd       93987 5  tcp4   *:2049                *:*
root     nfsd       93987 6  tcp6   *:2049                *:*
root     mountd     92576 6  udp6   *:1014                *:*
root     mountd     92576 7  tcp6   *:1014                *:*
root     mountd     92576 8  udp4   *:1014                *:*
root     mountd     92576 9  tcp4   *:1014                *:*
root     ntpd       88967 20 udp6   *:123                 *:*
root     ntpd       88967 21 udp4   *:123                 *:*
root     rpc.statd  86127 4  udp6   *:654                 *:*
root     rpc.statd  86127 5  tcp6   *:654                 *:*
root     rpc.statd  86127 6  udp4   *:654                 *:*
root     rpc.statd  86127 7  tcp4   *:654                 *:*
root     rpcbind    85696 6  udp6   *:111                 *:*
root     rpcbind    85696 7  udp6   *:702                 *:*
root     rpcbind    85696 8  tcp6   *:111                 *:*
root     rpcbind    85696 9  udp4   *:111                 *:*
root     rpcbind    85696 10 udp4   *:996                 *:*
root     rpcbind    85696 11 tcp4   *:111                 *:*
root@pi-rataplan:~ # 

Review the created Jail.

root@pi-rataplan:~ # ezjail-admin list
STA JID  IP              Hostname                       Root Directory
--- ---- --------------- ------------------------------ ------------------------
ZS  N/A  192.168.1.51    staftestje001                  /usr/jails/staftestje001
root@pi-rataplan:~ #

Start the Jail with ezjail-admin start staftst1

root@pi-rataplan:~ # ezjail-admin start staftst1 
Starting jails: staftst1.
/etc/rc.d/jail: WARNING: Per-jail configuration via jail_* variables  is obsolete.  Please consider migrating to /etc/jail.conf.
root@pi-rataplan:~ # 

Access the console with ezjail-admin console

root@pi-rataplan:~ # ezjail-admin console staftestje001
FreeBSD 13.2-RELEASE-p2 GENERIC

Welcome to FreeBSD!

Release Notes, Errata: https://www.FreeBSD.org/releases/
Security Advisories:   https://www.FreeBSD.org/security/
FreeBSD Handbook:      https://www.FreeBSD.org/handbook/
FreeBSD FAQ:           https://www.FreeBSD.org/faq/
Questions List:        https://www.FreeBSD.org/lists/questions/
FreeBSD Forums:        https://forums.FreeBSD.org/

Documents installed with the system are in the /usr/local/share/doc/freebsd/
directory, or can be installed later with:  pkg install en-freebsd-doc
For other languages, replace "en" with a language code like de or fr.

Show the version of FreeBSD installed:  freebsd-version ; uname -a
Please include that output and any error messages when posting questions.
Introduction to manual pages:  man man
FreeBSD directory layout:      man hier

To change this login announcement, see motd(5).
root@staftestje001:~ # 

Add a user.

root@staftestje001:~ # adduser 
Username: staf
Full name: staf
Uid (Leave empty for default): 
Login group [staf]: 
Login group is staf. Invite staf into other groups? []: wheel
Login class [default]: 
Shell (sh csh tcsh nologin) [sh]: 
Home directory [/home/staf]: 
Home directory permissions (Leave empty for default): 
Use password-based authentication? [yes]: 
Use an empty password? (yes/no) [no]: 
Use a random password? (yes/no) [no]: 
Enter password: 
Enter password again: 
Lock out the account after creation? [no]: no
Username   : staf
Password   : *****
Full Name  : staf
Uid        : 1001
Class      : 
Groups     : staf wheel
Home       : /home/staf
Home Mode  : 
Shell      : /bin/sh
Locked     : no
OK? (yes/no): yes
adduser: INFO: Successfully added (staf) to the user database.
Add another user? (yes/no): no
Goodbye!

Become the user test user and create some files.

root@staftestje001:~ # su - staf
You can use aliases to decrease the amount of typing you need to do to get
commands you commonly use.  Examples of fairly popular aliases include (in
Bourne shell style, as in /bin/sh, bash, ksh, and zsh):

	alias lf="ls -FA"
	alias ll="ls -lA"
	alias su="su -m"

In csh or tcsh, these would be

	alias lf ls -FA
	alias ll ls -lA
	alias su su -m

To remove an alias, you can usually use 'unalias aliasname'.  To list all
aliases, you can usually type just 'alias'.
staf@staftestje001:~ $ 
staf@staftestje001:~ $ vi testfile

bastille-icon.png

Migrate the ezjail Jail to BastilleBSD

Stop the ezjail jail

Execute ezjail-admin stop to stop the Jail.

root@pi-rataplan:~ # ezjail-admin stop staftestje001
Stopping jails: staftestje001.
root@pi-rataplan:~ # 

Archive

Use ezjail-admin archive to create a tar dump of the Jail.

root@pi-rataplan:~ # ezjail-admin archive staftestje001
pax: Access/modification time set failed on: ./var/empty <Operation not permitted>
Warning: Archiving jail staftestje001 was not completely successful.\n  Please refer to the output above for problems the archiving tool encountered.\n  You may ignore reports concerning setting access and modification times.\n  You might want to check and remove /usr/jails/ezjail_archives/staftestje001-202308161229.21.tar.gz.Warning: Archiving jail staftestje001 was not completely successful. For a running jail this is not unusual.
root@pi-rataplan:~ # 

The tar file is created at /usr/jails/ezjail_archives

root@pi-rataplan:~ # ls -l  /usr/jails/ezjail_archives
total 267233
-rw-r--r--  1 root  wheel  136712524 Aug 16 12:29 staftestje001-202308161229.21.tar.gz
root@pi-rataplan:~ # 

Import

It’s possible to import the ezjail archive with bastille import.

[root@pi-rataplan ~]# bastille import /usr/jails/ezjail_archives/staftestje001-202308161229.21.tar.gz 
Importing 'staftestje001' from foreign compressed .tar.gz archive.
Preparing ZFS environment...
Extracting files from 'staftestje001-202308161229.21.tar.gz' archive...
tar: Removing leading '/' from member names
Generating jail.conf...
Updating symlinks...
Container 'staftestje001' imported successfully.
[root@pi-rataplan ~]# 

List the Jails.

[root@pi-rataplan ~]# bastille list -a
 JID              State  IP Address           Published Ports  Hostname         Release          Path
 bastille-tst001  Up     192.168.1.50         -                bastille-tst001  13.2-RELEASE-p2  /usr/local/bastille/jails/bastille-tst001/root
 staftestje001    Down   vtnet0|192.168.1.51  -                staftestje001    13.2-RELEASE-p2  /usr/local/bastille/jails/staftestje001/root
[root@pi-rataplan ~]# 

Correct the IP Address

Our archived test Jail is imported.

We defined the interface as part of the ezjail-admin create command. But this ended up in the IP Address configuration.

Let’s see how this is defined in our Jail configuration.

Go to the Jail dataset.

root@pi-rataplan:~ # cd /usr/local/bastille/jails/staftestje001/
root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # 

List the configuration files.

root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # ls
fstab
fstab.ezjail
jail.conf
prop.ezjail-staftestje001-202309032022.27-pi_rataplan-13.2_RELEASE_p2-aarch64
root
root@pi-rataplan:/usr/local/bastille/jails/staftestje001 #

Edit the jail.conf

root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # vi jail.conf
staftestje001 {
  devfs_ruleset = 4;
  enforce_statfs = 2;
  exec.clean;
  exec.consolelog = /var/log/bastille/staftestje001_console.log;
  exec.start = '/bin/sh /etc/rc';
  exec.stop = '/bin/sh /etc/rc.shutdown';
  host.hostname = staftestje001;
  mount.devfs;
  mount.fstab = /usr/local/bastille/jails/staftestje001/fstab;
  path = /usr/local/bastille/jails/staftestje001/root;
  securelevel = 2;

  interface = vtnet0;
  ip4.addr = vtnet0|192.168.1.51;
  ip6 = disable;
}

The interface is defined in the interface config and the ip4.addr. Remove the interface from the ip4.addr.

  ip4.addr = 192.168.1.51;

Execute bastille list -a to verify.

root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # bastille list -a
 JID              State  IP Address           Published Ports  Hostname         Release          Path
 bastille-tst001  Down   192.168.1.50         -                bastille-tst001  13.2-RELEASE-p2  /usr/local/bastille/jails/bastille-tst001/root
 staftestje001    Down   192.168.1.51   

Verify

Start the Jail with bastille start

root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # bastille start staftestje001
[staftestje001]:
staftestje001: created

root@pi-rataplan:/usr/local/bastille/jails/staftestje001 # 

Test that the test user and files are imported correctly.

[staftestje001]:
Last login: Sun Sep  3 18:02:03 on pts/2
FreeBSD 13.2-RELEASE-p2 GENERIC

Welcome to FreeBSD!

Release Notes, Errata: https://www.FreeBSD.org/releases/
Security Advisories:   https://www.FreeBSD.org/security/
FreeBSD Handbook:      https://www.FreeBSD.org/handbook/
FreeBSD FAQ:           https://www.FreeBSD.org/faq/
Questions List:        https://www.FreeBSD.org/lists/questions/
FreeBSD Forums:        https://forums.FreeBSD.org/

Documents installed with the system are in the /usr/local/share/doc/freebsd/
directory, or can be installed later with:  pkg install en-freebsd-doc
For other languages, replace "en" with a language code like de or fr.

Show the version of FreeBSD installed:  freebsd-version ; uname -a
Please include that output and any error messages when posting questions.
Introduction to manual pages:  man man
FreeBSD directory layout:      man hier

To change this login announcement, see motd(5).
root@staftestje001:~ # su - staf
Need to quickly return to your home directory? Type "cd".
		-- Dru <genesis@istar.ca>
staf@staftestje001:~ $ ls
testfile
staf@staftestje001:~ $ 

Delete the ezjail Jail

The last step is to remove the “old” ezjail.

[root@pi-rataplan ~]# ezjail-admin list
STA JID  IP              Hostname                       Root Directory
--- ---- --------------- ------------------------------ ------------------------
ZS  N/A  192.168.1.51    staftestje001                  /usr/jails/staftestje001
ZR  2    192.168.1.49    stafscm                        /usr/jails/stafscm
ZR  3    192.168.1.45    stafproxy                      /usr/jails/stafproxy
ZR  4    192.168.1.47    stafmail                       /usr/jails/stafmail
ZR  5    192.168.1.41    staffs                         /usr/jails/staffs
ZR  6    192.168.1.85    stafdns                        /usr/jails/stafdns
[root@pi-rataplan ~]# ezjail-admin delete staftestje001
[root@pi-rataplan ~]# 

ezjail delete only removes the Jail configuration. The storage is still there. Might be useful if you want to restore the Jail. And we still have a backup in /usr/local/jails/archives if for some reason we need to restore the old ezjail.

[root@pi-rataplan ~]# zfs list | grep -i testje001
zroot/bastille/jails/staftestje001             219M   153G      144K  /usr/local/bastille/jails/staftestje001
zroot/bastille/jails/staftestje001/root        219M   153G      219M  /usr/local/bastille/jails/staftestje001/root
zroot/usr/jails/staftestje001                  219M   153G      219M  /usr/jails/staftestje001
[root@pi-rataplan ~]# 

As the procedure seems to work, I’ll continue with migration with the ezjail Jails to BastilleBSD :-)

Links

September 16, 2023

Ik zag net The First Turning and the End of Woke en  hoewel ik zelf iemand ben die eerst bewijzen en cijfers wil zien, erken ik dat de sociologie met moeilijkheden zit: het is niet zo gemakkelijk als bv. in de fysica om die cijfers te bekomen. Net zoals de psychologie. Ik erken dat de psychologie het niet gemakkelijk heeft.

Maar de psychologie erkent dat te weinig. Veel psychologen willen vooral middeltjes verkopen en veel geld verdienen. Dat is een aandoening (het zou goed zijn moesten psychologen Ethica van Spinoza lezen) waar de sociologie minder mee kampt. Ik neem de psychologie al maar minder serieus: zij willen geld verdienen aan de onwetendheid van de gewone mensen (dus hun patiënten – ik link niet zomaar wat). Ik maak uitzonderingen: ik denk dat Dirk De Wachter een uitstekend psycholoog / psychiater is. Dat wil niet zeggen dat (ik denk dat) andere psychologen van even goede wil zijn. Psychologie hoort over de patiënt te gaan. Niet over geld. Het zal nog lang duren eer dat terug zo is.

In tegenstelling tot de psychologie, die deze dagen ideologisch losgeslagen blijkt te zijn met bijvoorbeeld genderideologie (want er zijn middeltjes zoals hormoonbehandelingen te verkopen), lijkt mij de socioloog in ‘The First Turning and the End of Woke’ toch een betere benadering van een werkelijkheid te hebben. Ik denk dat vooral psychologen eens achter de oren mogen krabben. Want het gaat van kwaad naar erger.

Ik ben geen expert want ik nuttig mijn expertise en vooral mijn passietijd al in het vak van programmeren. Dus neem mijn mening met een vat zout.

Ik ben blij dat ik geen kind gekregen heb de afgelopen twintig jaar: wat ik heb gezien is walgelijk. De huidige generatie jong volwassenen werd en is extreem overbeschermd; ze heeft geen idee van hoe zelf te leven en ze faalt daar dan ook in.

Met de fietshelm op rijden ze de afgrond in. Ze denkt dat die fietshelm hen zal redden. Ze had beter leren leven, zodat ze de afgrond niet zou ingefiets zijn. Maar het leven was te gevaarlijk voor hun overbeschermende ouders en vooral henzelf om hen dat aan te mogen leren. Dus: helmpje op, tablet aan, een veilig spelletje spelen en zwijgen: dat is de wijze van het oppervlakkige opvoeden en het opgevoed worden geweest de afgelopen jaren.

Die opvoendende ouders en henzelf willen en wilden ook altijd maar meer regels. Meer fietshelmen. Regels voor alles. Inclusief wetten voor voornaamwoorden. Iets wat totaal absurd was nog geen tien jaar eerder. Leren leven doen ze vooral niet. Het oppervlakkige slachtoffer zijn, dat wel. Ze willen regels. Meer regels. Voor anderen. Zodat ze zelf veilig zijn. Zodat ze slachtoffer kunnen blijven. Ze zijn geen echt slachtoffer. Maar hun  oppervlakkige ideologie laat hen het toe om dat toch te zijn.

We leven daardoor nu in de meest debiele vorm van samenleving denkbaar: Een kus van een voetbalcoach die blij was omwille van een overwinning, is nu volledig gecriminaliseerd. Ik heb hier geen enkel begrip voor. Ik haat het. Ik haat niet de kus. Ik haat de samenleving die dit criminaliseert. Ik hoop oprecht dat dit verdwijnt. Het spijt me voor mijn mening (het spijt me helemaal niet), maar die man deed niets verkeerd. Die was blij en die kuste daarom een speelster. Dat is alles.

De huidige generatie van jongvolwassenen die geïndoctrineerd werd om het opervlakkige slachtofferschap na te streven heeft nooit de kans gekregen om te leren leven.

Mede omwille van deze generatie heb ik gewacht met een kind te verwekken. Dit jaar is het zo ver en is mijn vriendin zwanger. Ik kon ook niet langer wachten want ik zou anders te oud worden wanneer die kleine het huis verlaat. Maar het zat echt in mijn hoofd: ik wilde geen kind omdat de samenleving volledig idioot geworden was.

Kan de huidige generatie jongvolwassenen opschuiven? Zodat mijn kind zeker niet in hun leefwereld hoeft te vertoeven.

Ikzelf groeide op in de jaren tachtig en dat was in veel opzichten een veel betere tijd. Voor zowel meisjes als voor jongens. Voor iedereen. Wat we nu meemaken is degressief. Ik ‘klaag’ vandaag wel eens tegen mijn ouders hoe gevaarlijk onze opvoeding wel niet was; maar eigenlijk ben ik ze daar dankbaar voor: ik mocht van hen leren te leven. Ik leerde te overleven. Zij waren er heus wanneer het fout ging. Maar het ging niet vaak fout. Daarom ben ik nu een bekwaam en gezond volwassen individu. Met een vinger die toch wel bleek gebroken te zijn want die duim staat nu een beetje scheef. Maarja. Dat is echt waar niet zo erg.

Ik leerde turnen (en werd daar zelfs goed in) en skateboarden (en werd daar zelfs goed in, en ik brak er mijn duim). Wij groeven ondergrondse tunnels (ja, echt) en wij maakte meerdere boomhutten. Wij gingen koeien omver werpen en toen kwam de boer met zijn traktor achter ons aan. Wij gingen letterlijk wekelijks belleke-trek doen. Op het einde van onze (bijna vaste) toer gingen wij altijd bij een rijkswachter: die kwam dan achter ons aangelopen. Den traagste werd dan gepakt en diene flik ging dan eens babbelen met de ouders om dat kind terug thuis te brengen. Wij vonden het spannend om niet den traagste te zijn. Het was vast ook een goede loopoefening voor de rijkswachter: hoeveel criminelen heeft die wel niet gepakt omdat wij hem wekelijks in vorm hielden?!

De wereld was voor ons gevaarlijk en uitdagend. Leuk. Venieuwend. Leerzaam. We vielen heel de tijd en we kropen altijd weer recht. Niet af en toe maar iedere paar dagen hadden wij wel blauwe plekken en soms een gebroken been of arm. Maar, is dat dan zo erg?

Beste huidige generatie jongvolwassenen: Maak plaats voor meer samenhorigheidsgevoel i.p.v. jullie individualisme. Maak plaats voor vrijheid en ja ook voor meer gevaar. Gevaar dat toch wel opgevangen wordt omdat alle volwassenen samen een beetje meekijken.

Laat ons doen.

Laat ons onszelf leren leven.

ps. Bij de echo zagen we in de buik van mijn vrouwke dat het kindje erg veel bewoog. De echoscopist zei zelfs dat dat haar job wat moeilijker maakt. Maar heel erg vonden we dat niet. Spontaan zei ik tegen de echoscopist en tegen mijn vrouwke: dat wordt een skateboarder.

Zolang het in mijn handen is, zal die kleine stapsgewijs blootgesteld worden aan het gevaar van de wereld. Ik zal er zijn om haar of hem te redden en helpen wanneer nodig.

September 10, 2023

Introduction to BastilleBSD

What are “containers”?

Chroot, Jails, containers, zones, LXC, Docker

I use FreeBSD on my home network to serve services like email, git, fileserver, etc. For some other services, I use k3s with GNU/Linux application containers.

The FreeBSD services run as Jails. For those who aren’t familiar with FreeBSD Jails. Jails started the whole concept of “containers”.

FreeBSD Jails inspired Sun Microsystems to create Solaris zones.

If you want to know more about the history of FreeBSD Jails, Solaris zones and containers on Un!x systems in general and the challenges to run containers securely I recommend the video;

“Papers We Love: Jails and Solaris Zones by Bryan Cantrill”

Papers We Love: Jails and Solaris Zones by Bryan Cantrill

Sun took containers to the next level with Solaris zones , allowing a fine-grade CPU and memory allocation.

On GNU/Linux LXC was the most popular container framework. …Till Docker came along.

Application vs system containers

To the credit of Docker, Docker made the concept of application containers popular.

System containers run the complete operating system and can be used like virtual machines without the overhead.

Application containers run a single application binary inside a container that holds all the dependencies for this application.

FreeBSD Jails

FreeBSD Jails can be used for both “application” containers and “system” containers. FreeBSD Jails is a framework that separates the host and the Jails with security in mind.

My home network setup

To make the management of Jails easier we have management tools. I started to use ezjail in 2013 after my OpenSolaris system died and Oracle killed OpenSolaris.

ezjail isn’t developed that actively.

BastilleBSD is the most the popular Jail management tool for FreeBSD Jails. It supports both application and system containers.

I migrated all the services to Raspberry Pi’s to save electricity. But continued to use ezjail, just to stick to something that I knew and was stable. Migrating to BastileBSD was still on my to-do list and I finally found the time to do it.

My blog posts are mostly my installation notes that I publish in the hope that they are useful to somebody else.

In this blog post, you’ll find my journey to explore BastileBSD, in a next blog post, I’ll go through the migration from ezjail to BastilleBSD.

If you are interested in the concepts behind BastilleBSD, I recommend the video:

FreeBSD Fridays: Introduction to BastilleBSD

FreeBSD Fridays: Introduction to BastilleBSD

BastilleBSD exploration

Start

We’ll execute all actions on a virtual machine running FreeBSD on a Raspberry PI. The Raspberry PI is running Debian GNU/Linux with the KVM/libvirt hypervisor.

The virtual machine is running FreeBSD 13.2.

root@pi-rataplan:~ # freebsd-version 
13.2-RELEASE-p2
root@pi-rataplan:~ # 
root@pi-rataplan:~ # uname -a
FreeBSD pi-rataplan 13.2-RELEASE-p2 FreeBSD 13.2-RELEASE-p2 GENERIC arm64
root@pi-rataplan:~ # 

Install BastilleBSD

Install

The first step is to install bastille.

root@pi-rataplan:~ # pkg install -y bastille
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	bastille: 0.9.20220714

Number of packages to be installed: 1

38 KiB to be downloaded.
[1/1] Fetching bastille-0.9.20220714.pkg: 100%   38 KiB  38.8kB/s    00:01    
Checking integrity... done (0 conflicting)
[1/1] Installing bastille-0.9.20220714...
[1/1] Extracting bastille-0.9.20220714: 100%
root@pi-rataplan:~ # 

Enable BastilleBSD

BastilleBSD is not a daemon that runs in the background, but starts the containers when the bastille_enable system configuration rc.conf variable is set to YES.

FreeBSD has a nice tool to set the system configuration variables; sysrc.

root@pi-rataplan:~ # sysrc bastille_enable=YES
bastille_enable:  -> YES
root@pi-rataplan:~ # 

It’s possible to specify a list of containers in the bastille_list, the containers are started in the order of the list.

Configuration

Update config to use ZFS

BastilleBSD can use OpenZFS, but this isn’t enabled by default. When ZFS support is enabled a new ZFS dataset is created when a container is created.

root@pi-rataplan:/usr/local/etc/bastille # cd /usr/local/etc/bastille/
root@pi-rataplan:/usr/local/etc/bastille # 
root@pi-rataplan:/usr/local/etc/bastille # vi bastille.conf

To enable OpenZFS support you need to set bastille_zfs_enable to YES and specify the zpool to be used. Please note that YES need to be in upper case.

## ZFS options
bastille_zfs_enable="YES"                                                ## default: ""
bastille_zfs_zpool="zroot"                                                 ## default: ""
bastille_zfs_prefix="bastille"                                        ## default: "${bastille_zfs_zpool}/bast
ille"
bastille_zfs_options="-o compress=lz4 -o atime=off"                   ## default: "-o compress=lz4 -o atime=o
ff"

Bootstrap

Bootstrap a release

In order to start a container you need to bootstrap a FreeBSD release first.

root@pi-rataplan:/usr/local/share/bastille # freebsd-version 
13.2-RELEASE-p2

A FreeBSD release is without the patch level.

The command below will bootstrap the 13.2-RELEASE on bastilleBSD. The update option will also include the latest patches.

root@pi-rataplan:/usr/local/share/bastille # bastille bootstrap 13.2-RELEASE update
<snip>
Installing updates...
Restarting sshd after upgrade
Performing sanity check on sshd configuration.
Stopping sshd.
Waiting for PIDS: 93019.
Performing sanity check on sshd configuration.
Starting sshd.
Scanning /usr/local/bastille/releases/13.2-RELEASE/usr/share/certs/blacklisted for certificates...
Scanning /usr/local/bastille/releases/13.2-RELEASE/usr/share/certs/trusted for certificates...
 done.
root@pi-rataplan:/usr/local/share/bastille # 

List

You can use the bastille list release command to list the bootstrapped releases.

root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # bastille list release
13.1-RELEASE
13.2-RELEASE
root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # 

The releases are stored in the bastilleBSD dataset /usr/local/bastille/releases in our case.

root@pi-rataplan:/usr/local/bastille # cd releases/
root@pi-rataplan:/usr/local/bastille/releases # ls
13.1-RELEASE	13.2-RELEASE
root@pi-rataplan:/usr/local/bastille/releases # 

The downloaded tar archive are stored in the cache ZFS dataset.

root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # zfs list | grep -i bastille | grep -i cache
zroot/bastille/cache                           334M  93.6G      104K  /usr/local/bastille/cache
zroot/bastille/cache/13.1-RELEASE              165M  93.6G      165M  /usr/local/bastille/cache/13.1-RELEASE
zroot/bastille/cache/13.2-RELEASE              169M  93.6G      169M  /usr/local/bastille/cache/13.2-RELEASE
root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # 
root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # ls -l /usr/local/bastille/cache/13.2-RELEASE
total 345866
-rw-r--r--  1 root  wheel        782 Apr  7 07:01 MANIFEST
-rw-r--r--  1 root  wheel  176939748 Apr  7 07:01 base.txz
root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # 

Verify a release

With bastille verify you can verify a release to be sure that no files are altered.

root@pi-rataplan:/usr/local/bastille/releases #  bastille verify 13.2-RELEASE
src component not installed, skipped
Looking up update.FreeBSD.org mirrors... 2 mirrors found.
Fetching metadata signature for 13.2-RELEASE from update2.freebsd.org... done.
Fetching metadata index... done.
Inspecting system... done.
root@pi-rataplan:/usr/local/bastille/releases # 

Create your first container

“Thin” vs “thick” Jails.

“Thin” Jails are created by default. With a “thin” Jail the operating system is “shared” with the release. This saves a lot of disk space. It’s possible to convert a container to a “thick” Jail after a Jail is created.

Network

BastilleBSD has a lot of network options, it can also create dynamic firewall rules to expose services.

See https://docs.bastillebsd.org/en/latest/chapters/networking.html for more information.

In this example we’ll use a “shared” network interface vtnet0 with the host system.

root@pi-rataplan:/usr/local/bastille/releases # bastille create bastille-tst001 13.2-RELEASE <ip_address> vtnet0
Valid: (<ip_address>).
Valid: (vtnet0).

Creating a thinjail...

[bastille-tst001]:
bastille-tst001: created

[bastille-tst001]:
Applying template: default/thin...
[bastille-tst001]:
Applying template: default/base...
[bastille-tst001]:
[bastille-tst001]: 0

[bastille-tst001]:
syslogd_flags: -s -> -ss

[bastille-tst001]:
sendmail_enable: NO -> NO

[bastille-tst001]:
sendmail_submit_enable: YES -> NO

[bastille-tst001]:
sendmail_outbound_enable: YES -> NO

[bastille-tst001]:
sendmail_msp_queue_enable: YES -> NO

[bastille-tst001]:
cron_flags:  -> -J 60

[bastille-tst001]:
/etc/resolv.conf -> /usr/local/bastille/jails/bastille-tst001/root/etc/resolv.conf

Template applied: default/base

Template applied: default/thin

rdr-anchor not found in pf.conf
[bastille-tst001]:
bastille-tst001: removed

[bastille-tst001]:
bastille-tst001: created

root@pi-rataplan:/usr/local/bastille/releases # 

A new ZFS dataset is created for the Jail.

root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # zfs list | grep -i tst001
zroot/bastille/jails/bastille-tst001          5.61M  93.6G      116K  /usr/local/bastille/jails/bastille-tst001
zroot/bastille/jails/bastille-tst001/root     5.50M  93.6G     5.50M  /usr/local/bastille/jails/bastille-tst001/root
root@pi-rataplan:/usr/local/bastille/releases/13.1-RELEASE # 

The mounted Jail dataset holds the Jail configuration file and fstab that is used by the Jail.

root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # cd /usr/local/bastille/jails/bastille-tst001
root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # ls
fstab		jail.conf	root
root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # 

jail.conf:

root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # cat jail.conf 
bastille-tst001 {
  devfs_ruleset = 4;
  enforce_statfs = 2;
  exec.clean;
  exec.consolelog = /var/log/bastille/bastille-tst001_console.log;
  exec.start = '/bin/sh /etc/rc';
  exec.stop = '/bin/sh /etc/rc.shutdown';
  host.hostname = bastille-tst001;
  mount.devfs;
  mount.fstab = /usr/local/bastille/jails/bastille-tst001/fstab;
  path = /usr/local/bastille/jails/bastille-tst001/root;
  securelevel = 2;

  interface = vtnet0;
  ip4.addr = <ip_address>;
  ip6 = disable;
}
root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001

fstab:

root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # cat /usr/local/bastille/jails/bastille-tst001/fstab
/usr/local/bastille/releases/13.2-RELEASE /usr/local/bastille/jails/bastille-tst001/root/.bastille nullfs ro 0 0
root@pi-rataplan:/usr/local/bastille/jails/bastille-tst001 # 

Interact with containers

list

With bastille list we can list the containers. When no option is given it’ll list the running containers only. To list all containers (stopped and running) you can use the -a option.

Note that bastille list is a wrapper around the jls command and also lists the running ezjail containers.

root@pi-rataplan:/usr/jails/stafproxy/basejail # bastille list
 JID             IP Address      Hostname                      Path
 stafscm         <ip>            stafscm                       /usr/jails/stafscm
 stafproxy       <ip>            stafproxy                     /usr/jails/stafproxy
 stafmail        <ip>            stafmail                      /usr/jails/stafmail
 staffs          <ip>            staffs                        /usr/jails/staffs
 stafdns         <ip>            stafdns                       /usr/jails/stafdns
 bastille-tst001 <ip>            bastille-tst001               /usr/local/bastille/jails/bastille-tst001/root
root@pi-rataplan:/usr/jails/stafproxy/basejail # 

console access

To gain console access you use the bastille console command.

root@pi-rataplan:/usr/jails/stafproxy/basejail # bastille console bastille-tst001
[bastille-tst001]:
root@bastille-tst001:~ # 

Verify the disk space. We’re using a “thin” Jail only, 5.5M of disk space is used.

root@bastille-tst001:~ # df -h .
Filesystem                                   Size    Used   Avail Capacity  Mounted on
zroot/bastille/jails/bastille-tst001/root    154G    5.5M    154G     0%    /
root@bastille-tst001:~ # 

Readonly file system

We’re using “thin” Jails. The system binaries are read-only.

[root@bastille-tst001 ~]# ls -l /bin
lrwxr-xr-x  1 root  wheel  14 Aug 16 10:25 /bin -> /.bastille/bin
[root@bastille-tst001 ~]# 
[root@bastille-tst001 ~]# touch /bin/ls
touch: /bin/ls: Read-only file system
[root@bastille-tst001 ~]# 

freebsd-version

[root@bastille-tst001 ~]# freebsd-version 
13.2-RELEASE-p2
[root@bastille-tst001 ~]# uname -a
FreeBSD bastille-tst001 13.2-RELEASE-p2 FreeBSD 13.2-RELEASE-p2 GENERIC arm64
[root@bastille-tst001 ~]# 

Install FreeBSD packages

Bootstrap the pkg command to install packages.

root@bastille-tst001:~ # pkg
The package management tool is not yet installed on your system.
Do you want to fetch and install it now? [y/N]: y
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:13:aarch64/quarterly, please wait...
Verifying signature with trusted certificate pkg.freebsd.org.2013102301... done
[bastille-tst001] Installing pkg-1.19.2...
[bastille-tst001] Extracting pkg-1.19.2: 100%
pkg: not enough arguments
Usage: pkg [-v] [-d] [-l] [-N] [-j <jail name or id>|-c <chroot path>|-r <rootdir>] [-C <configuration file>] [-R <repo config dir>] [-o var=value] [-4|-6] <command> [<args>]

For more information on available commands and options see 'pkg help'.
root@bastille-tst001:~ #

I use ansible to manage my homelab python3 and sudo are required for this, so these are usually the first packages I install.

root@bastille-tst001:~ # pkg install -y sudo python3 bash
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 9 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	bash: 5.2.15
	gettext-runtime: 0.21.1
	indexinfo: 0.3.1
	libffi: 3.4.4
	mpdecimal: 2.5.1
	python3: 3_3
	python39: 3.9.17
	readline: 8.2.1
	sudo: 1.9.14p3

Number of packages to be installed: 9

The process will require 140 MiB more space.
21 MiB to be downloaded.
[bastille-tst001] [1/9] Fetching indexinfo-0.3.1.pkg: 100%    5 KiB   5.5kB/s    00:01    
[bastille-tst001] [2/9] Fetching mpdecimal-2.5.1.pkg: 100%  292 KiB 299.4kB/s    00:01    
[bastille-tst001] [3/9] Fetching python39-3.9.17.pkg: 100%   17 MiB   2.6MB/s    00:07    
[bastille-tst001] [4/9] Fetching libffi-3.4.4.pkg: 100%   36 KiB  36.6kB/s    00:01    
[bastille-tst001] [5/9] Fetching readline-8.2.1.pkg: 100%  345 KiB 353.1kB/s    00:01    
[bastille-tst001] [6/9] Fetching sudo-1.9.14p3.pkg: 100%    2 MiB   1.6MB/s    00:01    
[bastille-tst001] [7/9] Fetching python3-3_3.pkg: 100%    1 KiB   1.1kB/s    00:01    
[bastille-tst001] [8/9] Fetching bash-5.2.15.pkg: 100%    2 MiB   1.6MB/s    00:01    
[bastille-tst001] [9/9] Fetching gettext-runtime-0.21.1.pkg: 100%  160 KiB 164.0kB/s    00:01    
Checking integrity... done (0 conflicting)
[bastille-tst001] [1/9] Installing indexinfo-0.3.1...
[bastille-tst001] [1/9] Extracting indexinfo-0.3.1: 100%
[bastille-tst001] [2/9] Installing mpdecimal-2.5.1...
[bastille-tst001] [2/9] Extracting mpdecimal-2.5.1: 100%
[bastille-tst001] [3/9] Installing libffi-3.4.4...
[bastille-tst001] [3/9] Extracting libffi-3.4.4: 100%
[bastille-tst001] [4/9] Installing readline-8.2.1...
[bastille-tst001] [4/9] Extracting readline-8.2.1: 100%
[bastille-tst001] [5/9] Installing gettext-runtime-0.21.1...
[bastille-tst001] [5/9] Extracting gettext-runtime-0.21.1: 100%
[bastille-tst001] [6/9] Installing python39-3.9.17...
[bastille-tst001] [6/9] Extracting python39-3.9.17: 100%
[bastille-tst001] [7/9] Installing sudo-1.9.14p3...
[bastille-tst001] [7/9] Extracting sudo-1.9.14p3: 100%
[bastille-tst001] [8/9] Installing python3-3_3...
[bastille-tst001] [8/9] Extracting python3-3_3: 100%
[bastille-tst001] [9/9] Installing bash-5.2.15...
[bastille-tst001] [9/9] Extracting bash-5.2.15: 100%
=====
Message from python39-3.9.17:

--
Note that some standard Python modules are provided as separate ports
as they require additional dependencies. They are available as:

py39-gdbm       databases/py-gdbm@py39
py39-sqlite3    databases/py-sqlite3@py39
py39-tkinter    x11-toolkits/

Let’s install neofetch.

[root@bastille-tst001 ~]# pkg install -y neofetch
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	neofetch: 7.1.0_1

Number of packages to be installed: 1

79 KiB to be downloaded.
[bastille-tst001] [1/1] Fetching neofetch-7.1.0_1.pkg: 100%   79 KiB  81.0kB/s    00:01    
Checking integrity... done (0 conflicting)
[bastille-tst001] [1/1] Installing neofetch-7.1.0_1...
[bastille-tst001] [1/1] Extracting neofetch-7.1.0_1: 100%
[root@bastille-tst001 ~]# 
[root@bastille-tst001 ~]# neofetch 
  ` `.....---.......--.```   -/    -------------------- 
  +o   .--`         /y:`      +.   OS: FreeBSD 13.2-RELEASE-p2 aarch64 
   yo`:.            :o      `+-    Uptime: 3 days, 14 hours, 13 mins 
    y/               -/`   -o/     Packages: 11 (pkg) 
   .-                  ::/sy+:.    Shell: csh tcsh 6.22.04 
   /                     `--  /    Terminal: /dev/pts/1 
  `:                          :`   CPU: ARM Cortex-A72 r0p3 (4) 
  `:                          :`   Memory: 2193MiB / 3039MiB 
   /                          /
   .-                        -.                            
    --                      -.                             
     `:`                  `:`
       .--             `--.
          .---.....----.

[root@bastille-tst001 ~]# 

Logout

[root@bastille-tst001 ~]# 
exit
root@bastille-tst001:~ # logout

update containers

List the releases.

root@pi-rataplan:~ # bastille list release
13.1-RELEASE
13.2-RELEASE
root@pi-rataplan:~ # 

Update a release to the latest patch level.

root@pi-rataplan:/usr/jails/stafproxy/basejail # bastille update 13.2-RELEASE
src component not installed, skipped
Looking up update.FreeBSD.org mirrors... 2 mirrors found.
Fetching metadata signature for 13.2-RELEASE from update2.freebsd.org... done.
Fetching metadata index... done.
Inspecting system... done.
Preparing to download files... done.

No updates needed to update system to 13.2-RELEASE-p2.
No updates are available to install.

Execute commands inside a container.

You can use bastille cmd to execute a command inside a container which is a wrapper around the jexec.

List running processes.

The syntax is

bastille cmd <container|ALL> <command string>

root@pi-rataplan:~ # bastille cmd ALL ps aux
[bastille-tst001]:
grep: /usr/local/bastille/jails/ALL/fstab: No such file or directory
USER   PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
root 35869  0.0  0.1 12876 2496  -  IJ   10:55   0:00.00 cron: running job (cron)
root 46730  0.0  0.1 12704 2696  -  SsJ  10:25   0:00.01 /usr/sbin/syslogd -ss
root 55608  0.0  0.1 12876 2496  -  IJ   10:55   0:00.00 cron: running job (cron)
root 69222  0.0  0.1 12876 2512  -  IsJ  10:25   0:00.01 /usr/sbin/cron -J 60 -s
root 67674  0.0  0.1 13440 2960  1  R+J  10:55   0:00.00 ps aux
[bastille-tst001]: 0

root@pi-rataplan:~ # 

audit packages

FreeBSD has implemented the vulnerability management in the correct way. The vulnerability database is outside of the packages management database as it should be.

Unfortunately the FreeBSD vulnerability database isn’t compatible with the OVAL standard.

This makes auditing the installed FreeBSD packages with the pkg audit command in a way comparable with a security audit with OpenSCAP on a GNU/Linux system.

But this might be a topic for another blog post :-)

root@pi-rataplan:~ # bastille pkg ALL audit -F
[bastille-tst001]:
[bastille-tst001] Fetching vuln.xml.xz: 100%    1 MiB 349.2kB/s    00:03    
0 problem(s) in 0 installed package(s) found.

root@pi-rataplan:~ # 

Have fun!

Links

September 01, 2023

FOSDEM 2024 will take place on Saturday 3rd and Sunday 4th of February 2024. Further details and calls for participation will be announced in the coming days and weeks.

WordPress has made some good progress to speed up site rendering the last couple of years and part of this is thanks to images and iframes having the “loading” attribute with value “lazy”, telling browsers these can be loaded later allowing other (more important) assets to take priority. Especially iframes can consume a lot of bandwidth (think embedded Google Maps or one or more YouTube videos)...

Source

August 31, 2023

Waterloo:

The combined number of men killed or wounded reached nearly 50,000, with close to 25,000 casualties on the French side and approximately 23,000 for the Allied army.

Bakhmut:

Western estimate: 60,000+ casualties (20,000+ killed) Per Ukraine: 100,000+ killed or wounded (20,000–30,000 killed)

August 26, 2023

Today, I published the following diary on isc.sans.edu: “macOS: Who’s Behind This Network Connection?“:

When you must investigate suspicious behavior or work on an actual incident, you could be asked to determine who’s behind a network connection. From a pure network point of view, your firewall or any network security control device/app will tell you that the source is the connection is host « A », « B » or « C ». But investigating further how to discover who or which process is the source of the connection (now, at the operating system level)… [Read more]

The post [SANS ISC] macOS: Who’s Behind This Network Connection? appeared first on /dev/random.

August 25, 2023

Today, I published the following diary on isc.sans.edu: “Python Malware Using Postgresql for C2 Communications“:

For modern malware, having access to its C2 (Command and control) is a crucial point. There are many ways to connect to a C2 server using tons of protocols, but today, HTTP remains very common because HTTP is allowed on most networks… I found a malicious Python script that is pretty well obfuscated. The applied technique reduces its VT  score to 6/60! It’s based on a mix of Based64- and Hex-encoded data… [Read more]

The post [SANS ISC] Python Malware Using Postgresql for C2 Communications appeared first on /dev/random.

August 23, 2023

Today, I published the following diary on isc.sans.edu: “More Exotic Excel Files Dropping AgentTesla”:

Excel is an excellent target for attackers. The Microsoft Office suite is installed on millions of computers, and people trust these files. If we have the classic xls, xls, xlsm file extensions, Excel supports many others! Just check your local registry… [Read morehttps://isc.sans.edu/diary/More%20Exotic%20Excel%20Files%20Dropping%20AgentTesla/30150]

The post [SANS ISC] More Exotic Excel Files Dropping AgentTesla appeared first on /dev/random.

August 22, 2023

Today, I published the following diary on isc.sans.edu: “Have You Ever Heard of the Fernet Encryption Algorithm?“:

In cryptography, there is a gold rule that states to not develop your own algorithm because… it will be probably weak and broken! They are strong algorithms (like AES) that do a great job so why reinvent the wheel? However, there are projects that try to develop new algorithms. One of them is Fernet, described like this… [Read more]

The post [SANS ISC] Have You Ever Heard of the Fernet Encryption Algorithm? appeared first on /dev/random.