Reader

Read the latest posts from The Geek's corner – Recycled Cloud.

from vdoreau

A story about how I juggled with filesystems on a supposedly blackbox, and fixed an industrial monitoring solution.

The story

At Recycled Cloud, we host a lot of customers. As we're flexible in our hosting solutions, we get customers with specific needs. Customers that couldn't host their solutions on other providers.

One such customer had a SinemaRC image. Described as a blackbox by their supplier. The software is entirely self-contained within the OS, which is locked out of any external access. This is kind of like those tiny devices that you plug in to your network, configure via a web GUI and then pray for nothing to go wrong inside, because every single one of them is basically a linux box with some software.

So this was installed on the Recycled Cloud way back then, by someone who left in the meantime. I've only heard tales about it; it is said that the guy who installed SinemaRC would still have nightmares about it if you mentioned it to him.

The problem

This server ran pretty much quiet since then. We did not hear about it until the customer called us: “Help! We haven't been able to access our server for a week!”.

Since this is a blackbox, we had only provided cloud infrastructure; nothing was under our control, we didn't even had access to the web GUI. It's clearly one of those obscure software that you would hardly find any information when googling it. Never mind finding any documentation.

SinemaRC is a Siemens provided software, so with the help of the customer, we called the technical support at Siemens. But we quickly realized we wouldn't get help there. The support wasn't technical at all, they only knew very basic configuration.

Investigation

So with this machine on our hands, I decided to dig down, without any help or indication. After all, it's a Linux box, everything is possible.

I went to our hypervisor to attach a standard Debian image to the machine and booted it, this is basically booting on a live USB. Once there, I could access the SinemaRC filesystem by mounting it. My first instinct was trying to figure out with what I was working. Turns out a simple cat /etc/os-release showed me this so-called blackbox was actually just an Ubuntu 18.04. With that in mind, I took a look at /etc/network/interfaces. Bingo, it was configured to be 192.168.0.0/24; again, just like those devices you plug and configure via their factory IP. Starting to understand the system. So I tried to change it and rebooted.

To my surprise, the machine wouldn't boot anymore; it seems GRUB was broken for some reason. After searching for some time, I stumbled upon this forum post. I booted Debian again, mounted the filesystem with binds to /dev, /proc and /sys as indicated and then chroot into it. There I was, almost having control on this OS. Out of curiosity, I tried a few commands like apt, find, ip. They were all working, meaning the system wasn't really stripped out at all; a lot of debug tools would still be available to me. In this position, I ultimately ran grub-install and update-grub (which were available on SinemaRC ...) and I could boot. At this point, I didn't know I'd have to do it every time I would boot into Debian ...

Anyway, back to booting to see if simply changing the IP works. But no, I could see in the boot process the interfaces file was overwritten by something. Additionally, I could see in the console the following message, which was definitely wrong.

Welcome to SINEMA Remote Connect

System startup complete.
Please open WBM over http://192.168.0.2

I thought there would be a simple startup script getting the network config, updating it in interfaces and displaying it. Finding this script, I would find the source of the configuration and could change it. So I ran something similar to the following find / -type f | xargs grep -FHn --color "SINEMA" to search every file for “SINEMA”. Surely there would not be many files with this, and if any, it would be interesting to look into it. And oh boy was I right, logfiles, postgresql, dpkg, grub, /etc/sinemarc, this thing was everywhere, and most importantly /etc/welcome.py. Indeed, this file was responsible for the console display. Unfortunately, it was static, it was not getting configuration from an external source. It was a simple loop displaying constant text. Anyway, I had something that could run commands while the VM was running. So I did a few tests, starting by catting files in the console, and finally modifying network config successfully with this. At this point, I could ping the VM but not access the interface. Moreover, that was not a proper way to configure it, but still, I had hope.

I went back to my find command and looked at the results. In the dpkg results, I could see postinst scripts. I decided to look at those, surely there would be some useful information, how the app is installed, where it is ... Soon enough, I found this

SINEMARC_PATH="/usr/lib/sinemarc/"

cd $SINEMARC_PATH
django-admin compilemessages --pythonpath=$SINEMARC_PATH --settings=webprj.settings

Super interesting! I now know that it's a Django/Python app and where it is! I browsed the files for a while; lots of interesting things, including network files. It's at this point I understood SinemaRC was actually just a Django app managing system services like iptables, firewalld, openvpn. I could see code similar to {% if iface.is_wan %}. SinemaRC was actually just generating (interfaces, OpenVPN ...) config files and managing services accordingly. I felt right at home!

So I went on looking for the place generating /etc/network/interfaces, and with a few find and grep, I found it: network.py, next to a bunch of openvpn.py, firewall.py etc... I opened it, easily found the interface generation and overwrote it to be our correct public IP. I rebooted and voila! A fully working SinemaRC installation.

Conclusion

In the end, the problem was pretty easy, just the box setting its IP to 192.168... as a standard fallback for initial configuration. As to know why it did that ... Maybe an edge case, maybe a bad config push at some point, who knows.

Still, it took me around 12 hours to figure this all out. But I gained extreme knowledge about this thing, knowledge that probably only software engineers at Siemens know.

In agreement with the client, I opened myself a backdoor in the form of opening the SSH port in the firewall script (doing this, I figured out it was an option in the interface labeled as “Debug login”, go find it...). I also installed node_exporter to monitor this VM as well as the others.

This proved to be useful multiple times to debug VPN connections to devices as well as updating the underlying OS (which is not done by default) or making sure the VM is running smoothly.

 
Read more...

from vdoreau

#lowlevel #alpine #kernel

Recently, we had to update a router running alpine linux. This link is redundant with another alpine router, so a stressful operation but no big deal. Standard procedure: upgrade the system, then reboot to make sure everything is in order and the latest kernel is used. The second router takes over traffic, so far so good. Until the first router doesn't come back up.

We have access to its console, so I connect to see what's happening aaand ... bingo, the router is stuck at boot with the following line from the kernel, and after alpine took over the boot process.

random: crng: init done

We've already faced that and we prepared for that. Unlike default alpine that only keeps the latest kernel, we had a backup kernel on the system, ready for this exact case. So naturally I rebooted, selected the other kernel aaand ... back luck, still the exact same. At least the kernel was out of the equation.

At this point I knew I had to go into the initramfs to debug the boot process (I had already done it previously and was able to solve a similar issue this way). On alpine, the init script is /usr/share/mkinitfs/initramfs-init. Looking at it reveals you can use the single kernel option to tell the init script to spawn a shell before starting the root process (init on alpine, systemd on debian ...).

Here are the relevant lines from the script.

if [ "$SINGLEMODE" = "yes" ]; then
    echo "Entering single mode. Type 'exit' to continue booting."
    sh
fi

From there, you basically have a shell in the initramfs, so a working, minimal linux system. You can then alter the boot process. After all, init is only a shell script running some commands.

At this point, the only clue I had was the few lines after alpine started booting. Two of them mentioned changing permissions in the /run folder, and the last one was about randomness from the kernel.

I immediately connected the random stuff to boot time entropy starvation, which we already experienced in virtual machines but only as slow down, never a full stuck boot. I'm not going to explain what boot time entropy starvation is, if you want to know more, this article from the debian wiki is a good starting point.

Naturally, I went this way and found out you can get the available entropy of a system with /proc/sys/kernel/random/entropy_avail. At the time, it gave me something like 80, which is below the 256 mentioned in this forum post.

So here I am on my way to try to generate more entropy. Now you may have come across openssh asking you to move your mouse to generate entropy. Well of course this is not possible here since there is no X server. After trying a few commands and looking around, I cated the entropy available again, and to my surprise ... it increased! Interestingly, it increases with every command, even just typing gibberish worked; likely every keystroke increases entropy a little bit.

I continued this way until I hit 256 entropy, which seems to be the max. And at this exact moment, the kernel printed random: crng: init done. Out of curiosity/instinct I typed exit, which gives back control to the init script to continue booting and ... it worked! The router was now booting as normal.

In the end, this little APU was entropy starved. Now this is supposed to be fixed since Linux kernel 5.4, so I cannot explain entirely why this happens. To remedy this, we installed haveged, which is a userspace daemon gathering entropy from hardware. This is a first step towards full resolution that we'll need to monitor to verify it is sufficient.


When writing this blog post – and after some sleep – I realized something. At some point during the initial process I rebooted the router, I pinged it continuously to see when it would come back up.

Yay it pings, must be back up, let's ssh!

Oh ssh doesn't respond, let's give it a little bit of time, must still be booting.

Ok weird it's been a long time now, let's ping it again to see if it went down or something.

Still pings, ok so it's taking a suspiciously long amount of time to boot. Let's ssh again just to see if it finished now.

Ok works, piouf, that was close, it did reboot alone in the end

Or so I thought.

It's only later I understood, pinging an interface can generate randomness. So what I did without even knowing, was generating enough entropy, exactly as I did in the initramfs by issuing commands.

 
Read more...