Why did my Linux notebook freeze?
December 6th, 2020 | by Saren Tasciyan | posted in Hacking, Software
Linux has come a long way since it was first introduced by Linus Torvalds. and today it dominates our digital world. There are very good reasons for that. One place, where it is still behind in terms of usage, it is desktop/notebook PCs for end users. But this article is not about Linux history, this is about an issue in Linux for end users, which I faced since I am using Ubuntu for a while. Unfortunately, this issue has it’s roots deep inside the kernel.
Two years ago, I finally switched from Windows to Linux with the intention that I would not go back. But I faced many issues in the beginning, which were either related to certain settings or drivers etc. and could be solved easily! One issue puzzled me a lot and hard to crack. This will be a story of that.
Once in a while, random applications started to freeze completely and never recovered. In the beginning this seemed to be random. I thought, that the problem was with individual applications. What are the chances of multiple applications having the same problem/bug? When it comes to software (especially free or open source software), I prefer to stay with the mainstream users with little tweaks or changes as possible. This helps to get help if needed. I wasn’t really looking for adventure, I needed a system, which I can rely on. But this issue was really annoying. Often, I had to restart the system even forcibly. What didn’t make any sense is that the applications, which were freezing were the ones, which I readily used on my Windows 7 notebook without any such issues. And these applications weren’t exotic applications at all, such as LibreOffice, Inkscape, nautilus, etc…
Here it gets more complicated because multiple independent applications froze but not the Ubuntu itself. The strongest tool of computer user having issues is simply searching other people with similar issues and their solutions. But certain issues are not defined specifically (freezing of an application) and can have multiple causes (bugs mostly) and even more potential solutions. Therefore, I couldn’t find any useful solutions. And if the cause to your issue is affecting only a small subset of total users with similar problems, then you are doomed. Because all the results you will find out there will be information pollution to you (for instance, try to solve your Windows Update issues for once and you will understand, what I mean). Thankfully, askubuntu.com or superuser.com are great sources to find quickly potential solutions. However, when it comes to common issues, they can turn into a mess.
Btw, I found this gem…
After some frustration, I decided to take a more “investigative” approach. Let’s go over it…
I did my research and I have collected all the information out there regarding the issue of applications freezing and their potential causes. After talking other colleagues at work, I realized that some other Ubuntu (actually Linux) users also have similar issues. But it appears to be that generally Ubuntu users (outside of the workplace) don’t have a similar problem, where multiple software freeze on their computer. Somehow, it had something to do with the workplace.
And I started to observe… When and how exactly does this happen? Shortly after that I realized that it mostly happened, when I was at home back from work. But how can my computer know anything about it’s location. It doesn’t have a GPS chip (afaik :-/). Maybe it had something to do with my home WiFi network? I didn’t mention this yet but actually there was another degree of complexity that my Wi-Fi driver had issues and was disconnecting once in a while. I thought it had something to do with the Wi-Fi. But even after I switched to cable connection, the issue did not resolve. The Wi-Fi issue was resolved by installing backported drivers. (kudos to Linux developers!)
I was a bit lucky that even if I killed one of these applications, after starting it again caused the same issue. This was an opportunity for testing and experimenting. At this point I had a mind-blowing realization of one of the possible causes, which came up during my research. One possible cause was IO (input/output) operations. IO operations are simple data exchange between the software with a variety of data sources, such as your hard drive, a server on the network, your microphone, USB memory etc. I didn’t consider this option, as multiple applications were affected. Furthermore, I didn’t do anything in particular other than trying to open Writer document on my Desktop, or an SVG file locally stored on the hard drive. If there would be any issue with the hard drive, I would have noticed it with more catastrophic effects. However, I was starting to putting the dots. Here are some of those:
- Issues appeared mostly when I left the network of my workplace.
- Multiple applications are affected.
- It had nothing to do with the drivers (might have explained IO).
We use file servers to store, back-up and archive files. These servers aren’t accessible, when you are not “inside” the network. To test this, I connected to VPN and surprisingly some freezing applications recovered. And they froze again, once I left the VPN. This didn’t make any sense. I wasn’t trying to reach the server. When I unmounted the servers, issue didn’t happen again. So finally, I had some idea. But still it didn’t make any sense. Are these applications trying to reach the file servers? With a bit of paranoia, I wanted to know if these applications were trying to access file servers, while I was trying to open files on my local storage.
Thankfully, there is a command called strace
, which allows one to trace system calls and signals of a user. You can start a program with strace
and it will show you all the system calls the program is sending. The output format can be confusing and in order to understand it completely a better understanding of the Linux and programming is required. But you don’t need to understand the output completely in order to identify the problem. Very nice thing about it that fast flowing signals and system calls halt right at the point, where the application freezes. The last incompletely printed system call should be a hint for the reason of freezing… And there it was, for some reason my LibroOffice (LO) was trying to access an office file, which I have used a while ago. Again, I wasn’t opening that file but LO tried to access it any way. More obscure was the Inkscape trying to access a video file I opened with VLC on another network, which wasn’t accessible either. As a solution, I needed to stay connected to the network all the time. Deleting those files from recently used files from LO helped (there is a XML file somewhere, where recent files are stored). But this is not a long-term solution. Additionally, upon investigating running processes with ps -aux or htop, nautilus often was stuck at so called D state and cannot be killed. Yes, CANNOT be killed. For me it was hard to believe first. Why can’t be killed? Well… ask kernel developers. It is hard to find anything out there. This was asked on StackExchange (https://unix.stackexchange.com/questions/364100/why-cant-we-kill-uninterruptible-d-state-process/).
I think that these applications were getting recently used file lists from the operating system through similar or same libraries (GTK?). During which, they also try to access those files (IMHO unnecessary IO). Maybe, they are only checking, whether those files exist at all, or if they can read them. If the network is inaccessible, they don’t get any input. By design, the process waits until it receives something. If this never happens, they don’t recover. The process state, in which they are stuck is called D state. This happens during IO operations. It seems to be a very difficult task for the kernel, because all the addresses for the system call of that processes need to be traced back and removed. So, “for the most common case scenario” developers decided to let processes hang in the D state forever. In Ubuntu even if you kill a D state process, it may disappear from the screen, but it is likely to be present in the system (still visible with ps
or htop
). To remove it completely, either you need to reconnect to the resource (in this case the network) or reboot the system. For the file share servers, it is a good practice to use intr flag for mounting that file system. In my case, I was using an older method to mount network drives.
When your PC hangs for no reason (no extreme CPU usage), even for short segments of time (1-2s or more), this could be due to hanging IO operations. It reminds of the times, where inserting a CD into PC was freezing the PC until CD speeds up. Something to keep in mind. Similarly, accessing an idle external hard drive may block your file browser for a few seconds until the magnetic disks speed up.