# [SOLVED] How to identify cause of Linux System freeze?



## XEyedBear

The purpose of my post is not to ask somebody to solve a rather vexing problem that I have, but rather give me some advice on a procedure that I can follow now (and in the future) that will allow me to identify the cause of the problem. Solving the problem will be a later step.

This vexing problem that I have is an unpredictable but frequent total 'freeze' of my Linux system. By 'freeze' I mean that the only way to continue is to hit the system reset switch.

I have said it's a 'Linux' system, primarily to distinguish it from the Windows/XP which I can alternatively boot on this computer and which NEVER has the freeze problem. (From this I conclude that it is a Linux issue, caused either by a defect in Linux code or a hardware design 'feature' which Linux objects to but Windows doesn't).

In fact the Linux I am using is Ubuntu 10.10.

The hardware is Asus A8V, with AMD Athlon 64 3800+ X2 CPU, 4 sticks of consecutive serial numbered Corsair Value Select Ram 512 MB DDR400 (2.5, 3, 3, 8), and an nVidia Quadro NVS280 PCI-express video card. I also have an M-Audio 2496 audiophile sound card.

The freeze has appeared after a change from a single core CPU to dual core, some months ago. The frequency with which the freeze occurrs has varied markedly in the past after automatic updates. It used to occur in almost every application I was running, and even after a period of time on an idle machine.

Now the freeze is 100% reproducible in one application in particular: an interactive fiction game (text based, modelled after the Zork trilogy of 30 years ago). I run this game under Wine. It has no problem running natively under Windows. It is probably reproducible also in the Audacity Audio recording/editing application, which again runs without problem under Windows.

There does not appear to be a common thread here.

The challenge is knowing where to starting looking for the problem cause. That implies the need for a procedure. An extensive search with Google has yielded nothing - other than that a lot of Ubuntu users over the past few years have also suffered this intermittent freeze problem, with no resolution.

What advice have you more knowledgeable people got?


----------



## hal8000

*Re: How to identify cause of Linux System freeze?*

Did you have any problems before you changed the CPU?

After you changed the CPU did you change your kernel?
Open a terminal and Post the output of:

uname -a

Linux keeps many logfiles, problem is if your machine locks solid, then these log files may not be written to.

Here's what to try.
Run your game, or whatever causes a system to freeze then look at the time your system locks up.

Reboot then go to System,Administartion, Log File Viewer.
On the left hand side, scroll down to messages. You may see messages.1 messages.2 etc but current file is messages.
Scroll down to date and time your machine locked up and copy about 20 lines, /var/log/messages logs events, so it is quite lengthy.

Audacity is a sound app, does the game have sound?
Could be audio, chipset or graphics problem is a bit wide at the moment.


----------



## XEyedBear

*Re: How to identify cause of Linux System freeze?*

There were no comparable problems - that is 'hard freeze' before the CPU upgrade. There were the usual code failures that one has to expect from time to time in any software, but mostly the kernel just carried on. Immediately after the CPU change the freezes appeared every 10 minutes or so, then, over a period of weeks, reduced to every hour or so.

I tried to convince my self that this was because there were multiple causes for the freeze and these were by addressed, one by one, by kernel and application updates (I no longer have that view).

Before the CPU changed I posted on the Ubuntu forum to ask if any change to the kernel was required. My prior reading suggested to me that the kernel is already configured to support either 1 or 2 cores 'out of the box' and will adapt itself accordingly. It is not equipped for 'multiprocessor' (as distinct to 'multi-core') support. Replies on the forum assured me that no kernel upgrade, specifically for multi-core support, was required. And indeed the system booted up just fine (after reassembling the hardware 'correctly' 3 times - ahem!). 


Output of uname -a:
Linux Advent 2.6.35-27-generic #48-Ubuntu SMP Tue Feb 22 20:25:29 UTC 2011 i686 GNU/Linux

Output from Log File Viewer, messages, immediately prior to freeze:

This does not really show anything of value, aside from the entry at 15:44:55. In the copied lines that follow, my system froze at 15:48 (using system clock display). As you can see there were no 'events' for about 3 minutes after 15:44:55. Most of this time was spent in loading the game interpreter under Wine, then loading the game, then restoring a saved game, then entering 2 or 3 commands before the freeze happened. The events starting at 15:49:59 are the system reboot after I pressed the reset switch.

Mar 11 15:43:53 Advent kernel: [ 20.982953] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Mar 11 15:43:53 Advent kernel: [ 20.994812] NFSD: starting 90-second grace period
Mar 11 15:43:53 Advent kernel: [ 20.995217] agpgart-amd64 0000:00:00.0: AGP 3.0 bridge
Mar 11 15:43:53 Advent kernel: [ 20.995232] agpgart-amd64 0000:00:00.0: putting AGP V3 device into 8x mode
Mar 11 15:43:53 Advent kernel: [ 20.995337] nvidia 0000:01:00.0: putting AGP V3 device into 8x mode
Mar 11 15:43:54 Advent kernel: [ 22.286932] EXT4-fs (sdb5): re-mounted. Opts: errors=remount-ro,commit=0
Mar 11 15:43:54 Advent kernel: [ 22.304329] EXT4-fs (sdb7): re-mounted. Opts: commit=0
Mar 11 15:43:54 Advent kernel: [ 22.369717] skge 0000:00:0a.0: eth0: Link is up at 100 Mbps, full duplex, flow control both
Mar 11 15:43:54 Advent kernel: [ 22.369959] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 11 15:43:54 Advent kernel: [ 22.530700] usb 3-3: usbfs: interface 0 claimed by usblp while 'usb' sets config #1
Mar 11 15:43:56 Advent kernel: [ 24.141851] EXT4-fs (sdb5): re-mounted. Opts: errors=remount-ro,commit=0
Mar 11 15:43:56 Advent kernel: [ 24.145609] EXT4-fs (sdb7): re-mounted. Opts: commit=0
Mar 11 15:44:55 Advent kernel: [ 83.144026] Clocksource tsc unstable (delta = -208005225 ns)
Mar 11 15:49:59 Advent kernel: imklog 4.2.0, log source = /proc/kmsg started.
Mar 11 15:49:59 Advent rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="961" x-info="http://www.rsyslog.com"] (re)start
Mar 11 15:49:59 Advent rsyslogd: rsyslogd's groupid changed to 103
Mar 11 15:49:59 Advent rsyslogd: rsyslogd's userid changed to 101
Mar 11 15:49:59 Advent kernel: [ 0.000000] Initializing cgroup subsys cpuset
Mar 11 15:49:59 Advent kernel: [ 0.000000] Initializing cgroup subsys cpu
Mar 11 15:49:59 Advent kernel: [ 0.000000] Linux version 2.6.35-27-generic ([email protected]) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) ) #48-Ubuntu SMP Tue Feb 22 20:25:29 UTC 2011 (Ubuntu 2.6.35-27.48-generic 2.6.35.11)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-provided physical RAM map:
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 000000007ffb0000 (usable)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 000000007ffb0000 - 000000007ffc0000 (ACPI data)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 000000007ffc0000 - 000000007fff0000 (ACPI NVS)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 000000007fff0000 - 0000000080000000 (reserved)
Mar 11 15:49:59 Advent kernel: [ 0.000000] BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
Mar 11 15:49:59 Advent kernel: [ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!


No, no sound. The game is simple text only - not even any graphics, aside from it being a windowed game, of course.

Yes, I agree the problem is too wide at the moment - which is why I am looking for a process.


----------



## hal8000

*Re: How to identify cause of Linux System freeze?*

This line is significant:

Clocksource tsc unstable (delta = -208005225 ns)

And could lead to the clock sources being out of sync leading to a kernel crash.

First thing to check is your system BIOS. Are you overclocking any CPU bus,
memory timings or other hardware ?

If no, then you need to add one of these kernel parameters at boot time. in grub 2, press esc, try each of these lines (one by one) and try the game (at least you have some method of reproducing this error).

clocksource=acpi_pm

clocksource=rtc

clocksource=hpet

On Athlon 64 CPU frequency is dynamically changed, switching clocksource
to hpet or rtc they result in better timing accuracy, post back your findings please.


----------



## XEyedBear

*Re: How to identify cause of Linux System freeze?*

Now this is getting complicated. According to the ASUS support web-site, the A8V plus AMD CPU cannot support DDR400 memory at its rated speed ghen all 4 memory slots are populated and run in dual-channel mode - which is exactly my configuration. 

The system reverts to a 1:1 memclock to FSB ratio and the memory runs as DDR200. The system is slow.

I can manually set the memclock to FSB ratio to 2:1 (memory then runs as DDR400), but the system is unreliable - gratifyingly quick, but not usable. The best compromise I have been able to find is to run at a memclock to FSB ratio of 5:3 (DDR333) and set the system frequency to 220 MHz - this result in the memory running at about 180 MHz, according to CPU-Z. This is a lot better than the 100 MHz when the BIOS is set to auto for system frequency and memclock.

However, running in any settings - including the minimum (DDR200, FSB @200 MHz) and adding the 'clocksource=' parameter with the 3 values you suggest, makes no discernible difference to the freeze behaviour. It still occurs almost immediately within this game. And the 'messages' part of the system log still shows the 'clock unstable' log entry.

I assume I am adding this 'clocksource=' parameter in the correct way: pressing Esc in the grub menu on my system has no effect whatsoever. Pressing the 'e' key presents me with a very simple editor session showing boot commands (I guess) to which I can add the clocksource parameter. I then start the kernel with the keystroke combination 'Ctrl+X'.

is this correct?


----------



## hal8000

*Re: How to identify cause of Linux System freeze?*



XEyedBear said:


> Now this is getting complicated. According to the ASUS support web-site, the A8V plus AMD CPU cannot support DDR400 memory at its rated speed ghen all 4 memory slots are populated and run in dual-channel mode - which is exactly my configuration.
> 
> The system reverts to a 1:1 memclock to FSB ratio and the memory runs as DDR200. The system is slow.
> 
> I can manually set the memclock to FSB ratio to 2:1 (memory then runs as DDR400), but the system is unreliable - gratifyingly quick, but not usable. The best compromise I have been able to find is to run at a memclock to FSB ratio of 5:3 (DDR333) and set the system frequency to 220 MHz - this result in the memory running at about 180 MHz, according to CPU-Z. This is a lot better than the 100 MHz when the BIOS is set to auto for system frequency and memclock.
> 
> However, running in any settings - including the minimum (DDR200, FSB @200 MHz) and adding the 'clocksource=' parameter with the 3 values you suggest, makes no discernible difference to the freeze behaviour. It still occurs almost immediately within this game. And the 'messages' part of the system log still shows the 'clock unstable' log entry.
> 
> I assume I am adding this 'clocksource=' parameter in the correct way: pressing Esc in the grub menu on my system has no effect whatsoever. Pressing the 'e' key presents me with a very simple editor session showing boot commands (I guess) to which I can add the clocksource parameter. I then start the kernel with the keystroke combination 'Ctrl+X'.
> 
> is this correct?


For grub 2, Hold down the Shift key during bootup, highlight the kernel, press e for edit, append the linux line with the kernel parameters & press Ctrl=x to boot.

Asus may be correct with their statement.
Set the BIOS system clock and memory to Auto, save and boot Ubuntu. See if the freeze appears. It will be slower than overclocked at 180MHz but you need to gain stability.


If you have stability at 100MHz, then I would experiment with memory timings, you may have to get a compromise setting that will work with both operating systems.

If for example you can get the memory running faster, you can try the kerne parameters again and see if they help.

Regarding memory modules, Asus may also have recommemded manufacturers. My Gigigabyte motherboard has a range of memory modules
that have been certified and tested and as yet I have not overclocked anything.


----------



## XEyedBear

*Re: How to identify cause of Linux System freeze?*

OK, I have some progress to report.

Firstly I could find no BIOS settings - either manual or Auto - which will cause the system to run wthout a freeze, sooner or later. And I do have memory modules which are on the list of qualified vendors for this particluar mother board. Further more they have consecutive serial numbers, so are about as close to being identical as manufacturing will allow, I guess.

What is clear is that I was adding the 'clocksource =' option at boot time in a way that was quite incorrect. I learned how to edit the correct boot menu line, rather than just appending that option after the last line of the menu. 

I also found that hpet is not enabled in my BIOS (and there is no option for it), but these timeers are on the motherboard and can be forced on by the kernel (option hpet=force). I also learned that hpet is better than any other timesource.

With advice from the Ubuntu forum I have now found a way to edit the grub menu so that the options are permanent - I don't have to enter them at every boot.

Having added these options, I have been able to run with the system 5% overclocked, the memory at 5:3 (DDR333, or in practice DDR350 !) and carried out a 4 hour run with Audacity recording and the game also running, with more than 10 commands being entered without any freeze. This level of performance is acceptable (but if a little is good, a lot must be better and too much is never enough....)

So maybe, with your guidance, we have identified the cause of the problem (unstable timing) as well as having found a solution to the problem. That can't be bad, eh?

Thanks for your help.


----------



## hal8000

*Re: How to identify cause of Linux System freeze?*



XEyedBear said:


> OK, I have some progress to report.
> 
> Firstly I could find no BIOS settings - either manual or Auto - which will cause the system to run wthout a freeze, sooner or later. And I do have memory modules which are on the list of qualified vendors for this particluar mother board. Further more they have consecutive serial numbers, so are about as close to being identical as manufacturing will allow, I guess.
> 
> What is clear is that I was adding the 'clocksource =' option at boot time in a way that was quite incorrect. I learned how to edit the correct boot menu line, rather than just appending that option after the last line of the menu.
> 
> I also found that hpet is not enabled in my BIOS (and there is no option for it), but these timeers are on the motherboard and can be forced on by the kernel (option hpet=force). I also learned that hpet is better than any other timesource.
> 
> With advice from the Ubuntu forum I have now found a way to edit the grub menu so that the options are permanent - I don't have to enter them at every boot.
> 
> Having added these options, I have been able to run with the system 5% overclocked, the memory at 5:3 (DDR333, or in practice DDR350 !) and carried out a 4 hour run with Audacity recording and the game also running, with more than 10 commands being entered without any freeze. This level of performance is acceptable (but if a little is good, a lot must be better and too much is never enough....)
> 
> So maybe, with your guidance, we have identified the cause of the problem (unstable timing) as well as having found a solution to the problem. That can't be bad, eh?
> 
> Thanks for your help.



You're welcome, please can you mark as [solved].

The linux kernel is better at multitasking than the windows kernel. This means that hardware that may not work properly or stably under one system can work better with an alternate OS.
Welcome to Linux. The first 6 months will be the hardest, depending on what you want to do. I've been running linux since 2000, there's not much I can't do, however I still multiboot with windows as I'm also a gamer.


----------

