Bug 143477 - kdm root login ok, user login hangs kdm
Summary: kdm root login ok, user login hangs kdm
Status: RESOLVED NOT A BUG
Alias: None
Product: kdm
Classification: Miscellaneous
Component: general (show other bugs)
Version: unspecified
Platform: Slackware Linux
: NOR normal
Target Milestone: ---
Assignee: kdm bugs tracker
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-26 10:29 UTC by Robert Hogan
Modified: 2008-05-19 17:30 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Hogan 2007-03-26 10:29:19 UTC
Version:            (using KDE KDE 3.5.6)
Installed from:    Slackware Packages
OS:                Linux

I recently upgraded slackware-current. This involved a big re-org of the X11 installation. Ouch.

I *can*:

* Use KDE by doing startx and startkde as my normal user. KDE works fine under these conditions.
* Log into KDE from KDM as root.

I *can't*:

* Use KDE by logging in as a normal user from KDM.

What happens:

* I type name and password of normal user, press 'Login' and KDM hangs. Greeter screen stays there and remains unresponsive to keyboard or mouse. I have to ctrl+alt+backspace to recycle the display and get a responsive KDM back. This doesn't happen for root.

The problem remains when I try all of the following:

* Logging in with a fresh, empty home folder.
* Clearing out /tmp.
* Changing X11 driver between nv, vga and nvidia.
* Configure KDM to autologin the normal user. In this situation the mouse pointer appears and the screen remains black and unresponsive. 

Nothing meaningful appears in any of:

* /var/log/kdm.log
* /var/log/Xorg.0.log

I've tried using kdm -nodaemon -debug 15 but this has not produced anything beyond the Xorg output.

I've had a look through the KDM source and cannot determine what parts are called when I press login, so haven't been able to determine where it might be failing.

Request
-------

The only KDM bug here is that it does not give me anything meaningful when it encounters what is obviously an X11 setup problem. Can you advise me where to look next? Even pointers in the kdm source would be great. I suppose it has to be something permissions-related or something related to the way X is handling log-ins, but I know very little about how either are meant to work.
Comment 1 Robert Hogan 2007-03-26 10:34:12 UTC
This is the typical output in kdm.log:

This is a pre-release version of the X server from The X.Org Foundation.
It is not supported in any way.
Bugs may be filed in the bugzilla at http://bugs.freedesktop.org/.
Select the "xorg" product for bugs you find in this release.
Before reporting bugs in pre-release versions please check the
latest version in the X.Org Foundation git repository.
See http://wiki.x.org/wiki/GitPage for git access instructions.

X Window System Version 1.2.99.901 (1.3.0 RC 1)
Release Date: 4 March 2007
X Protocol Version 11, Revision 0, Release 1.2.99.901
Build Operating System: Slackware 12.0 Slackware Linux Project
Current Operating System: Linux darkstar 2.6.18.8-smp #1 SMP Fri Mar 16 19:37:53 CDT 2007 i686
Build Date: 07 March 2007
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Mon Mar 26 10:03:43 2007
(==) Using config file: "/etc/X11/xorg.conf"
FreeFontPath: FPE "/usr/share/fonts/misc" refcount is 2, should be 1; fixing.
Comment 2 Robert Hogan 2007-03-26 10:35:41 UTC
I've also tried the suggestion found elsewhere, log in from kdm as root and start a new session from there. This did not change the result.
Comment 3 Oswald Buddenhagen 2007-03-26 13:06:31 UTC
you should have a look at syslog, as kdm.log certainly indicates.

maybe one of the startup scripts hangs - that would be simple to find out - just append -x to the #! /bin/sh line and check the system logs and .xsession-errors.
strace-ing kdm might help, too.
Comment 4 Robert Hogan 2007-03-26 13:32:01 UTC
Sorry, forgot to mention that syslog told me nothing either. I also forgot to mention that nothing gets written to .x-session-errors.

I'll add -x to Xsession and see what happens and also have a go at strace. Get back to you soon.
Comment 5 Robert Hogan 2007-03-26 17:25:10 UTC
A little more experimentation:

* I can log in with XDM.
* Using strace directly with KDM it stops when it seizes up while writing to /sys/bus/pci/devices/0000:00:1e.0/config.

10224 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
10224 sigreturn()                       = ? (mask now [IO])
10224 gettimeofday({1174911433, 732855}, NULL) = 0
10224 gettimeofday({1174911433, 732944}, NULL) = 0
10224 select(256, [1 18 19], NULL, NULL, {1, 691000}) = 0 (Timeout)
10224 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
10224 gettimeofday({1174911435, 424626}, NULL) = 0
10224 gettimeofday({1174911435, 424730}, NULL) = 0
10224 select(256, [1 18 19], NULL, NULL, {470, 521000}) = ? ERESTARTNOHAND (To be restarted)
10224 --- SIGALRM (Alarm clock) @ 0 (0) ---
10224 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
10224 sigreturn()                       = ? (mask now [IO])
10224 gettimeofday({1174911435, 448798}, NULL) = 0
10224 gettimeofday({1174911435, 448888}, NULL) = 0
10224 select(256, [1 18 19], NULL, NULL, {470, 497000} <unfinished ...>
10222 <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)

The last line shows me ctrl-c'ing kdm.

If you have any suggestions as to commands I should run they'd be very welcome. 
Comment 6 Oswald Buddenhagen 2007-03-26 17:27:21 UTC
try attaching to the session subdaemon (-:0). kdm -h also reveals some interesting flags.
Comment 7 Robert Hogan 2007-03-26 19:39:09 UTC
OK. Got it.

The problem is in 916 in client.c:

if (!(s = getusershell()) {

results in an infinite loop if your shell is not listed in /etc/shells. It should be:

if (s = getusershell()) == NULL) {

For some reason I had /bin/sh as my shell in /etc/passwd. This was not listed in /etc/shells. This inconsistency was probably the result of some messing around during a traumatic upgrade of slackware-current ;)

I've tested the patch and it tells me I have an unlisted shell, which is the desired behaviour.
Comment 8 Oswald Buddenhagen 2007-03-26 20:46:16 UTC
your parenthesis placement is incorrect. ;)
anyway, this doesn't make sense - ! on a pointer *is* a null-check. mind extracting the compiler call and substitute the -o <object> with -S (provided this is gcc) to get the assembler code generated and creating a diff between the two variants?
Comment 9 Robert Hogan 2007-03-27 00:12:31 UTC
You're right on all counts.

The real culprit seems to be an /etc/shells file ending with something like 0a 00 rather than 0a, such as when you o a new line in vim and then esc and save without writing anything to the line.

I take it the bug is somwhere in getusershell(), probably in the function below - though I can't see it.

static size_t
readname (char **name, size_t *size, FILE *stream)
{
  int c;
  size_t name_index = 0;

  /* Skip blank space.  */
  while ((c = getc (stream)) != EOF && isspace (c))
    /* Do nothing. */ ;

  for (;;)
    {
      if (*size <= name_index)
        *name = x2nrealloc (*name, size, sizeof **name);
      if (c == EOF || isspace (c))
        break;
      (*name)[name_index++] = c;
      c = getc (stream);
    }
  (*name)[name_index] = '\0';
  return name_index;
}
Comment 10 Oswald Buddenhagen 2007-03-27 00:32:01 UTC
convincing an editor to insert a null char is usually more work than leaving an empty line. convince yourself what exactly is in the file (hexdump).
anyway, unless i'm missing something, even the weirdest char shouldn't cause more than an empty string, but definitely should not interfere with EOF detection (that would be the only explanation for an endless loop i can think of). so unless something is wrong within getc already (quite improbable, i'd think), you're more probably seeing the effects of a memory corruption (some reallocs like to hang under such circumstances) - maybe try valgrinding the thing (kdm -h).
Comment 11 Robert Hogan 2007-03-27 09:53:45 UTC
I'll have a go at valgrinding it tonight or tomorrow.

Have you tried recreating the problem yourself? This would be especially useful if you are using gcc-3.x since this would confirm that it's not something specific to the way slackware was compiled. (with gcc-4.x.x). 
Comment 12 Robert Hogan 2007-03-27 21:44:42 UTC
Here's the culprit. It's a libc bug:

http://sourceware.org/ml/libc-hacker/2006-12/msg00009.html
Comment 13 Oswald Buddenhagen 2007-03-27 22:13:36 UTC
tough one ... :}