Wednesday, April 03, 2013

My Book Live - Connection Issues and Troubleshooting

I've been noticing issues with my NAS solution, which is a Western Digital My Book Live Personal Cloud Edition.

I keep losing connectivity after 5 or so minutes of connecting to the NAS via the web-based console or accessing it as a mapped drive.  I'd get the message:

30001 - Your last operation timed out. Make sure there are no network connectivity issues and try again.
I used Google to attempt to find a solution, but all I see is shared pain.

I did find a way to log into the device's command line.  Here's what I did:

  • I put "http://[ip of your MBL NAS]/UI/ssh" into my browser's address bar.
  • Clicked the "enable" button.
  • Shelled into the NAS using Putty and "root/welc0me" as a username/password.

Once I logged in, I immediately began to run 'top' because I knew I'd lose the session after 5 or so minutes and wouldn't be able to log in again unless I power-cycled the NAS.  I noticed that Twonky appeared to hog CPU cycles, so I went to the web GUI and disabled it.  Then I watched top again.  The load averages were a bit high before I disabled Twonky (in the 7.xx range as a first number).  I watched them drop to the mid-4s, then they started raising again.  Top wasn't telling me anything, though.

I watched the load average raise to 22.xx before the terminal session showed signs of degrading to the point that it stopped taking input.

login as: root
root@xxx.xxx.xxx.xxx's password:
Linux MyBookLive 2.6.32.11-svn70860 #1 Thu May 17 13:32:51 PDT 2012 ppc
Disclaimer: SSH provides access to the network device and all its
content, only users with advanced computer networking and Linux experience
should enable it. Failure to understand the Linux command line interface
can result in rendering your network device inoperable, as well as allowing
unauthorized users access to your network. If you enable SSH, do not share
the root password with anyone you do not want to have direct access to all
the content on your network device.

MyBookLive:~# w
 22:37:58 up 2 min,  1 user,  load average: 5.03, 1.54, 0.54
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    1.00s  0.05s  0.03s w
MyBookLive:~# w
 22:38:10 up 2 min,  1 user,  load average: 5.85, 1.89, 0.67
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.04s  0.02s w
MyBookLive:~# w
 22:38:18 up 2 min,  1 user,  load average: 6.11, 2.07, 0.74
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.04s  0.02s w
MyBookLive:~#
MyBookLive:~#
MyBookLive:~#
MyBookLive:~# top
top - 22:39:10 up 3 min,  1 user,  load average: 7.44, 3.06, 1.14
Tasks:  97 total,   1 running,  96 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.9%us, 17.4%sy, 41.8%ni,  0.0%id,  6.6%wa,  0.3%hi,  2.0%si,  0.0%st
Mem:    253632k total,   242432k used,    11200k free,    41280k buffers
Swap:   500608k total,    42560k used,   458048k free,    52736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4429 root      21   1 21504 8192 3456 S 43.6  3.2   0:45.39 twonkymediaserv
 3936 www-data   4 -16 72704  30m  20m S 11.6 12.4   0:01.09 apache2
 3327 www-data   4 -16 76160  31m  19m S  5.6 12.6   0:02.37 apache2
 3809 www-data   4 -16 72704  33m  23m S  5.6 13.6   0:03.08 apache2
 3326 www-data   4 -16 74944  26m  16m S  1.7 10.7   0:03.34 apache2
 3829 www-data   4 -16 66624  23m  16m S  1.3  9.7   0:01.50 apache2
 4156 www-data   4 -16 69248  25m  17m S  1.3 10.3   0:00.30 apache2
 5071 root       4 -16  5056 3136 2304 D  1.0  1.2   0:00.03 getServiceStart
 4639 root      39  19  5120 3264 1920 D  0.7  1.3   0:03.12 ls
 4641 root      39  19  3776 1792 1344 S  0.7  0.7   0:00.77 tally
 4821 root      20   0  5056 3008 1920 R  0.7  1.2   0:00.34 top
 5067 root       4 -16  5056 3136 2304 D  0.7  1.2   0:00.02 getServiceStart
 2230 root      20   0 31424 3264 2048 S  0.3  1.3   0:00.19 rsyslogd
 2385 root      20   0     0    0    0 D  0.3  0.0   0:00.28 jbd2/sda4-8
 4405 root      20   0 57280 7552 2816 S  0.3  3.0   0:00.94 forked-daapd
 4640 root      39  19  4480 1856 1344 S  0.3  0.7   0:00.48 awk
    1 root      20   0  4352 1984 1600 S  0.0  0.8   0:00.82 init
MyBookLive:~#
MyBookLive:~#
MyBookLive:~#
MyBookLive:~# w
 22:39:15 up 3 min,  1 user,  load average: 7.24, 3.09, 1.16
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    1.00s  0.04s  0.02s w
MyBookLive:~# w
 22:39:16 up 3 min,  1 user,  load average: 7.24, 3.09, 1.16
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.03s  0.01s w
MyBookLive:~# w
 22:39:19 up 3 min,  1 user,  load average: 7.22, 3.16, 1.20
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    1.00s  0.04s  0.02s w
MyBookLive:~# w
 22:39:20 up 3 min,  1 user,  load average: 7.22, 3.16, 1.20
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.03s  0.01s w
MyBookLive:~# w
 22:39:25 up 3 min,  1 user,  load average: 7.36, 3.25, 1.24
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    2.00s  0.04s  0.02s w
MyBookLive:~# w
 22:39:32 up 3 min,  1 user,  load average: 7.09, 3.26, 1.25
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.05s  0.02s w
MyBookLive:~# w
 22:39:39 up 3 min,  1 user,  load average: 6.62, 3.29, 1.28
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.04s  0.01s w
MyBookLive:~# w
 22:40:17 up 4 min,  1 user,  load average: 5.75, 3.43, 1.40
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    1.00s  0.05s  0.02s w
MyBookLive:~# w
 22:40:24 up 4 min,  1 user,  load average: 5.79, 3.52, 1.45
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.05s  0.02s w
MyBookLive:~# w
 22:40:35 up 4 min,  1 user,  load average: 6.11, 3.66, 1.52
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    1.00s  0.05s  0.01s w
MyBookLive:~# w
 22:40:46 up 4 min,  1 user,  load average: 5.85, 3.69, 1.55
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.05s  0.01s w
MyBookLive:~# w
 22:41:00 up 5 min,  1 user,  load average: 5.44, 3.70, 1.59
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.05s  0.01s w
MyBookLive:~# w
 22:41:54 up 5 min,  2 users,  load average: 4.65, 3.75, 1.73
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    0.00s  0.06s  0.02s w
root     pts/1    ron-alien.home   22:41   21.00s  0.17s  0.15s top
MyBookLive:~# w
 22:42:48 up 6 min,  2 users,  load average: 4.90, 3.93, 1.89
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    2.00s  0.09s  0.04s w
root     pts/1    ron-alien.home   22:41    1:15   0.50s  0.48s top
MyBookLive:~#
MyBookLive:~#
MyBookLive:~# w
 22:43:11 up 7 min,  2 users,  load average: 5.26, 4.09, 1.99
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    ron-alien.home   22:37    2.00s  0.07s  0.02s w
root     pts/1    ron-alien.home   22:41    1:39   0.66s  0.64s top
MyBookLive:~# w

Then there is this:


Something isn't quite right with this NAS, but it's going to take awhile to figure out what's going on.  Also, it responds well to pings, even if the SSH session is dead and won't recover.  And I still have to back it up.  I think I've 378GB of data on it (that's crucial...like once-in-a-lifetime types of pictures).

I don't think the drives are bad, but it may be too early to say that.  I've never seen bad drives ramp up load averages like that.

The drive is out of warranty and I'm a bit upset that what's touted as a top-notch home NAS is having such issues, especially considering that it's a WD product.

I'll update this post when/if I've more findings on this issue.

EDIT:  I just checked again after posting and, while the shells aren't dead, they are very slide-show-like.  I checked the load average and it's dropped to 12.94.

EDIT 2:  I got tired of waiting for "apachectl stop" to finish and I think it was actually hung, so I did a "killall -9 apache2" which immediately brought the load down.  The load is currently at 1.09 and has been around that the last 20 minutes.  So, it's apache that's killing the NAS.  Note that I tested to see if I could reach the NAS shares in a conventional manner (ie, non-shell or without apache) and was able to reach the shares without issue.  I may keep apache off for the duration (unless I need to access the control panel).