Non-responsive Camera/USB - Resetting tips

The Problem

The two cameras on PAN001 occasionally time out while taking pictures and appear to get in a locked state. It’s not entirely clear what is causing this problem but it puts the camera into an unusable state.

The first signs that this is happening often appear in the observing logs as a simple timeout, which currently causes the unit to park itself. You can see an example of that timeout here:

Issuing a gphoto2 --auto-detect command will still show two cameras however an attempt to communicate with the camera fails with the standard gphoto2 error. A simple way to test this is to try and read the serial number from the camera, i.e.

> gphoto2 --auto-detect
Model                          Port                                            
----------------------------------------------------------
Canon EOS 100D                 usb:002,018     
Canon EOS 100D                 usb:002,017  

> gphoto2 --port usb:002,018 --get-config serialnumber
Label: Serial Number                                                           
Readonly: 0
Type: TEXT
Current: ee04d187fcad471c9c34f7f257792ee1
END

> gphoto2 --port usb:002,017 --get-config serialnumber
# Fails - sorry, forgot to copy error but it is ugly and uninformative.
# It looks like the same error you see when you do not have permissions
# because of the `dialout` permissions group but not exactly the same.
# I will edit this post next time I capture the error.

Running dmesg will display system messages from the kernel, in which case you can see that the USB subsystem is failing:

This points to the fact that we are having some other issue rather than a simple mechanical problem with the camera itself. This could either be a problem with the actual control computer (an older model NUC in this case) and the the USB subsystem, or some kind of lock-up on the arduino that is controlling the camera box.

There are a number of outstanding Issues related to the arduinos that capture some of the problems. In particular the millis wrap-around is causing some number overflows which prevents values from being read properly from the camera board, causing us to lose all sensor readings from the camera board a few days after it is successfully reset.

First attempts to fix this include a simple reboot of the computer, however because of our wiring setup (at least on the PAN001, which is the earliest model and before updated camera board), this does not actually cause a power-cycle on the camera board arduino. The dmesg display from above is taken after such a reboot and it’s clear that the problem has not fixed itself.

The Ideal Fix

The normal fix would be to toggle the individual camera lines via the peas_shell. Ideally this would look like:

PEAS > toggle_relay cam_0

Where you can use cam_0 or cam_1 depending. Assuming that didn’t work one could also try to reset the arduino that controls both cameras (as well as the accelerometer/temp/humidity sensors in the camera box):

PEAS > toggle_relay camera_box

Unfortunately neither of these are working on PAN001 because of mix of electronics and arduino sketches. This most likely relates to a simple issue where the peas_shell is not sending proper newlines to the arduino so the commands to toggle are silently being dropped. (This will hopefully be fixed soon but is also complicated by the remote nature of PAN001 such that we don’t want to risk a power shutdown if something goes wrong so we are being extra-cautious about doing it in the lab first.)

The Less-Ideal Fix

One can overcome this by not using the peas_shell but instead by using the POCS rs232 (a.k.a serial data) utility library, which is what peas_shell is doing under the hood. Something like (see also the script in the reply to this):

> jupyter-console
Jupyter console 6.0.0

In [1]: from pocs.utils.rs232 import SerialData                                                                                                                                                                                                                                                                               
In [2]: ser = SerialData('/dev/ttyACM0')

In [3]: ser.ser.readlines()  # I've added some newlines here to be readable
Out[8]:
 b'{"name":"telemetry_board", "ver":"2018-01-14", 
"millis":2520156078, "report_num":7519879, 
"power": {"computer":1, "fan":1, "mount":1, "cameras":0, "weather":1, "main":1}, 
"current": {"main":455,"fan":68,"mount":175,"cameras":105}, 
"amps": {"main":1274.00,"fan":122.40,"mount":315.00,"cameras":105.00}, 
"humidity":22.30, "temp_00":22.70, "temperature":[30.12,18.94,21.25]}\r\n'
]

In [4]: ser.ser.flushInput()  # Flush the input/output on the arduino
In [5]: ser.ser.flushOutput()
In [6]: ser.ser.write(b'5,0\r\n')  # Turn off arduino pin #5

# Wait 30 seconds or so
In [7]: ser.ser.write(b'5,1\r\n')  # Turn on arduino pin #5

Obviously this is not ideal as it requires knowledge of the underlying arduino port and the pins that each camera is connect to. And, unfortunately, doesn’t always work anyway.

The Brute Force Fix

Turns out that resetting a USB subsystem is not actually a straight-forward thing. The problem is that the 5V is always coming out of the usb port so it’s actually hard to kill the power to the arduino (which has the power relays for the camera).

Fortunately @james.synge has found a script that seems to let us brute force cycle the USB subsystem. This $POCS/scripts/reset_usb_device.py script works with the filesystem path for the USB subsystem, something like /dev/bus/usb/XXX/YYY and seems to work fairly well.

This is of course not ideal either as you need to discover the usb path. The script provides a way to list these ($POCS/scripts/reset_usb_device.py list) but the output is not totally clear. For PAN001 the arduino is hooked up to a 4-port (unpowered) usb hub and the relevant output from that command (although not the only output) is:

...

path=/dev/bus/usb/002/004
    description=VIA Labs, Inc. VL812 Hub
    manufacturer=
    device=
    search string=VIA Labs, Inc. VL812 Hub 

...

You can then do:

> $POCS/scripts/reset_usb_device.py path /dev/bus/usb/002/004

and the usb subsytem should be reset. You should verify this by doing the gphoto2 commands as listed above, i.e. the --auto-detect followed by the --get-config serialnumber commands.

I’ve also written a small script the resets the camers via the rs232 module as above and could be used after the reset_usb_device.py. This might make it’s way into the repository eventually if it is found useful but is attached here.

It requires the port for the arduino to use (/dev/ttyACM2 in this case):

> scripts/reset_cameras.py --verbose --serial_port /dev/ttyACM2                                                                                                                                                                                                                          
Connecting to /dev/ttyACM2
(b'{"name":"telemetry_board", "millis":2431131718, "report_num":7506912, "ver":'
 b'"2018-01-14", "power": {"computer":1, "fan":1, "mount":1, "cameras":0, "weat'
 b'her":1, "main":1}, "current": {"main":469,"fan":37,"mount":159,"cameras":92}'
 b', "amps": {"main":1313.20,"fan":66.60,"mount":286.20,"cameras":92.00}, "humi'
 b'dity":22.10, "temp_00":22.00, "temperature":[29.56,18.56,20.50]}\r\n')
Flushing input and output
Stopping cameras
Waiting for 30 seconds
Flushing input and output
Starting cameras
(b'56.00}, "humidity":22.10, "temp_00":22.00, "temperature":[29.62,18.56,, "ver'
 b'":"2018-01-14", "power": {"computer":1, "fan":1, "mount":1, "cameras":0, "we'
 b'ather":0, "main":1}, "current": {"main":371,"fan":57,"mount":110,"cameras":4'
 b'7}, "amps": {"main":1038.80,"fan":102.60,"mount":198.00,"cameras":47.00}, "h'
 b'umidity":22.10, "temp_00":22.00, "temperature":[29.62,18.56,20.56]}\r\n')

Conclusion

Hopefully we will be able to work these problems out of our system with the new camera box electronics boards and corresponding updates to the arduino code. And on that note, time to get to the lab!

Script I use for resetting.

 reset_cameras.py 
  1 #!/usr/bin/env python3
  2                                                                                                                                                                                                                                                                                                                           
  3 import time
  4 from pprint import pprint
  5 
  6 from pocs.utils.rs232 import SerialData
  7 from pocs.camera import list_connected_cameras
  8 from pocs.camera.canon_gphoto2 import Camera
  9 
 10 
 11 def main(serial_port, wait_time=30, **kwargs):
 12 
 13     print('Connecting to {}'.format(serial_port))
 14     device = SerialData(serial_port)
 15 
 16     pprint(device.ser.readline())
 17     print('Flushing input and output')
 18     device.ser.flushInput()
 19     device.ser.flushOutput()
 20 
 21     print('Stopping cameras')
 22     device.ser.write("5,0\n\n".encode())
 23     print('Waiting for {} seconds'.format(wait_time))
 24     time.sleep(wait_time)
 25     print('Flushing input and output')
 26     device.ser.flushInput()
 27     device.ser.flushOutput()
 28     print('Starting cameras')
 29     device.ser.write("5,1\n\n".encode())
 30     pprint(device.ser.readline())
 31     cameras = list_connected_cameras()
 32 
 33     for camera_port in cameras:
 34         print(f'Creating camera on {camera_port}')
 35         cam = Camera(port=camera_port)
 36         cam.connect()
 37         print(f'Camera: {cam.uid}')
 38         del cam
 39 
 40 
 41 if __name__ == '__main__':
 42     import argparse
 43 
 44     # Get the command line option
 45     parser = argparse.ArgumentParser(description="Read sensor data from arduinos")
 46 
 47     parser.add_argument("-p", "--serial_port", dest="serial_port", required=True,
 48                         help="Serial port to use for resetting cameras")
 49     parser.add_argument('-v', '--verbose', action='store_true', default=False,
 50                         help="Print results to stdout")
 51     args = parser.parse_args()
 52 
 53     main(**vars(args))

Thanks for the writeup.

Does this give us insight into why only one camera works with PAN012 through POCS? We will have the head unit interface board done this week so we should be able to power cycle whatever we like remotely at that point on PAN012 so you could do your experiment her remotely.

Looks like might be a non-issue for PAN012, answered here Build thread for PAN012 - Caltech.