Message-based API Design & Implementation for Components


#1

Hi,

I would like to know more about this project idea. Also, can you please link the repositories related to the project.


#2

Hi,
I’m interested to work on this topic as well.


#3

@james.synge sorry for being irregular, i am having my mid semester exams next week. So i won’t be able to actively work as such :frowning_face:


#4

I’m glad to see there is interest in this topic. I’m personally eager to see some decoupling of components, and improving our messaging supports that effort. It also reflects the modern systems design approach of microservices that communcate via RPCs. We currently use ZeroMQ between the processes started by POCS on a single computer, primarily for reporting weather and safety information from sensors, and for reporting status to the WebUI.

One of the nice features of introducing clearer messages between components is that we can run the various parts of the system as separate processes; this then allows us to have these processes distributed. For example, for the Huntsman Telephoto Array (an application of POCS in Australia) they desired to have a separate computer control each of their telescopes (a telephoto lens, camera, focuser and filter wheel), where all of the telescopes are on the same one mount. To accomplish this they added this support to POCS using Pyro4, a Python RPC mechanism. However, I have a preference for us to be language agnostic for our RPC/messaging system, so that we don’t require the client and server/sender and receiver to be implemented in the same language; only that they agree on the protocol and message formats.

I made a proposal about formalizing our ZeroMQ topic names here: https://github.com/panoptes/POCS/issues/606

There are other technologies that folks used for organizing an observatory control system, and for communicating between components of an observatory. @wtgee explored these broadly here: https://github.com/panoptes/POCS/blob/develop/docs/pocs-alternatives.rst

We’ve also had a debate on the topic earlier in this issue: https://github.com/panoptes/POCS/issues/546


#5

@james.synge
Thank for the assistance, sir. I read the discussions and the technologies that are being used. So, I suppose PANOPTES is using INDI as of now, right? But for components like the camera, we are required to make an API. Correct me if I am wrong, ZeroMQ was introduced for the camera because of the urgent requirement.


#6

@james.synge Waiting for acknowledgement.


#7

My apologies, my work week has been solid meetings, leaving little time for PANOPTES. I’ll respond on the weekend.


#8

@sushant_mehta we are not using INDI right now although there has been some talk about going in that direction. For things like the Canon DSLR cameras we are using the underlying gphoto2, which is the same thing that INDI uses.


#9

To clarify, we currently use zeromq only to relay non-critical messages, for instance to send status messages to the local web page (PAWS).

Since zeromq is easy to implement the idea would be to have distributed hardware devices that all communicate over a common protocol. In this way we could have a light-weight camera daemon that is, for example, listening to incoming zeromq messages and when it receives a command to take_observation (with appropriate parameters) it uses the existing camera code to do that. This would allow for the camera to be physically separated from the control computer and would also decouple the hardware from the control software (POCS). This kind of distributed system is similar to how INDI works.

However, my thinking lately is that we should not be recreating this wheel with zeromq but instead be using something like grpc to implement the protocol. grpc is robust and widely implemented and offers a few more features specific to an RPC compared to just zeromq. Since we are basically talking about an RPC I think this makes sense.

Note that @ahorton has implemented pyro4 for Huntsman, which has worked reasonably well and is largely the same thing. It would work better if we focused more energy on it. Pyro4, however, is limited to python whereas grpc is language agnostic, making it a much better candidate.

For GSoC this would look like implementing the basics of grpc on a selected piece of hardware (the canon dslr might be easy as many people have access to the actual hardware - the simulator could also be used.)

@james.synge @ahorton @anyone curious about your thoughts on grpc.


#10

@wtgee Sir, thank you for the clarifications provided on the project.

So from what I understood, currently zeromq is used to send messages like status messages, weather info etc. from the robot telescope to PAWS and does not involve any RPCs. And our objective is to create an API to allow us to send RPCs via PAWS so that the robot hardware behaves as we intend it to behave. Please correct me if I got it wrong.

I also have a doubt related to the issue#546 posted by @ahorton Sir: What is the advantage of controlling the Huntsman Telescope using multiple systems? I understand that it has multiple cameras and it is following the example of the Dragonfly Telescope, but just out of curiosity, I would like to know what is the advantage of having a distributed control system for the telescope. I also want to know if there is a correlation between the distributed control system and the messaging API that has to be build.

I also wish to know the need for such RPCs. From what I understand, the robot is completely autonomous at the movement. Is it introduced to bring some human control to the working of the robot?


#11

For background:

  • The Huntsman Telephoto Array has an array (collection) of telescopes on a single mount, so they’re all aimed at the same target, but each telescope has its own camera, focuser and filter wheel. Each of those has communication cables that need to be connected a computer. As additional telescopes are added (e.g. if the budget allows for it), there may not be enough, for example, USB ports for controlling those devices. We can add USB hubs if we run out of ports, but that increases the wiring complexity and may reduce the reliability of the system.
  • A PANOPTES scope has (typically) two Canon cameras, each with a manual focus lens and no filter wheel. We don’t plan for such a system to grow to many cameras, so we use a single USB hub in the camera box, connected to the two cameras and to an Arduino.

An advantage of having a single (small) computer per scope is that the wiring becomes more consistent. Each telescope needs power for the 4 devices (camera, focuser, filter wheel and computer), and needs a network connection to the primary computer… which could actually be Wi-Fi for reduced cabling. Then each computer needs cables to connect it to the 3 devices it controls. The software on the per-telescope computer can also be simpler. When it is enumerating the available devices, it doesn’t have to figure out which camera is associated with which focuser because there is only one of each.

We’ve seen that complexity even on the two camera PANOPTES systems: gphoto2 can sometimes get confused about which camera it is talking to, so even in this scenario it could be useful to introduce a camera per device (e.g. a Raspberry Pi Zero W).

My primary goal with introducing more messaging between components of PANOPTES is to make each part of the system simpler, and for the overall system to be more robust. For example, we’ve sometimes had the problem that communication with one device locks up, but as a result the whole process (pocs_shell) has effectively stopped working. If instead we have separate processes (workers and supervisors, ala one of the Erlang programming models), then the workers can each have a single, well defined job (e.g. communicating with an Arduino or controlling a single camera), and the supervisor(s) can have the job of making sure that the workers that they watch over are making suitable progress. For example, if a camera worker process is requested to take a picture for 30 seconds, and two minutes later hasn’t finished, we can assume something is wrong and kill the worker.

Such processes can have a variety of methods of communicating, one of which is to have a messaging system, either point to point, multicast or broadcast. ZeroMQ supports all of these. gRPC (suggested by @wtgee) doesn’t support those directly, but there are pub-sub systems that use gRPC to communicate between publisher and broker, and broker and subscriber.


#12

POCS will still be the primary control system (i.e. the “brains”) and will be responsible for sending any commands to hardware, whether via zeromq or grpc. Any PAWS interaction would be with a running instance of POCS.

This project is really only talking about the interaction between POCS and the hardware components and is not necessarily about communicating with POCS from PAWS (either via a traditional website or a mobile app).


Scheduler Updates
#13

@wtgee, thanks for clarifying that. I should have mentioned the PAWS and POCS communication in my response.


#14

@wtgee, @james.synge, Thank you sirs for answering my questions. Now I get a clearer picture of the project!


#15

So basically, the devices can independently interact with the hardware once gRPC is implemented. As of now, I have a DSLR, I’ll try and use gPhoto2 to interact with my DSLR. I’ll be updating results soon! Also, I will watch through some tutorials and guides for gRPC.
Thanks for the guidance @james.synge @wtgee


#16

What type of services are desirable for cameras(let’s suppose) ?


#17

OK, this is just background information, but I feel that I should clarify that the motivation for the distributed (multi-computer) version of POCS for the Huntsman Telescope was not wiring complexity. At the numbers of devices (cameras, focusers, filtewheels, etc.) involved in Huntsman that issue was manageable. It was in fact all to do with reliability.

While in theory it is possible to run a very large number of USB devices on a single computer the practical limits are actually much lower, and hard to predict. This is due to resource limitations in both the USB controller hardware and OS of the computer, and the varying amounts of resources (e.g. endpoints) consumed by each USB device. By trial and error we determined that we could only operate 4 cameras systems at a time, whereas the full system was intended to have 10. If we tried to attach more than 4 camera systems some of the USB devices would simply not be detected by the OS, and wouldn’t work at all.

We then discovered that even with only 4 cameras the system was not reliable. During periods when all the cameras were taking images in quick succession (e.g. focusing) we would get occasional image readout errors from the cameras, which required a restart of the whole system to recover from. Reducing the number of cameras from 4 to 3, and then to 2, reduced the frequency of these errors but did not eliminate them. We concluded that the root cause was probably USB bus congestion. We did some stress testing experiments running a single camera system on a Raspberry Pi and found that the errors completely disappeared. At that point we decided that we needed to switch to a distributed control system as soon as possible, and Pyro4 appeared to be the simplest way to do that.

The implementation for Huntsman has the majority of POCS running on a central control computer, with instances of the Camera class & its subcomponents (e.g. Focuser, Filterwheel) running on separate the Raspberry Pi’s which are attached to the camera hardware. The Pyro4 based wrappers enable the central POCS Observatory object to remotely call the methods of the Camera objects, e.g. Camera.take_exposure(). This implementation works, but is currently awkward to use, doesn’t deliver all the benefits of a distributed system, and only implements the Observatory-Camera interface.

While PANOPTES does not have the same issue of too many USB devices a move towards a distributed RPC based design still offers a numbers of advantages in terms of reliability and robustness, as @wtgee & @james.synge note. For example, when separate components are controlled by separate processes then it becomes much easier to work around or recover from errors by disabling or restarting only the affected part, and this can be done automatically.

A good initial step here would be to use grpc to set up a camera interface in a separate process to the rest of POCS, with remote procedure calls triggering the actions of the camera (e.g. taking an image). This can be done with either the POCS camera simulator or an actual DSLR camera, and should make use of the existing POCS Camera class interface (no need to interface with gPhoto2 directly, POCS already has code for doing that). It also wouldn’t matter if this is done on one computer, or across several. For grpc it’s all the same.


#18

Thank you for the clarification @ahorton sir.

So should we go on with using gRPC or ZeroMq? I had been looking into ZeroMq lately and understood that it is developed with distributed messaging in mind. Since ZeroMq is already used in the project for sending the sensor data to PAWS, won’t it be apt to use the same for the RPCs? Or will gRPC be a better choice?


#19

I’m deferring to @wtgee’s judgement on the choice of grpc over ZeroMQ (or Pyro4). The interfaces between POCS and hardware components do seem best suited to a remote procedure call based API, though, and grpc is specifically design for RPC. My understanding is that ZeroMQ, on the other hand, is a general purpose distributed messaging system and while it would be possible to build an RPC system from ZeroMQ it would involve more ‘reinventing the wheel’ that using a dedicated RPC library.


#20

@wtgee @ahorton @james.synge hello!
As of now i have made a basic api through grpc.
Grpc seems to be one of the right choice to implement the idea.
However as just suggested by @ahorton i came across zerorpc and it also shows similar properties as grpc. So a bit ambiguous right now about what to choose