# Tensorflow

To know if a human is present on a frame we use the Tensorflow object detection API with a model.

This is very different and more powerful from some other alarm system that uses traditional image processing and/or motion sensors (usually using Passive infrared sensor). With this solution we do not have basic noise that could trigger the alarm:

  • your pet moving.
  • huge variation of light.
  • if you monitor your courtyard: birds and all sorts of animals.
  • ...

TIP

The system monitores 4 courtyards and 3 homes since January 2020 without any false positive.

# The model and data provenance

We use a pre-trained model hosted/given by the Tensorflow team. You can download it here. (opens new window).

Sadly the documentation for this model is gone. Here are some useful informations:

The output model has the following inputs & outputs:

One input:
  image: a float32 tensor of shape[1, height, width, 3] containing the
  *normalized* input image.
  NOTE: See the `preprocess` function defined in the feature extractor class
  in the object_detection/models directory.

Four Outputs:
  detection_boxes: a float32 tensor of shape [1, num_boxes, 4] with box
  locations
  detection_classes: a float32 tensor of shape [1, num_boxes]
  with class indices
  detection_scores: a float32 tensor of shape [1, num_boxes]
  with class scores
  num_boxes: a float32 tensor of size 1 containing the number of detected boxes

TIP

The goal would be to train the model only on labels that we actually use: human and maybe a few more in the future. But I don't have the hardware requirements. Still, it works like a charm and does not even burn your RaspberryPi!

TIP

We tried others SSD models but the results look the same (performance/accuracy).

# Performance considerations

When we google "object detection with raspberry pi's" we find a lot of articles talking about "live" processing with the maximum fps that the device can process.

It might be useful for some applications to process frames at a high fps rate, but it is not our case! In real life flash does exist and people take seconds, minutes to move. So processing at 20-30 fps is not better than processing at 1-2 fps. This is why the system analyze frames (with the object detection api and thus the heavy machine learning model) at 1 frame per seconds which is sufficient. Thanks to this, we can analyze frames from multiple devices and run the whole core application from only one recommended raspberry pi! This is truly amazing because it makes the hardware setup affordable!

Source

One user has 3 dumb cameras (rpi zero) and one smart-camera processed by one raspberry pi 4/4gb ram, and it works like a charm!

# Where is the code?

You can find all related code in the folder object_detection of the smart-camera. This folder is totally independant from the rest of the software.

To change the model: Go to download_model.py, change the URL and the hash. Be sure outputs of the model are the same! Expected inputs/outputs are documented above.

WARNING

If you don't change the hash, then the system will keep the old model because the hash corresponds to the local file.