Installing caffe SSD on Arch

Exploring artificial intelligence possibilities in late 2021 led me through multiple hoops already. There are things that work already, there are things that can be optimized for a better performance and then there are things that do not appear to work at all.

One of such things is installing caffe. It is not your average daily coffee, and not even a hashtag #cofe you might find floating around social media. No, this CAFFE is Convolutional Architecture for Fast Feature Embedding. The current definition on the home page states the following:

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.

So it is a community maintained open-source project and quite well accepted by the related world, as it brings some unique advantages to deep learning, not necessarily present in the other contenders.

AUR and caffe

The official GitHub repository is BVLC/caffe but at the time of writing, the actual last code commit to the master branch was #99bd9979 on 21 Aug 2018.

The caffe from this repository is available in aur/caffe and aur/caffe-git. Their PKGBUILDs have absolutely no difference. Trust me, checked thoroughly. You can do the diff yourself by comparing PKGBUILDs of caffe and caffe-git yourself (links are to snapshots from the time of writing).

The only difference there is the source commit, obviously, as is the norm with AUR. In case you are not familiar with the conventions there, it goes like this: caffe package from AUR is tied to a specific release. At the time of writing, it is 1.0 as the source blob is released along with the actual release. On the other hand, caffe-git (or generally any package ending with -git for that matter) uses the latest commit from the default branch, generally the master. And there is a 136 commits difference, affecting 100 files, as can be seen in this diff, with snapshot for the time of writing, again.

There are a lot of other caffe versions in the AUR, but I omit them as they all target some specific GPU hardware, like nVidia CUDA. I will focus only on CPU version of caffe. For me, the git version builds and works nicely, but the release version does not. The error that halts the build process for the release caffe ends with the following:

CXX tools/extract_features.cpp
CXX/LD -o .build_release/tools/extract_features.bin
/usr/bin/ld: .build_release/tools/extract_features.o: in function `int feature_extraction_pipeline<float>(int, char**)':
extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0x37c): undefined reference to `caffe::Net<float>::CopyTrainedLayersFrom(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
collect2: error: ld returned 1 exit status
make: *** [Makefile:638: .build_release/tools/extract_features.bin] Error 1
make: Leaving directory '/home/peterbabic/.cache/yay/caffe/src/caffe-1.0'

After a lot of experimenting (read below) this error went away eventually and I was also able to build this release version testing backward, so I probably got some OpenCV dependencies exactly right over time, but I cannot pinpoint what has changed, although it could be helpful. Anyway, this is basically the least feature-rich version of caffe from all the explored ones, so it is not such a big deal.

Stirring caffe with a fork

As I said moments ago, the official caffe version does not really appear to be maintained anymore. But this does not mean the actual need for improvements disappeared. There is a significant development happening in the fork of caffe in the ssd branch of repository weiliu89/caffe. And SSD here does not stand for the storage at all. Instead, SSD here stands for Single Shot Detector, or more specifically Single Shot MultiBox Detector. SSD was developed to reduce the computation resources needed, so the model could run on embedded devices, such as autonomous vehicles. The SSD's primary author, Wei Liu, started the fork during his Google internship and it looks like it made a dent in the world, too.

I wanted to use the SSD enabled caffe fork for some experimenting, but whatever I tried, I couldn't find a way to make it build. The errors were looking like this:

In file included from /usr/include/c++/11.1.0/ext/string_conversions.h:41,
                 from /usr/include/c++/11.1.0/bits/basic_string.h:6594,
                 from /usr/include/c++/11.1.0/string:55,
                 from ./include/caffe/util/hdf5.hpp:4,
                 from src/caffe/util/hdf5.cpp:1:
/usr/include/c++/11.1.0/cstdlib:75:15: fatal error: stdlib.h: No such file or directory    hpp:
   75 | #include_next <stdlib.h>
      |               ^~~~~~~~~~
compilation terminated.
make: *** [Makefile:580: .build_release/src/caffe/util/hdf5.o] Error 1

The above error is seemingly related to the build process -isystem versus just -I parameter. Here are some relevant references can be found here, here and subsequently here, then here, here and even here, but the list might go on for long.

With the above problem solved, other one showed up:

src/caffe/util/im_transforms.cpp:2:10: fatal error: opencv2/highgui/highgui.hpp: No such file or directory
    2 | #include <opencv2/highgui/highgui.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:580: .build_release/src/caffe/util/im_transforms.o] Error 1

And the tightly related one as well, just truncated:

src/caffe/util/io.cpp:13:10: fatal error: opencv2/core/core.hpp: No such file or directory

This was further solved with tweaking INCLUDE_DIRS as hinted here, here and here. I think at this point I was able to build caffe with make.

A note on Cmake and OpenCV 4

However, before I was able to do away with make, for which a working PKGBUILDs are readily available in AUR, as discussed above, I was able to make a community supported Cmake build run. But not before I got over problems like:

CAP_PROP_POS_FRAMES was not declared in this scope.

More details can be found here. There were some patching needed first, explained here, here, then here and very briefly here.

I know that Cmake build and OpenCV 4 are not coupled, but I did not mark the exact error messages to put them here for a reference. The patches applied simply work for make and Cmake builds alike and Arch has OpenCV 4 in repositories for quite some time already, so I put them here together.

Even though I was able to build Cmake sooner, the official make build is much neatly organized and compiles everything caffe offers including documentation and all of that is being utilized in the PKGBUILD, whereas Cmake just builds the binaries so I decided not to utilize Cmake path further.

A note on atlas-lapack and NumPy

There is a convoluted world of mathematical libraries bearing names BLAS, Atlas, lapack, lapacke, OpenBLAS and Intel MKL. The features they offer do overlap to some degree, and in many Linux distributions the user can decide which implementation to use. This situation appears not to be too great on Arch, however.

For instance, caffe expects the Atlas implementation by default, but getting there on Arch is very hard, or maybe even impossible at this point, as I could not get it to work at all. Atlas is only available from AUR as atlas-lapack package and is a total pain to get it installed. Not to mention that the build itself took the better part of the day to finish on my machine, once I was even able to start it. I do not recommend going down this path at all!

The reason is that other Python packages cease to work with this implementation:

ImportError: libopenblas.so.3: cannot open shared object file: No such file or directory

I am not too sure the exact relation to NumPy because I was at this step even before I made Cmake build work, but I believe there was some connection. Other error I encountered was:

Could NOT find NumPy (missing:  NUMPY_INCLUDE_DIR NUMPY_VERSION) (Required is at least version "1.7.1")

The problems I've had during this phase are discussed here and here here. The conclusion is to avoid using Atlas implementation, as OpenBLAS is proven to work. Note that I did not experiment with Intel MKL at all yet, but It is the third contender in this area supported by caffe.

A note on draw_net.py

Here we are getting to the core of this not too interesting story. The reason I wanted the SSD version of caffe to work was actually a related script shipped alongside with it, called draw_net.py used to visualize the caffe model specified in the .prototxt file. Visualizing it like this makes it easier to understand what is going on inside the model. This script is available in the vanilla caffe as well, but when applied on the SSD model like Mobilenet-SSD it terminates with the error:

google.protobuf.text_format.ParseError: 1177:3 : Message type "caffe.LayerParameter" has no field named "permute_param".

The solution to this is obviously extend the parameters the draw_net.py script is able to process with the parameters used in the SSD model in the first place, thus installing the SSD branch of the caffe. This turned out to be exponentially more complicated than I previously thought (took me almost a week). However, even after a successful build of the SSD branch, the draw_net.py still shown an error:

AttributeError: 'google.protobuf.pyext._message.RepeatedScalarConta' object has no attribute '_values'

The solution is to patch the draw.py source file, as described here. Now I was finally able to fully visualize the model. What a ride.

The package caffe-ssd is now available in AUR.

AUR and caffe

Stirring caffe with a fork

A note on Cmake and OpenCV 4

A note on atlas-lapack and NumPy

A note on draw_net.py

Links