Exploring artificial intelligence possibilities in late 2021 led me through multiple hoops already. There are things that work already, there are things that can be optimized for a better performance and then there are things that do not appear to work at all.
One of such things is installing caffe. It is not your average daily coffee, and not even a hashtag #cofe you might find floating around social media. No, this CAFFE is Convolutional Architecture for Fast Feature Embedding. The current definition on the home page states the following:
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
So it is a community maintained open-source project and quite well accepted by the related world, as it brings some unique advantages to deep learning, not necessarily present in the other contenders.
AUR and caffe
The official GitHub repository is BVLC/caffe but at the time of writing, the actual last code commit to the master branch was #99bd9979 on 21 Aug 2018.
The caffe from this repository is available in aur/caffe and aur/caffe-git. Their PKGBUILDs have absolutely no difference. Trust me, checked thoroughly. You can do the diff yourself by comparing PKGBUILDs of caffe and caffe-git yourself (links are to snapshots from the time of writing).
The only difference there is the source commit, obviously, as is the norm
with AUR. In case you are not familiar with the conventions there, it goes
like this: caffe
package from AUR is tied to a specific release. At the
time of writing, it is
1.0 as the source blob is
released along with the actual release. On the other hand, caffe-git
(or
generally any package ending with -git
for that matter) uses the latest
commit from the default branch, generally the master
. And there is a 136
commits difference, affecting 100 files, as can be seen in this
diff,
with snapshot for the time of writing, again.
There are a lot of other caffe versions in the AUR, but I omit them as they all target some specific GPU hardware, like nVidia CUDA. I will focus only on CPU version of caffe. For me, the git version builds and works nicely, but the release version does not. The error that halts the build process for the release caffe ends with the following:
CXX tools/extract_features.cpp
CXX/LD -o .build_release/tools/extract_features.bin
/usr/bin/ld: .build_release/tools/extract_features.o: in function `int feature_extraction_pipeline<float>(int, char**)':
extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0x37c): undefined reference to `caffe::Net<float>::CopyTrainedLayersFrom(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
collect2: error: ld returned 1 exit status
make: *** [Makefile:638: .build_release/tools/extract_features.bin] Error 1
make: Leaving directory '/home/peterbabic/.cache/yay/caffe/src/caffe-1.0'
After a lot of experimenting (read below) this error went away eventually and I was also able to build this release version testing backward, so I probably got some OpenCV dependencies exactly right over time, but I cannot pinpoint what has changed, although it could be helpful. Anyway, this is basically the least feature-rich version of caffe from all the explored ones, so it is not such a big deal.
Stirring caffe with a fork
As I said moments ago, the official caffe version does not really appear to
be maintained anymore. But this does not mean the actual need for
improvements disappeared. There is a significant development happening in
the fork of caffe in the ssd
branch of repository
weiliu89/caffe. And SSD here
does not stand for the storage at all. Instead, SSD here stands for Single
Shot Detector, or more specifically Single Shot MultiBox Detector. SSD was
developed to reduce the computation resources needed, so the model could
run on embedded devices, such as autonomous vehicles. The SSD's primary
author, Wei Liu, started the fork during his Google internship and it looks
like it made a dent in the world, too.
I wanted to use the SSD enabled caffe fork for some experimenting, but whatever I tried, I couldn't find a way to make it build. The errors were looking like this:
In file included from /usr/include/c++/11.1.0/ext/string_conversions.h:41,
from /usr/include/c++/11.1.0/bits/basic_string.h:6594,
from /usr/include/c++/11.1.0/string:55,
from ./include/caffe/util/hdf5.hpp:4,
from src/caffe/util/hdf5.cpp:1:
/usr/include/c++/11.1.0/cstdlib:75:15: fatal error: stdlib.h: No such file or directory hpp:
75 | #include_next <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
make: *** [Makefile:580: .build_release/src/caffe/util/hdf5.o] Error 1
The above error is seemingly related to the build process -isystem
versus
just -I
parameter. Here are some relevant references can be found
here,
here and subsequently
here, then
here,
here and even
here,
but the list might go on for long.
With the above problem solved, other one showed up:
src/caffe/util/im_transforms.cpp:2:10: fatal error: opencv2/highgui/highgui.hpp: No such file or directory
2 | #include <opencv2/highgui/highgui.hpp>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:580: .build_release/src/caffe/util/im_transforms.o] Error 1
And the tightly related one as well, just truncated:
src/caffe/util/io.cpp:13:10: fatal error: opencv2/core/core.hpp: No such file or directory
This was further solved with tweaking INCLUDE_DIRS
as hinted
here,
here
and here. I think at
this point I was able to build caffe with make
.
A note on Cmake and OpenCV 4
However, before I was able to do away with make, for which a working
PKGBUILDs are readily available in AUR, as discussed above, I was able to
make a community supported Cmake
build run. But not before I got over
problems like:
CAP_PROP_POS_FRAMES was not declared in this scope.
More details can be found here. There were some patching needed first, explained here, here, then here and very briefly here.
I know that Cmake
build and OpenCV 4 are not coupled, but I did not mark
the exact error messages to put them here for a reference. The patches
applied simply work for make
and Cmake
builds alike and Arch has OpenCV
4 in repositories for quite some time already, so I put them here together.
Even though I was able to build Cmake
sooner, the official make
build
is much neatly organized and compiles everything caffe offers including
documentation and all of that is being utilized in the PKGBUILD, whereas
Cmake
just builds the binaries so I decided not to utilize Cmake
path
further.
A note on atlas-lapack and NumPy
There is a convoluted world of mathematical libraries bearing names BLAS, Atlas, lapack, lapacke, OpenBLAS and Intel MKL. The features they offer do overlap to some degree, and in many Linux distributions the user can decide which implementation to use. This situation appears not to be too great on Arch, however.
For instance, caffe expects the Atlas implementation by default, but getting there on Arch is very hard, or maybe even impossible at this point, as I could not get it to work at all. Atlas is only available from AUR as atlas-lapack package and is a total pain to get it installed. Not to mention that the build itself took the better part of the day to finish on my machine, once I was even able to start it. I do not recommend going down this path at all!
The reason is that other Python packages cease to work with this implementation:
ImportError: libopenblas.so.3: cannot open shared object file: No such file or directory
I am not too sure the exact relation to NumPy because I was at this step
even before I made Cmake
build work, but I believe there was some
connection. Other error I encountered was:
Could NOT find NumPy (missing: NUMPY_INCLUDE_DIR NUMPY_VERSION) (Required is at least version "1.7.1")
The problems I've had during this phase are discussed here and here here. The conclusion is to avoid using Atlas implementation, as OpenBLAS is proven to work. Note that I did not experiment with Intel MKL at all yet, but It is the third contender in this area supported by caffe.
A note on draw_net.py
Here we are getting to the core of this not too interesting story. The
reason I wanted the SSD version of caffe to work was actually a related
script shipped alongside with it, called draw_net.py
used to visualize
the caffe model specified in the .prototxt
file. Visualizing it like this
makes it easier to understand what is going on inside the model. This
script is available in the vanilla caffe as well, but when applied on the
SSD model like Mobilenet-SSD
it terminates with the error:
google.protobuf.text_format.ParseError: 1177:3 : Message type "caffe.LayerParameter" has no field named "permute_param".
The solution to this is obviously extend the parameters the draw_net.py
script is able to process with the parameters used in the SSD model in the
first place, thus installing the SSD branch of the caffe. This turned out
to be exponentially more complicated than I previously thought (took me
almost a week). However, even after a successful build of the SSD branch,
the draw_net.py
still shown an error:
AttributeError: 'google.protobuf.pyext._message.RepeatedScalarConta' object has no attribute '_values'
The solution is to patch the draw.py
source file, as described
here.
Now I was finally able to fully visualize the model. What a ride.
The package caffe-ssd
is now available in
AUR.