Saturday, February 24, 2018

Building a Reclaimed Kubernetes Cluster

Thanks to family hand-me-downs, over the passing years I have become a repository of unwanted laptops. Some of them barely boot anymore, but three-quarters of them have two cores and 2 GB of RAM or more. One actually had four cores and 8 GB of RAM... making it a veritable workhorse. I could make them disposable workstations, but instead I wove them together and created a personal Kubernetes cluster.

The base OS for the nodes is Ubuntu's latest LTS release. Rather than using Conjure on MaaS to set up the cluster (which required an isolated network for bootp and DNS and meh), I leveraged Kubespray's flurry of Ansible scripts to prep an inventory of machines over SSH. This ended up being surprisingly low impact and worked perfectly for the use case of building a test lab with piecemeal hardware.

Laptops work just fine as server nodes with a few tweaks:
  • Even if you use the server distribution of Ubuntu, laptop events such as closing the lid will still result in a suspend/hibernate/resume action. Edit /etc/systemd/logind.conf to make sure the laptop keeps running when closed:
    sudo vi /etc/systemd/logind.conf
    sudo service systemd-logind restart
  • The display will remain on once you start ignoring LidSwitch events - run a script at startup to turn the display off and save energy.
  • Even if you are running in console mode, NVIDIA Optimus laptops will go nuts and seemingly run the discrete and on-chip GPUs nonstop, overheating the machine. Install Ubuntu's Bumblebee packages to prevent this:
    sudo apt-get install bumblebee bumblebee-nvidia primus linux-headers-generic
  • As with all Kubernetes nodes, disable swap by commenting out the partition in /etc/fstab. Since you will no longer need to resume from hibernate mode on the laptop, it can be safely disabled

Once you have the laptops prepped and the latest updates applied, you will need to make sure each node has a copy of python-netaddr installed: sudo apt-get install python-netaddr Ansible issues its commands over SSH, so ensure you have keyfile-based authentication set up from the machine you will be running Kubespray on to each of the nodes. If you don't already have an SSH key generated (for example, if you will run Kubespray on the master node), then you can generate a passwordless one via ssh-keygen. After that, copy the public key to each node with:

ssh-copy-id node1
ssh-copy-id node2

After that, the machine you are running Kubespray on will need Ansible installed. I ran Kubespray on the master node to keep things simple - so on that Ubuntu box I issued:

sudo apt-add-repository ppa:ansible/ansible
sudo apt-get update
sudo apt-get install ansible
git clone
cp -rfp inventory/sample inventory/mycluster

This will:
  1. Install Ansible on the box
  2. Download the Ansible scripts from Kubespray
  3. Creates a new Ansible inventory called "mycluster" that is a clone of the Kubespray sample
An important thing to remember is that you address nodes by straight IP address - not by hostname. This is especially important with Ansible scripts because the node's hostname may well change as part of the installation process. If your nodes are fetching their IP address via a DHCP server, ensure the DHCP server has static IP allocations for your nodes.

Once you have all the IP addresses for your nodes, set them in your inventory file. An easy way to do this at the command line is:

declare -a IPS=(
CONFIG_FILE=inventory/mycluster/hosts.ini python3 contrib/inventory_builder/ ${IPS[@]}

Verify the inventory is correct by cracking open inventory/mycluster/hosts.ini - if you want to change hostnames, now is the time.

I would recommend having Kubespray build a kubectl configuration file automagically for you. To have this generated as an artifact, change inventory/mycluster/group_vars/k8s-cluster.yml to have the following entry set: kubeconfig_localhost: true After these tweaks you should be ready to launch Kubespray's Ansible playbook. Note that Ubuntu's convention is to have you operate as a normal user and sudo all of your commands, so you will need to use Ansible's --become parameter:

ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml --ask-become-pass --become

At this point Kubespray will try its best to get a cluster up and running on the nodes specified in your inventory file. At the very end Kubespray will provide you with a kubectl configuration file in artifacts/admin.conf, which you can then copy or merge into another workstation's ~/.kube/config file.
Once you have the Kubernets configuration file set on your workstation, you can use it to fetch an authtoken to get into the Kubernetes Dashboard. The proper way to do this is to generate a new system secret that has the appropriate permissions to interrogate the running cluster... but the lazy way is to just steal the token used by Kubernetes' namespace controller.

I'm lazy, so first I list all the secretes in the kube-system namespace:

kubectl -n kube-system get secrets

And then fetch the token for the namespace controller:

kubectl -n kube-system describe secret namespace-controller-token-???

So that I can use it to login to the web dashboard:

kubectl proxy &

Now you should have a working cluster you can mess with!

So that the laptops were properly ventilated, I placed each vertically into a metal document sorter from an office supply store. This gives me a nifty vertical rack for the laptops that has plenty of air circulation and allows me to route cables out of the way.

I've constructed one weird frakencluster - but it works!

Tuesday, January 23, 2018

Your Documents Under the Magnifying Glass

A few years ago I moved my household administrivia to a paperless system. Instead of stacking file folders deep with bills and statements, everything would be scanned & shredded. This greatly helped with storage space - but in a couple of years I ended up with a network drive filled with over 3,000 PDFs, images and documents. Bear in mind the majority of these are scanned documents - so the contents are images instead of machine-readable text. Everything was dumped into a single directory and files were named based on the timestamp of when they were scanned, taking hours to organize documents into folders and sub-folders.

Instead of burning hours sorting documents I started burning hours building a simple set of applications that would read document metadata, attempt to convert the images to text, group documents by common letterhead and then provide a simple search interface over all of it. Since optical character recognition is hit-and-miss, any full-text search should permit proximate indexing and searching to allow for fuzzy matches.

In the end I created two apps: DocMag and DocIndex. DocMag serves as the search front-end and allows users to perform full-text searches on scanned documents, label them with tags and automagically group other documents with the same letterhead or logo. The interface is pretty spartan and uses Spring Boot to build a straightforward integration into Elasticsearch. DocIndex is the batch process that crawls a filesystem and parses the documents using OCR, generates thumbnails, tags similar documents using computer vision-based template matching, and stores document metadata within Elasticsearch.

DocMag was created in Groovy using Spring Boot (Spring Web, Spring Data, etc). I did this mainly to understand how Spring Boot's conventions translated over to the Groovy world... it had been quite a while since I had worked with Grails. It turns out that Groovy, Spring Boot and Thymeleaf complemented each other quite well and make for fairly simple web development.

DocIndex was created with Spring Boot and Java 9 initially. I griped in an earlier post about my problems with Java 9's dependency management, so instead I fell back to the lambda expressions and work queue management within Java 8. This permits multithreaded parsing of discovered files, which then allows for vertically scaling document indexing by adding cores. Horizontal scaling should be possible by replacing the in-memory work queue with a proper shared message broker. There is a "reminder" issue I've already filed to migrate to a proper broker so this can be done sometime in the future.

Both DocMag and DocIndex are deployed as containers within DockerHub. This was especially necessary with DocIndex, as it relied heavily on native libraries for Tesseract OCR and OpenCV. OpenCV was the most contentious - each Linux distribution has a different version of OpenCV, and the version changes quite rapidly. Building containers for distribution allowed me to ensure users got the correct version of native libraries that worked well with their Java bindings.

Another nice feature of the containerized deployment model was composition - I was able to pair the correct revision of Elasticsearch, conditionally include Kibana, and provide a simple web application firewall by placing DocMag behind modsecurity and Apache. Network connections could be maintained between Elasticsearch, modsecurity, and DocMag without any of these interconnects leaking to the "outside" world, allowing me to do things such as only expose modsecurity to outside traffic and only permitting DocMag to receive requests through modsecurity. Elasticsearch could be hidden as well, only available on the internal network managed by Docker Compose.

Deployment can be relatively straightforward; since everything is deployed to Docker Hub as a container, one should just need to download the docker-compose.yml file and issue export DOCUMENT_HOST_DIR=/mnt/documents && docker-compose up -d. This should provision a single-node Elasticsearch instance, start DocMag behind modsecurity, and begin indexing with DocIndex.

If you are stuck digging through mountains of scanned documents, give DocMag a try. Ease of installation is one of its primary goals - so let me know if you find any issues getting it running!

Wednesday, December 20, 2017

Java Jigsaw Puzzles DevOps

Oh man that's a catchy blog title.

For the past couple o' weeks, my after-hours project has been trying out building webapps and batch jobs using the combo of Java 9, Spring Boot 2 milestone releases, Elasticsearch 6.1 and Docker Edge with Docker Compose. Just because I was in a WAF frame of mind I added modsecurity as a web application firewall in front of the app so I could learn a bit more about building WAF rules with Apache 2.

It was a fun lil' exercise, but in the end I found that all the cutting edge releases simply wouldn't play nicely with each other.

One painful exercise was trying to get Java 9 distributions to work within a Docker container just as it would within my desktop environment. Project Jigsaw is an oft-cited future feature of Java that build engineers have been asking for to end the myriad of JavaEE / Java ME / Java Desktop / Java Server distributions. It should help containerization by allowing svelte JRE installations to bootstrap within a minimal OS. However... this new way of distributing JREs with modular components creates yet another dependency management headache for builds. Once you begin writing manifest elements for Jigsaw + Java 9, every library and its mother now needs to be managed by your manifest as well. Its enough to drive you nuts.

Let's say you don't want to jump into building modular JARs yet and just build traditional JARs that don't use Jigsaw dependency management. Well... Ubuntu's OpenJRE 9 distribution doesn't automatically inject some Java 9 foundation libraries (such as javax.image), while Oracle's JDK does. If you use an Oracle JDK locally to develop things may appear just fine, but then you need to perform some command-line overrides for things to work on an OpenJRE 9 build. To make things more hairy, it seems that OpenJDK and Oracle have built implementations that might be runtime compatible but are NOT compatible from a build & deployment standpoint. Command-line arguments are vastly different, even though manifest formats are the same. That makes building standard build & deployment scripts a pain, as well as local testing. Distributing Oracle's JRE within a container is just to fraught for me to attempt - so I stick to distribution with OpenJDK instead.

I ended up burning too much time trying to get a consistent build between my streamlined Ubuntu-powered Docker container and my local MacOS development environment, so I punted back to Java 8. While Java 9 had some nice memory management features and some syntactic sugar, what I really needed was Lambda and Stream support. Java 8 was sufficient for this in both Oracle and OpenJDK-land.

The combo of Spring Boot 2 (milestone 7) and Elasticsearch 6.1.0 was another mix that simply didn't pan out. The Java libraries for Elasticsearch 6 had a few signature changes across the API which were entirely incompatible with Spring Data Elasticsearch, and the protocol between ES 5 and 6 did not appear to be compatible. I'm sure this will get patched up in short order within the Spring project, however until then I had to fall back to Elasticsearch 5.6.4. I wanted to stick with Spring Boot conventions as closely as possible, so I did not go native just for ES 6 support.

In the end... I do have a fully containerized solution using Spring Boot 2, Java 8, Elasticsearch 5.6.4, and modsecurity. Getting WAF protection, a single-node ES cluster, a web front-end and a indexing batch process running in the background all happens with:

export DOCUMENT_HOST_DIR=/mnt/documents && docker-compose up -d

...and that's it! Containers are also available at Docker Hub and require thankfully LITTLE dependency management.

Monday, May 15, 2017

Climate Change By The Dollar

One of my lil' neurosis is ensuring that I reduce my energy usage year over year. To make sure I'm following a downward trend, I've been trending the dollar cost for energy and water bills. Assuming that cost per unit does not go down year over year (which so far has been true), this should be a reflection of overall energy use.

Note that the large hills on the graph spurred on by heating bills (both water and central air) are shrinking each year. Air conditioning during the summer months is showing small increases. Over the past three years I have also replaced all light fixtures with LED lighting - which does help drop the constant spend month over month.

It is interesting that while both winters and summers are getting warmer, heating the house expends much more energy than cooling the house, providing an overall downward trend. Water use is also beginning to spike due to the lawn irrigation system, which is why I created the Sprinkler Switch project to only water when no rain has occurred recently or is forecast to occur that day.

This is an indirect measure of how our climate is changing, and only represents a four year sample size. The trends are still quite visible - and demonstrate how evolutions in home heating could significantly reduce energy consumption.

Saturday, February 04, 2017

Alarm Clock Hacking by Blocks

A little over two years ago I built an alarm clock intended for hacking by kids, using a web-based Python IDE. When I tested the lessons, I found that kids didn't like messing with Python and only learned enough to get things barely working. Yet, when it came to Scratch Jr or the desktop version of Scratch, they would spend hours at a time. I needed to find a more approachable way to code.

Recently I discovered Blockly, a product from Google for Education. With that framework you can code by blocks and use its transcoder to output JavaScript, Python, Lua, Dart or (ugh) PHP. The transcoder runs entirely client-side, and the output is human-readable - well indented and even commented.

Writing custom blocks turned out to be an easy thing, so I created blocks to modify the LED display, send audio out to a speaker, or react to button presses. Now you can use blocks to program the clock, while retaining all the functionality present in the older Python interface.

If I was going to redo the Hack Clock, this time I wanted to have a presentable site with full hardware and software lessons, for both Python and Blockly. I revamped the Hack Clock website, completed the Python lessons that I left incomplete last time, wrote new Blockly lessons for the new IDE, and completely re-did the hardware how-tos. Lesson writing took up the lion's share of time, since they all needed new images and better testing.

Another bit o' feedback I had received was that installing the Hack Clock software was too much of a pain. I tried to make this a bit easier this time by offering releases within a Debian pkg, although you still needed to use apt to install dependencies. Still, this cuts down installation from over an hour to about ten minutes... and most of those ten minutes is spent twiddling your thumbs while you want for packages to download and install.

The hardware needed tweaking as well. It turns out the Raspberry Pi headphone jack is just a PWM pin hack and it seemed that GStreamer sometimes just couldn't grok it. The headphone jack was never a complete solution either - it required a discrete amplifier to power speakers, and soldering wires onto a 1/8" jack is a GIGANTIC pain. To make the audio hardware easier to cope with, I moved away from the headphone jack to Adafruit's I2S decoder and amplifier. It provided better audio and cleaner installation without increasing my part count or price. It has proven out to be easier for everyone so far.

The old Hack Clock had another embarrassing flaw: it could only handle one button input and couldn't manage output at all. That drove me nuts and was probably the second biggest thing I wanted to fix. With the latest release the Hack Clock can handle as many buttons as you have GPIO pins, and you can also drive output pins as "switches" in code. The code-by-blocks IDE could deal with buttons and switches as simple function blocks - which meant reacting to user input became much easier to code.

Once things were ready, I installed the Hack Clock software in a mission-critical environment: kids' rooms. So far things have gone well; audio has been more reliable than with the headphone jack, and they have been able to tweak the software more easily than with Python. One bit I noticed this round however: kids don't like looking down to read something, then looking back to code it. The next generation Hack Clock should have an interactive demo to guide through the lessons so they never have to glance away from the IDE.

I'd love to hear what other people experience when they try to get the Hack Clock running as well. A hardware list is posted on Hackaday, and all the instructions are at Let me know what you think!

Thursday, December 15, 2016

Arcade Addiction

Ah, who can forget playing Pac-Man at the Pizza Hut. Or Joust waiting for a pizza at Noble Roman's. Or DigDug at Pizza King. Come to think of it... I ate a lot of pizza as a kid.

Fast forward to Christmas of 2014 - I purchased a arcade cocktail cabinet from Rec Room Masters. After it was assembled in Ikea-like fashion I mounted an old monitor, discarded 2.1 speakers and an Raspberry Pi 3 inside of the chassis. Nifty.

One oddity was that I didn't want to shell out all the cash for every single button in a panel... so I needed a cap for each remaining hole. Luckily I had access to a 3D printer, so was able to remix a hole cap on Thingiverse and print black caps to fill the gaps.

I wanted the Raspberry Pi to sit a bit out of the way, so I screwed it into the VESA mount that the monitor rested on. After sawing an Adafruit perma-protoboard in half I was able to craft some custom headers that allow ribbon cables to connect from the Raspberry Pi and join with header posts for the joystick pins and buttons. This allowed for much better cable management and room for the subwoofer & speakers underneath.

I wasn't interested in installing a coin door on one side - so I kept it wide open and instead had the cabinet door facing the center of the room. Little had I expected that cats would LOVE climbing in the open gap of the cabinet... and ripping cables off my Pi. I shoved the open side against the wall - allowing the extension cord to conveniently poke through - and now the only access is through the swinging door on the opposite side.

On the software side, joysticks and buttons are mapped through mk_arcade_joystick_rpi, an archaically named but amazingly useful module that allows GPIO pins to become joystick inputs recognized by Linux. It took some work in order to have libretro recognize these buttons; many of them had to be remapped. However, libretro quickly became my go-to MAME emulator and now supports controls on both sides of the cocktail cabinet.

I had to perform some slight modifications to the RetroPie display setup to rotate the screen 90˚, but luckily so many cocktail cabinet titles were programmed for this 4:3 aspect ratio. Titles are working flawlessly now, and I can host two-player action by flipping a few emulated dip switches.

One thing I found interesting was how MAME distributions were entirely dependent on the exact name of your zip file. In addition, each ZIP was a true manifestation of the on-board arcade ROMs - in that sometimes a US distribution or second edition game actually piggybacked on top of a previous ROM. In this same way, two ZIPs were sometimes required to run a single title: one for the older ROM, one for the later version. I ended up combining the two ZIP archives into a single one - in this way older ROM images were still injected as a dependency, while the ZIP name was that of the older title and was still executed correctly.

Pizza night at the household now takes on an entirely new meeting. A few slices and a frosty beverage helps me appreciate Ms. Pac-Man in a whole new light.

Wednesday, October 07, 2015

Raspberry Pi Finally Conquers Userland

Raspberry Pi developers have had quite a coup on their hands this past few weeks. The "official" Raspberry Pi Linux distribution Raspian was just upgraded to Debian 8, or "Jessie." This provides a huge number of wins - the 4.1 release of the Linux kernel, latest glibc and build chain updates, more native packages (like Node.JS and wiringPi), and device trees. Oh, sweet device trees.

While the current Raspian distribution still relies on wiringPi 2.24, the most recent 2.29 version has a much nicer way of addressing GPIO in userspace by exposing the GPIO ports in /dev/gpiomem. All too often Raspberry Pi developers run GPIO apps as root to access the array of general purpose I/O pins, however this leads to all the lovely security holes and vulnerabilities that privileged access brings. You never want Apache or Python or any user-created apps running as root - so instead you must find a way to export these ports and allow unprivileged users to access them. Traditionally this has been done using wiringPi's export utility, however the latest gpiomem exposure seems to be much cleaner.

With Jessie I've been able to significantly cut the complexity of installing Garage Security and Sprinkler Switch. I don't need to manually install wiringPi, Node.JS, Video4Linux and a number of other packages. Things seem to largely "just work" as one might expect of a modern distro. One example is that Motion has been updated and appears to be pre-packaged on Raspian, and the necessary Video4Linux bcm2835-v4l2 kernel module properly creates a /dev/video0 device. CPU utilization appears to be much lower with the current stack, and it appears that I can just tweak Motion's configs to save videos in an HTML 5-friendly way rather than transcoding them with a script.

Garage Security and Sprinker Switch are being updated now for Jessie and testing is underway... the new Jessie builds are looking very promising so far.