Friday 19 August 2016

Docker 1.12 and Crate (Redux)

So I managed to solve the little problem I was having the other day with getting a Crate cluster running properly under Docker Engine 1.12. Thanks to a tip-off I got on that previous blog post I have been able to solve the problem. But not without creating one more...

Sunday 14 August 2016

Docker 1.12 and Crate

I've been very busy at work recently.... Seattle for DockerCon, San Francisco to meet with partners/clients and then New York to give a presentation to a meetup group hosted by Packet. Not to mention the stuff I've been working on back here in Berlin. One of the things I have been working on, when I have a spare moment, is updating my knowledge on Docker since the release of Docker Engine 1.12. Of course, Crate.io is producing official documentation. Before that comes out, however, here is what I have been playing with.

Saturday 14 May 2016

Two Years Of Akonadi: 2014, 2015

Not so long ago, I blogged about KDEPIM and how contributions varied between 2014 and 2015. One of the comments on that blog post mentioned Akonadi. Now Akonadi has its own git repo; actually, it has about 14. So I decided to take a very quick glance into that, too.

Friday 22 April 2016

Speeding Up Crate By Doing Nothing

Since I started working at Crate.IO I have, of course, had to spend some time getting to learn the technology. For me, overall Crate story is very compelling: distributed SQL at NoSQL-scale. How the technology clusters and distributes load is also very cool. In order to learn a little about the technology for myself I have been working on a specific "game"... What can I tweak in order to maximise the input rate? Here's what I found out so far...

Sunday 17 April 2016

Two Years Of KDEPIM: 2014, 2015

It's been a while since I delved into KDE. With me no longer working at KDAB, my contributions are really restricted to helping out in KDE e.V. and, even then, normally only when someone specifically ask for me. KDE, of course, continues to be a community I care about very much. I wanted to do a very basic health check of my old stomping ground: KDEPIM.

Thursday 31 March 2016

"...then you win"? Some Thoughts On Bash For Windows

If you have not seen it yet, take four minutes out of your life to watch the announcement that Bash is going to be natively supported on Windows 10. During the video you're probably going to notice some confused-looking folk in the audience. Heck, if I was at the Build Conference I'd be confused, too.

In fact, when I first heard this story yesterday, it was such a non-event in my life that I chortled a little and thought nothing more of it. I don't even remember who it was who told me. I just remember laughing and then going about my business... Because I assumed this was a joke. The I realise that this is real. Now I've had time to think on it this, clearly, is not a joke. But it is a little weird.

Sunday 20 March 2016

Crate.IO: Anatomy Of A New Career

The eagle-eyed of you will have noticed that I have recently started a new career at Crate.IO. After 6 years in side the KDAB family (including Kolab Systems, KDAB UK and KDAB Germany) this came as a surprise to many people. KDAB gave me a lot of freedom to achieve some long-lasting successes as well as some failures for me to learn from. I will always be grateful. But it was time to move on. Here is how I ended up where I did.

What Do You Want?

1. Find an excellent team.

Having left KDAB I wanted to take the time to evaluate what I was trying to achieve with my career. I was never much of a programmer, but I like to think I have an excellent overview of software engineering as a whole. Whilst I did not want to be hacking (and no employer should want that, either) I did want to be kicking down doors. I want to kick down the doors that the other engineers did not even realise were in their way. This type of servant leadership is hugely enjoyable when you are working with the right team...

2. Product company, not consulting.

My career has been split between product and consulting work. Most recently, at KDAB, I was part of an incredible team on consultants. Those guys are solving problems with C++/Qt/OpenGL that most people would be outright scared of. The problem with consulting, though, is the lack of ownership. KDAB's engineers would achieve glorious results but not get any glory. Not only that, but often they would be working on multiple problems for multiple clients and not get the chance to feel a sense of real ownership. All software engineering is a team sport, both consulting and product work. But there is something awesome about being part of a team with a long-term product vision and ownership.

3. Free Software angle.

I have always worked in or around Free Software. This is something hugely important to me. Free Software was the subject of my PhD and has been at the heart of every job I've had. There is a certain sense of "doing it for a greater good" which really adds some good vibes to the work environment.

4. In Berlin.

I love Berlin. I might tongue-in-cheek write about the brokenness every now and then but, fundamentally, this is my kinda town. I've been here for almost two years, but I still feel like I'm new in town. In the grand scheme of things, I am. My time here is nowhere near done and so any new job had to keep me here. If this meant regular travel, so be it. Long commute? For the right job, perhaps. But with a large tech startup scene in Berlin this seemed unnecessary.

Joining Crate.IO

I write recently about my experience of attending the Crate.IO Snow Sprint. This gave me a great opportunity to meet the Crate team, learn about the technology and eat some phenomenal Leberkäse.

The Team

Needless to say, all of my boxes were ticked. I was impressed by the leadership team. The CEO and CTO were both very open about the company. This was not a recruitment sales pitch. Instead I was given the warts-and-all story of where the company and technology were at and where I might fit in to help. They are also generally cool guys with great humour under the pressure of leading a startup. Sadly, I did not get too much time to spend with the two team leaders during the sprint. What time I did spent with them gave me the impression that we all saw the world in very similar ways and that we were going to work well together. Everyone else was very welcoming of an outsider, had plenty of time to spare for me to help me understand the tech and generally made me feel like part of the team. Which I wasn't. As a nice touch, after deciding to evaluate how I might fit in, I even had the opportunity to speak with the newest member of the board, Ari Helgason. It was very nice to hear the investor's voice. And to hear that voice was genuinely excited about the technology...

The Technology

...we definitely need to talk about the technology. If I was going to dedicate myself to a product company, the product had to be something epic. Something I cared about as least as much as Kolab. Crate is a departure from what I have done in the past but shares one crucial aspect that I care about most... solving a real problem. At this time I am not going into too much detail about Crate and what it does. What I will say is this: as a database, Crate helps to solve some seriously complex problems in the Big Data space, yet it is so easy to deploy that, at the Snow Sprint, I had a three-node cluster running on my laptop, with a simple schema, importing data, within minutes of me first sitting down to play with it. SQL with scaling, resilience, containerisation and such ease of use... it's a pretty compelling story.

Looking Forward

In future posts I will write about what it is I am precisely doing at Crate.IO. For now there is a lot of hard work ahead of me and I am definitely going to enjoy the ride. I have already started to submit presentations on Crate to various conferences, so I hope to see you soon and we can talk about what I'm up to.
I recently moved this blog away from a self-hosted Wordpress (just too much hassle to keep maintained). The good news is that this new career change will present me the opportunity to go back to some of the metrics-related content I used to work on before. So, if you have been following my nonsense for some time, you (hopefully) will be pleased to know that more oversized data visualisations are on their way.

Saturday 19 March 2016

Rush Hour Berlin: A Public Transport Survival Guide

Berliners love to complain about the public transport; it's in their blood. Within the city there are two major players: the BVG, who operates the buses, trams and UBahn; Deutsche Bahn, who operates the SBahn. The advice I am dropping here is, I hope, an invaluable survival guide.

Waiting

You can always tell the most experienced users of public transport because they strategically place themselves in the correct position to get straight onto the vehicle they are waiting for. When the vehicle arrives don't make too much effort to get out of the way of people getting off. Bonus points for standing front-and-centre of the door and blocking their way. If you see some idiots actually waiting at the side to let people off, don't wait or them. Ultra-bonus points can be earned for barging straight past them as soon as you can, more if people are still trying to get off.

Getting On

Now this is most important. Simply getting onto the public transport is so easy to get wrong. As you board your tram/train/whatever you should look around to see if there is some space for you to stand or sit in. Then (and this is most important) just stand there. Don't bother to move. Simply huddle by the door with all the other commuters. You get bonus points for blatantly ignoring space further into the vehicle and ultra-bonus points for blocking the entrance of another passenger behind you, who is also trying to get on.

Getting Off

It is rush hour so, inevitably, you vehicle is going to be busy. Don't let that bother you. When you arrive at your exit there is a simple procedure for disembarking. Firstly, don't bother to wait. What a complete waste of your time that is! Instead, start pushing your way to the front immediately. Don't bother to check if the person in front is also trying to get off. If someone is blocking your way, first you should check to see if they are even capable of moving for you. Yes? No? It does not really matter, squeeze past them and squash them into a uncomfortable position regardless.

Payment

This is one aspect of the public transport that most people fail to understand at first. In theory, you are supposed to pay. Many people actually get annual passes from their work or school. If you're one of the other folk that are supposed to pay, don't bother. Occasionally you might encounter someone with an ID and a tricorder who grunts at you. They probably want to see the ticket you do not have. No problem. Assume your finest non-German accent and in your native tongue (the further away from German, the better) say "Good morrow, sir. By happenstance I have just had my pocketbook purloined. Can you assist me with directions to the constabulary?"

Tarif Zones

Super-simple. Berlin basically only has two zones: Zones A + B and (confusingly) Zone C. Almost any ticket covers you for both Zones A + B. Zone C is special. Why? Because it is home to Schönefeld Airport and the mythical Berlin Brandenburg Airport. Both are side-by-side and are at the first stop in Zone C. That's right, you need the extra expensive ticket to go one stop when arriving/departing to the airport. Because: fuck you tourists. If you already have a valid ticket for Zones A + B (Why? Did you not read the previous section of this blog post?) then all you need is to buy an Anschlussfahrausweis. In the one stop between the airport and the freedom of Zones A + B you are almost guaranteed to have this magical little ticket checked. The correct procedure is to insert the ticket into the mouth of the person with the tricorder.

Tuesday 9 February 2016

Book: The Nature of Software Development

OK, so I still owe you further review of Agile Estimating and Planning. I've got plenty other books I want to talk about, too. At FOSDEM I picked up these two gems: User Story Mapping by Jeff Patton and The Nature of Software Development by Ron Jeffries. This blog post is about the latter, which I have just finished reading.

Reading To Learn

Every year I pick up one or two books at the O'Reilly stand at FOSDEM and this year was no different. Having heard a lot about it, I picked up a copy of "User Story Mapping". I've not gotten around to reading this yet, but I'm looking forward to it. I also picked up a copy of "The Nature of Software Development", a book I did not know I needed until I had a copy in my hands.

Whilst I am a huge fan of Scrum, I appreciate that being agile is more important than following any particular agile process. Ron Jeffries sees this as the "natural" way of software development and I tend to agree. Whenever I talk to people about Scrum/agility one of the things I always struggle with is describing "why". Describing "mechanical Scrum" (the raw details of the process) is normally not sufficient for getting the most out of the team. A decent agile coach really needs their team to understand "why", too.

For certain activities I have good "why" stories. Ron's book, gave me all the "why's" for everything else. And that is what makes this book great.

Books With Pictures

So I need to come right out with it: I love books with pictures. My imagination is a little lazy like that. This book has a picture on every page to help illustrate the concepts being discussed and that is marvellous.

The book is separated into two parts. In the first part Ron shares his thoughts on specific aspects of "value" and how to deliver it, including understanding just what value is and understanding it should be delivered feature at a time. The second half of the book is a collection of essays that help to expand upon topics covered briefly in the first half: how to refactor systematically in agile projects, how to scale agility etc etc.

The second part of the book has some interesting insight. But, for me, it is the first part that really continued all the gems... all the "why" information that I had been looking for. Take, for example, "why bother estimating?".

Example Of "Why?": Estimations.

When I trained to become a Scrum Master (and throughout my career) I was trained to value estimations. We estimate all sorts of things all the time in the development process. We need to know how long something is going to take. Or (more likely) how much effort is involved. Or how much it will cost.

Why?

Ron is a proponent of the NoEstimates movement. Having read this book and understood the topic better, so now am I. If we keep user stories uniformly (and predictably) small and we are always focusing on delivering the most high value user stories immediately, why should we care about estimations? Estimations are hard to the point we (should always) accept that they are wrong. So is there anything really to be gained from them? Read this book to find out.

Before reading this book I never would have dared to try NoEstimation development. Now I understand the topic better, I'd happily give it a try to see what the team felt about it. Perhaps estimation will become another "choose to do it or not" feature of the process, just like the Daily Scrum.

In Conclusion

The writing is clear and simple. The structure (background issues followed by in-depth essays) flows well and encourages you to keep reading: I managed to get through the book in only two sittings and only a couple of hours.

Much of the content should be familiar to anyone with experience of following agile processes. It should definitely be comfy ground for anyone with Scrum Master/Product Owner experience. The insight and experience of Ron on the topics covered really shows in the text: a coach's coach.

Want to hone your agile coaching skillz? Read this book.

Tuesday 26 January 2016

Crate Snow Sprint: Day 1 (Stashing Git Metadata)

[If you haven't done so already, take a quick look at what I wrote yesterday about the Crate Snow Sprint].

So yesterday was the first "real" day of the Snow Sprint. I used the time to start implementing a very rudimentary metric processing tool. To recap: I want to build a service-oriented system with arbitrarily scalable components. In the cloud. (Bingo!). Whilst this is largely a demonstration of Crate and an opportunity for me to learn about that technology, it is a serious project that I would like to see grow into something meaningful.

So here I write about the nonsense code I wrote yesterday and what I have managed to achieve in prototyping my intended solution.

Bitten By Python

Many moons have passed since I was last paid to program in Python. The language has changed in some crucial ways since then, but not so much that it is hard to update your thinking from Python 2. I always find Python a real joy to work with, so updating my own thinking really was not a chore.

As well as defamiliarising with Python, my aim for yesterday was to get my head around the basics of certain technologies I was planning to use as part of this project, namely Flask and SQLAlchemy.

What Am I Building?

If you have yet to go back and read yesterday's blog post, now is a good time. Yesterday I envisaged a system with 4 distinct component services, each independently scalable:

  • Crate: Used as storage for Git metadata (who committed, what, when and in which project) and metric results (which metric gave what result, for which project and at what time). Depending on the number of projects and metrics we could potential get close to "big data" territory. The reason for opting for Crate is the scaling: a system like this is going to be inserting/retrieving data concurrently in great volume. The clustering of Crate will really help with this.
  • Metric Services: Invoked according to how often they need to be run (typically hourly, daily or weekly). These services will grab metadata from Crate in order to run a metric. Simple metrics could be implemented in Python or, for more computational metrics, implemented in C with Python bindings. These services are not constantly processing. They should only be run if a user of the system queries for metric result data that is not in the results cache.
  • Git Services: Invoked according to how timely the data for any given project must be. These services clone the project repo, run git log, parse the output and then stash the metadata of each commit in Crate before killing the clone.
  • Frontend Services: These provide REST API and web for manipulating the whole setup.

Well... The Crate part of this is easy enough. Crate exists and works. And is extremely simple to deploy (read: "plays extremely well with Docker"). Yesterday my aim was to create a pipeline for getting metadata from Git into Crate.

A Few Things To Show Off

Firstly, you can go and grab my code on Github: https://github.com/therealpadams/metre

What you will find in there:

  • create_commits.py: A script for invoking git-log, parsing the output and then calling the REST API for inserting the metadata.
  • models.py: Contains all the classes for mapping in SQLAlchemy as well as crating the table. In this case just one table, for storing commit metadata.
  • pipelines.py: Provides the functions called by the REST API. At this point I have simple functions for inserting commits and closing the transaction.
  • requirements.txt: Should be familiar to anyone who has worked with Python virtualenv and pip... contains a list of the dependencies to be installed.
  • urls.py: Provides the REST API using Flask. At this moment it provides one simple function for receiving log metadata and stashing it in Crate.

Right now, you can run the urls script and fire in data from the create_commits script and it will fail. I will fix this all before the end of the day. If all goes well, but the end of the day there will be a simple metric (commits per day?) script in the repo, too.

This is far far far from great code. At the moment my aim is simply to learn the technologies I am working with and have a play with the overall pipeline for the data. After this playground edition is complete, I will stash it somewhere by itself in the repo and go about hacking "the real thing". Although this is not likely to happen after FOSDEM.

Colophon:

Another header photo taken at the Crate Snow Sprint. Again, thanks to Crate.io for sponsoring my trip to Austria.

Monday 25 January 2016

Crate Snow Sprint: Day 0 (I Need Help)

Now for something a little different: Crate. Thanks to the kind sponsorship of Crate.io, I am attending the annual Snow Sprint. This is an event that has been in existence for many years (certainly over a decade); originally it was a get-together for Zope/Plone developers. Those of you with very long memories might remember that I used to be part of the Plone community and even worked for Zope Europe Association at one point. The Plone Snow Sprint is still a "thing". But, with former Zope/Plone developers involved in the company/community, the Crate Snow Sprint is also a "thing".

Getting To Know Crate

Crate is a high-performance, distributed database. Very easy to deploy (think: "the database storage for Docker") and designed to help manage subsets of very large datasets (Clusters of 100s of nodes? Why not!? Petabytes of data? Hells yeah! Hundreds of billions of rows? Come get some.)

From the marketing blurb:

Crate has been designed to be a highly distributed high performing database. Before Crate organizations had to compromise on performance if they wanted to keep the ease of use benefits of using SQL stores, or move to a No-SQL store and deal with the complexities of the query languages and rewriting their code. With Crate you get the best of both worlds: the No-SQL performance you require and the SQL syntax you want. Crate can be used for a variety of use cases, from a classic but scalable SQL database, to advanced usage incorporating full text search, geo shape and analytics support.

Big Data, of course, means we are not talking "classic" SQL. No foreign key support, for example; these just do not scale very well. Instead, all related data should just get fired into the same table. What you lose in space efficiency you gain in speed. Lots of speed.

Why Do You Care About Crate, Paul?

The Big Data space is full of incredible technology and is evolving fast. With the growth in Big Data we have seen a growth in technologies to enable Big Data; containers are a huge part of this. If you are not clued-in on containers, go take a look at what Wikipedia has to say on it. Until now containerisation (that's a word, right?) has really been focused on application deployment. Crate is the primary contender as the storage for containers, allowing extremely easy deployments in order to arbitrarily grow the clusters. As I said, rolling a 100s of nodes in a Crate deployment is really not hard to do.

So why do I care? Who here remembers the SQO-OSS project? The premise of this project was simple: create a system to measure the quality of a piece of Free Software, where "quality" was defined by any arbitrary collection of metrics by the user. The system maintained clones of SVN repositories and regularly ensured they were up-to-date. Metric scripts would then be run against these repositories and an SQL store would keep the results which could then be viewed through a web client.

This project, like almost every EC-funded project I have ever worked on, was a successful failure. We built exactly what we intended to build and helped develop knowledge on software quality (metrics), for which we had an extraordinary number of publications to prove it. I say "failure", however, because the architecture of the system we built was heavily tied to the hardware purchased for the project (i.e. one seriously fat sparc server from Sun). Arbitrary scaling of the datastore, processing and front-end? Never taken into consideration. Ironically, the backbone to the tool we developed was Equinox which could have enabled such an architecture. We really did not make best use of that technology, however.

The... result.... was....... very........... slow.

Why do I care about Crate? Because I want to fix ^ that problem. In short: I want to create an arbitrary scaleable solution for metric processing, result caching and visualisation in a timely fashion.

So What Do You Have In Mind, Paul?

I want to create a solution that is arbitrarily scaleable to the needs of those using it. To that end I envisage four discrete components that can be scaled according to need:

  • Front-end nodes using Python + Flask.

REST and web front-ends for data input, retrieval and visualisation. A typical deployment will not need many of these, I guess. However, a public deployment of the system for a popular Free Software project might well need extra oomph here. In the case of data retrieval, data will be grabbed from the results cache or, if the result is not available yet, a metric processing node will be triggered.

  • Data entry nodes.

These would simply be responsible for gathering the metadata from (probably just git, at first) repositories and entering the data in the (potentially very large, but hardly "Big Data") table in the data storage. Python + SQLAlchemy here. These would need to be scaled up with the number of projects being analysed.

  • Metric processing nodes.

These will be scripts that process data from the metadata store and stash the result in the results cache. Mostly envisage Python for basic metrics or C/C++ libraries for anything heavyweight, dispy for process distribution.

  • Storage nodes.

Err... Crate.

Where Am I At?

Well, it is a long time since I did any "real" Python programming. Or any programming for that matter. So I'm starting from the very beginning and getting my head around the technology stack that I am envisioning.

As part of my work at the Snow Sprint, yesterday, I started work on a basic code for data entry using Python and SQLAlchemy; which works very nicely with Crate. The fruits of my labour can be found on Github (promise not to laugh!). By the end of today (Day 1) I hope to have a nice pipeline: git repo -> metadata -> data entry node -> crate. Then tomorrow I will implement a basic metric (daily commit count?) and my work for the Snow Sprint will be done. At the moment I have no documentation tucked away in there; I will sort this out sometime after the sprint.

For me the next step will be to really nail down the basic architecture, ensuring ease of deployment and scaling. Once I have my head around that, building of the actual v0.1 system will not be too hard, I think. At least not for very simple metrics.

Call For Help

At the moment the system I am building is nothing more than an interesting demo for Crate (it is really showcasing simplicity rather than, say, scalability, since this will never be a real BigData application). However, as someone who cares about software/community/developer metrics I would love to see this mini-project turn into something real. If you have an interested in helping me develop this into a real tool, something that developers could really benefit from, I would be very happy to hear from you and get this thing moving. Not picked a license yet, because I want to engage the whole project team in that discussion, if I manage to grow one! :) But, most definitely, Free Software.

If you have an interest in metrics and would like to talk to me about this project, feel free to reach me by any means sensible (all my social media/email links are at the top of the page). Alternatively, if you are there, feel free to grab me for a chat at either the FLOSS Community Metrics Meeting or FOSDEM later this week.

Colophon:

The header image on this page is a photo I took during the setup of the snow sprint. As you might imagine, it is not enough (or sensible) for us to make use of the router provided in the chalet. Lots of cabling everywhere for our own network, of course! I will post more photos later.