When I'm diving into a new codebase, I always start from the project structure analysis. And my favorite tool is tree. However, not every project is perfectly balanced. Some files and folders tend to be more popular and contain much more code than others. Seems like yet another incarnation of the Pareto principle.

So, when the tree's capabilities aren't enough, I jump to cloc. This tool is much more powerful and can show nice textual statistics for the number of code lines and programming languages used per the whole project or per each file individually.

However, some projects are really huge and some lovely visualization would be truly helpful! And here the FlameGraph goes! What if we feed the cloc's output for the Kubernetes codebase to FlameGraph? Thanks to the author of this article for the original cloc-to-flamegraph one-liner:

git clone https://github.com/brendangregg/FlameGraph
go get -d github.com/kubernetes/kubernetes

cd $(go env GOPATH)/src/github.com/kubernetes/kubernetes

cloc --csv-delimiter="$(printf '\t')" --by-file --quiet --csv . | \
    sed '1,2d' | \
    cut -f 2,5 | \
    sed 's/\//;/g' | \
    ~/FlameGraph/flamegraph.pl \
        --width=3600 \
        --height=32 \
        --fontsize=8 \
        --countname=lines \
        --nametype=package \
    > kubernetes.html

open kubernetes.html

What I learned about the Kubernetes codebase in 5 minutes:

  • It's about 4 000 000 lines of (mostly Go) code, excluding comments and blank lines.
  • Nearly half of these lines are coming from the vendor folder.
  • 5 vendor packages (golang.org/x, google.golang.org/api/compute, vmware, azure, aws) constitute up to 60% of the vendor folder.
  • Around a quarter of the source code is coming from the staging folder. This folder is a staging area for packages that have been split into their own repository. Seems like there is a plan to make Kubernetes main repo thinner.
  • Around one-third of the staging folder belongs to various APIs (admission, authorization, autoscaling, core, etc).
  • kubectl mostly lives in the staging folder, i.e. it has been moved to a separate repo almost entirely.
  • kubelet and kube-proxy (still?) live in pkg.
  • kubeadm is the most significant tenant of the cmd folder even though it's only 1.2% of the source code.

Give it a chance and see what interesting insights you'll find!


Similar articles