Anyone who works in a field related to computer software, ends up running a bunch of CLI (command line interface) commands pretty often. The longer you stay in the field, the larger is your command set. With new systems being invented everyday, the complexity of commands being run is also increasing. Day by day, I am finding myself running more and more complex commands. That's when I decided I need something to manage these commands for me. Shell history is great, but it is not portable and reliable. So, I have built pin, a portable commands organizer. Pin is available for use at https://github.com/harshadjs/pin. In this post, I'll go over some of the key design choices I made while building this tool.

Portability and the Ease of First Use

I often run complex commands on resource constrained or ill configured environments which don't retain any state (ephemeral VMs) or minimalstic chrooted environments - such as a custom kernel running with minimal debian image with no network configuration under qemu virtualization. This often means installing a collection of libraries and dependencies is not feasible / convenient. Therefore, the primary goals I had in my mind while designing pin were portability and the ease of first use.

So, given the above constraints, how does pin provide ease of first use and portability?

Pin is just one shell script that relies on very basic shell programs (such as base64 or sha1sum) for doing most of its work. Therefore, pin can be run in almost any environment where you have shell access; and given that you are trying to remember your shell commands, you … well - do have shell access.

This makes pin installation a very simple process. Just use wget or curl to download pin script and you are all set to use it! For environments where internet connectivity is not available (often inside qemu while testing custom kernel), you can still paste the compressed and base64 encoded version of the pin script, which gets decompressed and decoded by simple shell commands allowing you to use pin.

Self Identifying Objects

Another design choice that I made was to store commands as files in the following format which are named as the SHA1 hash of the command (CMD) itself:

CMD="<base64 encoding of "find . -type f -iname *.[ch] | etags -">"
DESC="Generate emacs tags"
TYPE=cmd

This implies that there's inherent dedpulication of all the commands stored in pin. Also, self-identification of the object prevents accidental overwrites of command files.

Output Caching

Lastly, even though shell is very portable and requires the least number of dependencies in order to be used, it is definitely slower compared to say - a python script or a C/C++ compiled program. Since in the write path, pin mostly performs writes or updates to fixed set of files, the slowness of shell doesn't affect the write path much.

However, there's a slight impact on the read path. For example, the pin grep command which searches for occurence of a term in the database of commands (which are base64 encoded) can be slow if the number of commands is very high. In order to deal with this problem, pin implements output caching which allows it to significantly reduce read path latencies. pin stores output of grep queries in /tmp folder. There's some more work needed to maintain and perhaps configure the size of the cache, but in my experience, the number of such commands that I run in a reboot cycle has not resulted in any significant bloating of the cache.

Links to this note

jekyll  tech 
comments powered by Disqus