Trade Ideas Blog

Simple Effective Shell Scripting

Apr 9, 2016

Shell scripting can be wonderful.  Microsoft is in the news for adding Linux shell scripting to Windows 10.  In honor of that, let’s take a look at Linux shell scripts in action.

Windows 10 now includes the Linux shell prompt.


I’m not talking about those huge, completely unreadable scripts that we’ve all run across.  I’m talking about that list of instructions written in English that you eventually turned into a shell script.  You got tired of doing the same boring thing over and over, but you have to pay attention to make sure you do it right.  Of course, if you have multiple servers, and they should be identical, these shell scripts are essential.


Let’s take a specific example.  This script takes care of copying files from the build directory, and restarting the server after the new stuff is in place.  We have several variations on this theme.  This particular one is for our micro services proxy.


#!/bin/tcsh


mv -f micro_proxy micro_proxy.1
cp micro_proxy_devel micro_proxy


mv init.tcl init.tcl.1
cp init.tcl_devel init.tcl


echo “Verifying that init.tcl is current.”
cvs -n update -A ../../source/micro_proxy/init.tcl


# Check the config file.  If it fails the first part will print a message.
# Otherwise, kill the running one and let the loop in ./start handle it.
# If there was nothing running, print an error.
./micro_proxy -i test_config=init.tcl && killall micro_proxy


It’s not that long.  The whole thing is listed above. Now let’s look at it in details.


#!/bin/tcsh
Start by running the tcsh shell.  There are plenty of arguments over which shell is the best.  I mostly use tcsh because I’ve got some history with it and I know it best.


mv -f micro_proxy micro_proxy.1
This may be the single most important step.  Even if you’re not using a shell script, you should still do this.  micro_proxy is the name of my executable.  micro_proxy.1 is the last version that’s been running for a while.  If something goes wrong on a production server I need to have the previous version handy so I can roll back ASAP.  I don’t want to look at CVS or a debugger.  After rolling back I can explore the problem in detail.


cp micro_proxy_devel micro_proxy
Here’s the obvious part.  Copy the executable file.  micro_proxy_devel is a symbolic link to the source/build/test directory.  A different shell script created the symbolic links, log directory, and other things that can’t be stored in CVS.  I ran that once, right after I used CVS to create this directory.


mv init.tcl init.tcl.1
cp init.tcl_devel init.tcl
Do the same thing for our config file.  We often update the config file and the executable at the same time.  Even if only one changed, it doesn’t hurt to copy them both.


echo “Verifying that init.tcl is current.”
cvs -n update -A ../../source/micro_proxy/init.tcl
This is a relatively new addition.  This has worked so well I copied it to several other upgrade scripts.


The config file should be taken from CVS.  It’s easy to forget to update from CVS.  In some cases a config file is a shared resource, so it might not be in the same directory as the rest of the source files.  If you don’t copy the right files to production that always causes confusion.


Notice the -n flag.  That means to report what CVS would normally report, but don’t actually make any changes.  I type “cvs -n update” or “cvs -nq update” all the time to get status.  This gives you a report similar to what most CVS GUIs will give you.  In short, it says nothing if the file is already up to date, but prints a message if the file needs to be updated.  Unfortunately CVS doesn’t offer this information using a return code, so the user has to react if he sees a message.


Notice the -A flag.  Normally if you tell CVS to grab a specific version of a file, that’s sticky.  Every time you call cvs update again, it assumes you want to keep the same version.  cvs update -A means you want the latest version.  Of course, in this script it’s just a warning.  If you did a sticky update from CVS, this will remind you.  You can decide on your own whether or not to ignore it.


Think about what that last paragraph is saying.  One day I wanted an old config file so I used CVS to find it.  I totally forgot I did that.  (Or maybe one of my colleagues did it!)  Later I wanted to upgrade to the latest build.  Every time I told it to upgrade, the upgrade script worked without any warnings or errors.  I couldn’t figure out why this one production server was failing while the test server and the other production servers worked.  I really got burned.  As I said, copying tested code to a production server should not be exciting!  Eventually I found the problem.  And I added these two lines to the script so that problem would never burn me again.


# Check the config file.  If it fails the first part will print a message.
# Otherwise, kill the running one and let the loop in ./start handle it.
# If there was nothing running, print an error.
./micro_proxy -i test_config=init.tcl && killall micro_proxy
Next we verify that the config file is valid.  The program always tests the config file as soon as it starts.  If there’s a serious error it will print something to the standard output and immediately exit.  That’s great in test, but what about production?  In production I want to know if there’s a problem with the new server before I stop the old server.


That last test is unique to this server.  Most of the config files are simple enough that they don’t need this.


After the test you see &&.  That’s the only conditional in this script.  (Notice the word “simple” in the title of this article.)  If the test fails, && will not run the second command.  If the test succeeds, then we restart the server.


We use killall to restart the server.  That’s a common trick.  Most of our production servers run in a loop.  There’s another shell script that starts running as soon as the machine reboots.  That does some one time setup, then starts the server.  If the server dies for any reason, the script will record that in a log file, sleep for about a second, then run the script again.  That loop is helpful for a lot of reasons.  Restarting the server becomes trivial.


This script covers the common case.  I almost always use it to do an upgrade.  But there are so many options.  For example, what if I manually change the config file in place, rather than copying a tested version from somewhere else?  I know at a bare minimum I want to run the syntax check on the config file before I restart.  I might not remember that off the top of my head.  What do I do?  I look at the script to see what it normally does.  It’s like a checklist.  I skip most of it.  I copy and paste the last line directly to the command prompt.

This shell script serves as documentation and a checklist, in addition to being a runnable script.  This doesn’t just save time.  This helps me avoid making the same mistake twice.  Shell scripting is good.  Life is good.