<shell-discuss@opensolaris.org>
.
Table of Contents
set
"$(...)
instead of `...`
command substitutions$(...)
or
$( ...;)
command substitution in quotesPATH
grep
/sed
/awk
/etc.
if you want to process lots of data with them--
$ export FOOBAR=val #
instead of
$ FOOBAR=val ; export FOOBAR #
$ ( mycmd ) #
) around places which use
set -- $(mycmd)
and/or shift
$ set -o nounset #
by defaulteval
unless absolutely necessary+=
source
instead of '.
'(dot)
to include other shell script fragments$"..."
instead of
gettext ... "..."
for strings that need to be localized for different localesset -o noglob
if you do not need to expand filesIFS=
to avoid problems with spaces in filenamesexit
codeshcomp -n scriptname.sh /dev/null
to check for common errorsif
, for
and while
'{'
+'}'
when using variable
names longer than one characterecho
" command for outputredirect
and not exec
to open files$ echo "foo" >xxx ; echo "bar" >>xxx ; echo "baz" >>xxx #
echo "$x" | command
-r
option of read
to read a lineprint -C varname
or print -v varname
gmacs
editor
mode when reading user input using the read
builtinprintf "%a"
when passing floating-point valuesLC_NUMERIC
when using floating-point constantsThis document describes the shell coding style used for all the SMF script changes integrated into (Open)Solaris.
All new SMF shell code should conform to this coding standard, which is intended to match our existing C coding standard.
When in doubt, think "what would be the C-Style equivalent ?" and "What does the POSIX (shell) standard say ?"
Similar to cstyle
, the basic format is that all
lines are indented by TABs or eight spaces, and continuation lines (which
in the shell end with "\") are indented by an equivalent number of TABs
and then an additional four spaces, e.g.
cp foo bar cp some_realllllllllllllllly_realllllllllllllly_long_path \ to_another_really_long_path
The encoding used for the shell scripts is either ASCII
or UTF-8
, alternative encodings are only allowed when the
application requires this.
Shell comments are preceded by the '#
' character. Place
single-line comments in the right-hand margin. Use an extra '#
'
above and below the comment in the case of multi-line comments:
cp foo bar # Copy foo to bar # # Modify the permissions on bar. We need to set them to root/sys # in order to match the package prototype. # chown root bar chgrp sys bar
The proper interpreter magic for your shell script should be one of these:
#!/bin/sh Standard Bourne shell script #!/bin/ksh -p Standard Korn shell 88 script. You should always write ksh scripts with -p so that ${ENV} (if set by the user) is not sourced into your script by the shell. #!/bin/ksh93 Standard Korn shell 93 script (-p is not needed since ${ENV} is only used for interactive shell sessions).
Harden your script against unexpected (user) input, including command line options, filenames with blanks (or other special characters) in the name, or file input
Use builtin commands if the shell provides them. For example ksh93s+
(ksh93, version 's+') delivered with Solaris (as defined by PSARC 2006/550)
supports the following builtins:
basename, cat, chgrp, chmod, chown, cmp, comm, cp, cut, date, dirname, expr, fds, fmt, fold, getconf, head, id, join, ln, logname, mkdir, mkfifo, mv, paste, pathchk, rev, rm, rmdir, stty, tail, tee, tty, uname, uniq, wc, sync
Those builtins can be enabled via $ builtin name_of_builtin #
in shell
scripts (note that ksh93 builtins implement exact POSIX behaviour - some
commands in Solaris /usr/bin/
directory implement pre-POSIX behaviour.
Add /usr/xpg6/bin/:/usr/xpg4/bin
before
/usr/bin/
in ${PATH}
to test whether your script works with
the XPG6/POSIX versions)
Use blocks and not subshells if possible, e.g. use
$ { print "foo" ; print "bar" ; }
instead of
$ (print "foo" ; print "bar") #
- blocks are
faster since they do not require to save the subshell context (ksh93) or
trigger a shell child process (Bourne shell, bash, ksh88 etc.)
use long options for "set
", for example instead of $ set -x #
use $ set -o xtrace #
to make the code more readable.
Use $(...)
instead of `...`
- `...`
is an obsolete construct in ksh+POSIX sh scripts and $(...)
.is a cleaner design,
requires no escaping rules, allows easy nesting etc.
${ ...;}
-style command substitutionsksh93 has support for an alternative version of command substitutions with the
syntax ${ ...;}
which do not run in a subshell.
Always put the result of $( ... )
or $( ...;)
in
quotes (e.g. foo="$( ... )"
or foo="$( ...;)"
) unless
there is a very good reason for not doing it
Scripts should always set their PATH
to make sure they do not use
alternative commands by accident (unless the value of PATH
is well-known
and guaranteed to be set by the caller)
Scripts should make sure that commands in optional packages are really there, e.g. add a "precheck" block in scipts to avoid later failure when doing the main job
Check how boolean values are used in your application.
For example:
mybool=0 # do something if [ $mybool -eq 1 ] ; then do_something_1 ; fi
could be rewritten like this:
mybool=false # (valid values are "true" or "false", pointing # to the builtin equivalents of /bin/true or /bin/false) # do something if ${mybool} ; then do_something_1 ; fi
or
integer mybool=0 # values are 0 or 1 # do something if (( mybool==1 )) ; then do_something_1 ; fi
Shell scripts operate on characters and not bytes. Some locales use multiple bytes (called "multibyte locales") to represent one character
ksh93 has support for binary variables which explicitly operate on bytes, not characters. This is the only allowed exception.
Think about whether your application has to handle file names or
variables in multibyte locales and make sure all commands used in your
script can handle such characters (e.g. lots of commands in Solaris's
/usr/bin/
are not able to handle such values - either use ksh93
builtin constructs (which are guaranteed to be multibyte-aware) or
commands from /usr/xpg4/bin/
and/or /usr/xpg6/bin
)
Only use external filters like grep
/sed
/awk
/etc.
if a significant amount of data is processed by the filter or if
benchmarking shows that the use of builtin commands is significantly slower
(otherwise the time and resources needed to start the filter are
far greater then the amount of data being processed,
creating a performance problem).
For example:
if [ "$(echo "$x" | egrep '.*foo.*')" != "" ] ; then do_something ; done
can be re-written using ksh93 builtin constructs, saving several
|fork()|+|exec()|
's:
if [[ "${x}" == ~(E).*foo.* ]] ; then do_something ; done
If the first operand of a command is a variable, use --
for any command that accepts this as end of argument to
avoid problems if the variable expands to a value starting with -
.
At least
print, /usr/bin/fgrep, /usr/bin/grep, /usr/bin/egrep
support --
as "end of arguments"-terminator.
Use $ export FOOBAR=val # instead of $ FOOBAR=val ; export FOOBAR #
-
this is much faster.
Use a subshell (e.g. $ ( mycmd ) #
) around places which use
set -- $(mycmd)
and/or shift
unless the variable
affected is either a local one or if it's guaranteed that this variable will no longer be used
(be careful for loadable functions, e.g. ksh/ksh93's autoload
!!!!)
Be careful with using TABS in script code, they are not portable between editors or platforms.
If you use ksh93 use $'\t'
to include TABs in sources, not the TAB character itself.
If you have multiple points where your application exits with an error message create a central function for this, e.g.
if [ -z "$tmpdir" ] ; then print -u2 "mktemp failed to produce output; aborting." exit 1 fi if [ ! -d $tmpdir ] ; then print -u2 "mktemp failed to create a directory; aborting." exit 1 fi
should be replaced with
function fatal_error { print -u2 "${progname}: $*" exit 1 } # do something (and save ARGV[0] to variable "progname") if [ -z "$tmpdir" ] ; then fatal_error "mktemp failed to produce output; aborting." fi if [ ! -d "$tmpdir" ] ; then fatal_error "mktemp failed to create a directory; aborting." fi
Think about using $ set -o nounset #
by default (or at least during the
script's development phase) to catch errors where variables are used
when they are not set (yet), e.g.
$(set -o nounset ; print ${foonotset})
/bin/ksh93: foonotset: parameter not set
Avoid using eval
unless absolutely necessary. Subtle things
can happen when a string is passed back through the shell
parser. You can use name references to avoid uses such as
eval $name="$value"
.
Use +=
instead of manually adding strings/array elements, e.g.
foo="" foo="${foo}a" foo="${foo}b" foo="${foo}c"
should be replaced with
foo="" foo+="a" foo+="b" foo+="c"
Use source
instead of '.
'
(dot) to include other shell script fragments - the new form is much
more readable than the tiny dot and a failure can be caught within the script.
Use $"..." instead of gettext ... "..."
for strings that need to be
localized for different locales. gettext
will require a
fork()+exec()
and
reads the whole catalog each time it's called, creating a huge overhead for localisation
(and the $"..."
is easier to use, e.g. you only have to put a
$
in front of the catalog and the string will be localised).
If you don't expect to expand files, you can do set -f
(set -o noglob
) as well. This way the need to use ""
is
greatly reduced.
Unless you want to do word splitting, put IFS=
at the beginning of a command. This way spaces in
file names won't be a problem. You can do
IFS='delims' read -r
line
to override IFS
just for the read
command. However,
you can't do this for the set
builtin.
Set the message locale (LC_MESSAGES
) if you process output of tools which may be localised
Example 1. Set LC_MESSAGES
when testing for specific outout of the /usr/bin/file
utility:
# set french as default message locale export LC_MESSAGES=fr_FR.UTF-8 ... # test whether the file "/tmp" has the filetype "directory" or not # we set LC_MESSAGES to "C" to ensure the returned message is in english if [[ "$(LC_MESSAGES=C file /tmp)" = *directory ]] ; then print "is a directory" fi
The environment variable LC_ALL
always
overrides any other LC_*
environment variables
(and LANG
, too),
including LC_MESSAGES
.
if there is the chance that LC_ALL
may be set
replace LC_MESSAGES
with LC_ALL
in the example above.
Cleanup after yourself. For example ksh/ksh93 have an EXIT
trap which
is very useful for this.
Note that the EXIT
trap is executed for a subshell and each subshell
level can run it's own EXIT
trap, for example
$(trap "print bam" EXIT ; (trap "print snap" EXIT ; print "foo"))
foo snap bam
Explicitly set the exit code of a script, otherwise the exit code from the last command executed will be used which may trigger problems if the value is unexpected.
Use functions to break up your code into smaller, logical blocks.
Do not use function names which are reserved keywords (or function names) in C/C++/JAVA or the POSIX shell standard (to avoid confusion and/or future changes/updates to the shell language).
It is highly recommended to use ksh style functions
(function foo { ... }
) instead
of Bourne-style functions (foo() { ... }
) if possible
(and local variables instead of spamming the global namespace).
The difference between old-style Bourne functions and ksh functions is one of the major differences between ksh88 and ksh93 - ksh88 allowed variables to be local for Bourne-style functions while ksh93 conforms to the POSIX standard and will use a function-local scope for variables declared in Bourne-style functions.
Example (note that "integer
" is an alias for "typeset -li
"):
# new style function with local variable $ ksh93 -c 'integer x=2 ; function foo { integer x=5 ; } ; print "x=$x" ; foo ; print "x=$x" ;' x=2 x=2 # old style function with an attempt to create a local variable $ ksh93 -c 'integer x=2 ; foo() { integer x=5 ; } ; print "x=$x" ; foo ; print "x=$x" ;' x=2 x=5
>usr/src/lib/libshell/common/COMPATIBILITY
says about this issue:
Functions, defined with name() with ksh-93 are compatible with the POSIX standard, not with ksh-88. No local variables are permitted, and there is no separate scope. Functions defined with the function name syntax, maintain compatibility. This also affects function traces.
(this issue also affects /usr/xpg4/bin/sh
in Solaris 10 because it is based on ksh88. This is a bug.).
Explicitly set the return code of a function - otherwise the exit code from the last command executed will be used which may trigger problems if the value is unexpected.
The only allowed exception is if a function uses the shell's errexit
mode to leave
a function, subshell or the script if a command returns a non-zero exit code.
To match cstyle
, the shell token equivalent to the C
"{
" should appear on the same line, separated by a
";
", as in:
if [ "$x" = "hello" ] ; then echo $x fi if [[ "$x" = "hello" ]] ; then print $x fi for i in 1 2 3; do echo $i done for ((i=0 ; i < 3 ; i++)); do print $i done while [ $# -gt 0 ]; do echo $1 shift done while (( $# > 0 )); do print $1 shift done
DO NOT use the test builtin. Sorry, executive decision.
In our Bourne shell, the test
built-in is the same as the "["
builtin (if you don't believe me, try "type test" or refer to usr/src/cmd/sh/msg.c
).
So please do not write:
if test $# -gt 0 ; then
instead use:
if [ $# -gt 0 ] ; then
Use "[[ expr ]]
" instead of "[ expr ]
" if possible
since it avoids going through the whole pattern expansion/etc. machinery and
adds additional operators not available in the Bourne shell, such as short-circuit
&&
and ||
.
Use "(( ... ))
" instead of "[ expr ]
"
or "[[ expr ]]
" expressions.
Example: Replace
i=5 # do something if [ $i -gt 5 ] ; then
with
i=5 # do something if (( i > 5 )) ; then
Use POSIX arithmetic expressions to test for exit/return codes of commands and functions. For example turn
if [ $? -gt 0 ] ; then
into
if (( $? > 0 )) ; then
Make sure that your shell has a "true
" builtin (like ksh93) when
executing endless loops like $ while true ; do do_something ; done #
-
otherwise each loop cycle runs a |fork()|+|exec()|
-cycle to run
/bin/true
It is permissible to use &&
and ||
to construct
shorthand for an "if
" statement in the case where the if statement has a
single consequent line:
[ $# -eq 0 ] && exit 0
instead of the longer:
if [ $# -eq 0 ]; then exit 0 fi
Recall that "if
" and "while
"
operate on the exit status of the statement
to be executed. In the shell, zero (0) means true and non-zero means false.
The exit status of the last command which was executed is available in the $?
variable. When using "if
" and "while
",
it is typically not necessary to use
$?
explicitly, as in:
grep foo /etc/passwd >/dev/null 2>&1 if [ $? -eq 0 ]; then echo "found" fi
Instead, you can more concisely write:
if grep foo /etc/passwd >/dev/null 2>&1; then echo "found" fi
Or, when appropriate:
grep foo /etc/passwd >/dev/null 2>&1 && echo "found"
Names of variables local to the current script which are not exported to the environment should be lowercase while variable names which are exported to the environment should be uppercase.
The only exception are global constants (=global readonly variables,
e.g. $ float -r M_PI=3.14159265358979323846 #
(taken from <math.h>))
which may be allowed to use uppercase names, too.
Uppercase variable names should be avoided because there is a good chance
of naming collisions with either special variable names used by the shell
(e.g. PWD
, SECONDS
etc.).
Do not use variable names which are reserved keywords in C/C++/JAVA or the POSIX shell standard (to avoid confusion and/or future changes/updates to the shell language).
The Korn Shell and the POSIX shell standard have many more reserved variable names than the original Bourne shell. All these reserved variable names are spelled uppercase.
Always use '{'
+'}'
when using
variable names longer than one character unless a simple variable name is
followed by a blank, /
, ;
, or $
character (to avoid problems with array,
compound variables or accidental misinterpretation by users/shell)
print "$foo=info"
should be rewritten to
print "${foo}=info"
Always put variables into quotes when handling filenames or user input, even if the values are hardcoded or the values appear to be fixed. Otherwise at least two things may go wrong:
a malicious user may be able to exploit a script's inner working to infect his/her own code
a script may (fatally) misbehave for unexpected input (e.g. file names with blanks and/or special symbols which are interpreted by the shell)
As alternative a script may set IFS='' ; set -o noglob
to turn off the
interpretation of any field seperators and the pattern globbing.
For example the following is very inefficient since it transforms the integer values to strings and back several times:
a=0 b=1 c=2 # more code if [ $a -lt 5 -o $b -gt c ] ; then do_something ; fi
This could be rewritten using ksh constructs:
integer a=0 integer b=1 integer c=2 # more code if (( a < 5 || b > c )) ; then do_something ; fi
Store lists in arrays or associative arrays - this is usually easier to manage.
For example:
x=" /etc/foo /etc/bar /etc/baz " echo $x
can be replaced with
typeset -a mylist mylist[0]="/etc/foo" mylist[1]="/etc/bar" mylist[2]="/etc/baz" print "${mylist[@]}"
or (ksh93-style append entries to a normal (non-associative) array)
typeset -a mylist mylist+=( "/etc/foo" ) mylist+=( "/etc/bar" ) mylist+=( "/etc/baz" ) print "${mylist[@]}"
Arrays may be expanded using two similar subscript operators, @ and *. These subscripts
differ only when the variable expansion appears within double quotes. If the variable expansion
is between double-quotes, "${mylist[*]}" expands to a single string with the value of each array
member separated by the first character of the IFS
variable, and "${mylist[@]}"
expands each element of name to a separate string.
Example 2. Difference between [@] and [*] when expanding arrays
typeset -a mylist mylist+=( "/etc/foo" ) mylist+=( "/etc/bar" ) mylist+=( "/etc/baz" ) IFS="," printf "mylist[*]={ 0=|%s| 1=|%s| 2=|%s| 3=|%s| }\n" "${mylist[*]}" printf "mylist[@]={ 0=|%s| 1=|%s| 2=|%s| 3=|%s| }\n" "${mylist[@]}"
will print:
mylist[*]={ 0=|/etc/foo,/etc/bar,/etc/baz| 1=|| 2=|| 3=|| }
mylist[@]={ 0=|/etc/foo| 1=|/etc/bar| 2=|/etc/baz| 3=|| }
Use compound variables or associative arrays to group similar variables together.
For example:
box_width=56 box_height=10 box_depth=19 echo "${box_width} ${box_height} ${box_depth}"
could be rewritten to ("associative array"-style)
typeset -A -E box=( [width]=56 [height]=10 [depth]=19 ) print -- "${box[width]} ${box[height]} ${box[depth]}"
or ("compound variable"-style
box=( float width=56 float height=10 float depth=19 ) print -- "${box.width} ${box.height} ${box.depth}"
The behaviour of "echo
" is not portable
(e.g. System V, BSD, UCB and ksh93/bash shell builtin versions all
slightly differ in functionality) and should be avoided if possible.
POSIX defines the "printf
" command as replacement
which provides more flexible and portable behaviour.
print
" and not "echo
" in Korn Shell scriptsKorn shell scripts should prefer the "print
"
builtin which was introduced as replacement for "echo
".
Use $ print -- ${varname}" #
when there is the slightest chance that the
variable "varname
" may contain symbols like "-". Or better use "printf
"
instead, for example
integer fx # do something print $fx
may fail if "f" contains a negative value. A better way may be to use
integer fx # do something printf "%d\n" fx
Use redirect
and not exec
to open files - exec
will terminate the current function or script if an error occurs while redirect
just returns a non-zero exit code which can be caught.
Example:
if redirect 5</etc/profile ; then print "file open ok" head <&5 else print "could not open file" fi
Each of the redirections above trigger an
|open()|,|write()|,|close()|
-sequence. It is much
more efficient (and faster) to group the rediction into a block,
e.g. { echo "foo" ; echo "bar" ; echo "baz" } >xxx #
Avoid the creation of temporary files and store the values in variables instead if possible
Example:
ls -1 >xxx for i in $(cat xxx) ; do do_something ; done
can be replaced with
x="$(ls -1)" for i in ${x} ; do do_something ; done
ksh93 supports binary variables (e.g. typeset -b varname
) which can hold any value.
If you create more than one temporary file create an unique subdir for these files and make sure the dir is writable. Make sure you cleanup after yourself (unless you are debugging).
When opening a file use {n}<file, where n
is an
integer variable rather than specifying a fixed descriptor number.
This is highly recommended in functions to avoid that fixed file descriptor numbers interfere with the calling script.
Example 3. Open a network connection and store the file descriptor number in a variable
function cat_http { integer netfd ... # open TCP channel redirect {netfd}<>"/dev/tcp/${host}/${port}" # send HTTP request request="GET /${path} HTTP/1.1\n" request+="Host: ${host}\n" request+="User-Agent: demo code/ksh93 (2007-08-30; $(uname -s -r -p))\n" request+="Connection: close\n" print "${request}\n" >&${netfd} # collect response and send it to stdout cat <&${netfd} # close connection exec {netfd}<&- ... }
Use inline here documents, for example
command <<< $x
rather than
print -r -- "$x" | command
Use the -r
option of read
to read a line.
You never know when a line will end in \
and without a
-r
multiple
lines can be read.
Print compound variables using print -C varname
or
print -v varname
to make sure that non-printable characters
are correctly encoded.
Put the command name and arguments before redirections.
You can legally do $ > file date
instead of date > file
but don't do it.
Enable the gmacs
editor mode before reading user
input using the read
builtin to enable the use of
cursor+backspace+delete keys in the edit line
Example 5. Prompt user for a string with gmacs editor mode enabled
Use builtin (POSIX shell) arithmetic expressions instead of
expr
,
bc
,
dc
,
awk
,
nawk
or
perl
.
ksh93 supports C99-like floating-point arithmetic including special values such as +Inf, -Inf, +NaN, -NaN.
Use floating-point arithmetic expressions if calculations may
trigger a division by zero or other exceptions - floating point arithmetic expressions in
ksh93 support special values such as +Inf
/-Inf
and
+NaN
/-NaN
which can greatly simplify testing for
error conditions, e.g. instead of a trap
or explicit
if ... then... else
checks for every sub-expression
you can check the results for such special values.
Example:
$ksh93 -c 'integer i=0 j=5 ; print -- "x=$((j/i)) "'
ksh93: line 1: j/i: divide by zero
$ksh93 -c 'float i=0 j=-5 ; print -- "x=$((j/i)) "'
x=-Inf
Use printf "%a"
when passing floating-point values between scripts or
as output of a function to avoid rounding errors when converting between
bases.
Example:
function xxx { float val (( val=sin(5.) )) printf "%a\n" val } float out (( out=$(xxx) )) xxx print -- $out
This will print:
-0.9589242747 -0x1.eaf81f5e09933226af13e5563bc6p-01
Put constant values into readonly variables
For example:
float -r M_PI=3.14159265358979323846
or
float M_PI=3.14159265358979323846 readonly M_PI
Avoid string to number and/or number to string conversions in arithmetic expressions expressions to avoid performance degradation and rounding errors.
Example 6. (( x=$x*2 )) vs. (( x=x*2 ))
float x ... (( x=$x*2 ))
will convert the variable "x" (stored in the machine's native
|long double|
datatype) to a string value in base10 format,
apply pattern expansion (globbing), then insert this string into the
arithmetic expressions and parse the value which converts it into the internal |long double| datatype format again.
This is both slow and generates rounding errors when converting the floating-point value between
the internal base2 and the base10 representation of the string.
The correct usage would be:
float x ... (( x=x*2 ))
e.g. omit the '$' because it's (at least) redundant within arithmetic expressions.
Example 7. x=$(( y+5.5 )) vs. (( x=y+5.5 ))
float x float y=7.1 ... x=$(( y+5.5 ))
will calculate the value of y+5.5
, convert it to a
base-10 string value amd assign the value to the floating-point variable
x
again which will convert the string value back to the
internal |long double| datatype format again.
The correct usage would be:
float x float y=7.1 ... (( x=y+5.5 ))
i.e. this will save the string conversions and avoid any base2-->base10-->base2-conversions.
Set LC_NUMERIC
when using floating-point constants to avoid problems with radix-point
representations which differ from the representation used in the script, for example the de_DE.*
locale
use ',' instead of '.' as default radix point symbol.
For example:
# Make sure all math stuff runs in the "C" locale to avoid problems with alternative # radix point representations (e.g. ',' instead of '.' in de_DE.*-locales). This # needs to be set _before_ any floating-point constants are defined in this script) if [[ "${LC_ALL}" != "" ]] ; then export \ LC_MONETARY="${LC_ALL}" \ LC_MESSAGES="${LC_ALL}" \ LC_COLLATE="${LC_ALL}" \ LC_CTYPE="${LC_ALL}" unset LC_ALL fi export LC_NUMERIC=C ... float -r M_PI=3.14159265358979323846
The environment variable LC_ALL
always overrides all other LC_*
variables,
including LC_NUMERIC
. The script should always protect itself against custom LC_NUMERIC
and
LC_ALL
values as shown in the example above.