Examples with awk: A short introduction

ArticleCategory:[Article Category]

UNIX Basics

AuthorImage:[Author Photo]

[Photo of the Author]

TranslationInfo:[Author Name]

original in es Javier Palacios Bermejo es to en Javier Palacios Bermejo

AboutTheAuthor:[Über den Autor]

Abstract:[Abstract]

ArticleIllustration:[Article head Illustration]

[Ilustration]

ArticleBody:[Article Body]

Originally, the idea of writing this text came to me after the reading of a couple of articles published in _LF_ and written by Guido Socher. One of them, about find and related commands, showed me that looks like I was not the only one which used the command line yet, instead pretty GUI that makes you never know how the things are really done (that's the way that Windows takes years ago). The other article was about regular expressions, that, althoug are slightly mentioned in this article, you need to know to get the maximum from awk and some other commands (sed and grep mainly) about I wanted to talk when writing this article.

The key question is whether this command is really useful. And the answer is yes. It could be useful for a normal user, depending on his type of work but, as an administration tool, comands like this are unvaluable. Just walk around /var/yp/Makefile or the initialization scripts of every system to realize about that.

Introduction to awk

My first news about it are old enough for being forgoten. A colleage that needed to work with some really big outputs from a small Cray, and he was looking for many posibilities for clasification. The manual page for awk on the Cray was really small, but he said that it looks very good for that task, although it was not possible to deal with it.
A long time later, it cames to my life again, by mean of a casual comment (another place, another colleage), who used it for extract the first column from a table:
awk '{print $1}' file
Easy, isn't it? This simple task needs small amounts of programming in C or any other compiled or interpreted language.

Once whe have learned the lesson extracting a column we cab do things as rename files (althought not very much) using sequences as
ls -1 pattern | awk '{print "mv "$1" "$1".new"}' | sh

And more. Using sed or grep too the previous example becames more powerful.

  1. Renaming within the name
    ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
    (altought in some cases it will fail, as in file_old_and_old)

  2. remove only files (it can be done using rm alone, but what about an alias as 'rm -r')
    ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
    (again it could fail with strange names or access permisions)

  3. remove only directories
    ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
    (I thinks this works in every case, and we can do with ls -p | grep /$ | ...)

When, for example, same calculations are repeated with different initial parameters and we want select some output files for additional processing, this tools helps more than a little (actually, they are the only help to my known).

Actually, altough we will use that name, awk is not the kind of thing that usually is called command, instruction, etc, in the same way that gcc is not. awk is a programming language, with a syntax close to C in many aspects, which interpreter is called with the instruction awk.

About the syntax of the command itself, everything has been said

# gawk --help
Usage: gawk [POSIX or GNU style options] -f progfile [--] file ...
        gawk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options:
        -f progfile             --file=progfile
        -F fs                   --field-separator=fs
        -v var=val              --assign=var=val
        -m[fr] val
        -W compat               --compat
        -W copyleft             --copyleft
        -W copyright            --copyright
        -W help                 --help
        -W lint                 --lint
        -W lint-old             --lint-old
        -W posix                --posix
        -W re-interval          --re-interval
        -W source=program-text  --source=program-text
        -W traditional          --traditional
        -W usage                --usage
        -W version              --version

Report bugs to bug-gnu-utils@prep.ai.mit.edu,
with a Cc: to arnold@gnu.ai.mit.edu
Just mention that instead of simple quoting (') the programs in the command line, we can wrote them into a file, and call it with the option -f, and that command line defined variables using -v var=val we can add some versatility to the programs we write.

Awk is, roughly speaking, a language oriented to manage tables, in the sense of information that can be grouped inside fields and records, in the way of the more traditional databases. With the advantage that the record definition (and the field one too) is extremely flexible.

But awk is more powerful. It's designed for work with one-line records, but that point could be relaxed. In order to deep in some of its aspects, we are going to look some illustrative (and real) examples.

Programs like these only takes 5 minutes thinking and 5 more writing (or more than 20 minutes without thinking, using trial and error, in the funniest way).
If there is a less time consuming way, I want to know.

I've used awk for many other tasks (automatic generation of web pages with information from simple databases) and I know enough about programming as to be sure that a lot of things can be done, even many that I've never think about.
Let's fly your imagination.

A problem

(and a solution)

The only problem of awk is it's need of perfect tabular data, without any holes: It cannot work with the so common fixed width columns. If the awk input is generated by ourselves, this fact is not a problem: Just choosing something really strange to mark the fields, defining it with the FS variable should be enough. But if we have only the input, this becames a real problem, because some fields could contain the field-separator character (and, asl FS is usually a white space, the presence of names could be unconfortable). Look, for example, the table
1234  HD 13324  22:40:54 ....
1235  HD12223   22:43:12 ....
That could not be worked out with awk. Entries like this are sometimes necessary, and they are very common, because the data typing is not too much homogeneous. But, even in this case, if whe have only one of those columns, not everything is losed (if anybody knows how to deal with more than one column in a general case, just tell it). Once I need to deal with such a table, quite close to the one described above. Second column was a name, with a non-fixed ammount of white spaces. And, as use to be, I needed to sort using a later column. Some trials using sort +/-n.m showed the same problem with the embedded spaces.
And, suddenly, I realized that the column I wanted to sort was the last one. And that awk knows how many fields are content whithin the actual record, and access the last was enough (sometimes $9, sometimes $11, but everytime NF). A couple of trials taken me to where I wanted to arrive:
{
  printf $NF
  $NF = ""
  printf " "$0"\n"
}
And we obtain an output equal to the input, but with the last column moved to first place, and we can sort without any problem. Obviously, this method is easily applied to the third field from the end, o the next one to that control field that is equally valued because it was the key for our subtable, extracted from a bigger database...
Just let's fly again the imagination.

Working over matched lines

Deeper awk

Conclussions

Certainly, it might not be as poweful as many other tools designed with similar goals. But it has the big advantage that in a really short time, allow you to write programs that, although maybe one-shot ones, are fully tailored to our needs, in so many times very simple.
A clear example, not involving directly to awk, is substituting string within a text file: with really elementary notions of sed we can do it in any unix system in any thinkable circumstance, because we don't need even a text editor. Including vi. On the other side, system files as /etc/password and many other are very easily worked with awk, without involving anything else.

Y desde luego que awk no es el mejor. Hay varios lenguajes de scripting con capacidades mucho mayores, y la caracter'istica com'un de ser interpretados, que es lo que permite tiempos de desarrollo rid'iculos para proyectos sin grandes ambiciones aparte de la eficacia. Pero awk sigue teniendo la ventaja de ser siempre accesible en cualquier instalaci'on, por m'inima que esta sea.

Additional information

This kind of very basic commands don't use to be well documented, but you ever can find something looking around.

Usually, all books on unix mention this command, but only some of them treat it with some detail giving useful information. The best we can do, browse any book we see, because you never know where useful information can be found.