original in es Javier Palacios Bermejo es to en Javier Palacios Bermejo
Originally, the idea of writing this text came to me after the reading of a
couple of articles published in _LF_ and written by Guido Socher. One of them,
about
find and related commands,
showed me that looks like I was not the only one which
used the command line yet, instead pretty GUI that makes you never know how
the things are really done (that's the way that Windows takes years ago).
The other article was about
regular expressions,
that, althoug are slightly mentioned in this article, you need to know to get
the maximum from awk
and some other commands (sed
and
grep
mainly) about I wanted to talk when writing this article.
The key question is whether this command is really useful. And the answer
is yes. It could be useful for a normal user, depending on his type of
work but, as an administration tool, comands like this are unvaluable. Just
walk around /var/yp/Makefile
or the initialization scripts of every system to
realize about that.
awk
My first news about it are old enough for being forgoten. A colleage that
needed to work with some really big outputs from a small Cray, and he was
looking for many posibilities for clasification. The manual page for awk
on
the Cray was really small, but he said that it looks very good for that task,
although it was not possible to deal with it.
A long time later, it cames to my life again, by mean of a casual comment
(another place, another colleage), who used it for extract the first column
from a table:
awk '{print $1}' file
Easy, isn't it? This simple task needs small amounts of programming in C or any
other compiled or interpreted language.
Once whe have learned the lesson extracting a column we cab do
things as rename files (althought not very much) using sequences as
ls -1 pattern | awk '{print "mv "$1" "$1".new"}' | sh
And more. Using sed
or grep
too the previous example
becames more powerful.
ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
ls -p | grep /$ | ...
)
When, for example, same calculations are repeated with different initial parameters and we want select some output files for additional processing, this tools helps more than a little (actually, they are the only help to my known).
Actually, altough we will use that name, awk
is not the kind of
thing that usually is called command, instruction, etc, in the same way that
gcc
is not. awk
is a programming language, with a syntax
close to C in many aspects, which interpreter is called with the instruction
awk
.
About the syntax of the command itself, everything has been said
# gawk --help Usage: gawk [POSIX or GNU style options] -f progfile [--] file ... gawk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -W compat --compat -W copyleft --copyleft -W copyright --copyright -W help --help -W lint --lint -W lint-old --lint-old -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W version --version Report bugs to bug-gnu-utils@prep.ai.mit.edu, with a Cc: to arnold@gnu.ai.mit.eduJust mention that instead of simple quoting (') the programs in the command line, we can wrote them into a file, and call it with the option
-f
,
and that command line defined variables using -v var=
Awk is, roughly speaking, a language oriented to manage tables, in the sense of information that can be grouped inside fields and records, in the way of the more traditional databases. With the advantage that the record definition (and the field one too) is extremely flexible.
But awk
is more powerful. It's designed for work with one-line records, but
that point could be relaxed. In order to deep in some of its aspects, we are
going to look some illustrative (and real) examples.
awk
well), it's not too dificult, altough it could get
bored:
BEGIN { printf "LaTeX preample" printf "\\begin{tabular}" printf "{|c|c|...|c|}" } |
{ printf $1" & " printf $2" & " . . . printf $n" \\\\ " printf "\\hline" } |
END { print "\\end{document}" } |
awk
for slicing it.
Obviously, I needed to take advantage on some output characteristics.
|
( $1 == "====>" ) { NomObj = $2 TotObj = $4 if ( TotObj > 0 ) { FS = "|" for ( cont=0 ; cont<TotObj ; cont++ ) { getline print $2 $4 $5 $3 >> NomObj } FS = " " } } |
NOTE: Acutally, the object name was not returned, and it was sligthly more complicated, but this pretend to be an illustrative example. |
BEGIN { BEGIN_MSG = "From" BEGIN_BDY = "Precedence:" MAIN_KEY = "Subject:" VALIDATION = "[MONTH REPORT]" HEAD = "NO"; BODY = "NO"; PRINT="NO" OUT_FILE = "Month_Reports" } { if ( $1 == BEGIN_MSG ) { HEAD = "YES"; BODY = "NO"; PRINT="NO" } if ( $1 == MAIN_KEY ) { if ( $2 == VALIDATION ) { PRINT = "YES" $1 = ""; $2 = "" print "\n\n"$0"\n" > OUT_FILE } } if ( $1 == BEGIN_BDY ) { getline if ( $0 == "" ) { HEAD = "NO"; BODY = "YES" } else { HEAD = "NO"; BODY = "NO"; PRINT="NO" } } if ( BODY == "YES" && PRINT == "YES" ) { print $0 >> OUT_FILE } } |
Maybe we are the administers of a mailing list. Maybe, from time to time,
some special messages are submitted to the list (for example, month reports)
with some specific format (subject as '[MONTH REPORT] month , dept'). And,
suddenly, we decide at the end of the year put together all these messages,
saving aside the others.
This task can be done working the mail spool with the awk program on the left.
Make each report being write to an individual file means three extra lines to the code, and make each department reports being write to individual files means only some extra characters. |
NOTE: This example assumes that the mail spool is structured as I think it is. Actually I don't know the real format, but this programs works in my installation (again, in some strange cases, it could fail). |
Programs like these only takes 5 minutes thinking and 5 more writing (or
more than 20 minutes without thinking, using trial and error,
in the funniest way).
If there is a less time consuming way, I want to know.
I've used awk
for many other tasks (automatic generation of web pages with
information from simple databases) and I know enough about programming as
to be sure that a lot of things can be done, even many that I've never think
about.
Let's fly your imagination.
A problem |
(and a solution) |
|
The only problem of awk is it's need of perfect tabular data,
without any holes: It cannot work with the so common fixed width columns.
If the awk input is generated by ourselves, this fact is not a
problem: Just choosing something really strange to mark the fields, defining it
with the FS variable should be enough. But if we have only the
input, this becames a real problem, because some fields could contain the
field-separator character (and, asl FS is usually a white space,
the presence of names could be unconfortable). Look, for example, the table
1234 HD 13324 22:40:54 .... 1235 HD12223 22:43:12 ....That could not be worked out with awk . Entries like this are
sometimes necessary, and they are very common, because the data typing is not
too much homogeneous. But, even in this case, if whe have only one of those
columns, not everything is losed (if anybody knows how to deal with more than
one column in a general case, just tell it). Once I need to deal with such
a table, quite close to the one described above. Second column was a name, with
a non-fixed ammount of white spaces. And, as use to be, I needed to sort using
a later column.
Some trials using sort +/-n.m showed the same problem with the
embedded spaces. |
And, suddenly, I realized that the column I wanted to sort was the last one.
And that awk knows how many fields are content whithin the actual
record, and access the last was enough (sometimes $9 , sometimes
$11 , but everytime NF ). A couple of trials taken me
to where I wanted to arrive:
{ printf $NF $NF = "" printf " "$0"\n" }And we obtain an output equal to the input, but with the last column moved to first place, and we can sort without any problem. Obviously, this
method is easily applied to the third field from the end, o the next one to that
control field that is equally valued because it was the key for our subtable,
extracted from a bigger database...
Just let's fly again the imagination. |
awk
Certainly, it might not be as poweful as many other tools designed with
similar goals. But it has the big advantage that in a really short time, allow
you to write programs that, although maybe one-shot ones, are fully tailored
to our needs, in so many times very simple.
A clear example, not involving directly to awk
, is substituting string
within a text file: with really elementary notions of sed
we
can do it in any unix system in any thinkable circumstance, because we don't
need even a text editor. Including vi
. On the other side,
system files as /etc/password
and many other are very easily
worked with awk
, without involving anything else.
Y desde luego que awk
no es el mejor. Hay varios lenguajes de scripting
con capacidades mucho mayores, y la caracter'istica com'un de ser interpretados,
que es lo que permite tiempos de desarrollo rid'iculos para proyectos sin
grandes ambiciones aparte de la eficacia. Pero awk
sigue teniendo la ventaja
de ser siempre accesible en cualquier instalaci'on, por m'inima que esta
sea.
This kind of very basic commands don't use to be well documented, but you ever can find something looking around.
man awk
Usually, all books on unix mention this command, but only some of them treat it with some detail giving useful information. The best we can do, browse any book we see, because you never know where useful information can be found.