awk
command¶
In 1977, a programming language-level tool for processing text, named' awk', was born at Bell Labs. The name comes from the first letters of the last names of three famous people:
- Alfred Aho
- Peter Weinberger
- Brian Kernighan
Similar to shell (bash, csh, zsh, and ksh), awk
has derivatives with the development of history:
awk
: Born in 1977 Bell Labs.nawk
(new awk): It was born in 1985 and is an updated and enhanced version ofawk
. It was widely used with Unix System V Release 3.1 (1987). The old version ofawk
is calledoawk
(old awk).gawk
(GNU awk): It was written by Paul Rubin in 1986. The GNU Project was born in 1984.mawk
: was written in 1996 by Mike Brennan, the interpreter of the awk programming language.jawk
: Implementation ofawk
in JAVA
In the GNU/Linux operating system, the usual awk
refers to gawk
. However, some distributions, such as Ubuntu or Debian, use mawk
as their default awk
.
In the Rocky Linux 8.8, awk
refers to gawk
.
Shell > whereis awk
awk: /usr/bin/awk /usr/libexec/awk /usr/share/awk /usr/share/man/man1/awk.1.gz
Shell > ls -l /usr/bin/awk
lrwxrwxrwx. 1 root root 4 4月 16 2022 /usr/bin/awk -> gawk
Shell > rpm -qf /usr/bin/awk
gawk-4.2.1-4.el8.x86_64
For information not covered, see the gawk manual.
Although awk
is a tool for processing text, it has some programming language features:
- variable
- process control (loop)
- data type
- logical operation
- function
- array
- ...
The working principle of awk
: Similar to relational databases, it supports processing fields (columns) and records (rows). By default, awk
treats each line of a file as a record and places these records in memory for line-by-line processing, with a portion of each line treated as a field in the record. By default, delimiters to separate different fields use spaces and tabs, while numbers represent different fields in the row record. To reference multiple fields, separate them with commas or tabs.
A simple example that is easy to understand:
Shell > df -hT
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|Filesystem | Type | Size | Used | Avail | Use% | Mounted | on |←← 1 (first line)
|devtmpfs | devtmpfs | 1.8G | 0 | 1.8G | 0% | /dev | |←← 2
|tmpfs | tmpfs | 1.8G | 0 | 1.8G | 0% | /dev/shm | |←← 3
|tmpfs | tmpfs | 1.8G | 8.9M | 1.8G | 1% | /run | |←← 4
|tmpfs | tmpfs | 1.8G | 0 | 1.8G | 0% | /sys/fs/cgroup | |←← 5
|/dev/nvme0n1p2 | ext4 | 47G | 2.6G | 42G | 6% | / | |←← 6
|/dev/nvme0n1p1 | xfs | 1014M | 182M | 833M | 18% | /boot | |←← 7
|tmpfs | tmpfs | 364M | 0 | 364M | 0% | /run/user/0 | |←← 8 (end line)
Shell > df -hT | awk '{print $1,$2}'
Filesystem Type
devtmpfs devtmpfs
tmpfs tmpfs
tmpfs tmpfs
tmpfs tmpfs
/dev/nvme0n1p2 ext4
/dev/nvme0n1p1 xfs
tmpfs tmpfs
# $0: Reference the entire text content.
Shell > df -hT | awk '{print $0}'
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs tmpfs 1.8G 0 1.8G 0% /dev/shm
tmpfs tmpfs 1.8G 8.9M 1.8G 1% /run
tmpfs tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup
/dev/nvme0n1p2 ext4 47G 2.6G 42G 6% /
/dev/nvme0n1p1 xfs 1014M 182M 833M 18% /boot
tmpfs tmpfs 364M 0 364M 0% /run/user/0
Instructions for using awk
¶
The usage of awk
is - awk option 'pattern {action}' FileName
pattern: Find specific content in the text action: Action instruction { }: Group some instructions according to specific patterns
option | description |
---|---|
-f program-file --file program-file |
Reading awk program source files from files |
-F FS | Specify the separator for separating fields. The 'FS' here is a built-in variable in awk , with default values of spaces or tabs |
-v var=value | variable assignment |
--posix | Turn on compatibility mode |
--dump-variables=[file] | Write global variables in awk to a file. If no file is specified, the default file is awkvars.out |
--profile=[file] | Write performance analysis data to a specific file. If no file is specified, the default file is awkprof.out |
pattern | description |
---|---|
BEGIN{ } | An action that is performed before all row records are read |
END{ } | An action that is performed after all row records are read |
/regular expression/ | Match the regular expression for each input line record |
pattern && pattern | Logic and operation |
pattern || pattern | Logic or operation |
!pattern | Logical negation operation |
pattern1,pattern2 | Specify the pattern range to match all row records within that range |
awk
is powerful and involves a lot of knowledge, so some of the content will be explained later.
printf
commands¶
Before formally learning awk
, beginners need to understand the command printf
.
printf
:format and print data. Its usage is -printf FORMAT [ARGUMENT]...
FORMAT:Used to control the content of the output. The following common interpretation sequences are supported:
- \a - alert (BEL)
- \b - backspace
- \f - form feed
- \n - new line
- \r - carriage return
- \t - horizontal tab
- \v - vertical tab
- %Ns - The output string. The N represents the number of strings, for example:
%s %s %s
- %Ni - Output integers. The N represents the number of integers of the output, for example:
%i %i
- %m.nf - Output Floating Point Number. The m represents the total number of digits output, and the n represents the number of digits after the decimal point. For example:
%8.5f
ARGUMENT: If it is a file, you need to do some preprocessing to output correctly.
Shell > cat /tmp/printf.txt
ID Name Age Class
1 Frank 20 3
2 Jack 25 5
3 Django 16 6
4 Tom 19 7
# Example of incorrect syntax:
Shell > printf '%s %s $s\n' /tmp/printf.txt
/tmp/printf.txt
# Change the format of the text
Shell > printf '%s' $(cat /tmp/printf.txt)
IDNameAgeClass1Frank2032Jack2553Django1664Tom197
# Change the format of the text
Shell > printf '%s\t%s\t%s\n' $(cat /tmp/printf.txt)
ID Name Age
Class 1 Frank
20 3 2
Jack 25 5
3 Django 16
6 4 Tom
19 7
Shell > printf "%s\t%s\t%s\t%s\n" a b c d 1 2 3 4
a b c d
1 2 3 4
No print
command exists in RockyLinux OS. You can only use print
in awk
, and its difference from printf
is that it automatically adds a newline at the end of each line. For example:
Shell > awk '{printf $1 "\t" $2"\n"}' /tmp/printf.txt
ID Name
1 Frank
2 Jack
3 Django
4 Tom
Shell > awk '{print $1 "\t" $2}' /tmp/printf.txt
ID Name
1 Frank
2 Jack
3 Django
4 Tom
Basic usage example¶
-
Reading
awk
program source files from filesShell > vim /tmp/read-print.awk #!/bin/awk {print $6} Shell > df -hT | awk -f /tmp/read-print.awk Use% 0% 0% 1% 0% 6% 18% 0%
-
Specify delimiter
Shell > awk -F ":" '{print $1}' /etc/passwd root bin daemon adm lp sync ... Shell > tail -n 5 /etc/services | awk -F "\/" '{print $2}' awk: warning: escape sequence `\/' treated as plain `/' axio-disc 35100 pmwebapi 44323 cloudcheck-ping 45514 cloudcheck 45514 spremotetablet 46998
You can also use words as delimiters. Parentheses indicate this is an overall delimiter, and "|" means or.
Shell > tail -n 5 /etc/services | awk -F "(tcp)|(udp)" '{print $1}' axio-disc 35100/ pmwebapi 44323/ cloudcheck-ping 45514/ cloudcheck 45514/ spremotetablet 46998/
-
Variable assignment
Shell > tail -n 5 /etc/services | awk -v a=123 'BEGIN{print a}{print $1}' 123 axio-disc pmwebapi cloudcheck-ping cloudcheck spremotetablet
Assign the value of user-defined variables in bash to awk's variables.
Shell > ab=123 Shell > echo ${ab} 123 Shell > tail -n 5 /etc/services | awk -v a=${ab} 'BEGIN{print a}{print $1}' 123 axio-disc pmwebapi cloudcheck-ping cloudcheck spremotetablet
-
Write awk's global variables to a file
Shell > seq 1 6 | awk --dump-variables '{print $0}' 1 2 3 4 5 6 Shell > cat /root/awkvars.out ARGC: 1 ARGIND: 0 ARGV: array, 1 elements BINMODE: 0 CONVFMT: "%.6g" ENVIRON: array, 27 elements ERRNO: "" FIELDWIDTHS: "" FILENAME: "-" FNR: 6 FPAT: "[^[:space:]]+" FS: " " FUNCTAB: array, 41 elements IGNORECASE: 0 LINT: 0 NF: 1 NR: 6 OFMT: "%.6g" OFS: " " ORS: "\n" PREC: 53 PROCINFO: array, 20 elements RLENGTH: 0 ROUNDMODE: "N" RS: "\n" RSTART: 0 RT: "\n" SUBSEP: "\034" SYMTAB: array, 28 elements TEXTDOMAIN: "messages"
Later, we will introduce what these variables mean. To review them now, jump to variables.
-
BEGIN{ } and END{ }
Shell > head -n 5 /etc/passwd | awk 'BEGIN{print "UserName:PasswordIdentification:UID:InitGID"}{print $0}END{print "one\ntwo"}' UserName:PasswordIdentification:UID:InitGID root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin one two
-
--profile option
Shell > df -hT | awk --profile 'BEGIN{print "start line"}{print $0}END{print "end line"}' start line Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 1.8G 0 1.8G 0% /dev tmpfs tmpfs 1.8G 0 1.8G 0% /dev/shm tmpfs tmpfs 1.8G 8.9M 1.8G 1% /run tmpfs tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup /dev/nvme0n1p2 ext4 47G 2.7G 42G 6% / /dev/nvme0n1p1 xfs 1014M 181M 834M 18% /boot tmpfs tmpfs 363M 0 363M 0% /run/user/0 end line Shell > cat /root/awkprof.out # gawk profile, created Fri Dec 8 15:12:56 2023 # BEGIN rule(s) BEGIN { 1 print "start line" } # Rule(s) 8 { 8 print $0 } # END rule(s) END { 1 print "end line" }
Modify the awkprof.out file.
Shell > vim /root/awkprof.out BEGIN { print "start line" } { print $0 } END { print "end line" } Shell > df -hT | awk -f /root/awkprof.out start line Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 1.8G 0 1.8G 0% /dev tmpfs tmpfs 1.8G 0 1.8G 0% /dev/shm tmpfs tmpfs 1.8G 8.9M 1.8G 1% /run tmpfs tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup /dev/nvme0n1p2 ext4 47G 2.7G 42G 6% / /dev/nvme0n1p1 xfs 1014M 181M 834M 18% /boot tmpfs tmpfs 363M 0 363M 0% /run/user/0 end line
-
Match rows (records) through regular expressions
Shell > cat /etc/services | awk '/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ {print $0}' sunrpc 111/tcp portmapper rpcbind # RPC 4.0 portmapper TCP auth 113/tcp authentication tap ident sftp 115/tcp uucp-path 117/tcp nntp 119/tcp readnews untp # USENET News Transfer Protocol ntp 123/tcp netbios-ns 137/tcp # NETBIOS Name Service netbios-dgm 138/tcp # NETBIOS Datagram Service netbios-ssn 139/tcp # NETBIOS session service ...
-
Logical operations (logical and, logical OR, reverse)
logical and: && logical OR: || reverse: !
Shell > cat /etc/services | awk '/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ && /175/ {print $0}' vmnet 175/tcp # VMNET
Shell > cat /etc/services | awk '/[^0-9a-zA-Z]9[1-9]{2}\/tcp/ || /91{2}\/tcp/ {print $0}' telnets 992/tcp imaps 993/tcp # IMAP over SSL pop3s 995/tcp # POP-3 over SSL mtp 1911/tcp # rndc 953/tcp # rndc control sockets (BIND 9) xact-backup 911/tcp # xact-backup apex-mesh 912/tcp # APEX relay-relay service apex-edge 913/tcp # APEX endpoint-relay service ftps-data 989/tcp # ftp protocol, data, over TLS/SSL nas 991/tcp # Netnews Administration System vsinet 996/tcp # vsinet maitrd 997/tcp # busboy 998/tcp # garcon 999/tcp # #puprouter 999/tcp # blockade 2911/tcp # Blockade prnstatus 3911/tcp # Printer Status Port cpdlc 5911/tcp # Controller Pilot Data Link Communication manyone-xml 8911/tcp # manyone-xml sype-transport 9911/tcp # SYPECom Transport Protocol
Shell > cat /etc/services | awk '!/(tcp)|(udp)/ {print $0}' discard 9/sctp # Discard discard 9/dccp # Discard SC:DISC ftp-data 20/sctp # FTP ftp 21/sctp # FTP ssh 22/sctp # SSH exp1 1021/sctp # RFC3692-style Experiment 1 (*) [RFC4727] exp1 1021/dccp # RFC3692-style Experiment 1 (*) [RFC4727] exp2 1022/sctp # RFC3692-style Experiment 2 (*) [RFC4727] exp2 1022/dccp # RFC3692-style Experiment 2 (*) [RFC4727] ltp-deepspace 1113/dccp # Licklider Transmission Protocol cisco-ipsla 1167/sctp # Cisco IP SLAs Control Protocol rcip-itu 2225/sctp # Resource Connection Initiation Protocol m2ua 2904/sctp # M2UA m3ua 2905/sctp # M3UA megaco-h248 2944/sctp # Megaco-H.248 text ...
-
Locates consecutive lines by string and prints them
Shell > cat /etc/services | awk '/^ntp/,/^netbios/ {print $0}' ntp 123/tcp ntp 123/udp # Network Time Protocol netbios-ns 137/tcp # NETBIOS Name Service
Info
Start range: stop matching when the first match is encountered. End range: stop matching when the first match is encountered.
Built-in variable¶
Variable name | Description |
---|---|
FS | The delimiter of the input field. The default is space or tab |
OFS | The delimiter of the output field. The default is space |
RS | The delimiter of the input row record. The default is a newline character (\n) |
ORS | The delimiter of output row record. The default is a newline character (\n) |
NF | Count the number of fields in the current row record |
NR | Count the number of row records. After each line of text is processed, the value of this variable will be +1 |
FNR | Count the number of row records. When the second file is processed, the NR variable continues to add up, but the FNR variable is recounted |
ARGC | The number of command line arguments |
ARGV | An array of command line arguments, with subscript starting at 0 and ARGV[0] representing awk |
ARGIND | The index value of the file currently being processed. The first file is 1, the second file is 2, and so on |
ENVIRON | Environment variables of the current system |
FILENAME | Output the currently processed file name |
IGNORECASE | Ignore case |
SUBSEP | The delimiter of the subscript in the array, which defaults to "\034" |
-
FS and OFS
Shell > cat /etc/passwd | awk 'BEGIN{FS=":"}{print $1}' root bin daemon adm lp sync
You can also use the -v option to assign values to variables.
Shell > cat /etc/passwd | awk -v FS=":" '{print $1}' root bin daemon adm lp sync
The default output delimiter is a space when using commas to reference multiple fields. You can, however, specify the output delimiter separately.
Shell > cat /etc/passwd | awk 'BEGIN{FS=":"}{print $1,$2}' root x bin x daemon x adm x lp x
Shell > cat /etc/passwd | awk 'BEGIN{FS=":";OFS="\t"}{print $1,$2}' # or Shell > cat /etc/passwd | awk -v FS=":" -v OFS="\t" '{print $1,$2}' root x bin x daemon x adm x lp x
-
RS and ORS
By default,
awk
uses newline characters to distinguish each line recordShell > echo -e "https://example.com/books/index.html\ntitle//tcp" https://example.com/books/index.html title//tcp Shell > echo -e "https://example.com/books/index.html\ntitle//tcp" | awk 'BEGIN{RS="\/\/";ORS="%%"}{print $0}' awk: cmd. line:1: warning: escape sequence `\/' treated as plain `/' https:%%example.com/books/index.html title%%tcp %% ← Why? Because "print"
-
NF
Count the number of fields per line in the current text
Shell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{RS="\n";ORS="\n"} {print NF}' 7 7 7 7 7
Print the fifth field
Shell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{RS="\n";ORS="\n"} {print $(NF-2)}' root bin daemon adm lp
Print the last field
Shell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{RS="\n";ORS="\n"} {print $NF}' /bin/bash /sbin/nologin /sbin/nologin /sbin/nologin /sbin/nologin
Exclude the last two fields
Shell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{RS="\n";ORS="\n"} {$NF=" ";$(NF-1)=" ";print $0}' root x 0 0 root bin x 1 1 bin daemon x 2 2 daemon adm x 3 4 adm lp x 4 7 lp
Exclude the first field
Shell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{RS="\n";ORS="\n"} {$1=" ";print $0}' | sed -r 's/(^ )//g' x 0 0 root /root /bin/bash x 1 1 bin /bin /sbin/nologin x 2 2 daemon /sbin /sbin/nologin x 3 4 adm /var/adm /sbin/nologin x 4 7 lp /var/spool/lpd /sbin/nologin
-
NR and FNR
Shell > tail -n 5 /etc/services | awk '{print NR,$0}' 1 axio-disc 35100/udp # Axiomatic discovery protocol 2 pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API 3 cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive 4 cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System 5 spremotetablet 46998/tcp # Capture handwritten signatures
Print the total number of lines in the file content
Shell > cat /etc/services | awk 'END{print NR}' 11473
Print the content of line 200
Shell > cat /etc/services | awk 'NR==200' microsoft-ds 445/tcp
Print the second field on line 200
Shell > cat /etc/services | awk 'BEGIN{RS="\n";ORS="\n"} NR==200 {print $2}' 445/tcp
Print content within a specific range
Shell > cat /etc/services | awk 'BEGIN{RS="\n";ORS="\n"} NR<=10 {print NR,$0}' 1 # /etc/services: 2 # $Id: services,v 1.49 2017/08/18 12:43:23 ovasik Exp $ 3 # 4 # Network services, Internet style 5 # IANA services version: last updated 2016-07-08 6 # 7 # Note that it is presently the policy of IANA to assign a single well-known 8 # port number for both TCP and UDP; hence, most entries here have two entries 9 # even if the protocol doesn't support UDP operations. 10 # Updated from RFC 1700, ``Assigned Numbers'' (October 1994). Not all ports
Comparison between NR and FNR
Shell > head -n 3 /etc/services > /tmp/a.txt Shell > cat /tmp/a.txt # /etc/services: # $Id: services,v 1.49 2017/08/18 12:43:23 ovasik Exp $ # Shell > cat /etc/resolv.conf # Generated by NetworkManager nameserver 8.8.8.8 nameserver 114.114.114.114 Shell > awk '{print NR,$0}' /tmp/a.txt /etc/resolv.conf 1 # /etc/services: 2 # $Id: services,v 1.49 2017/08/18 12:43:23 ovasik Exp $ 3 # 4 # Generated by NetworkManager 5 nameserver 8.8.8.8 6 nameserver 114.114.114.114 Shell > awk '{print FNR,$0}' /tmp/a.txt /etc/resolv.conf 1 # /etc/services: 2 # $Id: services,v 1.49 2017/08/18 12:43:23 ovasik Exp $ 3 # 1 # Generated by NetworkManager 2 nameserver 8.8.8.8 3 nameserver 114.114.114.114
-
ARGC and ARGV
Shell > awk 'BEGIN{print ARGC}' log dump long 4 Shell > awk 'BEGIN{print ARGV[0]}' log dump long awk Shell > awk 'BEGIN{print ARGV[1]}' log dump long log Shell > awk 'BEGIN{print ARGV[2]}' log dump long dump
-
ARGIND
This variable is mainly used to determine the file the
awk
program is working on.Shell > awk '{print ARGIND,$0}' /etc/hostname /etc/resolv.conf 1 Master 2 # Generated by NetworkManager 2 nameserver 8.8.8.8 2 nameserver 114.114.114.114
-
ENVIRON
You can reference operating systems or user-defined variables in
awk
programs.Shell > echo ${SSH_CLIENT} 192.168.100.2 6969 22 Shell > awk 'BEGIN{print ENVIRON["SSH_CLIENT"]}' 192.168.100.2 6969 22 Shell > export a=123 Shell > env | grep -w a a=123 Shell > awk 'BEGIN{print ENVIRON["a"]}' 123 Shell > unset a
-
FILENAME
Shell > awk 'BEGIN{RS="\n";ORS="\n"} NR=FNR {print ARGIND,FILENAME"---"$0}' /etc/hostname /etc/resolv.conf /etc/rocky-release 1 /etc/hostname---Master 2 /etc/resolv.conf---# Generated by NetworkManager 2 /etc/resolv.conf---nameserver 8.8.8.8 2 /etc/resolv.conf---nameserver 114.114.114.114 3 /etc/rocky-release---Rocky Linux release 8.9 (Green Obsidian)
-
IGNORECASE
This variable is useful if you want to use regular expressions in
awk
and ignore case.Shell > awk 'BEGIN{IGNORECASE=1;RS="\n";ORS="\n"} /^(SSH)|^(ftp)/ {print $0}' /etc/services ftp-data 20/tcp ftp-data 20/udp ftp 21/tcp ftp 21/udp fsp fspd ssh 22/tcp # The Secure Shell (SSH) Protocol ssh 22/udp # The Secure Shell (SSH) Protocol ftp-data 20/sctp # FTP ftp 21/sctp # FTP ssh 22/sctp # SSH ftp-agent 574/tcp # FTP Software Agent System ftp-agent 574/udp # FTP Software Agent System sshell 614/tcp # SSLshell sshell 614/udp # SSLshell ftps-data 989/tcp # ftp protocol, data, over TLS/SSL ftps-data 989/udp # ftp protocol, data, over TLS/SSL ftps 990/tcp # ftp protocol, control, over TLS/SSL ftps 990/udp # ftp protocol, control, over TLS/SSL ssh-mgmt 17235/tcp # SSH Tectia Manager ssh-mgmt 17235/udp # SSH Tectia Manager
Shell > awk 'BEGIN{IGNORECASE=1;RS="\n";ORS="\n"} /^(SMTP)\s/,/^(TFTP)\s/ {print $0}' /etc/services smtp 25/tcp mail smtp 25/udp mail time 37/tcp timserver time 37/udp timserver rlp 39/tcp resource # resource location rlp 39/udp resource # resource location nameserver 42/tcp name # IEN 116 nameserver 42/udp name # IEN 116 nicname 43/tcp whois nicname 43/udp whois tacacs 49/tcp # Login Host Protocol (TACACS) tacacs 49/udp # Login Host Protocol (TACACS) re-mail-ck 50/tcp # Remote Mail Checking Protocol re-mail-ck 50/udp # Remote Mail Checking Protocol domain 53/tcp # name-domain server domain 53/udp whois++ 63/tcp whoispp whois++ 63/udp whoispp bootps 67/tcp # BOOTP server bootps 67/udp bootpc 68/tcp dhcpc # BOOTP client bootpc 68/udp dhcpc tftp 69/tcp
Operator¶
Operator | Description |
---|---|
(...) | Grouping |
$n | Field reference |
++ -- |
Incremental Decreasing |
+ - ! |
Mathematical plus sign Mathematical minus sign Negation |
* / % |
Mathematical multiplication sign Mathematical division sign Modulo operation |
in | Elements in an array |
&& || |
Logic and Operations Logical OR operation |
?: | Abbreviation of conditional expressions |
~ | Another representation of regular expressions |
!~ | Reverse Regular Expression |
Note
In the awk
program, the following expressions will be judged as false:
- The number is 0;
- Empty string;
- Undefined value.
Shell > awk 'BEGIN{n=0;if(n) print "Ture";else print "False"}'
False
Shell > awk 'BEGIN{s="";if(s) print "True";else print "False"}'
False
Shell > awk 'BEGIN{if(t) print "True";else print "Flase"}'
False
-
Exclamation point
Print odd rows:
Shell > seq 1 10 | awk 'i=!i {print $0}' 1 3 5 7 9
Question
Why? Read the first line: Because "i" is not assigned a value, so "i=!i" indicates TRUE. Read the second line: At this point, "i=!i" indicates FALSE. And so on, the final printed line is an odd number.
Print even rows:
Shell > seq 1 10 | awk '!(i=!i)' # or Shell > seq 1 10 | awk '!(i=!i) {print $0}' 2 4 6 8 10
Note
As you can see, sometimes you can ignore the syntax for the "action" part, which by default is equivalent to "{print $0}".
-
Reversal
Shell > cat /etc/services | awk '!/(tcp)|(udp)|(^#)|(^$)/ {print $0}' http 80/sctp # HyperText Transfer Protocol bgp 179/sctp https 443/sctp # http protocol over TLS/SSL h323hostcall 1720/sctp # H.323 Call Control nfs 2049/sctp nfsd shilp # Network File System rtmp 1/ddp # Routing Table Maintenance Protocol nbp 2/ddp # Name Binding Protocol echo 4/ddp # AppleTalk Echo Protocol zip 6/ddp # Zone Information Protocol discard 9/sctp # Discard discard 9/dccp # Discard SC:DISC ...
-
Basic operations in mathematics
Shell > echo -e "36\n40\n50" | awk '{print $0+1}' 37 41 Shell > echo -e "30\t5\t8\n11\t20\t34" 30 5 8 11 20 34 Shell > echo -e "30\t5\t8\n11\t20\t34" | awk '{print $2*2+1}' 11 41
It can also be used in the "pattern":
Shell > cat -n /etc/services | awk '/^[1-9]*/ && $1%2==0 {print $0}' ... 24 tcpmux 1/udp # TCP port service multiplexer 26 rje 5/udp # Remote Job Entry 28 echo 7/udp 30 discard 9/udp sink null 32 systat 11/udp users 34 daytime 13/udp 36 qotd 17/udp quote ... Shell > cat -n /etc/services | awk '/^[1-9]*/ && $1%2!=0 {print $0}' ... 23 tcpmux 1/tcp # TCP port service multiplexer 25 rje 5/tcp # Remote Job Entry 27 echo 7/tcp 29 discard 9/tcp sink null 31 systat 11/tcp users ...
-
You can use the bash command in the awk program, for example:
Shell > echo -e "6\n3\n9\n8" | awk '{print $0 | "sort"}' 3 6 8 9
Info
Please pay attention! You must use double quotes to include the command.
-
Regular expression
Here, we cover basic examples of regular expressions. You can use regular expressions on row records.
Shell > cat /etc/services | awk '/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ {print $0}' # Be equivalent to: Shell > cat /etc/services | awk '$0~/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ {print $0}'
If the file has a large amount of text, regular expressions can also be used for fields, which will help improve processing efficiency. The usage example is as follows:
Shell > cat /etc/services | awk '$0~/^(ssh)/ && $2~/tcp/ {print $0}' ssh 22/tcp # The Secure Shell (SSH) Protocol sshell 614/tcp # SSLshell ssh-mgmt 17235/tcp # SSH Tectia Manager Shell > cat /etc/services | grep -v -E "(^#)|(^$)" | awk '$2!~/(tcp)|(udp)/ {print $0}' http 80/sctp # HyperText Transfer Protocol bgp 179/sctp https 443/sctp # http protocol over TLS/SSL h323hostcall 1720/sctp # H.323 Call Control nfs 2049/sctp nfsd shilp # Network File System rtmp 1/ddp # Routing Table Maintenance Protocol nbp 2/ddp # Name Binding Protocol ...
Flow control¶
-
if statement
The basic syntax format is -
if (condition) statement [ else statement ]
Example of a single branch use of an if statement:
Shell > cat /etc/services | awk '{if(NR==110) print $0}' pop3 110/udp pop-3
The condition is determined as a regular expression:
Shell > cat /etc/services | awk '{if(/^(ftp)\s|^(ssh)\s/) print $0}' ftp 21/tcp ftp 21/udp fsp fspd ssh 22/tcp # The Secure Shell (SSH) Protocol ssh 22/udp # The Secure Shell (SSH) Protocol ftp 21/sctp # FTP ssh 22/sctp # SSH
Double branch:
Shell > seq 1 10 | awk '{if($0==10) print $0 ; else print "False"}' False False False False False False False False False 10
Multiple branches:
Shell > cat /etc/services | awk '{ \ if($1~/netbios/) {print $0} else if($2~/175/) {print "175"} else if($2~/137/) {print "137"} else {print "no"} }'
-
while statement
The basic syntax format is -
while (condition) statement
Traverse and print out the fields of all row records.
Shell > tail -n 2 /etc/services cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures Shell > tail -n 2 /etc/services | awk '{ \ i=1; while(i<=NF){print $i;i++} }' cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures
-
for statement
The basic syntax format is -
for (expr1; expr2; expr3) statement
Traverse and print out the fields of all row records.
Shell > tail -n 2 /etc/services | awk '{ \ for(i=1;i<=NF;i++) print $i }'
Print the fields for each row of records in reverse order.
Shell > tail -n 2 /etc/services | awk '{ \ for(i=NF;i>=1;i--) print $i }' System Management WiFi CloudCheck ASSIA # 45514/tcp cloudcheck signatures handwritten Capture # 46998/tcp spremotetablet
Print each line of records in the opposite direction.
Shell > tail -n 2 /etc/services | awk '{ \ for(i=NF;i>=1;i--) {printf $i" "}; print "" }' System Management WiFi CloudCheck ASSIA # 45514/tcp cloudcheck signatures handwritten Capture # 46998/tcp spremotetablet
-
break statement and continue statement
The comparison between the two is as follows:
Shell > awk 'BEGIN{ \ for(i=1;i<=10;i++) { if(i==3) {break}; print i } }' 1 2
Shell > awk 'BEGIN{ \ for(i=1;i<=10;i++) { if(i==3) {continue}; print i } }' 1 2 4 5 6 7 8 9 10
-
exit statement
You can specify a return value in the range of [0,255]
The basic syntax format is -
exit [expression]
Shell > seq 1 10 | awk '{ if($0~/5/) exit "135" }' Shell > echo $? 135
Array¶
array: A collection of data with the same data type arranged in a certain order. Each data in an array is called an element.
Like most programming languages, awk
also supports arrays, which are divided into indexed arrays (with numbers as subscripts) and associative arrays (with strings as subscripts).
awk
has a lot of functions, and the functions related to arrays are:
-
length(Array_Name) - Get the length of the array.
-
Custom array
Format -
Array_Name[Index]=Value
Shell > awk 'BEGIN{a1[0]="test0" ; a1[1]="s1"; print a1[0]}' test0
Get the length of the array:
Shell > awk 'BEGIN{name[-1]="jimcat8" ; name[3]="jack" ; print length(name)}' 2
Store all GNU/Linux users in an array:
Shell > cat /etc/passwd | awk -F ":" '{username[NR]=$1}END{print username[2]}' bin Shell > cat /etc/passwd | awk -F ":" '{username[NR]=$1}END{print username[1]}' root
Info
The numeric subscript of an
awk
array can be a positive integer, a negative integer, a string, or 0, so the numeric subscript of anawk
array has no concept of an initial value. This is not the same as arrays inbash
.Shell > arr1=(2 10 30 string1) Shell > echo "${arr1[0]}" 2 Shell > unset arr1
-
Delete array
Format -
delete Array_Name
-
Delete an element from an array
Format -
delete Array_Name[Index]
-
Traversal array
You can use the for statement, which is suitable for cases where the array subscript is unknown:
Shell > head -n 5 /etc/passwd | awk -F ":" ' \ { username[NR]=$1 } END { for(i in username) print username[i],i } ' root 1 bin 2 daemon 3 adm 4 lp 5
If the subscript of an array is regular, you can use this form of the for statement:
Shell > cat /etc/passwd | awk -F ":" ' \ { username[NR]=$1 } END{ for(i=1;i<=NR;i++) print username[i],i } ' root 1 bin 2 daemon 3 adm 4 lp 5 sync 6 shutdown 7 halt 8 ...
-
Use "++" as the subscript of the array
Shell > tail -n 5 /etc/group | awk -F ":" '\ { a[x++]=$1 } END{ for(i in a) print a[i],i } ' slocate 0 unbound 1 docker 2 cgred 3 redis 4
-
Use a field as the subscript of an array
Shell > tail -n 5 /etc/group | awk -F ":" '\ { a[$1]=$3 } END{ for(i in a) print a[i],i } ' 991 docker 21 slocate 989 redis 992 unbound 990 cgred
-
Count the number of occurrences of the same field
Count the number of occurrences of the same IPv4 address. Basic idea:
- First use the
grep
command to filter out all IPv4 addresses - Then hand it over to the
awk
program for processing
Shell > cat /var/log/secure | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk ' \ { a[$1]++ } END{ for(v in a) print a[v],v } ' 4 0.0.0.0 4 192.168.100.2
Info
a[$1]++
is equivalent toa[$1]+=1
Count the number of occurrences of words regardless of case. Basic idea:
- Split all fields into multiple rows of records
- Then hand it over to the
awk
program for processing
Shell > cat /etc/services | awk -F " " '{for(i=1;i<=NF;i++) print $i}' Shell > cat /etc/services | awk -F " " '{for(i=1;i<=NF;i++) print $i}' | awk '\ BEGIN{IGNORECASE=1;OFS="\t"} /^netbios$/ || /^ftp$/ {a[$1]++} END{for(v in a) print a[v],v} ' 3 NETBIOS 18 FTP 7 ftp Shell > cat /etc/services | awk -F " " '{ for(i=1;i<=NF;i++) print $i }' | awk '\ BEGIN{IGNORECASE=1;OFS="\t"} /^netbios$/ || /^ftp$/ {a[$1]++} END{for(v in a) \ if(a[v]>=5) print a[v],v} ' 18 FTP 7 ftp
You can first filter specific row records and then perform statistics, such as:
Shell > ss -tulnp | awk -F " " '/tcp/ {a[$2]++} END{for(i in a) print a[i],i}' 2 LISTEN
- First use the
-
Print lines based on the number of occurrences of a specific field
Shell > tail /etc/services aigairserver 21221/tcp # Services for Air Server ka-kdp 31016/udp # Kollective Agent Kollective Delivery ka-sddp 31016/tcp # Kollective Agent Secure Distributed Delivery edi_service 34567/udp # dhanalakshmi.org EDI Service axio-disc 35100/tcp # Axiomatic discovery protocol axio-disc 35100/udp # Axiomatic discovery protocol pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures Shell > tail /etc/services | awk 'a[$1]++ {print $0}' axio-disc 35100/udp # Axiomatic discovery protocol
Reverse:
Shell > tail /etc/services | awk '!a[$1]++ {print $0}' aigairserver 21221/tcp # Services for Air Server ka-kdp 31016/udp # Kollective Agent Kollective Delivery ka-sddp 31016/tcp # Kollective Agent Secure Distributed Delivery edi_service 34567/udp # dhanalakshmi.org EDI Service axio-disc 35100/tcp # Axiomatic discovery protocol pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures
-
Multidimensional array
The
awk
program does not support multi-dimensional arrays, but support for multi-dimensional arrays is achievable through simulation. By default, "\034" is the delimiter for the subscript of a multidimensional array.Please note the following differences when using multidimensional arrays:
Shell > awk 'BEGIN{ a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i }' 200 20 300 30 100 1,0
Redefine the delimiter:
Shell > awk 'BEGIN{ SUBSEP="----" ; a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i }' 300 3----0 200 2----0 100 1,0
Reorder:
Shell > awk 'BEGIN{ SUBSEP="----" ; a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i | "sort" }' 100 1,0 200 2----0 300 3----0
Count the number of times the field appears:
Shell > cat c.txt A 192.168.1.1 HTTP B 192.168.1.2 HTTP B 192.168.1.2 MYSQL C 192.168.1.1 MYSQL C 192.168.1.1 MQ D 192.168.1.4 NGINX Shell > cat c.txt | awk 'BEGIN{SUBSEP="----"} {a[$1,$2]++} END{for(i in a) print a[i],i}' 1 A----192.168.1.1 2 B----192.168.1.2 2 C----192.168.1.1 1 D----192.168.1.4
Built-in function¶
Function name | Description |
---|---|
int(expr) | Truncate as an integer |
sqrt(expr) | Square root |
rand() | Returns a random number N with a range of (0,1). The result is not that every run is a random number, but that it remains the same. |
srand([expr]) | Use "expr" to generate random numbers. If "expr" is not specified, the current time is used as the seed by default, and if there is a seed, the generated random number is used. |
asort(a,b) | The elements of the array "a" are reordered (lexicographically) and stored in the new array "b", with the subscript in the array "b" starting at 1. This function returns the number of elements in the array. |
asorti(a,b) | Reorder the subscript of the array "a" and store the sorted subscript in the new array "b" as an element, with the subscript of the array "b" starting at 1. |
sub(r,s[,t]) | Use the "r" regular expression to match the input records, and replace the matching result with "s". "t" is optional, indicating a replacement for a certain field. The function returns the number of replacements - 0 or 1. Similar to sed s// |
gsub(r,s[,t]) | Global replacement. "t" is optional, indicating the replacement of a certain field. If "t" is ignored, it indicates global replacement. Similar to sed s///g |
gensub(r,s,h[,t]) | The "r" regular expression matches the input records and replaces the matching result with "s". "t" is optional, indicating a replacement for a certain field. "h" represents replacing the specified index position |
index(s,t) | Returns the index position of the string "t" in the string "s" (the string index starts from 1). If the function returns 0, it means it does not exist |
length([s]) | Returns the length of "s" |
match(s,r[,a]) | Test whether the string "s" contains the string "r". If included, return the index position of "r" within it (string index starting from 1). If not, return 0 |
split(s,a[,r[,seps]]) | Split string "s" into an array "a" based on the delimiter "seps". The subscript of the array starts with 1. |
substr(s,i[,n]) | Intercept the string. "s" represents the string to be processed; "i" indicates the index position of the string; "n" is the length. If you do not specify "n", it means to intercept all remaining parts |
tolower(str) | Converts all strings to lowercase |
toupper(str) | Converts all strings to uppercase |
systime() | Current timestamp |
strftime([format[,timestamp[,utc-flag]]]) | Format the output time. Converts the timestamp to a string |
-
int function
Shell > echo -e "qwer123\n123\nabc\n123abc123\n100.55\n-155.27" qwer123 123 abc 123abc123 100.55 -155.27 Shell > echo -e "qwer123\n123\nabc\n123abc123\n100.55\n-155.27" | awk '{print int($1)}' 0 123 0 123 100 -155
As you can see, the int function only works for numbers, and when encountering a string, converts it to 0. When encountering a string starting with a number, truncates it.
-
sqrt function
Shell > awk 'BEGIN{print sqrt(9)}' 3
-
rand function and srand function
The example of using the rand function is as follows:
Shell > awk 'BEGIN{print rand()}' 0.924046 Shell > awk 'BEGIN{print rand()}' 0.924046 Shell > awk 'BEGIN{print rand()}' 0.924046
The example of using the srand function is as follows:
Shell > awk 'BEGIN{srand() ; print rand()}' 0.975495 Shell > awk 'BEGIN{srand() ; print rand()}' 0.99187 Shell > awk 'BEGIN{srand() ; print rand()}' 0.069002
Generate an integer within the range of (0,100):
Shell > awk 'BEGIN{srand() ; print int(rand()*100)}' 56 Shell > awk 'BEGIN{srand() ; print int(rand()*100)}' 33 Shell > awk 'BEGIN{srand() ; print int(rand()*100)}' 42
-
asort function and asorti function
Shell > cat /etc/passwd | awk -F ":" '{a[NR]=$1} END{anu=asort(a,b) ; for(i=1;i<=anu;i++) print i,b[i]}' 1 adm 2 bin 3 chrony 4 daemon 5 dbus 6 ftp 7 games 8 halt 9 lp 10 mail 11 nobody 12 operator 13 polkitd 14 redis 15 root 16 shutdown 17 sshd 18 sssd 19 sync 20 systemd-coredump 21 systemd-resolve 22 tss 23 unbound Shell > awk 'BEGIN{a[1]=1000 ; a[2]=200 ; a[3]=30 ; a[4]="admin" ; a[5]="Admin" ; \ a[6]="12string" ; a[7]=-1 ; a[8]=-10 ; a[9]=-20 ; a[10]=-21 ;nu=asort(a,b) ; for(i=1;i<=nu;i++) print i,b[i]}' 1 -21 2 -20 3 -10 4 -1 5 30 6 200 7 1000 8 12string 9 Admin 10 admin
Info
Sorting rules:
- Numbers have higher priority than strings and are arranged in ascending order.
- Arrange strings in ascending dictionary order
If you are using the asorti function, the example is as follows:
Shell > awk 'BEGIN{ a[-11]=1000 ; a[-2]=200 ; a[-10]=30 ; a[-21]="admin" ; a[41]="Admin" ; \ a[30]="12string" ; a["root"]="rootstr" ; a["Root"]="r1" ; nu=asorti(a,b) ; for(i in b) print i,b[i] }' 1 -10 2 -11 3 -2 4 -21 5 30 6 41 7 Root 8 root
Info
Sorting rules:
- Numbers have priority over strings
- If a negative number is encountered, the first digit from the left will be compared. If it is the same, the second digit will be compared, and so on
- If a positive number is encountered, it will be arranged in ascending order
- Arrange strings in ascending dictionary order
-
sub function and gsub function
Shell > cat /etc/services | awk '/netbios/ {sub(/tcp/,"test") ; print $0 }' netbios-ns 137/test # NETBIOS Name Service netbios-ns 137/udp netbios-dgm 138/test # NETBIOS Datagram Service netbios-dgm 138/udp netbios-ssn 139/test # NETBIOS session service netbios-ssn 139/udp Shell > cat /etc/services | awk '/^ftp/ && /21\/tcp/ {print $0}' ftp 21/tcp ↑ ↑ Shell > cat /etc/services | awk 'BEGIN{OFS="\t"} /^ftp/ && /21\/tcp/ {gsub(/p/,"P",$2) ; print $0}' ftp 21/tcP ↑ Shell > cat /etc/services | awk 'BEGIN{OFS="\t"} /^ftp/ && /21\/tcp/ {gsub(/p/,"P") ; print $0}' ftP 21/tcP ↑ ↑
Just like the
sed
command, you can also use the "&" symbol to reference already matched strings.Shell > vim /tmp/tmp-file1.txt A 192.168.1.1 HTTP B 192.168.1.2 HTTP B 192.168.1.2 MYSQL C 192.168.1.1 MYSQL C 192.168.1.1 MQ D 192.168.1.4 NGINX # Add a line of text before the second line Shell > cat /tmp/tmp-file1.txt | awk 'NR==2 {gsub(/.*/,"add a line\n&")} {print $0}' A 192.168.1.1 HTTP add a line B 192.168.1.2 HTTP B 192.168.1.2 MYSQL C 192.168.1.1 MYSQL C 192.168.1.1 MQ D 192.168.1.4 NGINX # Add a string after the IP address in the second line Shell > cat /tmp/tmp-file1.txt | awk 'NR==2 {gsub(/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/,"&\tSTRING")} {print $0}' A 192.168.1.1 HTTP B 192.168.1.2 STRING HTTP B 192.168.1.2 MYSQL C 192.168.1.1 MYSQL C 192.168.1.1 MQ D 192.168.1.4 NGINX
-
index function
Shell > tail -n 5 /etc/services axio-disc 35100/udp # Axiomatic discovery protocol pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures Shell > tail -n 5 /etc/services | awk '{print index($2,"tcp")}' 0 7 0 7 7
-
length function
# The length of the output field Shell > tail -n 5 /etc/services | awk '{print length($1)}' 9 8 15 10 14 # The length of the output array Shell > cat /etc/passwd | awk -F ":" 'a[NR]=$1 END{print length(a)}' 22
-
match function
Shell > echo -e "1592abc144qszd\n144bc\nbn" 1592abc144qszd 144bc bn Shell > echo -e "1592abc144qszd\n144bc\nbn" | awk '{print match($1,144)}' 8 1 0
-
split function
Shell > echo "365%tmp%dir%number" | awk '{split($1,a1,"%") ; for(i in a1) print i,a1[i]}' 1 365 2 tmp 3 dir 4 number
-
substr function
Shell > head -n 5 /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin # I need this part of the content - "emon:/sbin:/sbin/nologin" Shell > head -n 5 /etc/passwd | awk '/daemon/ {print substr($0,16)}' emon:/sbin:/sbin/nologin Shell > tail -n 5 /etc/services axio-disc 35100/udp # Axiomatic discovery protocol pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures # I need this part of the content - "tablet" Shell > tail -n 5 /etc/services | awk '/^sp/ {print substr($1,9)}' tablet
-
tolower function and toupper function
Shell > echo -e "AbcD123\nqweR" | awk '{print tolower($0)}' abcd123 qwer Shell > tail -n 5 /etc/services | awk '{print toupper($0)}' AXIO-DISC 35100/UDP # AXIOMATIC DISCOVERY PROTOCOL PMWEBAPI 44323/TCP # PERFORMANCE CO-PILOT CLIENT HTTP API CLOUDCHECK-PING 45514/UDP # ASSIA CLOUDCHECK WIFI MANAGEMENT KEEPALIVE CLOUDCHECK 45514/TCP # ASSIA CLOUDCHECK WIFI MANAGEMENT SYSTEM SPREMOTETABLET 46998/TCP # CAPTURE HANDWRITTEN SIGNATURES
-
Functions that deal with time and date
What is a UNIX timestamp? According to the development history of GNU/Linux, UNIX V1 was born in 1971, and the book "UNIX Programmer's Manual" was published on November 3 of the same year, which defines 1970-01-01 as the reference date of the start of UNIX.
The conversion between a timestamp and a natural date time in days:
Shell > echo "$(( $(date --date="2024/01/06" +%s)/86400 + 1 ))" 19728 Shell > date -d "1970-01-01 19728days" Sat Jan 6 00:00:00 CST 2024
The conversion between a timestamp and a natural date time in seconds:
Shell > echo "$(date --date="2024/01/06 17:12:00" +%s)" 1704532320 Shell > echo "$(date --date='@1704532320')" Sat Jan 6 17:12:00 CST 2024
The conversion between natural date time and UNIX timestamp in
awk
program:Shell > awk 'BEGIN{print systime()}' 1704532597 Shell > echo "1704532597" | awk '{print strftime("%Y-%m-%d %H:%M:%S",$0)}' 2024-01-06 17:16:37
I/O statement¶
Statement | Description |
---|---|
getline | Read the next matching row record and assign it to "$0". The return value is 1: Indicates that relevant row records have been read. The return value is 0: Indicates that the last line has been read The return value is negative: Indicates encountering an error |
getline var | Read the next matching row record and assign it to the variable "var" |
command | getline [var] | Assign the result to "$0" or the variable "var" |
next | Stop the current input record and perform the following actions |
Print the result | |
printf | See here |
system(cmd-line) | Execute the command and return the status code. 0 indicates that the command was executed successfully; non-0 indicates that the execution failed |
print ... >> file | Output redirection |
print ... | command | Print the output and use it as input to the command |
-
getline
Shell > seq 1 10 | awk '/3/ || /6/ {getline ; print $0}' 4 7 Shell > seq 1 10 | awk '/3/ || /6/ {print $0 ; getline ; print $0}' 3 4 6 7
Using the functions we learned earlier and the "&" symbol, we can:
Shell > tail -n 5 /etc/services | awk '/45514\/tcp/ {getline ; gsub(/.*/ , "&\tSTRING1") ; print $0}' spremotetablet 46998/tcp # Capture handwritten signatures STRING1 Shell > tail -n 5 /etc/services | awk '/45514\/tcp/ {print $0 ; getline; gsub(/.*/,"&\tSTRING2") } {print $0}' axio-disc 35100/udp # Axiomatic discovery protocol pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System spremotetablet 46998/tcp # Capture handwritten signatures STRING2
Print even and odd lines:
Shell > tail -n 10 /etc/services | cat -n | awk '{ if( (getline) <= 1) print $0}' 2 ka-kdp 31016/udp # Kollective Agent Kollective Delivery 4 edi_service 34567/udp # dhanalakshmi.org EDI Service 6 axio-disc 35100/udp # Axiomatic discovery protocol 8 cloudcheck-ping 45514/udp # ASSIA CloudCheck WiFi Management keepalive 10 spremotetablet 46998/tcp # Capture handwritten signatures Shell > tail -n 10 /etc/services | cat -n | awk '{if(NR==1) print $0} { if(NR%2==0) {if(getline > 0) print $0} }' 1 aigairserver 21221/tcp # Services for Air Server 3 ka-sddp 31016/tcp # Kollective Agent Secure Distributed Delivery 5 axio-disc 35100/tcp # Axiomatic discovery protocol 7 pmwebapi 44323/tcp # Performance Co-Pilot client HTTP API 9 cloudcheck 45514/tcp # ASSIA CloudCheck WiFi Management System
-
getline var
Add each line of the b file to the end of each line of the C file:
Shell > cat /tmp/b.txt b1 b2 b3 b4 b5 b6 Shell > cat /tmp/c.txt A 192.168.1.1 HTTP B 192.168.1.2 HTTP B 192.168.1.2 MYSQL C 192.168.1.1 MYSQL C 192.168.1.1 MQ D 192.168.1.4 NGINX Shell > awk '{getline var1 <"/tmp/b.txt" ; print $0 , var1}' /tmp/c.txt A 192.168.1.1 HTTP b1 B 192.168.1.2 HTTP b2 B 192.168.1.2 MYSQL b3 C 192.168.1.1 MYSQL b4 C 192.168.1.1 MQ b5 D 192.168.1.4 NGINX b6
Replace the specified field of the c file with the content line of the b file:
Shell > awk '{ getline var2 < "/tmp/b.txt" ; gsub($2 , var2 , $2) ; print $0 }' /tmp/c.txt A b1 HTTP B b2 HTTP B b3 MYSQL C b4 MYSQL C b5 MQ D b6 NGINX
-
command | getline [var]
Shell > awk 'BEGIN{ "date +%Y%m%d" | getline datenow ; print datenow}' 20240107
Tip
Use double quotes to include Shell command.
-
next
Earlier, we introduced the break statement and the continue statement, the former used to terminate the loop, and the latter used to jump out of the current loop. See here. For next, when the conditions are met, it will stop the input recording that meets the conditions and continue with subsequent actions.
Shell > seq 1 5 | awk '{if(NR==3) {next} print $0}' 1 2 4 5 # equivalent to Shell > seq 1 5 | awk '{if($1!=3) print $0}'
Skip eligible line records:
Shell > cat /etc/passwd | awk -F ":" 'NR>5 {next} {print $0}' root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin # equivalent to Shell > cat /etc/passwd | awk -F ":" 'NR>=1 && NR<=5 {print $0}'
Tip
"next" cannot be used in "BEGIN{}" and "END{}".
-
system function
You can use this function to call commands in the Shell, such as:
Shell > awk 'BEGIN{ system("echo nginx http") }' nginx http
Tip
Please note to add double quotes when using the system function. If not added, the
awk
program will consider it a variable of theawk
program.Shell > awk 'BEGIN{ cmd1="date +%Y" ; system(cmd1)}' 2024
What if the Shell command itself contains double quotes? Using escape characters - "\", such as:
Shell > egrep "^root|^nobody" /etc/passwd Shell > awk 'BEGIN{ system("egrep \"^root|^nobody\" /etc/passwd") }' root:x:0:0:root:/root:/bin/bash nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
Another example:
Shell > awk 'BEGIN{ if ( system("xmind &> /dev/null") == 0 ) print "True"; else print "False" }' False
-
Write the output of the
awk
program to a fileShell > head -n 5 /etc/passwd | awk -F ":" 'BEGIN{OFS="\t"} {print $1,$2 > "/tmp/user.txt"}' Shell > cat /tmp/user.txt root x bin x daemon x adm x lp x
Tip
">" indicates writing to the file as an overlay. If you want to write to the file as an append, please use ">>". Reminder again, you should use double quotation marks to include the file path.
-
pipe character
See here
-
Custom functions
syntax -
function NAME(parameter list) { function body }
. Such as:Shell > awk 'function mysum(a,b) {return a+b} BEGIN{print mysum(1,6)}' 7
Concluding remarks¶
If you have specialized programming language skills, awk
is relatively easy to learn. However, for most sysadmins with weak programming language skills (including the author), awk
can be very complicated to learn. For information not covered, please refer to here.
Thank you again for reading.
Author: tianci li