This article is a compilation of several interesting, unique command-line tricks that should help you squeeze more juice out of your system, improve your situational awareness of what goes on behind the curtains of the desktop, plus some rather unorthodox solutions that will melt the proverbial socks off your kernel.
Follow me for a round of creative administrative hacking.
1. Run top in batch mode
top is a handy utility for monitoring the utilization of your system. It is invoked from the command line and it works by displaying lots of useful information, including CPU and memory usage, the number of running processes, load, the top resource hitters, and other useful bits. By default, top refreshes its report every 3 seconds.
Most of us use top in this fashion; we run it inside the terminal, look on the statistics for a few seconds and then graciously quit and continue our work.
But what if you wanted to monitor the usage of your system resources unattended? In other words, let some system administration utility run and collect system information and write it to a log file every once in a while. Better yet, what if you wanted to run such a utility only for a given period of time, again without any user interaction?
There are many possible answers:
You could schedule a job via cron.
You could run a shell script that runs ps every X seconds or so in a loop, incrementing a counter until the desired number of interactions elapsed. But you would also need uptime to check the load and several other commands to monitor disk utilization and what not.
Instead of going wild about trying to patch a script, there's a much, much simpler solution: top in batch mode.
top can be run non-interactively, in batch mode. Time delay and the number of iterations can be configured, giving you the ability to dictate the data collection as you see fit. Here's an example:
top -b -d 10 -n 3 >> top-file
We have top running in batch mode (-b). It's going to refresh every 10 seconds, as specified by the delay (-d) flag, for a total count of 3 iterations (-n). The output will be sent to a file. A few screenshots:
And that does the trick. Speaking of writing to files ...
2. Write to more than one file at once with tee
In general, with static data, this is not a problem. You simply repeat the write operation. With dynamic data, again, this is not that much of a problem. You capture the output into a temporary variable and then write it to a number of files. But there's an easier and faster way of doing it, without redirection and repetitive write operations. The answer: tee.
tee is a very useful utility that duplicates pipe content. Now, what makes tee really useful is that it can append data to existing files, making it ideal for writing periodic log information to multiple files at once.
Here's a great example:
ps | tee file1 file2 file3
That's it! We're sending the output of the ps command to three different files! Or as many as we want. As you can see in the screenshots below, all three files were created at the same time and they all contain the same data. This is extremely useful for constantly changing output, which you must preserve in multiple instances without typing the same commands over and over like a keyboard-loving monkey.
Now, if you wanted to append data to files, that is periodically update them, you would use the -a flag, like this:
ps | tee -a file1 file2 file3 file4
3. Unleash the accounting power with pacct
Did you know that you can log the completion of every single process running on your machine? You may even want to do this, for security, statistical purposes, load optimization, or any other administrative reason you may think of. By default, process accounting (pacct) may not be activated on your machine. You might have to start it:
/usr/sbin/accton /var/account/pacct
Once this is done, every single process will be logged. You can find the logs under /var/account. The log itself is in binary form, so you will have to use a dumping utility to convert it to human-readable form. To this end, you use the dump-acct utility.
dump-acct pacct
The output may be very long, depending on the activity on your machine and whether you rotate the logs, which you should, since the accounting logs can inflate very quickly.
And there you go, the list of all processes ran on our host since the moment we activated the accounting. The output is printed in nice columns and includes the following, from left to right: process name, user time, system time, effective time, UID, GID, memory, and date. Other ways of starting accounting may be in the following forms:
/etc/init.d/psacct start
Or:
/etc/init.d/acct start
In fact, starting accounting using the init script is the preferred way of doing things. However, you should note that accounting is not a service in the typical form. The init script does not look for a running process - it merely checks for the lock file under /var. Therefore, if you turn the accounting on/off using the accton command, the init scripts won't be aware of this and may report false results.
BTW, turning accounting off with accton is done just like that:
/usr/sbin/accton
When no file is specified, the accounting is turned off. When the command is run against a file, as we've demonstrated earlier, the accounting process is started. You should be careful when activating/deactivating the accounting and stick to one method of management, either via the accton command or using the init scripts.
4. Dump utmp and wtmp logs
Like pacct, you can also dump the contents of the utmp and wtmp files. Both these files provide login records for the host. This information may be critical, especially if applications rely on the proper output of these files to function.
Being able to analyze the records gives you the power to examine your systems in and out. Furthermore, it may help you diagnose problems with logins, for example, via VNC or ssh, non-console and console login attempts, and more.
You can dump the logs using the dump-utmp utility. There is no dump-wtmp utility; the former works for both.
You can also do the following:
dump-utmp /var/log/wtmp
Here's what the sample file looks like:
5. Monitor CPU and disk usage with iostat
Would you like to know how your hard disks behave? Or how well does your CPU churn? iostat is a utility that reports statistics for CPU and I/O devices on your system. It can help you identify bottlenecks and mis-tuned kernel parameters, allowing you to boost the performance of your machine.
On some systems, the utility will be installed by default. Ubuntu 9.04, for example, requires that you install sysstat package, which, by the way, contains several more goodies that we will soon review:
Then, we can start monitoring the performance. I will not go into details what each little bit of displayed information means, but I will focus on one item: the first output reported by the utility is the average statistics since the last reboot.
Here's a sample run of iostat:
iostat -x 10 10
The utility runs 10 times, every 10 seconds, reporting extended (-x) statistics. Here's what the sample output to terminal looks like:
6. Monitor memory usage with vmstat
vmstat does the similar job, except it works with the virtual memory statistics. For Windows users, please note the term virtual does not refer to the pagefile, i.e. swap. It refers to the logical abstraction of memory in kernel, which is then translated into physical addresses.
vmstat reports information about processes, memory, paging, block IO, traps, and CPU activity. Again, it is very handy for detecting problems with system performance. Here's a sample run of vmstat:
vmstat -x 10 10
The utility runs 10 times, reporting every 1 second. For example, we can see that out system has taken some swap, but it's not doing anything much with it, there's approx. 35MB free memory and there's very little I/O activity, as there are no blocked processes. The CPU utilization spikes from just a few percents to almost 90% before calming down.
Nothing specially exciting, but in critical situations, this kind of information can be critical.
7. Combine the power of iostat and vmstat with dstat
dstat aims to replace vmstat, iostat and ifstat combined. It also offers exporting data into .csv files that can then be analyzed using spreadsheet software. dstat uses a pleasant color output in the terminal:
Plus you can make really nice graphs. The spike in the graph comes from opening the Firefox browser, for instance.
8. Collect, report or save system activity information with sar
sar is another powerful, versatile system. It is a sort of a jack o' all trades when it comes to monitoring and logging system activity. sar can be very useful for trying to analyze strange system problems where normal logs like boot.msg, messages or secure under /var/log do not yield too much information. sar writes the daily statistics into log files under /var/log/sa. Like we did before, we can monitor CPU utilization, every 2 seconds, 10 times:
sar -u 2 10
Or you may want to monitor disk activity (10 iterations, every 5 seconds):
sar -d 5 10
Now for some really cool stuff ...
9. Create UDP server-client - version 1
Here's something radical: create a small UDP server that listens on a port. Then configure a client to send information to the server. All this without root access!
Configure server with netcat
netcat is an incredibly powerful utility that can do just about anything with TCP or UDP connections. It can open connections, listen on ports, scan ports, and much more, all this with both IPv4 and IPv6.
In our example, we will use it to create a small UDP server on one of the non-service ports. This means we won't need root access to get it going.
netcat -l -u -p 42000
Here's what we did:
-l tells netcat to listen, -u tells it to use UDP, -p specifies the port (42000).
We can indeed verify with netstat:
netstat -tulpen | grep 42000
And we have an open port:
Configure client
Now we need to configure the client. The big question is how to tell our process to send data to a remote machine, to a UDP port? The answer is quite simple: open a file descriptor that points to the remote server. Here's the actual BASH script that we will use to test our connection:
The most interesting bit is the line that starts with exec.
exec 104<> /dev/udp/192.168.1.143/$1
We created a file descriptor 104 that points to our server. Now, it is possible that the file descriptor number 104 might already be in use, so you may want to check first with lsof or randomize the choice of the descriptor. Furthermore, if you have a name resolution mechanism in place, you can use a hostname instead of an IP. If you wanted to use a TCP connection, you would use /dev/tcp.
The choice of the port is defined by the $1 variable, passed as a command-line argument. You can hard code it - or make everything configurable by the user at runtime. The rest of the code is unimportant; we do something and then send information to our file descriptor, without really caring what it is. Again, we need no root access to do this.
Test connection
Now, we can see the server-client connection in action. Our server is a Ubuntu 8.10 machine, while our client is a Fedora 11. We ran the script on the client:
And watch the command-line on the server:
10. Configure UDP server-client - version 2
The limitation with the exercise above is that we do not control over some of the finer aspects of our connection. Furthermore, the connection is limited to a single end-point. If one client connects, others will be refused. To make things more exciting, we can improve our server. Instead of using netcat, we will write one of our own - in Perl.
Perl is a powerful programming language, very flexible, very neat. I must admin I have only recently began dabbling in it, so do not expect any miracles, but here's one way of creating a UDP server in Perl - there are tons of other implementations available, better, smarter, faster, and more elegant.
The code is very simple. First, let's take a look at the entire file and then examine sections of code. Here it is:
#!/usr/bin/perl
use IO::Socket;
$server = IO::Socket::INET->new(LocalPort => '50060',
Proto => "udp")
or die "Could not create UDP server on port
$server_port : $@n";
my $datagram;
my $MAXSIZE = 16384; #buffer size
while (my $data=$server->recv($datagram,$MAXSIZE))
{
print $datagram;
my $logdate=`date +"%m-%d-%H:%M:%S"`;
chomp($logdate);
my $filename="file.$logdate";
open(FD,">","$filename");
print FD $datagram;
close(FD);
}
close($server);
The code begins with the standard Perl declaration. If you want extra debugging, you can add the -w flag. If you want to use strict code, then you may also want to add use strict; declaration. I warmly recommend this.
The next important bit is this one:
use IO::Socket;
This one tells Perl to use the IO::Socket object interface. You can also use IO:Socket::INET specifically for domain sockets. For more information, please check the official Perl documentation.
The next bit is the creation of the socket, i.e. server:
$server = IO::Socket::INET->new(LocalPort => '50060',
Proto => "udp")
or die "Could not create UDP server on port
$server_port : $@n";
We are trying to open the local UDP port 50060. If this cannot be done, the script will die with a rather descriptive message.
Next, we define a variable that will take incoming data (datagram) and the buffer size. The buffer size might be limited by the network implementation or network restrictions on your router/switch or the kernel itself, so some values might not work for you.
And then, we have the server doing some hard work. It prints the data to the screen. But it also creates a log file with a time stamp and prints the data to the file as well.
The beauty of this implementation is that the server permits multiple incoming connections. Of course, you will have to decide how you want to differentiate the data sent by different clients, whether by a message header or using additional IO:Socket:INET objects like PeerAddr.
On the client side, nothing changes.
#Thanks dedoimedo.com for the great article .
Saturday, January 17, 2015
Monday, January 12, 2015
Regular Expressions and Extended Pattern Matching
Here
is a table of the Solaris (around 1991) commands that allow you to specify
regular expressions:
Utility
|
Regular Expression Type
|
vi
|
Basic
|
sed
|
Basic
|
grep
|
Basic
|
csplit
|
Basic
|
dbx
|
Basic
|
dbxtool
|
Basic
|
more
|
Basic
|
ed
|
Basic
|
expr
|
Basic
|
lex
|
Basic
|
pg
|
Basic
|
nl
|
Basic
|
rdist
|
Basic
|
awk
|
Extended
|
nawk
|
Extended
|
egrep
|
Extended
|
EMACS
|
EMACS Regular Expressions
|
PERL
|
PERL Regular Expressions
|
The
Anchor Characters: ^ and $
Most UNIX text facilities are line oriented. Searching for
patterns that span several lines is not easy to do. You see, the end of line
character is not included in the block of text that is searched. It is a
separator. Regular expressions examine the text between the separators. If you
want to search for a pattern that is at one end or the other, you use anchors. The character
"^" is the starting anchor, and the character "$" is the
end anchor. The regular expression "^A" will match all lines that
start with a capital A. The expression "A$" will match all lines that
end with the capital A. If the anchor characters are not used at the proper end
of the pattern, then they no longer act as anchors. That is, the "^"
is only an anchor if it is the first character in a regular expression. The
"$" is only an anchor if it is the last character. The expression
"$1" does not have an anchor. Neither is "1^". If you need
to match a "^" at the beginning of the line, or a "$" at
the end of a line, you must escape the special characters with a
backslash. Here is a summary:
Pattern
|
Matches
|
^A
|
"A" at the beginning of a line
|
A$
|
"A" at the end of a line
|
A^
|
"A^" anywhere on a line
|
$A
|
"$A" anywhere on a line
|
^^
|
"^" at the beginning of a line
|
$$
|
"$" at the end of a line
|
The use of "^" and "$" as indicators of the beginning or end of a line is a convention other utilities use. The vi editor uses these two characters as commands to go to the beginning or end of a line. The C shell uses "!^" to specify the first argument of the previous line, and "!$" is the last argument on the previous line.
It is one of those choices that other utilities go along with to maintain consistancy. For instance, "$" can refer to the last line of a file when using ed and sed. Cat -e marks end of lines with a "$". You might see it in other programs as well.
Matching
a character with a character set
The simplest character set is a character. The regular
expression "the" contains three character sets: "t,"
"h" and "e". It will match any line with the string
"the" inside it. This would also match the word "other". To
prevent this, put spaces before and after the pattern:
" the ". You can combine the string with an anchor. The
pattern "^From: " will match the lines of a mail message that
identify the sender. Use this pattern with grep to print every address in your
incoming mail box:
grep '^From: '
/usr/spool/mail/$USER
Some characters have a special meaning in regular expressions.
If you want to search for such a character, escape it with a backslash.
Match
any character with .
The character "." is one of those special
meta-characters. By itself it will match any character, except the end-of-line
character. The pattern that will match a line with a single characters is
^.$
Specifying
a Range of Characters with [...]
If you want to match specific characters, you can use the square
brackets to identify the exact characters you are searching for. The pattern
that will match any line of text that contains exactly one number is
^[0123456789]$
This is verbose. You can use the hyphen between two characters
to specify a range:
^[0-9]
OR
[0-9]$
You can intermix explicit characters with character ranges. This
pattern will match a single character that is a letter, number, or underscore:
[A-Za-z0-9_]
Character sets can be combined by placing them next to each
other. If you wanted to search for a word that
- Started
with a capital letter "T".
- Was
the first word on a line
- The
second letter was a lower case letter
- Was
exactly three letters long, and
- The
third letter was a vowel
the regular expression would be "^T[a-z][aeiou] ".
Exceptions
in a character set
You can easily search for all characters except those in square
brackets by putting a "^" as the first character after the
"[". To match all characters except vowels use "[^aeiou]".
Like the anchors in places that can't be considered an anchor,
the characters "]" and "-" do not have a special meaning if
they directly follow "[". Here are some examples:
Regular Expression
|
Matches
|
[]
|
The characters "[]"
|
[0]
|
The character "0"
|
[0-9]
|
Any number
|
[^0-9]
|
Any character other than a number
|
[-0-9]
|
Any number or a "-"
|
[0-9-]
|
Any number or a "-"
|
[^-0-9]
|
Any character except a number or a "-"
|
[]0-9]
|
Any number or a "]"
|
[0-9]]
|
Any number followed by a "]"
|
[0-9-z]
|
Any number,
|
or any character between "9" and "z".
|
|
[0-9\-a\]]
|
Any number, or
|
a "-", a "a", or a "]"
|
Repeating
character sets with *
The third part of a regular expression is the modifier. It is
used to specify how may times you expect to see the previous character set. The
special character "*" matches zero
or more copies. That is, the
regular expression "0*" matches zero
or more zeros, while the expression "[0-9]*" matches zero or more
numbers.
This explains why the pattern "^#*" is useless, as it
matches any number of "#'s" at the beginning of the line, including zero. Therefore this will match
every line, because every line starts with zero or more "#'s".
At first glance, it might seem that starting the count at zero
is stupid. Not so. Looking for an unknown number of characters is very
important. Suppose you wanted to look for a number at the beginning of a line,
and there may or may not be spaces before the number. Just use
"^ *" to match zero or more spaces at the beginning of the line.
If you need to match one or more, just repeat the character set. That is,
"[0-9]*" matches zero or more numbers, and "[0-9][0-9]*"
matches one or more numbers.
Matching
a specific number of sets with \{ and \}
You can continue the above technique if you want to specify a
minimum number of character sets. You cannot specify a maximum number of sets
with the "*" modifier. There is a special pattern you can use to
specify the minimum and maximum number of repeats. This is done by putting
those two numbers between "\{" and "\}". The backslashes
deserve a special discussion. Normally a backslash turns off the special meaning for a character. A
period is matched by a "\." and an asterisk is matched by a
"\*".
If a backslash is placed before a "<,"
">," "{," "}," "(," "),"
or before a digit, the backslash turns
on a special meaning. This
was done because these special functions were added late in the life of regular
expressions. Changing the meaning of "{" would have broken old expressions.
This is a horrible crime punishable by a year of hard labor writing COBOL
programs. Instead, adding a backslash added functionality without breaking old
programs. Rather than complain about the unsymmetry, view it as evolution.
Having convinced you that "\{" isn't a plot to confuse
you, an example is in order. The regular expression to match 4, 5, 6, 7 or 8
lower case letters is
[a-z]\{4,8\}
Any numbers between 0 and 255 can be used. The second number may
be omitted, which removes the upper limit. If the comma and the second number
are omitted, the pattern must be duplicated the exact number of times specified
by the first number.
You must remember that modifiers like "*" and
"\{1,5\}" only act as modifiers if they follow a character set. If
they were at the beginning of a pattern, they would not be a modifier. Here is
a list of examples, and the exceptions:
Regular Expression
|
Matches
|
_
|
|
*
|
Any line with an asterisk
|
\*
|
Any line with an asterisk
|
\\
|
Any line with a backslash
|
^*
|
Any line starting with an asterisk
|
^A*
|
Any line
|
^A\*
|
Any line starting with an "A*"
|
^AA*
|
Any line if it starts with one "A"
|
^AA*B
|
Any line with one or more "A"'s followed by a
"B"
|
^A\{4,8\}B
|
Any line starting with 4, 5, 6, 7 or 8 "A"'s
|
followed by a "B"
|
|
^A\{4,\}B
|
Any line starting with 4 or more "A"'s
|
followed by a "B"
|
|
^A\{4\}B
|
Any line starting with "AAAAB"
|
\{4,8\}
|
Any line with "{4,8}"
|
A{4,8}
|
Any line with "A{4,8}"
|
Matching
words with \< and \>
Searching for a word isn't quite as simple as it at first
appears. The string "the" will match the word "other". You
can put spaces before and after the letters and use this regular expression:
" the ". However, this does not match words at the
beginning or end of the line. And it does not match the case where there is a
punctuation mark after the word.
There is an easy solution. The characters "\<" and
"\>" are similar to the "^" and "$" anchors,
as they don't occupy a position of a character. They do "anchor" the
expression between to only match if it is on a word boundary. The pattern to
search for the word "the" would be "\<[tT]he\>". The
character before the "t" must be either a new line character, or
anything except a letter, number, or underscore. The character after the
"e" must also be a character other than a number, letter, or
underscore or it could be the end of line character.
Backreferences
- Remembering patterns with \(, \) and \1
Another pattern that requires a special mechanism is searching
for repeated words. The expression "[a-z][a-z]" will match any two
lower case letters. If you wanted to search for lines that had two adjoining
identical letters, the above pattern wouldn't help. You need a way of
remembering what you found, and seeing if the same pattern occurred again. You
can mark part of a pattern using "\(" and "\)". You can
recall the remembered pattern with "\" followed by a single digit.
Therefore, to search for two identical letters, use "\([a-z]\)\1".
You can have 9 different remembered patterns. Each occurrence of "\("
starts a new pattern. The regular expression that would match a 5 letter
palindrome, (e.g. "radar"), would be
\([a-z]\)\([a-z]\)[a-z]\2\1
Potential
Problems
That completes a discussion of the Basic regular expression.
Before I discuss the extensions the extended expressions offer, I wanted to mention
two potential problem areas.
The "\<" and "\>" characters were
introduced in the vi editor. The other programs didn't have
this ability at that time. Also the "\{min,max\}" modifier
is new and earlier utilities didn't have this ability. This made it difficult
for the novice user of regular expressions, because it seemed each utility has
a different convention. Sun has retrofited the newest regular expression
library to all of their programs, so they all have the same ability. If you try
to use these newer features on other vendor's machines, you might find they
don't work the same way.
The other potential point of confusion is the extent of the
pattern matches. Regular expressions match the longest possible pattern. That
is, the regular expression
A.*B
matches "AAB" as well as
"AAAABBBBABCCCCBBBAAAB". This doesn't cause many problems using grep, because an oversight in a
regular expression will just match more lines than desired. If you use sed, and your patterns get
carried away, you may end up deleting more than you wanted too.
Extended
Regular Expressions
Two programs use the extended regular expression: egrep and awk.
With these extensions, those special characters preceded by a backslash no
longer have the special meaning: "\{" , "\}",
"\<", "\>", "\(", "\)" as well as
the "\digit". There is a very good reason for this, which I
will delay explaining to build up suspense.
The character "?" matches 0 or 1 instances of the
character set before, and the character "+" matches one or more
copies of the character set. You can't use the \{ and \} in the extended
regular expressions, but if you could, you might consider the "?" to
be the same as "\{0,1\}" and the "+" to be the same as
"\{1,\}".
By now, you are wondering why the extended regular expressions
is even worth using. Except for two abbreviations, there are no advantages, and
a lot of disadvantages. Therefore, examples would be useful.
The three important characters in the expanded regular
expressions are "(", "|", and ")". Together, they
let you match a choice of patterns. As an example, you can egrep to print all From: and Subject: lines from your incoming mail:
egrep
'^(From|Subject): '
/usr/spool/mail/$USER
All lines starting with "From:" or
"Subject:" will be printed. There is no easy way to do this with the
Basic regular expressions. You could try "^[FS][ru][ob][mj]e*c*t*: "
and hope you don't have any lines that start with "Sromeet:".
Extended expressions don't have the "\<" and "\>"
characters. You can compensate by using the alternation mechanism. Matching the
word "the" in the beginning, middle, end of a sentence, or end of a
line can be done with the extended regular expression:
(^| )the([^a-z]|$)
There are two choices before the word, a space or the beginining
of a line. After the word, there must be something besides a lower case letter
or else the end of the line. One extra bonus with extended regular expressions
is the ability to use the "*," "+," and "?"
modifiers after a "(...)" grouping. The following will match "a
simple problem," "an easy problem," as well as "a
problem".
egrep "a[n]?
(simple|easy)? problem"
data
I promised to explain why the backslash characters don't work in
extended regular expressions. Well, perhaps the "\{...\}" and
"\<...\>" could be added to the extended expressions. These are
the newest addition to the regular expression family. They could be added, but
this might confuse people if those characters are added and the
"\(...\)" are not. And there is no way to add that functionality to
the extended expressions without changing the current usage. Do you see why?
It's quite simple. If "(" has a special meaning, then "\("
must be the ordinary character. This is the opposite of the Basic regular
expressions, where "(" is ordinary, and "\(" is special.
The usage of the parentheses is incompatable, and any change could break old
programs.
If the extended expression used "( ..|...)" as regular
characters, and "\(...\|...\)" for specifying alternate patterns,
then it is possible to have one set of regular expressions that has full
functionality. This is exactly what GNU emacs does, by the way.
The rest of this is random notes.
Regular
Expression
|
Class
|
Type
|
Meaning
|
_
|
|||
.
|
all
|
Character Set
|
A single character (except newline)
|
^
|
all
|
Anchor
|
Beginning of line
|
$
|
all
|
Anchor
|
End of line
|
[...]
|
all
|
Character Set
|
Range of characters
|
*
|
all
|
Modifier
|
zero or more duplicates
|
\<
|
Basic
|
Anchor
|
Beginning of word
|
\>
|
Basic
|
Anchor
|
End of word
|
\(..\)
|
Basic
|
Backreference
|
Remembers pattern
|
\1..\9
|
Basic
|
Reference
|
Recalls pattern
|
_+
|
Extended
|
Modifier
|
One or more duplicates
|
?
|
Extended
|
Modifier
|
Zero or one duplicate
|
\{M,N\}
|
Extended
|
Modifier
|
M to N Duplicates
|
(...|...)
|
Extended
|
Anchor
|
Shows alteration
|
_
|
|||
\(...\|...\)
|
EMACS
|
Anchor
|
Shows alteration
|
\w
|
EMACS
|
Character set
|
Matches a letter in a word
|
\W
|
EMACS
|
Character set
|
Opposite of \w
|
POSIX
character sets
POSIX added newer and more portable ways to search for character sets. Instead
of using [a-zA-Z] you can replace 'a-zA-Z' with [:alpha:], or to be more
complete. replace [a-zA-Z] with [[:alpha:]]. The advantage is that this will
match international character sets. You can mix the old style and new POSIX
styles, such as
grep '[1-9[:alpha:]]'
Here is the fill list
Character Group
|
Meaning
|
[:alnum:]
|
Alphanumeric
|
[:cntrl:]
|
Control Character
|
[:lower:]
|
Lower case character
|
[:space:]
|
Whitespace
|
[:alpha:]
|
Alphabetic
|
[:digit:]
|
Digit
|
[:print:]
|
Printable character
|
[:upper:]
|
Upper Case Character
|
[:blank:]
|
whitespace, tabs, etc.
|
[:graph:]
|
Printable and visible characters
|
[:punct:]
|
Punctuation
|
[:xdigit:]
|
Extended Digit
|
Note that some people use [[:alpha:]] as a notation, but the outer '[...]' specifies a character set.
Perl
Extensions
Regular
Expression
|
||
Class
|
Type
|
Meaning
|
\t
|
Character Set
|
tab
|
\n
|
Character Set
|
newline
|
\r
|
Character Set
|
return
|
\f
|
Character Set
|
form
|
\a
|
Character Set
|
alarm
|
\e
|
Character Set
|
escape
|
\033
|
Character Set
|
octal
|
\x1B
|
Character Set
|
hex
|
\c[
|
Character Set
|
control
|
\l
|
Character Set
|
lowercase
|
\u
|
Character Set
|
uppercase
|
\L
|
Character Set
|
lowercase
|
\U
|
Character Set
|
uppercase
|
\E
|
Character Set
|
end
|
\Q
|
Character Set
|
quote
|
\w
|
Character Set
|
Match a "word" character
|
\W
|
Character Set
|
Match a non-word character
|
\s
|
Character Set
|
Match a whitespace character
|
\S
|
Character Set
|
Match a non-whitespace character
|
\d
|
Character Set
|
Match a digit character
|
\D
|
Character Set
|
Match a non-digit character
|
\b
|
Anchor
|
Match a word boundary
|
\B
|
Anchor
|
Match a non-(word boundary)
|
\A
|
Anchor
|
Match only at beginning of string
|
\Z
|
Anchor
|
Match only at EOS, or before newline
|
\z
|
Anchor
|
Match only at end of string
|
\G
|
Anchor
|
Match only where previous m//g left off
|
Example of PERL Extended, multi-line regular expression
m{ \(
( # Start group
[^()]+ # anything but '(' or ')'
| # or
\( [^()]* \)
)+ # end group
\)
}x
Subscribe to:
Posts (Atom)