Weight: 2
Description: Candidates should be able to apply filters to text streams.
Objectives
Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package.
Terms
- bzcat
- cat
- cut
- head
- less
- md5sum
- nl
- od
- paste
- sed
- sha256sum
- sha512sum
- sort
- split
- tail
- tr
- uniq
- wc
- xzcat
- zcat
Streams
In UNIX world a lot of data is in TEXT form. Log files, configurations, data, etc. Filtering this data means taking an input stream of text and performing some conversion on the text before sending it to an output stream. In this context, a streams is nothing more than "a sequence of bytes that can be read or written using library functions that hide the details of an underlying device from the application".
In simple words, a text stream is an input of text from a keyboard, a file, a network device, ... which can be viewed, changed, examined, and ... via text util commands.
Modern programming environments and shells (including bash) use three standard I/O streams:
- stdin is the standard input stream, which provides input to commands.
- stdout is the standard output stream, which displays output from commands.
- stderr is the standard error stream, which displays error output from commands
Here we are talking about the stdin and viewing or manipulating it via different commands and utilities. You will see more about these streams and will see how we can combine commands to PIPE inputs and outputs of different commands in chapter 103.4.
Viewing commands
cat
This command simply outputs its input stream (or the filename you give it). As you saw in the previous section. As with most commands, if you do not give input to it, it will read the data from the keyboard.
jadi@funlife:~/w/lpic/101$ cat > mydata
test
this is the second line
bye
jadi@funlife:~/w/lpic/101$ cat mydata
test
this is the second line
bye
When entering the input via the keyboard,
ctrl+d
will end the stream.
You can also provide more than one input file name:
jadi@funlife:~/w/lpic/101$ cat mydata directory_data
test
this is the second line
bye
total 0
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:33 12
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:33 62
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:33 neda
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:33 jadi
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:33 you
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:34 amir
-rw-rw-r-- 1 jadi jadi 0 Jan 4 17:37 directory_data
Some common cat switches are -n
to show line numbers, -s
to squeeze blanks, -T
to show tabs, and -v
to show non-printing characters.
bzcat, xzcat, zcat, gzcat
There are used to directly cat
the bz, xz, and Z & gz compressed files. These let you see the contents of compressed files without uncompressing them first.
less
This is a powerful tool to view larger text files. It can paginate, search and move in text files.
There is another command called
more
. It's more familiar for people coming from the DOS environment and not very common in the Linux world. Do not use it. Remember:less
is more thanmore
.
Some less common commands are as follows.
Command | Usage |
---|---|
q | Exit |
/foo | Searches for foo |
n | Next (search) |
N | Previous (search) |
?foo | Search backward for foo |
G | Go to end |
nG | Go to line n |
PageUp, PageDown, UpArrow, DownArrow | You guess! |
od
This command _dump_s files (Shows files in formats other than text). Normal behavior is OctalDump (showing in base 8):
jadi@funlife:~/w/lpic/101$ od mydata
0000000 062564 072163 072012 064550 020163 071551 072040 062550
0000020 071440 061545 067543 062156 066040 067151 005145 074542
0000040 005145
0000042
Not good enough for normal human beings. Let's use some switches:
- -t will tell what format to print:
-t a
for showing only named characters-t c
for showing escaped chars. You can summarize the two above to-a
and-c
- -A for choosing how to present offset field:
-A d
for Decimal,-A o
for Octal,-A x
for hex-A n
for None
od
is very useful to find problems in your text files - Say finding out if you are using tabs or correct line endings.
Choosing parts of files
split
Will split the files. It is very useful for transferring HUGE files on smaller media (say splitting a 3TB file into 8GB parts and moving them to another machine with a USB Disk).
jadi@funlife:~/w/lpic/101$ cat mydata
hello
this is the second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
jadi@funlife:~/w/lpic/101$ ls
mydata
jadi@funlife:~/w/lpic/101$ split -l 2 mydata
jadi@funlife:~/w/lpic/101$ ls
mydata xaa xab xac xad xae
jadi@funlife:~/w/lpic/101$ cat xab
but as you can see we are
still writing
- By default, split uses xaa, xab, xac, ... for output file names. It can be changed with
split -l 2 mydata output
which split mydata into outputaa, outputab, ...; 2 lines per file. - the
-l 2
splits 2 lines per file. It is possible to use-b 42
to split every 42 bytes or even-n 5
to force 5 output files. - If you want numeric output (x00, x01, ..) use
-d
option.
Need to join these files?
cat
them withcat x* > originalfile
.
head and tail
Will show the beginning (head) or end (tail) of text files. By default, it will show 10 lines but you can change it by -n20
or -20
.
tail -f
follows the new lines which are being written at the end of the file. Very useful.
cut
The cut
command will cut one or more columns from a file. Good for separating fields:
Lets cut the first field of a file.
jadi@funlife:~/w/lpic/101$ cat howcool
jadi 5
sina 6
rubic 2
you 12
jadi@funlife:~/w/lpic/101$ cut -f1 howcool
jadi
sina
rubic
you
Default delimiter is TAB. use
-dx
to change it to "x" or-d' '
to change it to space
It is also possible to cut fields 1, 2, and 3 with -f1-3
or only characters with index 4, 5, 7, 8 from each line -c4,5,7,8
.
Modifying streams
nl
This command is for showing line numbers.
jadi@funlife:~/w/lpic/101$ nl mydata | head -3
1 hello
2 this is the second line
3 but as you can see we are
cat -n
will also number lines.
sort & uniq
Will sorts its input(s).
jadi@funlife:~/w/lpic/101$ cat uses
you fedora
jadi ubuntu
rubic windows
neda mac
jadi@funlife:~/w/lpic/101$ cat howcool
jadi 5
sina 6
rubic 2
you 12
jadi@funlife:~/w/lpic/101$ sort howcool uses
jadi 5
jadi ubuntu
neda mac
rubic 2
rubic windows
sina 6
you 12
If you want a reverse sort, use the -r
switch.
If you want to sort NUMERICALLY (so 9 is lower than 19), use
-n
.
And the uniq
removes duplicate entries from its input. Normal behavior is removing only the duplicated lines but you can change its behavior, for example -f1
switch forces it not to check the first field.
jadi@funlife:~/w/lpic/101$ uniq what_i_have.txt
laptop
socks
tshirt
ball
socks
glasses
jadi@funlife:~/w/lpic/101$ sort what_i_have.txt | uniq
ball
glasses
laptop
socks
tshirt
jadi@funlife:~/w/lpic/101$
As you can see, the input HAVE TO BE sorted for uniq to work.
uniq has great switches:
jadi@funlife:~/w/lpic/101$ cat what_i_have.txt
laptop
socks
tshirt
ball
socks
glasses
jadi@funlife:~/w/lpic/101$ sort what_i_have.txt | uniq -c #show count of each item
1 ball
1 glasses
1 laptop
2 socks
1 tshirt
jadi@funlife:~/w/lpic/101$ sort what_i_have.txt | uniq -u #show only non-repeated items
ball
glasses
laptop
tshirt
jadi@funlife:~/w/lpic/101$ sort what_i_have.txt | uniq -d #show only repeated items
socks
paste
The paste command pastes lines from two or more files side-by-side! You cannot do this in a general text editor with ease!
jadi@funlife:~/w/lpic/101$ cat howcool
jadi 5
sina 6
rubic 2
you 12
jadi@funlife:~/w/lpic/101$ cat uses
you fedora
jadi ubuntu
rubic windows
neda mac
jadi@funlife:~/w/lpic/101$ paste howcool uses
jadi 5 you fedora
sina 6 jadi ubuntu
rubic 2 rubic windows
you 12 neda mac
tr
The tr
command translates characters in the stream. For example, tr 'ABC' '123'
will replace A with 1, B with 2, and C with 3 in the provided stream. It is a pure filter and does not accept the input file name. If needed you can pipe the cat with it (see chapter 103.4).
jadi@funlife:~/w/lpic/101$ cat mydata
hello
this is the second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
jadi@funlife:~/w/lpic/101$ cat mydata | tr 'and' 'AND'
hello
this is the second liNe
but As you cAN see we Are
still writiNg
AND this is gettiNg loNger
.
.
AND loNger
AND loNger!
Note: all 'a's are replaced with 'A'.
sed
sed is stream editor. It is POWERFUL and can do things that are not far from magic! Just like most of the tools we've seen far now, sed can work as a filter or take input from a file. Sed is a great tool for replacing text with using regular expressions . If you need to replace A with B only once in each line in a stream just issue sed 's/A/B/'
:
jadi@funlife:~/w/lpic/101$ cat uses
you fedora
jadi ubuntu
rubic windows
neda mac
jadi@funlife:~/w/lpic/101$ sed 's/ubuntu/debian/' uses
you fedora
jadi debian
rubic windows
neda mac
jadi@funlife:~/w/lpic/101$
The pattern for changing EVERY occurrence of A to B in a line is sed 's/A/B/g'
.
Remember escape characters? They also work here and this will remove every new line from a file and will replace it with a space:
jadi@funlife:~/w/lpic/101$ cat mydata
hello
this is the second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
jadi@funlife:~/w/lpic/101$ sed 's/ /\t/g' mydata > mydata.tab
jadi@funlife:~/w/lpic/101$ cat mydata.tab
hello
this is the second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
Getting stats
wc
The wc
is word count. It counts the lines, words, and bytes in the input stream.
jadi@funlife:~/w/lpic/101$ wc mydata
9 25 121 mydata
It is very common to count the line numbers with
-l
switch.
-
You should know that if you put -
instead of a filename, the data will be replaced from the pipe (or keyboard stdin).
jadi@funlife:~/w/lpic/101$ wc -l mydata | cat mydata - mydata
hello
this is the second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
9 mydata
hello
this is second line
but as you can see we are
still writing
and this is getting longer
.
.
and longer
and longer!
Hashing
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. There are different hashes and we use them for different purposes. For example, a site may hash your password in its database to keep it secure (and check the hash of provided password with a hash it already has in DB during logins) a site may provide the hash of a file so you can be sure that you've downloaded the correct file and ...
The hashing algorithms covered in LPIC1 are:
- md5sum
- sha256sum
- sha512sum
You can check any file (or input streams hash with something like this):
jadi@ocean:~$ md5sum /tmp/myfile.txt
8183aa57a23658efe7ba7aebe60816bc /tmp/myfile.txt
jadi@ocean:~$ sha256sum /tmp/myfile.txt
7ddcfda184b55ee06b0c81e0ad136b1aa4a86daeb1078bcaeccc246eb2c8693b /tmp/myfile.txt
jadi@ocean:~$ sha512sum /tmp/myfile.txt
79e5d789528e5e55fc1bddcb381afd56e896b1b452347a76777fb38d76c9754278700036f35df2a53c4d53d3e3623538a8b9ed155a3fd5275e667bdbf3c0b359 /tmp/myfile.txt
As you can see, sha512sum
creates a longer hash which is more secure.
← 103.1 Work on the command line | 103.3 Perform basic file management → |