sort(1) — Commands

OSF

NAME

sort − Sorts or merges files

SYNOPSIS

sort [−Abcdfimnru] [−o output-file] [−tcharacter] [−T directory]
[-y[kilobytes] [-zrecord-size] [+fskip][.cskip] [-fskip][.cskip]
[-bdfinr] ... file

The sort command sorts lines in its input files and writes the result to standard output.

FLAGS

-ASorts on a byte-by-byte basis using each character’s encoded value. One some systems, extended characters will be considered negative values, and so sort before ASCII characters.

-bIgnores leading spaces and tabs in sort key comparisons (that is, when +fskip.cskip and -fskip.cskip are used).

-cChecks that the input is sorted according to the ordering rules specified in the flags. Displays nothing unless the file is not sorted.

-dSorts according to the collating sequence. Only letters, digits, and spaces are considered in comparisons.

-fMerges uppercase and lowercase letters. Case is not considered in the sorting.

-iSorts only by characters in the ASCII range octal 040-0176 (all printable characters and the space character) in nonnumeric comparisons.

-mMerges only (assumes sorted input).

-nSorts any initial numeric strings (consisting of optional spaces, optional dashes, and 0 (zero) or more digits with optional decimal point) by arithmetic value. The -n flag automatically specifies the -b flag when you use sort keys.

-o output_file
Directs output to output_file instead of standard output. output_file can be the same as one of the input files.

-rReverses the order of the specified sort.

-tcharacter
Sets field separator character to character. To specify the tab character as the field separator, you must enclose it in ’ ’ (single quotes).

-T directory
Places all the temporary files that are created in directory.

-uSuppresses all but one in each set of equal lines. Ignored characters (such as leading tabs and spaces) and characters outside of sort keys are not considered in this type of comparison.

-ykilobytes
Starts the sort command using kilobytes of main storage and adds storage as needed. (If kilobytes is less than the minimum storage size or greater than the maximum, the mimimum or maximum is used instead). If the -y flag is omitted, the sort command starts with the default storage size; -y0 starts with minimum storage, and -y (with no value) starts with the maximum storage. The amount of storage used by the sort command has a significant impact on performance. Sorting a small file in a large amount of storage is wasteful.

-zrecord_size
Prevents abnormal termination if lines being sorted are longer than the default buffer size can handle. When the -c or -m flags are specified, the sorting phase is omitted and a system default size buffer is used. If sorted lines are longer than this size, sort terminates abnormally. The -z option specifies that the longest line be recorded in the sort phase so that adequate buffers can be allocated in the merge phase. record_size must be a value in bytes equal to or greater than the number of bytes in the longest line to be merged.

DESCRIPTION

The sort command treats all of its input files as one file when it performs the sort. A - (dash) in place of a filename specifies standard input. If you do not specify a filename, it sorts standard input.

The sort command can handle a variety of collation rules typically used in Western European languages, including primary/secondary sorting, one-to-two character mapping, N-to-one character mapping, and ignore-character mapping. To summarize briefly:

Primary/Secondary Sorting
In this system, a group of characters all sort to the same primary location. If there is a tie, a secondary sort is applied. For example, in French, the plain and accented a’s all sort to the same primary location. If two strings collate to the same primary location, the secondary sort goes into effect. These words are in correct French order:

abord
ˆpre
aprs
ˆpret
azur

One-to-Two Character Mappings
This system requires that certain single characters be treated as if they were two characters. For example, in German, the ß (scharfes-S) is collated as if it were ss.

N-to-One Character Mappings
Some languages treat a string of characters as if it were one single collating element. For example, in Spanish, the ch and ll sequences are treated as their own elements within the alphabet. (ch comes between c and d in the alphabet, and ll comes between l and m.

Ignore-Character Mappings
In some cases, certain characters may be ignored in collation. For example, if - were defined as an ignore-character, the strings re-locate and relocate would sort to the same place.

The results that you get from sort depend on the collating sequence as defined by the current setting of the LC_COLLATE environment variable. The configuration files for collation and character classification information are /usr/lib/nls/loc/locale.

The sort key is the part of each line used for sorting. The default sort key is the entire line.

The two numbers, fskip and cskip, specify the sort key. Both numbers have two parts, as follows:

The fskip variable specifies the number of fields to skip from the beginning of the input line, and the cskip variable specifies the number of additional characters to skip to the right beyond that point. For both the starting point (+fskip.cskip) and the ending point (-fskip.cskip) of a sort key, fskip is measured from the beginning of the input line, and cskip is measured from the last field skipped. If you omit .cskip, .0 is assumed. If you omit fskip, 0 (zero) is assumed. If you omit the ending field specifier (-fskip.cskip), the end of the line is the end of the sort key.

You can supply more than one sort key by repeating +fskip.cskip and -fskip.cskip. In cases where you specify more than one sort key, keys specified further to the right on the command line are compared only after all earlier keys are sorted. For example, if the first key is to be sorted in numerical order and the second according to the collating sequence, all strings that start with the number 1 are sorted according to the collating order before the strings that start with the number 2. Lines that are identical in all keys are sorted with all characters significant. You can also specify different flags for different sort keys in multiple sort keys.

A field is one or more characters bounded by the beginning of a line and the current field separator, or one or more characters bounded by a the field separator on either side. The space character is the default field separator.

Lines longer than 1024 bytes are truncated by sort. The maximum number of fields on a line is 10.

Japanese Language Support

When using the -i flag, the sort command sorts only by printable characters in nonnumeric comparisons.

EXAMPLES

1.To perform a simple sort, enter:

sort fruits

This displays the contents of fruits sorted in ascending lexicographic order. This means that the characters in each column are compared one by one, including spaces, digits, and special characters. For instance, if fruits contains the text:

banana
orange
Persimmon
apple
%%banana
apple
ORANGE

then sort fruits displays:

%%banana
ORANGE
Persimmon
apple
apple
banana
orange

This order follows from the fact that in the ASCII collating sequence, symbols (such as %) precede uppercase letters, and all uppercase letters precede the lowercase letters. If you are using a different collating order, your results may be different.

2.To group lines that contain uppercase and special characters with similar lowercase lines, and remove duplicate lines, enter:

sort -d -f -u fruits

The -u flag tells sort to remove duplicate lines, making each line of the file unique. This displays:

apple
%%banana
orange
Persimmon

Note that not only was the duplicate apple removed, but banana and ORANGE were removed as well. The -d flag told sort to ignore symbols, so %%banana and banana were considered to be duplicate lines and banana was removed. The -f flag told sort not to differentiate between uppercase and lowercase, so ORANGE and orange were considered to be duplicate lines and ORANGE was removed. When the -u flag is used with input that contains nonidentical lines that are considered by sort (due to other flags) to be duplicates, there is no way to predict which lines sort will keep and which it will remove.

3.To sort as in Example 2, but remove duplicates unless capitalized or punctuated differently, enter:

sort -u +0 -d -f +0 fruits

Flags appearing between sort key specifiers apply only to the specifier preceding them. There are two sorts specified in this command line. +0 -d -f specifies the first sort, of the same type done with -d -f in Example 3. Then +0 performs another comparison to distinguish lines that are not actually identical. This prevents -u, which applies to both sorts because it precedes the first sort key specifier, from removing lines that are not exactly identical to other lines. Given the fruits file shown in Example 1, the added +0 distinguishes %%banana from banana and ORANGE from orange. However, the two instances of apple are exactly identical, so one of them is deleted.

apple
%%banana
banana
ORANGE
orange
Persimmon

4.To specify a new field separator, enter:

sort -t: +1 vegetables

This sorts vegetables, comparing the text that follows the first colon on each line. -t: tells sort that colons separate fields. +1 tells sort to ignore the first field and to compare from the start of the second field to the end of the line. If vegetables contains:

yams:104
turnips:8
potatoes:15
carrots:104
green beans:32
radishes:5
lettuce:15

then sort -t: +1 vegetables displays:

carrots:104
yams:104
lettuce:15
potatoes:15
green beans:32
radishes:5
turnips:8

Note that the numbers are not in ascending order. This is because a lexicographic sort compares each character from left to right. In other words, 3 comes before 5 so 32 comes before 5.

5.To sort on more than one field, enter:

sort -t: +1 -2 -n +0 -1 -r vegetables

This performs a numeric sort on the second field (+1 -2 -n) and then, within that ordering, sorts the first field in reverse collating order (+0 -1 -r). The output looks like this:

radishes:5
turnips:8
potatoes:15
lettuce:15
green beans:32
yams:104
carrots:104

The lines are sorted in numeric order; when two lines have the same number, they appear in reverse collating order.

6.To replace the original file with the sorted text, enter:

sort -o vegetables vegetables

(-o vegetables) stores the sorted output into the file vegetables.

7.To collate using Spanish rules, set the LC_COLLATE (or LANG) environment variable to a Spanish locale, and then use sort in the regular way, enter:

sort sp.words

The input file named sp.words contains the following Spanish words:

dama
loro
chapa
canto
mover
chocolate
curioso
llanura

The sorted file looks like this:

canto
curioso
chapa
chocolate
dama
loro
llanura
mover

If you sort the file using ASCII collation rules, the output looks like this:

canto
chapa
chocolate
curioso
dama
llanura
loro
mover

FILES

/usr/lib/nls/loc/localeConfiguration files.

RELATED INFORMATION

Commands: comm(1), ctab(1), join(1), uniq(1).

Files: ctab(4).

OSF/1 User’s Guide.

Museum

Related Articles