What are field separators in operating-programming languages (such as Bash)?

−0

In Bash, IFS is an internal variable and it stands for "Internal Field Separator" <- according to this link, it "determines how Bash recognizes fields, or word boundaries, when it interprets character strings".

Its default value is a "whitespace" (space, tab, and newline), but you can change it to whatever you need.

To give an example, using the default value, the following commands:

text="a:b c-d e/f"
for word in $text; do echo "Word: $word"; done

would ouput:

Word: a:b
Word: c-d
Word: e/f

Note that the spaces were used to split the string into fields/"words", so each iteration of the for loop gets one part of the split results.

But if we change IFS:

IFS=':-/'
text="a:b c-d e/f"
for word in $text; do echo "Word: $word"; done

Now the output is:

Word: a
Word: b c
Word: d e
Word: f

By setting IFS=':-/', I'm saying that :, - and / should be the characters used to determine a field/"word" boundaries, thus the result is quite different (note that the spaces were "ignored", so b c and d e are considered two fields/"words").

If we change to IFS=':', only the : character will be considered, and the result would be only 2 fields: a and b c-d e/f.

IFS is used by other commands, such as read:

IFS=':'
echo "abc:def" | (read x y; echo "x=$x y=$y")
# output is "x=abc y=def"

And it also affects the output of the special variable $* (which contains all the command line arguments of a script), when printed inside double quotes. Suppose I have this simple script:

#!/bin/bash
echo "Args: $*"

If I run this script: script.sh a b c, the output will be Args: a b c.
But if I change it to:

#!/bin/bash
IFS=':'
echo "Args: $*"

The first character of IFS will be used in the output, and displayed between the fields, so the output will be: Args: a:b:c.

One detail regarding whitespace versus non-whitespace characters: if IFS contains whitespace, a sequence of one or more whitespaces is considered to be a single separator, but a sequence of one or more non-whitespaces isn't. Example:

# text with 4 spaces before "c", and a trailing space in the end
text='a::b    c '
# IFS is just a space
IFS=' '
for word in $text; do echo "Word: [$word]"; done

In this case, IFS is just a space, but a sequence of one or more spaces is considered to be a single separator, so the output is:

Word: [a::b]
Word: [c]

If we set IFS=':', now the field separator is a non-whitespace, so a sequence of one or more is not considered a single separator, and the output would change to:

Word: [a]
Word: []
Word: [b    c ]

The second field is an empty string, because each : is another separator, and :: is considered "an empty string between two :".

But anyway, the field separator can be whatever you need, not limited to the ones defined in the question.

posted about 4 years ago

CC BY-SA 4.0

4y ago

hkotsubo‭

5235 reputation 21 70 590 239

Copy Link

Raw

Markdown

History

1 comment thread

and to learn about Field Separators in awk see https://www.gnu.org/software/gawk/manual/gawk.html#Fie... (1 comment)

Communities

What are field separators in operating-programming languages (such as Bash)?

1 comment thread

1 answer

1 comment thread