sed and the t character

Dear Grymoire,
I am trying to convert a pipe character into a tab. I tried
sed y/|/t/ <in>out
but it did not work. Instead of a tab, I got a ‘t’.  What did I do wrong?

I thought this was a good lesson, as there are several misunderstandings a beginner might have when they write a shell script.

The first lesson concerns meta-characters and quoting, which I covered in my tutorial.

A meta-character is simply a character that has a special meaning.

The character ‘t’ is a normal character. The character ” is a special character. It has a special meaning. However, the meaning of this character changes depending on how and where it is used.  You have to know the context. ‘|’ is another example. It may or may not be a meta-character.

So what was wrong with ‘|’ and ‘t’?

To break down this problem into steps, there are some basic concepts.

In Unix shell scripts. the shell is the interface between you and the computer. When you write a shell script, and that script calls a program like sed, there are two steps

  1. The shell reads the shell script.
  2. The shell passes the characters it sees to the program.

Let’s take for example the above script. How does the shell see the script? The easiest way is to insert echo as the first command on the line.

We also have to remove the <in >out so we can see what happens. So the new command is simply

echo sed y/|/t/

And this command echos the following to the screen:

sed y/|/t/

So the pipe character is seen by the shell and echoed. But where is the tab character?. What happened?

Normally, the shell sees the ‘|’ character as a special character. It’s a meta-character because of this special meaning. It tells the shell to start a new process, and you take the output of the previous character, and send it to the new process.

Remember that the shell has three kinds of quotations, a single quote character, a double quote character, and a backslash.

So when you quote a special character,it becomes a special kind of special character.

In other words, it’s normal. Crazy rules, but that’s how it works.  Quoting special characters in the shell make them normal and ordinary – just like the letters a-z, etc.

So the ‘|’ is correct.  So is using single or double quotes. So why didn’t ‘t’ work? Well, on some systems, sed will output either a ‘t’ or a ‘t’ instead of a tab. What is happening?

The second lesson is this:

To understand what is a meta-character, you have to be familiar with the program that is looking at the character.

The shell sees ‘t’ to be the same as ” followed by ‘t’, so it treats them the same way.

What does sed do with ‘t’?  Well, it depends.

Some versions of sed treat ‘t’ to be the same as ‘t’. Others treat ‘t’ as a tab.

The person who send me this problem was trying to get a tab character into this part of the script. Well, there are two approaches:

  1. Let sed convert ‘t’ into a tab
  2. Let the shell convert ‘t’ into a tab, and pass this into sed.

This causes a problem if neither the shell nor sed interpret ‘t’ to be a tab character. There are some programs that do treat  ‘t’ is a tab character. These include

  • The C programming language in a string
  • The print(1) command, which is based on the C language
  • perl
  • awk
  • The tr(1) command
  • Some versions of the echo(1) command.
  • … and several other computer languages.

But the shell does not. And sed may (or may not). So what’s the answer?

It’s quite simple.

If you want sed to insert a tab, you may have to include the tab character in the script.

In other words, the answer is to use the following

sed ‘y/|/       /’ <in >out

Where the character between the ‘/…/’ is a tab character. Unfortunately, I cannot show a tab character there, because it looks like zero or more spaces, depending upon the tab stops in your terminal.  However, you are reading this using a browser, which cannot show tabs either.

So how do you get a tab character in the script? If you are using an editor, you can edit a file that contains the script and insert a tab character.

This seemingly simple task can be tricky. Some editors convert tabs to spaces. This can depend upon the settings in your Unix terminal.

To make sure you have a tab character in the script, use ‘od -c’ to display the file. In other words, if you had the script

#!/bin/sh
sed 'y/|/       /'

and typed in od -c script, you would see

0000000   #   !   /   b   i   n   /   s   h  n   s   e   d       '   y
0000020   /   |   /  t   /   '  n
0000027

If this doesn’t work, some of your terminal settings may be wrong.

A second way to get it to work is to enter the script from the command line. The Control-V character tells the terminal handler to “quote” the next character. If it is converting tabs to spaces. Control-V <Tab> may let you type a real tab character.

The third way is to use another program to create the tab character. tr(1) understands ‘t’ so let it create it for you. One way is to set a variable to be equal to a tab character, and then use this variable inside of double quotes:

#!/bin/sh
tab=`echo a | tr a 't'`
sed "y/|/$tab/" <in >out

This format is useful if you want to publish  a script on the web. That’s because the script does not contain any non-printing characters. This makes it easy to cut and paste a script. Copying a tab character, and pasting it into another window may not work.

There is one more point

Don’t change characters when you don’t need to

In other words, instead of changing the pipe character to a tab, just use the pipe character as the field separator.  Awk allows this

Just use

awk -F’|’ ‘{print $3}’

or whatever.

Advertisements
This entry was posted in Shell Scripting and tagged , , , , , , . Bookmark the permalink.

9 Responses to sed and the t character

  1. motey says:

    The articles are great and very informative.
    I learn a lot.
    Thank you.

    In other words, instead of changeing the pipe character ti a tab
    sed -e ‘s/changeing/changing/’ -e ‘s/ ti / to /’
    Just being a spelling checker.

    [Fixed! Thanks ]

  2. Ed says:

    Many thanks! i got here from the sed tutorial… great job! well done!!

  3. Michael says:

    Error in last sed listing:

    Not:

    sed “y/|/$tab/’ out

    Instead:

    sed “y/|/$tab/” out

    [Corrected. Thanks !]

  4. Michael says:

    HTML ate the -gt and -lt signs. But the point was that the last quote in the listing should be double-quotes.

  5. @Grymoire:
    Hi! Good post. Just a minor point: assuming your implementation of sed does convert ‘t’ to a tab, as GNU sed does, then you can just use single quotes to prevent your shell from seeing anything:

    sed ‘y/|/t/’ < in >out

    This goes back to one of the first statements you made in the very beginning of your tutorial:

    However, I recommend you do use quotes. If you have meta-characters in the command, quotes are necessary. And if you aren’t sure, it’s a good habit

    Indeed, this is a good habit. 🙂

  6. Jim says:

    Great writeups on sed & regexs! It’s too bad tr(1) can only be used as a filter, unlike its bigger bros. sed & awk. This limits its command forms, e.g., you can’t say:

    $ tr ‘t’ ‘|’ a-very-tabby-file | less

    I wanted to point out what I believe is a small error in your regex page (you never know w/ these things!) Shouldn’t this construction be labeled “Basic” instead of “Extended”?

    {M,N} Extended Modifier M to N Duplicates

    BTW, you’re featured twice on my scripts link page:
    http://www.fastnlight.com/links/script.html

    • Grymoire says:

      Jim,
      tr can always accept input using the STDIN redirect character
      $ tr ‘t’ ‘|’ <a-very-tabby-file | less
      Also {M,N} was not included in the original Regex expressions. Sed didn't have this feature years ago. Things may have changed. I'll look into it. Thanks.

  7. Lorens Kockum says:

    Hi,

    Let me first thank you very very much for the very best sed tutorial
    I’ve found.

    I printed the Jul 15 2009 version, and I have kept it at hand
    since then, annotating it when I found anything at all. I
    intended to send you the corrections, but now when I sit down
    to do so, I find that in the current version most of the
    corrections have already been applied 🙂 There are some minor
    typos left in the text, probably nothing I’d have bothered to
    nitpick if there hadn’t been that echo and < at the same time,
    but here they are. They are in the order of the text.

    Thank you again.

    s/CRTL-I/CTRL-I/

    s/The next line line is "CD/The next line is "CD/

    In the paragraph following sed_delete_between_two_words.sh,
    there is an unbalanced opening (

    s/it has it's own programming language/it has its own programming language/

    s/^please forgive errors/Please forgive errors/
    # Well, yes 🙂

    s/my tutorial on Other of/my tutorial on sed. Other of/

    s/occured/occurred/
    # http://www.thefreedictionary.com/occurred

    Oh and BTW the Postscript link on http://www.grymoire.com is broken, http://www.grymoire.com/Postscript doesn't work without adding /index.html

    [Thanks – It’s fixed now! Bruce]

  8. Alvin says:

    This is one of the best Unix tutorials covering some tricky stuff in sed, I would like to thank you a million, is there a source for this good stuff where can I get more thanks in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s