Command Line Tutorial

Let’s go back to the files that include sequences from Brugia. If you don’t remember which these were, use grep and the wildcard * to get that information.

We have them now in three separate files, but maybe we want to look at them all together, or have a single file to give to a program like SeaView or Blast. The fasta format can hold multiple sequences, as long as they each start with a line that has a > plus the sequence identifier, followed by a carriage return and the sequence data. For example:

>seq_1
CAGCTAGTCGATGCTAGCTAGCTAGTCGAGC
>seq_2
GCTAGCTAGCTAGCTAGCTAGCTGATCGATG

Let’s make a single fasta file that contains the three Brugia sequences. The cat command will concatenate multiple files into one. It’s syntax is

cat (list of files separated by a space) > (name of output file)

The parentheses are for clarity – don’t type them!

So try…

cat KP760414.fasta KP760415.fasta KP760416.fasta > brugia.fasta

This should make a new file called brugia.fasta with the three sequences in it.

Use the more command we learned to look at all the sequences in the new file!

One last important Unix command is rm for remove. Now that you have concatenated the three Brugia sequences, you don’t need the individual fastas anymore, and you want to avoid cluttering up your computer. If you do bioinformatics, sooner or later you will run out of space on your computer, even if you have one with tons of storage. The rm command is here to help! Remember, the files we concatenated are KP760414.fasta, KP760415.fasta, and KP760416.fasta. Type this into your terminal

ls KP760414.fasta

If you are pointing to the correct folder, the computer should print the file name back. That means it exists. Now type

rm KP760414.fasta

and try the ls KP760414.fasta command again. Here’s a neat trick – just hit the up arrow and you can scroll through all your previous commands. When you find the one you want, hit return and it will run again! If you run ls KP760414.fasta again you should see:

ls: KP760414.fasta: No such file or directory

This response means that the file KP760414.fasta no longer exists. The rm command removed it.

Go ahead and remove the other two files that have Brugia sequences.

The rm command can be a little dangerous, because it immediately removes the file from memory, unlike moving the file to the trash, so it can’t be recovered. It is especially dangerous if you use it with the wildcard *. Look at all the files in your directory with ls. Then try:

rm KP*

Look at all the files in your directory with ls again – with one command you removed all the files that began with KP. If you wanted to keep some of them, you would be out of luck!

Now move into the parent directory of the Unix folder. Do you remember how?

Once you are there, you are in the Bioinformatics folder. Type ls and see the Unix folder. Type ls * to see all the files inside the Unix folder. Try removing the Unix folder with:

rm Unix

You should get an message

rm: Unix: is a directory

and if you type ls you see that Unix is still there. rm specifically removes files, but leaves directories alone. This is a good safety measure, since you don’t want to accidentally remove a folder full of important files! If you *really* want to remove a folder, there is an option for rm. Adding the -r option (for recursive) will recursively remove all the files inside Unix, and then the Unix folder itself. As you might imagine, this is dangerous and should only be used if you are quite sure you want to get rid of everything! Try it now.

rm -r Unix

Try this code and look into your Bioinformatics folder. Unix and all the files in it are now gone. Now cd up one level and remove the Bioinformatics folder, too.

Here is a list of all the commands you have learned:

  • cd
  • mkdir
  • mv
  • cp
  • ls
  • man
  • unzip
  • more
  • tail
  • grep
  • cat
  • rm
  • and the shortcuts and wildcards / ~ . .. *

    Now go find your own files to play with and have fun!