Introduction
Dealing with huge omics datasets in the genomics era, bioinformatics is essential for the transformation of raw sequence data into meaningful biological information for all branches of life sciences, including aquaculture. Most tasks of bioinformatics are processed using the Linux operating system (OS). Linux is a stable, multi-user, and multi-tasking system for servers, desktops, and laptops. It is particularly suited to working with large text files. Many of the Linux commands can be combined in various ways to amplify the power of command lines. Moreover, Linux provides the greatest level of flexibility for development of bioinformatics applications. The majority of bioinformatics programs and packages are developed on the Linux OS. Although most programs can be compiled to run on Microsoft Windows systems, it is generally more convenient to install and use the programs on Linux systems. Therefore, familiarity with and understanding of basic Linux command lines is essential for bioinformatic analysis. In this chapter, we provide an introduction to the Linux OS and its basic command line tools.
An operating system (OS) is basically a suite of programs that make the computer work. It manages computer hardware and software resources and provides common services for computer programs. Examples of popular modern OSs include Microsoft Windows, Linux, macOS, iOS, BSD, Android, BlackBerry OS, and Chrome OS. All these examples share the root of a UNIX base, except for Microsoft Windows.
The UNIX OS was developed in the late 1960s and first released in 1971 by AT&T Bell Labs. It has been under continuous development ever since. UNIX is proprietary, however, which hindered its wide academic use. Researchers at University of California-Berkeley developed an alternative to AT&T Bell Labs' UNIX OS, called the Berkeley Software Distribution (BSD. BSD is an influential operation system, from which several notable OSs such as Sun's SunOS and Apple Inc's macOS system are derived. In the 1990s, Linus Torvalds developed a non-commercial replacement for UNIX, which eventually became the Linux OS. Linux was released as free open source software, with its underlying source code publicly available, freely distributed, and freely modified. Linux is now used in numerous areas, from embedded systems to supercomputers. It is the most common OS powering web servers around the world. Many Linux distributions have been developed, such as Red Hat, Fedora, Debian, SUSE, and Ubuntu. Each distribution has the Linux kernel at its core, but builds on top of that with its own selection of other components, depending on the target users of the distribution. From the perspective of end users, there is no big difference between Linux and UNIX. Both use the same shell (e.g., bash, ksh, csh) and other development tools such as Perl, PHP, Python, and GNU C/C++ compilers. However, because of the freeware nature of the Linux OS, it has the most active support community.
Linux is well known for its command line interface (CLI), while it also has a graphical user interface (GUI). Similar to Microsoft Windows, the GUI provides the user an easy-to-use environment. Currently, the most common way to interact with a Linux OS is via a GUI. In general, the GUI is powered by a derivative of the X11 Window System, commonly referred to as “X11.” A desktop manager runs in the X11 Window System and supplies the menus, icons, and windows to interact with the system. The KDE (the default desktop for openSUSE) and GNOME (the default desktop for Ubuntu) are two of the most popular desktop environments. On the modern Linux OS, although the GUI provides the graphical “user-friendliness,” the “unhandy” text-based CLI is where the true power resides. In the field of bioinformatics, almost all applications are executed with CLI.
Linux is a stable, multi-user, and multi-tasking system for servers, desktops, and laptops. It is particularly suited to working with large text files because it has a large number of powerful commands that specialize in processing text files. Most of these commands can be further combined in various ways to amplify the power of command lines. In the genomics era, with sequencing data being explosively accumulated, bioinformatics has become a scientific discipline of its own. Bioinformatics relies heavily on the Linux OS because it mostly works with text files containing nucleotide and amino acid sequences. Moreover, Linux provides the greatest level of flexibility for the development of bioinformatics applications. The majority of bioinformatics programs and packages are developed on Linux-based systems. Although most bioinformatics programs can be compiled to run on Microsoft Windows systems, it is more convenient to install and use the program on Linux-based systems.
In this chapter, we introduce the Linux OS and its basic command lines. All commands introduced in Linux are valid for UNIX or any UNIX-like OSs. This chapter functions as a boot cam...