Code your own simple shell in C language step by step
What is a shell?
A Shell is a program that takes the command inputs written from the the user’s keyboard and passes them to the machine to execute them through the kernel. It also verifies if the command inputs from the user are correct.
The prompt:
The first step is to create an infinite loop that is always ready to take any command and prints the shell’s symbol –in the example above, the “($)”–. In this way we are setting a shell in an interactive mode. The non-interactive way is, for example, when you type echo "/home/user/my_shell" | cd
.
You need to have in mind here the stdio.h library. It is a library that contains the functions to the input and output processes of the system. For example, it has functions to read commands from the user and to write in screen the results of his requests. At system startup, three streams of data are opened: stdio, stdout and stderr.
Each one of those represents a buffered stream contained in a file, which has a file descriptor. We will be working with those files inside our loop.
What happens when we types in ls -l*.c in the shell terminal?
First of all, we need to know what is a shell, a shell is the command interpreter between the user and the computer, when you write a command on the terminal, the shell translates the message to the computer and then the command is executed.
So what happens when you type ls *.c?
The ls command tells the program to list all files and directories in the current working directory. In this input however, there is a wildcard that is implemented before the ‘.c’. The wildcard is symbolized by the asterisk. The asterisk wildcard matches the characters that are placed after it, and tells the program to look for files that match those same characters. When one places an asterisk before ‘.c’ like in this example, you are telling the program to only list files that end with a ‘.c’.
To understand this command line, you need to split it.
First, we have ‘ls’, which is the command for listing files. ‘ls’ display all the files and directories located in the directory or folder that you indicates. If you don’t indicate a folder, ‘ls’ will list the files and folders of the current directory.
then, we have “-l ” Displays permissions, links, owner, group, size, time, name.
The output from “ls –l” summarizes the most important information about the file on a single line. If the specified pathname is a directory, ls displays information about every file in that directory (one file per line). It precedes this list with a status line that indicates the total number of file system blocks occupied by files in the directory (in 512-byte chunks or 1024-bytes if –k option is used). Following is a sample of the output along with an explanation:
Then, we have ‘*’, that is the character that represents all the files of a location. ‘.c’ is part of the name of a file or multiple files, so the combination of ‘*.c’ represents all the files with a name finished in ‘.c’. For example, if the order of characters where ‘c*’, the combination would represent all the files started with c and finished with something else.
So, the result of type and run ‘ls -l*.c’ in the shell prompt is a list of all the files in the current directory with a name finished in ‘.c’.
Take that “ls -l” string and parse it for the kernel
Once your prompt is running, the next step is to use the getline() and strtok() functions. They will take the string the user inputs in the stdin, store it into an array of characters and parse it according to a delimiter you decide –for example, “ ”, “-”, or whatever you need–.
It is really important for you to remember that toks needs to be malloc/ realloc/free, as it will need memory reserved to store char pointers to each one of the tokens.
If the first built-in command or command exists, we execute different logic. First we would capture the entire command and create a double pointer array, with each command being stored as a char pointer, followed by a NULL terminator at the end of our double pointer array. We do this by first finding the number of commands in the prompt and add one for the NULL terminator. Next we use the strtok function, man strtok
if you need more details and use a " "
as a delimiter. We iterate in a loop until each token we’ve assigned from the strtok function is NULL. Each iteration we’ll have to malloc enough space for each command, meaning we’ll have to know the exact length of each command before mallocing. Then we use the string
functions to copy each token to the newly malloced space, or in our case, our own string
functions we built.
aliases in Linux shell commands:
An alias is a (usually short) name that the shell translates into another (usually longer) name or command. Aliases allow you to define new commands by substituting a string for the first token of a simple command. They are typically placed in the ~/.bashrc (bash) or ~/.tcshrc (tcsh) startup files so that they are available to interactive subshells.
Environment search for PATH:
Once all aliases and special characters have expanded, the shell looks through the first word of a command to check if it’s a built-in function before checking for a program in the PATH. After it checks for built-ins, the shell checks your PATH variable and uses each directory in the PATH variable to check if the command exists, in this case: isls
is located in each directory?. (FYI, ls
is a built-in, but we’ll proceed in this case as if it wasn’t). Check all of your environment variables by typing in $ env
into your shell program and look for your PATH variable.
Example Output:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin/:bin
Your shell program looks through each directory separated by a :
to see if the ls
command exists in each of these directories. When recreating the shell, we used a function called stat
to check if the ls
command existed by appending /ls
to each PATH directory and entering the new PATH directory into the stat
function to see if the executable file exists in each directory. If it didn’t exist, we recreated an error message similar to the normal output of the shell.
Finding the PATH:
If we pass in just ls
without the PATH where the command lives, our simple shell won’t work. This is why we have to have another condition that finds all the directories in the PATH variable, appends /ls
to each PATH variable, then uses stat
to check if the file exists. If it does exist, then use execve
to execute the program.
For example, let’s take the PATH variable used above:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Example structure:
char *all_directories = [“/usr/local/sbin/ls”, “/usr/local/bin/ls”, “/usr/sbin/ls”, “/usr/sbin/ls”, “/usr/bin/ls”, “/sbin/ls”, “/bin/ls”, NULL];
You’ll have to loop through your data structure, check if the file exists using stat
, and if it does, execute the program using execve
. But how does the program continue to run and output a new prompt that waits for a new command?
Creating New Processes:
Before reading on, man fork
and read this. For extra credit, type in : pstree -pn
into your terminal. This shows all the processes running on your computer via a nice illustration in your terminal. For your program to persist, you’ll need two things. You’ll need a while loop that runs forever or until a certain condition is met like reaching an end of file and you’ll need to be able to create a new child process from the parent process associated with your executable file (./hsh
in our example) during each iteration of your while loop.
To get an interactive example, clone this repo, move to the test_folder and compile the program like so:
gcc -Wall -Werror -Wextra -pedantic environment.c error_message.c free_it_all.c helper_functions.c strtok_example.c prompt.c
The PPID, which is the parent process associated with the /.a.out
file should always be the same each time you execute the program. It will only change when you exit out of the program and execute the program again. It will stay constant through the duration of your executable file. The PID, which is the child process spawned each time from the parent process, should have a new process id associated with it each time we enter a command or script to the simple shell.
Tying it all together, each time you enter a command a new child process needs to be created and the parent process needs to wait for the child process to execute before proceeding forward, man wait
for more details. Once the child process executes, all mallocs associated with the child process need to be freed before passing control back to the parent process.
If you have any questions or comments, feel free to add them below. Thanks for your time.