Introduction to Operating Systems

CS 3113, Fall 2018, Project 2, Due 10/18/2018

For this project, you will expand on what you’ve learned in Project 0 and your code in Project 1 to expand your shell. Your code for Project 2 will do the following:

Added commands for directory management.
Replace calls to system() with fork-exec pattern.
Add I/O redirection to the command line.

Remember to read this specification in full. Please post questions in the Project 2 Talk discussion board. For private questions, email cs3113@googlegroups.com.

Grading Criteria

Task	Percent
Code compiles with `make clean` and `make all`	10%
Documentation: Proper functional-level and inline documentation. README is thorough and complete.	40%
Correctness: This will be assessed by giving your code a range of inputs and matching the expected output.	50%
Total	100%

Below, is an implementation checklist for your convenience.

Submission

Your code, executable, makefile and README must all be on your instance in the /projects/2/ directory. Please note that this location is NOT under your home directory. You must also submit you code as project2.tar.gz in canvas.

Directory Management

mimic and morph

In this project, you will extend the abilities of the mimic and morph commands from project 1. mimic and morph still should both copy files or directories from one location to another; morph command should remove the old files and directories once copied. Your code should open the appropriate files and move the bytes, do not use the mv or rm shell calls. Directories supplied may or may not have the full path. Also, directories may or may not have a trailing slash. This breaks down to the following:

`[src]`	`[dst]`	Description	Comment
existing file	existing file	Success	The file is copied and has the same name as `[dst]`
existing directory	existing file	Error	You cannot write a directory to a file.
missing file	existing file	Error	Nothing to be copied.
missing directory	existing file	Error	Nothing to be copied.
existing file	existing directory	Success	`[src]` should be written into the `[dst]` directory. The file name of [src] is maintained.
existing directory	existing directory	Success	If `-r` is supplied, the `[src]` directory and all of its contents are copied into the directory `[dst]`. If `-r` is not supplied, and the directory is empty, the empty folder can be copied into `[dst]`; if the `[src]` folder is non-empty and `-r` is not supplied, the command should fail.
missing file	existing directory	Error	No source to be copied.
missing directory	existing directory	Error	No source to be copied.
existing file	missing file	Success	Will create a new file with the name `[dst]` assuming the location is valid.
existing directory	missing file	Error	You cannot write a directory to a file.
missing file	missing file	Error	Nothing to be copied
missing directory	missing file	Error	Nothing to be copied.
existing file	missing directory	Error	You cannot write to missing location.
existing directory	missing directory	Success	If the parent exists and `-r` is supplied will copy with the name of `[dst]`. If `-r` is not supplied and the directory is empty, the empty directory should be copied under the parent of `[dst]` with the name given in `[dst]`. If `-r` is not supplied and the directory is non-empty the command should throw an error.
missing file	missing directory	Error	You cannot write a missing file
missing directory	missing directory	Error	Both parameters are missing

Examples

mimic foo/bar.txt /foobar/baz will result in the file /foobar/baz/bar.txt, assuming that /foobar/baz and foo/bar.txt already exist.

mimic -r foo/ /foobar/baz/ will result in all of the contents of foo/ being copied to /foobar/baz, assuming that both directories already exist. Note that this is a recursive copy, so any subdirectoies of foo/ will also be copied.

mkdirz and rmdirz

In addition to the commands above, you should create mkdirz and rmdirz that create and remove directories, respectively.

The rmdirz [path] command is only expected to work on empty directories (otherwise, give an error).

The mkdirz [path] command must contain a [path] that does not already exist. The the parent of [path] must exist.

Examples

mkdirz foo/bar/baz will create directory baz in foo/bar/ relative to the current working directory (the latter must already exist).

rmdirz foo/bar will remove directory bar from foo only if bar is empty.

Extending your shell

Previously, you wrote a simple shell that looped reading a line from standard input and checked the first word of the input line. While you are at it you might as well put the name of the current working directory in the shell prompt! If the current working directory is /projects/2/ The system prompt must look like the following /projects/2==>.

Adding `fork` and `exec` to the shell

So far, our shell has used the system call to pass on command lines to the default system shell for execution. Since we need to control what open files and file descriptors are passed to these processes (i/o redirection), we need more control over their execution.

To do this we need to use the fork and exec system calls. fork creates a new process that is a clone of the existing one by just copying the existing one. The only thing that is different is that the new process has a new process ID and the return from the fork call is different in the two processes.

The exec system call reinitializes that process from a designated program; the program changes while the process remains! Make sure you read the notes on fork and exec below and the text.

Your TODOs

1. In your program, replace the use of system() with fork and exec. This include the commands that are sent to the shell.

2. You will now need to more fully parse the incoming command line so that you can set up the argument array (char *argv[] in the above examples). N.B. remember to malloc/strdup and to free memory you no longer need!

3. You will find that while a system function call only returns after the program has finished, the use of fork means that two processes are now running in foreground. In most cases you will not want your shell to ask for the next command until the child process has finished. This can be accomplished using the wait or waitpid functions (see below for more detail). e.g.

switch (pid = fork ()) { 
   case -1:
      syserr("fork"); 
   case 0:                 // child 
      execvp (args[0], args); 
      syserr("exec");
   default:                // parent
      if (!dont_wait)
         waitpid(pid, &status, WUNTRACED);
 }  

In the above example, if you wanted to run the child process ‘in background’ (i.e., the parent does not wait for the child to finish before continuing its execution), the flag dont_wait would be set and the shell would not wait for the child process to terminate.

4. The commenting in the above examples is minimal. In the projects you will be expected to provide more descriptive commentary!

I/O Redirection

Your project shell must support i/o-redirection on both stdin and stdout. i.e. the command line:

programname arg1 arg2 < inputfile > outputfile

will execute the program programname with arguments arg1 and arg2, the stdin FILE stream replaced by inputfile and the stdout FILE stream replaced by outputfile. For more cases on how redirection I/O redirection should be handled, please visit the textbook website: http://www.tldp.org/LDP/intro-linux/html/sect_05_01.html.

With output redirection, if the redirection character is > then the outputfile is created if it does not exist and truncated if it does. If the redirection token is >> then outputfile is created if it does not exist and appended to if it does.

Note: you can assume that the redirection symbols, < , > and >> will be delimited from other command line arguments by white space - one or more spaces and/or tabs. This condition and the meanings for the redirection symbols outlined above and in the project may differ slightly from that of the standard shell.

Examples

filez > filelist.txt will execute your command filez in the current working directory and send its output to the file filelist.txt (if the file existed before the call, then the file contents will first be truncated.

wc < project2.c > word_count.txt will pass the unrecognized command (wc) to the bash shell. The input to this command will be project2.c; the output will be placed in word_count.txt.

~~ditto This is my sentence > my_sentence.txt will place the specified string into the file my_sentence.txt.~~

I/O Redirection Implementation

I/O redirection is accomplished in the child process immediately after the fork and before the exec command. At this point, the child has inherited all the filehandles of its parent and still has access to a copy of the parent memory. Thus, it will know if redirection is to be performed, and, if it does change the stdin and/or stdout file streams, this will only effect the child and not the parent.

You can use open to create file descriptors for inputfile and/or outputfile and then use dup or dup2 to replace either the stdin descriptor (STDIN_FILENO from unistd.h) or the stdout descriptor (STDOUT_FILENO from unistd.h).

However, the easiest way to do this is to use freopen. This function is one of the three functions you can use to open a standard I/O stream.

#include <stdio.h>

FILE *fopen(const char *pathname, const char * type);

FILE *freopen(const char * pathname, const char * type, FILE *fp);

FILE *fdopen(int filedes, const char * type);

// All three return: file pointer if OK, NULL on error

The differences in these three functions are as follows:

fopen opens a specified file.
freopen opens a specified file on a specified stream, closing the stream first if it is already open. This function is typically used to open a specified file as one of the predefined streams, stdin, stdout, or stderr.
fdopen takes an existing file descriptor (obtained from open, etc) and associates a standard I/O stream with that descriptor - useful for associating pipes etc with an I/O stream.

The type string is the standard open argument:

`type`	Description
`r` or `rb`	open for reading
`w` or `wb`	truncate to 0 length or create for writing
`a` or `ab`	append; open for writing at the end of file, or create for writing
`r+` or `r+b` or `rb+`	open for reading and writing
`w+` or `w+b` or `wb+`	truncate to 0 length or create for reading and writing
`a+` or `a+b` or `ab+`	open or create for reading and writing at end of file

where b as part of type allows the standard I/O system to differentiate between a text file and a binary file.

Thus:

freopen("inputfile", "r", stdin);

would open the file inputfile and use it to replace the standard input stream, stdin.

You should also use the access function to check on existence or not of the files:

#include <unistd.h>

int access(const char *pathname, int mode);

// Returns: 0 if OK, -1 on error

The mode is the bitwise OR of any of the constants below:

mode	Description
R_OK	test for read permission
W_OK	test for write permission
X_OK	test for execute permission
F_OK	test for existence of file

~~Looking at the project specification, stdout redirection should also be possible for the internal commands: dir, environ, help.~~

Extra Info

env processes

Each process has an environment associated with it. The environment strings are usually of the form: name=value (standard NULL terminated strings) and are referenced by an array of pointers to these strings. This array is made available to a process through the C Run Time library as:

extern char **environ; // NULL terminated array of char *

While an application can access the environment directly through this array, some functions are available to access and manipulate the environment:

#include <stdlib>

char *getenv(const char *name);

// Returns pointer to value associated with name, NULL if not found.

getenv returns a pointer to the value of a name=value string. You should use getenv to fetch a specific value from the environment rather than accessing environ directly. getenv is supported by both the ANSI C and POSIX standards. In addition to fetching the value of an environment variable, sometimes it is necessary to set a variable. You may want to change the value of an existing variable, or add a new variable to the environment.

#include <stdlib>

int putenv(const char *str);

int setenv(const char *name, const char *value, int rewrite);

//Both return: 0 if OK, non-zero on error

void unsetenv(const char *name);

putenv takes a string of the form name=value and places it in the environment list. If the name already exists, its old definition is first removed.
setenv sets name to value. If name already exists, its old definition is first removed if rewrite is non-zero. Otherwise the value is not overwritten.
unsetenv removes any definition of name.

You may need to use putenv with the environment value left blank to unset an environment object. i.e. putenv("myname=")

getcwd

#include <unistd.h>

char *getcwd(char *buf, size_t size);

//Returns: buf if OK, NULL on error

Every process has a current working directory which can be set using chdir. While chdir can use a relative pathname argument, there is a need for a function to derive the absolute pathname of the directory. getcwd performs this function.

The function is passed the address of a buffer, buf, and its size. The buffer must be large enough to accommodate the full absolute pathname plus a terminating NULL byte, or an error is returned.

Some implementations of getcwd allow the first argument buf to be NULL, in which case the function calls malloc to allocate size number of bytes dynamically. This is not part of the POSIX standard and should be avoided.

Enter the man getcwd command in your bash shell to get a more detailed description of this function.

fork

Process creation in UNIX is achieved by means of the kernel system call, fork(). When a process issues a fork request, the operating system performs the following functions (in kernel mode):

It allocates a slot in the process table for the new process
It assigns a unique process ID to the child process
It makes a copy of the parent’s process control block
It makes a copy of the process image of the parent (with the exception of any shared memory)
It increments counters for any files owned by the parent to reflect that an additional process now also owns those files
It assigns the child process to a Ready to Run state
It returns the ID number of the child to the parent process and a 0 value to the child process - this function is called once and returns twice!

#include <sys/types.h>
#include <unistd.h>

pid_t fork(void);

//Returns: 0 in child, process ID of child in parent, -1 on error

The fork system call creates a new process that is essentially a clone of the existing one. The child is a complete copy of the parent. For example, the child gets a copy of the parent’s data space, heap and stack. Note that this is a copy. The parent and child do not share these portions of memory. The child also inherits all the open file handles (and streams) of the parent with the same current file offsets.

The parent and child processes are essentially identical except that the new process has a new process ID and the return value from the fork call is different in the two processes:

The “parent” process gets the new process ID of the “child” returned from the fork call. If, for some reason the process can not be cloned, then -1 is returned
The “child” process is returned 0 (zero) from the fork call.

exec

To actually load and execute a different process, the fork request is used first to generate the new process. The kernel system call: exec(char* programfilename) is then used to load a new program image over the forked process:

exec identifies the required memory allocation for the new program and alters the memory allocation of the process to accommodate it
The program is loaded into memory and execution is commenced at the start of the main() routine.

The exec system call reinitializes a process from a designated program; the program changes while the process remains! The exec call does not change the process ID and process control block (apart from memory allocation and current execution point); the process inherits all the file handles etc. that were currently open before the call.

Without fork, exec is of limited use; without exec, fork is of limited use (A favorite exam questions is to ask in what circumstances you would/could use these functions on their own. Think about it and be prepared to discuss these scenarios).

exec variants:

System Call	Argument Format	Environment Passing	PATH search
execl	list	auto	no
execv	array	auto	no
execle	list	manual	no
execve	array	manual	no
execlp	list	auto	yes
execvp	array	auto	yes

#include <unistd.h>
 
int execl(path,arg0,arg1,...,argn,null)
   char *path;     // path of program file
   char *arg0;     // first arg (file name) 
   char *arg1;     // second arg (1st command line parameter)
    ...
   char *argn;     // last arg
   char *null;     // NULL delimiter 
 
int execv(path,argv)
   char *path;
   char *argv[];   // array of ptrs to args,last ptr = NULL 
 
int execle(path,arg0,arg1,.,argn,null,envp)
   char *path;     // path of program file
   char *arg0;     // first arg (file name) 
   char *arg1;     // second arg (1st command line parameter)
    ...
   char *argn;     // last arg
   char *null;     // NULL delimiter
   char *envp[];   // array of ptrs to environment strings
                   // last ptr = NULL
 
int execve(path,argv,envp)
   char *path;
   char *argv[];   // array of ptrs to args,last ptr = NULL
   char *envp[];   // array of ptrs to environment strings
                   // last ptr = NULL
 
int execlp(file,arg0,arg1,...,argn,null)

int execvp(file,argv)
// All six return -1 on error, no return on success

In the first four exec functions, the executable file has to be referenced either relatively or absolutely by the pathname. The last two search the directories in the PATH environment variable to search for the filename specified.

Example of use of fork and exec

switch (fork()){
   case -1:             // fork error
      syserr("fork");
   case 0:              // continue execution in child process
      execlp("pgm","pgm",NULL);
      syserr("execl");  // will only return on exec error
}                       // continue execution in parent process 

wait

When a process terminates, either normally or abnormally, the parent is notified by the kernel sending the parent the SIGCHLD signal. The parent can choose to ignore the signal (the default) or it can provide a function that is called when the signal occurs. The system provides functions wait or waitpid that can

block (if all of its children are still running), or
return immediately with the termination status of a child (if a child has terminated and is waiting for its termination status to be fetched), or
return immediately with an error (if it doesn’t have any child processes).

#include <sys/types.h>
#include <sys/wait.h>

pid_t wait(int *statloc);

pid_t waitpid(pid_t pid, int *statloc, int options);

//Both return: process ID if OK, 0 or -1 on error

The differences between the two functions are:

wait can block the caller until a child process terminates, while waitpid has an option that prevents it from blocking.
waitpid doesn’t wait for the first child to terminate - it has a number of options that control which process it waits for.

If a child has already terminated and is a zombie, wait returns immediately with that child’s status. Otherwise it blocks the caller until a child terminates. If the caller blocks and has multiple children, wait returns when one terminates - the function returns the process ID of the particular child.

Both functions return an integer status, *statloc. The format of this status is implementation dependent. Macros are defined in <sys/wait.h>.

The pid parameter of waitpid specifies the set of child processes for which to wait.

pid == -1: waits for any child process. In this respect, waitpid is equivalent to wait
pid == 0: waits for any child process in the process group of the caller
pid > 0: waits for the process with process ID pid
pid < -1: waits for any process whose process group id equals the absolute value of pid.

The options for waitpid are a bitwise OR of any of the following options:

WNOHANG: the call should not block if there are no processes that wish to report status
WUNTRACED: children of the current process that are stopped due to a SIGTTIN, SIGTTOU, SIGTSTP, or SIGSTOP signal also have their status reported

dup and dup2

An existing file descriptor (filedes) is duplicated by either of the following functions:

#include <unistd.h>
 
int dup(int filedes);
 
int dup2(int filedes, int filedes2);
// Both return: new file descriptor if OK, -1 on error

The new file descriptor returned by dup is guaranteed to be the lowest numbered available file descriptor. Thus by closing one of the standard file descriptors (STDIN_FILENO, STDOUT_FILENO, or STDERR_FILENO, normally 0, 1 and 2 respectively) immediately before calling dup, we can guarantee (in a single threaded environment!) that filedes will be allotted that empty number.

With dup2 we specify the value of the new descriptor with the filedes2 argument and it is an atomic call. If filedes2 is already open, it is first closed. If filedes equals filedes2 , then dup2 returns filedes2 without closing it.

Addenda

2018-10-04

mkdirz does not execute recursively. If the parent does not exist, then this is an error
rmdirz do not execute recursively. If the directory contains other files/directories, then this is an error
All commands that you have implemented in your shell ignore STDIN (though the user might pipe a file to them). However, commands passed to the bash shell might use data from STDIN. For example:
- cat < myfile will pass cat to the shell and pipe myfile to its STDIN
Our morph/mimix implementation

2018-10-06

Test cases used for project 1

2018-10-08

You only need to consider redirects for external commands. filez and other internal commands will not be paired with redirects.
When a batchfile is passed, you still need to copy the command to stdout, as in project 1.
The -r in the mimic and morph commands must be the ~~second argument to be valid~~ next token after the mimic or morph command (e.g. mimic -r out.txt dir/).
You only need to consider regular files and directories. You dont need to consider symlink/hardlink etc.

2018-10-10

The src location in morph and mimic may not be the current directory, parent directory, or a glob. In other terms, the following [src] locations are not acceptable: ., .., /, \*txt.
We’ve added the first testcases for project2: Batchfile: morph1.txt –> Expected output: morph1.out.txt

2018-10-14

We removed the extra requirement from the specification stating that we will test internal commands for redirection.
The erase, from project 1 command will not be performed recursively; it should only work on individual files.
We added two testscases, one that tests directory commands and another that tests fork commands.
- dirz1.txt –> dirz1.out.txt
- fork1.txt –> fork1.out.txt
The filez command should not recognize the -r flag.
If morph/mimic is called with the -r command, it should still work as normal.

2018-10-15

It is a good idea to use structured debug statements in your code. We will not read your stderr. Here a file with the macros that I use to interleave debugging/strace statements in my code Util.h.
We removed the “cat” command from the fork test. We won’t compare the ids returned.

2018-10-18

Removed the example ditto call with, with redirection from the spec.
If you use the underlying shell to make calls you will get points off when we check your code. You should not use the underlying shell or system calls. You don’t need to use the execl(“/bin/sh”, “sh”, “-c”, ...) command, execvp should be enough.
We will add esc to the end each testcase.

2018-10-19

Updated the line, if both [src] and [dst] exist, the [src] folder is non-empty and -r is not supplied, the command should fail. That is, morph and mimic cannot work on non-empty src directories without the recursive flag.

2018-10-22

Removed the filez example because we will not test this case.

2018-10-23

Hints

execvp(cmd, args): args[0] of must be command (i.e., the same as cmd).
execvp(cmd, args): the arg[i] after the last argument must be set to NULL.
execvp() will generally never return (i.e., the process will exit). However, in the case where there is an error (e.g., the command does not exist), execvp() will return an error code and the process will continue to live. If this is a child process, you must make sure that it terminates.

2018-10-24

Added a couple more sample outputs: newtestcases.tar

Back to Project List

Introduction to Operating Systems

Grading Criteria

Submission

Directory Management

mimic and morph

Examples

mkdirz and rmdirz

Examples

Extending your shell

Adding fork and exec to the shell

Your TODOs

I/O Redirection

Examples

I/O Redirection Implementation

Extra Info

env processes

getcwd

fork

exec

wait

dup and dup2

Addenda

2018-10-04

2018-10-06

2018-10-08

2018-10-10

2018-10-14

2018-10-15

2018-10-18

2018-10-19

2018-10-22

2018-10-23

2018-10-24

Adding `fork` and `exec` to the shell