CS 3113 Fall 18

Logo

This is the web page for Operation Systems at the University of Oklahoma.

View the Project on GitHub oudalab/cs3113fa18

Introduction to Operating Systems

CS 3113, Fall 2018, Project 2, Due 10/18/2018

For this project, you will expand on what you’ve learned in Project 0 and your code in Project 1 to expand your shell. Your code for Project 2 will do the following:

  1. Added commands for directory management.
  2. Replace calls to system() with fork-exec pattern.
  3. Add I/O redirection to the command line.

Remember to read this specification in full. Please post questions in the Project 2 Talk discussion board. For private questions, email cs3113@googlegroups.com.

Grading Criteria

Task Percent
Code compiles with make clean and make all 10%
Documentation: Proper functional-level and inline documentation. README is thorough and complete. 40%
Correctness: This will be assessed by giving your code a range of inputs and matching the expected output. 50%
Total 100%

Below, is an implementation checklist for your convenience.

Submission

Your code, executable, makefile and README must all be on your instance in the /projects/2/ directory. Please note that this location is NOT under your home directory. You must also submit you code as project2.tar.gz in canvas.

Directory Management

mimic and morph

In this project, you will extend the abilities of the mimic and morph commands from project 1. mimic and morph still should both copy files or directories from one location to another; morph command should remove the old files and directories once copied. Your code should open the appropriate files and move the bytes, do not use the mv or rm shell calls. Directories supplied may or may not have the full path. Also, directories may or may not have a trailing slash. This breaks down to the following:

[src] [dst] Description Comment
existing file existing file Success The file is copied and has the same name as [dst]
existing directory existing file Error You cannot write a directory to a file.
missing file existing file Error Nothing to be copied.
missing directory existing file Error Nothing to be copied.
existing file existing directory Success [src] should be written into the [dst] directory. The file name of [src] is maintained.
existing directory existing directory Success If -r is supplied, the [src] directory and all of its contents are copied into the directory [dst]. If -r is not supplied, and the directory is empty, the empty folder can be copied into [dst]; if the [src] folder is non-empty and -r is not supplied, the command should fail.
missing file existing directory Error No source to be copied.
missing directory existing directory Error No source to be copied.
existing file missing file Success Will create a new file with the name [dst] assuming the location is valid.
existing directory missing file Error You cannot write a directory to a file.
missing file missing file Error Nothing to be copied
missing directory missing file Error Nothing to be copied.
existing file missing directory Error You cannot write to missing location.
existing directory missing directory Success If the parent exists and -r is supplied will copy with the name of [dst]. If -r is not supplied and the directory is empty, the empty directory should be copied under the parent of [dst] with the name given in [dst]. If -r is not supplied and the directory is non-empty the command should throw an error.
missing file missing directory Error You cannot write a missing file
missing directory missing directory Error Both parameters are missing

Examples

mimic foo/bar.txt /foobar/baz will result in the file /foobar/baz/bar.txt, assuming that /foobar/baz and foo/bar.txt already exist.

mimic -r foo/ /foobar/baz/ will result in all of the contents of foo/ being copied to /foobar/baz, assuming that both directories already exist. Note that this is a recursive copy, so any subdirectoies of foo/ will also be copied.

mkdirz and rmdirz

In addition to the commands above, you should create mkdirz and rmdirz that create and remove directories, respectively.

The rmdirz [path] command is only expected to work on empty directories (otherwise, give an error).

The mkdirz [path] command must contain a [path] that does not already exist. The the parent of [path] must exist.

Examples

mkdirz foo/bar/baz will create directory baz in foo/bar/ relative to the current working directory (the latter must already exist).

rmdirz foo/bar will remove directory bar from foo only if bar is empty.

Extending your shell

Previously, you wrote a simple shell that looped reading a line from standard input and checked the first word of the input line. While you are at it you might as well put the name of the current working directory in the shell prompt! If the current working directory is /projects/2/ The system prompt must look like the following /projects/2==>.

Adding fork and exec to the shell

So far, our shell has used the system call to pass on command lines to the default system shell for execution. Since we need to control what open files and file descriptors are passed to these processes (i/o redirection), we need more control over their execution.

To do this we need to use the fork and exec system calls. fork creates a new process that is a clone of the existing one by just copying the existing one. The only thing that is different is that the new process has a new process ID and the return from the fork call is different in the two processes.

The exec system call reinitializes that process from a designated program; the program changes while the process remains! Make sure you read the notes on fork and exec below and the text.

Your TODOs

1. In your program, replace the use of system() with fork and exec. This include the commands that are sent to the shell.

2. You will now need to more fully parse the incoming command line so that you can set up the argument array (char *argv[] in the above examples). N.B. remember to malloc/strdup and to free memory you no longer need!

3. You will find that while a system function call only returns after the program has finished, the use of fork means that two processes are now running in foreground. In most cases you will not want your shell to ask for the next command until the child process has finished. This can be accomplished using the wait or waitpid functions (see below for more detail). e.g.

switch (pid = fork ()) { 
   case -1:
      syserr("fork"); 
   case 0:                 // child 
      execvp (args[0], args); 
      syserr("exec");
   default:                // parent
      if (!dont_wait)
         waitpid(pid, &status, WUNTRACED);
 }  

In the above example, if you wanted to run the child process ‘in background’ (i.e., the parent does not wait for the child to finish before continuing its execution), the flag dont_wait would be set and the shell would not wait for the child process to terminate.

4. The commenting in the above examples is minimal. In the projects you will be expected to provide more descriptive commentary!

I/O Redirection

Your project shell must support i/o-redirection on both stdin and stdout. i.e. the command line:

programname arg1 arg2 < inputfile > outputfile

will execute the program programname with arguments arg1 and arg2, the stdin FILE stream replaced by inputfile and the stdout FILE stream replaced by outputfile. For more cases on how redirection I/O redirection should be handled, please visit the textbook website: http://www.tldp.org/LDP/intro-linux/html/sect_05_01.html.

With output redirection, if the redirection character is > then the outputfile is created if it does not exist and truncated if it does. If the redirection token is >> then outputfile is created if it does not exist and appended to if it does.

Note: you can assume that the redirection symbols, < , > and >> will be delimited from other command line arguments by white space - one or more spaces and/or tabs. This condition and the meanings for the redirection symbols outlined above and in the project may differ slightly from that of the standard shell.

Examples

filez > filelist.txt will execute your command filez in the current working directory and send its output to the file filelist.txt (if the file existed before the call, then the file contents will first be truncated.

wc < project2.c > word_count.txt will pass the unrecognized command (wc) to the bash shell. The input to this command will be project2.c; the output will be placed in word_count.txt.

ditto This is my sentence > my_sentence.txt will place the specified string into the file my_sentence.txt.

I/O Redirection Implementation

I/O redirection is accomplished in the child process immediately after the fork and before the exec command. At this point, the child has inherited all the filehandles of its parent and still has access to a copy of the parent memory. Thus, it will know if redirection is to be performed, and, if it does change the stdin and/or stdout file streams, this will only effect the child and not the parent.

You can use open to create file descriptors for inputfile and/or outputfile and then use dup or dup2 to replace either the stdin descriptor (STDIN_FILENO from unistd.h) or the stdout descriptor (STDOUT_FILENO from unistd.h).

However, the easiest way to do this is to use freopen. This function is one of the three functions you can use to open a standard I/O stream.

#include <stdio.h>

FILE *fopen(const char *pathname, const char * type);

FILE *freopen(const char * pathname, const char * type, FILE *fp);

FILE *fdopen(int filedes, const char * type);

// All three return: file pointer if OK, NULL on error

The differences in these three functions are as follows:

  1. fopen opens a specified file.
  2. freopen opens a specified file on a specified stream, closing the stream first if it is already open. This function is typically used to open a specified file as one of the predefined streams, stdin, stdout, or stderr.
  3. fdopen takes an existing file descriptor (obtained from open, etc) and associates a standard I/O stream with that descriptor - useful for associating pipes etc with an I/O stream.

The type string is the standard open argument:

type Description
r or rb open for reading
w or wb truncate to 0 length or create for writing
a or ab append; open for writing at the end of file, or create for writing
r+ or r+b or rb+ open for reading and writing
w+ or w+b or wb+ truncate to 0 length or create for reading and writing
a+ or a+b or ab+ open or create for reading and writing at end of file

where b as part of type allows the standard I/O system to differentiate between a text file and a binary file.

Thus:

freopen("inputfile", "r", stdin);

would open the file inputfile and use it to replace the standard input stream, stdin.

You should also use the access function to check on existence or not of the files:

#include <unistd.h>

int access(const char *pathname, int mode);

// Returns: 0 if OK, -1 on error

The mode is the bitwise OR of any of the constants below:

mode Description
R_OK test for read permission
W_OK test for write permission
X_OK test for execute permission
F_OK test for existence of file

Looking at the project specification, stdout redirection should also be possible for the internal commands: dir, environ, help.


Extra Info

env processes

Each process has an environment associated with it. The environment strings are usually of the form: name=value (standard NULL terminated strings) and are referenced by an array of pointers to these strings. This array is made available to a process through the C Run Time library as:

extern char **environ; // NULL terminated array of char *

While an application can access the environment directly through this array, some functions are available to access and manipulate the environment:

#include <stdlib>

char *getenv(const char *name);

// Returns pointer to value associated with name, NULL if not found.

getenv returns a pointer to the value of a name=value string. You should use getenv to fetch a specific value from the environment rather than accessing environ directly. getenv is supported by both the ANSI C and POSIX standards. In addition to fetching the value of an environment variable, sometimes it is necessary to set a variable. You may want to change the value of an existing variable, or add a new variable to the environment.

#include <stdlib>

int putenv(const char *str);

int setenv(const char *name, const char *value, int rewrite);

//Both return: 0 if OK, non-zero on error

void unsetenv(const char *name);

You may need to use putenv with the environment value left blank to unset an environment object. i.e. putenv("myname=")

getcwd

#include <unistd.h>

char *getcwd(char *buf, size_t size);

//Returns: buf if OK, NULL on error

Every process has a current working directory which can be set using chdir. While chdir can use a relative pathname argument, there is a need for a function to derive the absolute pathname of the directory. getcwd performs this function.

The function is passed the address of a buffer, buf, and its size. The buffer must be large enough to accommodate the full absolute pathname plus a terminating NULL byte, or an error is returned.

Some implementations of getcwd allow the first argument buf to be NULL, in which case the function calls malloc to allocate size number of bytes dynamically. This is not part of the POSIX standard and should be avoided.

Enter the man getcwd command in your bash shell to get a more detailed description of this function.

fork

Process creation in UNIX is achieved by means of the kernel system call, fork(). When a process issues a fork request, the operating system performs the following functions (in kernel mode):

  1. It allocates a slot in the process table for the new process
  2. It assigns a unique process ID to the child process
  3. It makes a copy of the parent’s process control block
  4. It makes a copy of the process image of the parent (with the exception of any shared memory)
  5. It increments counters for any files owned by the parent to reflect that an additional process now also owns those files
  6. It assigns the child process to a Ready to Run state
  7. It returns the ID number of the child to the parent process and a 0 value to the child process - this function is called once and returns twice!
#include <sys/types.h>
#include <unistd.h>

pid_t fork(void);

//Returns: 0 in child, process ID of child in parent, -1 on error

The fork system call creates a new process that is essentially a clone of the existing one. The child is a complete copy of the parent. For example, the child gets a copy of the parent’s data space, heap and stack. Note that this is a copy. The parent and child do not share these portions of memory. The child also inherits all the open file handles (and streams) of the parent with the same current file offsets.

The parent and child processes are essentially identical except that the new process has a new process ID and the return value from the fork call is different in the two processes:

exec

To actually load and execute a different process, the fork request is used first to generate the new process. The kernel system call: exec(char* programfilename) is then used to load a new program image over the forked process:

The exec system call reinitializes a process from a designated program; the program changes while the process remains! The exec call does not change the process ID and process control block (apart from memory allocation and current execution point); the process inherits all the file handles etc. that were currently open before the call.

Without fork, exec is of limited use; without exec, fork is of limited use (A favorite exam questions is to ask in what circumstances you would/could use these functions on their own. Think about it and be prepared to discuss these scenarios).

exec variants:

System Call Argument Format Environment Passing PATH search
execl list auto no
execv array auto no
execle list manual no
execve array manual no
execlp list auto yes
execvp array auto yes
#include <unistd.h>
 
int execl(path,arg0,arg1,...,argn,null)
   char *path;     // path of program file
   char *arg0;     // first arg (file name) 
   char *arg1;     // second arg (1st command line parameter)
    ...
   char *argn;     // last arg
   char *null;     // NULL delimiter 
 
int execv(path,argv)
   char *path;
   char *argv[];   // array of ptrs to args,last ptr = NULL 
 
int execle(path,arg0,arg1,.,argn,null,envp)
   char *path;     // path of program file
   char *arg0;     // first arg (file name) 
   char *arg1;     // second arg (1st command line parameter)
    ...
   char *argn;     // last arg
   char *null;     // NULL delimiter
   char *envp[];   // array of ptrs to environment strings
                   // last ptr = NULL
 
int execve(path,argv,envp)
   char *path;
   char *argv[];   // array of ptrs to args,last ptr = NULL
   char *envp[];   // array of ptrs to environment strings
                   // last ptr = NULL
 
int execlp(file,arg0,arg1,...,argn,null)

int execvp(file,argv)
// All six return -1 on error, no return on success

In the first four exec functions, the executable file has to be referenced either relatively or absolutely by the pathname. The last two search the directories in the PATH environment variable to search for the filename specified.

Example of use of fork and exec

switch (fork()){
   case -1:             // fork error
      syserr("fork");
   case 0:              // continue execution in child process
      execlp("pgm","pgm",NULL);
      syserr("execl");  // will only return on exec error
}                       // continue execution in parent process 

wait

When a process terminates, either normally or abnormally, the parent is notified by the kernel sending the parent the SIGCHLD signal. The parent can choose to ignore the signal (the default) or it can provide a function that is called when the signal occurs. The system provides functions wait or waitpid that can

#include <sys/types.h>
#include <sys/wait.h>

pid_t wait(int *statloc);

pid_t waitpid(pid_t pid, int *statloc, int options);

//Both return: process ID if OK, 0 or -1 on error

The differences between the two functions are:

If a child has already terminated and is a zombie, wait returns immediately with that child’s status. Otherwise it blocks the caller until a child terminates. If the caller blocks and has multiple children, wait returns when one terminates - the function returns the process ID of the particular child.

Both functions return an integer status, *statloc. The format of this status is implementation dependent. Macros are defined in <sys/wait.h>.

The pid parameter of waitpid specifies the set of child processes for which to wait.

The options for waitpid are a bitwise OR of any of the following options:

dup and dup2

An existing file descriptor (filedes) is duplicated by either of the following functions:

#include <unistd.h>
 
int dup(int filedes);
 
int dup2(int filedes, int filedes2);
// Both return: new file descriptor if OK, -1 on error

The new file descriptor returned by dup is guaranteed to be the lowest numbered available file descriptor. Thus by closing one of the standard file descriptors (STDIN_FILENO, STDOUT_FILENO, or STDERR_FILENO, normally 0, 1 and 2 respectively) immediately before calling dup, we can guarantee (in a single threaded environment!) that filedes will be allotted that empty number.

With dup2 we specify the value of the new descriptor with the filedes2 argument and it is an atomic call. If filedes2 is already open, it is first closed. If filedes equals filedes2 , then dup2 returns filedes2 without closing it.

Addenda

2018-10-04

2018-10-06

2018-10-08

2018-10-10

2018-10-14

2018-10-15

2018-10-18

2018-10-19

2018-10-22

2018-10-23

Hints

2018-10-24

Added a couple more sample outputs: newtestcases.tar


Back to Project List