The N Commandments for using the Internet Archive
The N Commandments for using the Internet Archive
Our group has
researcher access to the Internet
Archive, which permits us to work on the Archive's cluster.
Over the past several years, we have evolved some rules of good
citizenship that help to avoid problematic situations,
e.g. accidentally taking over a large chunk of the internal bandwidth
on their network. This Web page attempts to institutionalize some of
that knowledge. It's titled "The N Commandments" because, like the
Archive itself, the list is likely to need updating.
(List initially contributed by Michael Subotin.)
- Be sure not to run a big job before you've learned how to monitor and
kill it
- Avoid running processes on homeserver (an automatic check in the
scripts might be useful)
- Avoid jobs that have multiple hosts read or write large amounts of data
to /home (use /tmp/your_username directories instead)
- Avoid heavy I/O activity on /home in general
- Be careful not to run one p2 job inside another (an automatic check
in the scripts might be useful)
- When you use ctrl-C to kill a p2 job, it leaves pipe files in
the /tmp (or /tmp/your_username?) directory (extension .p2tmp), and
if you kill a sort process,
it may leave behind a temporary file there. Please check to clean
these up once in a while.
- Be careful about exceeding disk quota on any of the disks at run time
- Be sure to nice your processes
- Be sure not to leave any processes running invadvertently (check the
hosts you've been working with before logging off)
- Be sure to clean up the files you've placed
in /tmp/your_username directories on all hosts (particularly large files)
- For the sake of your own sanity, if a job seems to be running forever,
check to see if an I/O or ssh breakdown on some host isn't holding it back
Questions? Contact Philip Resnik at lastname _AT_ umd.edu.