The Wayback Machine - https://web.archive.org/web/20200619105237/https://github.com/oduwsdl/warrick
Skip to content
Recover lost websites from the Web Infrastructure
HTML Perl ASP Hack JavaScript CSS Other
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
MAKEFILE Initial Commits for git repo Jun 25, 2012
StoredResources Initial Commits for git repo Jun 25, 2012
TEST_FILES Initial Commits for git repo Jun 25, 2012
TestResources Initial Commits for git repo Jun 25, 2012
WebRepos Initial Commits for git repo Jun 25, 2012
Yahoo Initial Commits for git repo Jun 25, 2012
CachedUrls.pm Initial Commits for git repo Jun 25, 2012
Dockerfile docker: Pin ubuntu version Mar 22, 2020
INSTALL Perl package dependency installation separated from the main INSTALL … Feb 18, 2015
Logger.pm Initial Commits for git repo Jun 25, 2012
MAKEFILE_LOGFILE.log Initial Commits for git repo Jun 25, 2012
MAKEFILE_recoveryLog.out Initial Commits for git repo Jun 25, 2012
MASTERTESTFILE.log Initial Commits for git repo Jun 25, 2012
MementoThread.pm Initial Commits for git repo Jun 25, 2012
README.md Verified functionality, rm'd msg in README May 7, 2018
TEST Initial Commits for git repo Jun 25, 2012
UrlUtil.pm Initial Commits for git repo Jun 25, 2012
ai_timegates.o Initial Commits for git repo Jun 25, 2012
aweu_timegates.o Initial Commits for git repo Jun 25, 2012
b_timegates.o Initial Commits for git repo Jun 25, 2012
bl_timegates.o Initial Commits for git repo Jun 25, 2012
cache.o Initial Commits for git repo Jun 25, 2012
can_timegates.o Initial Commits for git repo Jun 25, 2012
cdlib_timegates.o Initial Commits for git repo Jun 25, 2012
curl.exe Initial Commits for git repo Jun 25, 2012
di_timegates.o Initial Commits for git repo Jun 25, 2012
eu_timegates.o Initial Commits for git repo Jun 25, 2012
g_timegates.o Initial Commits for git repo Jun 25, 2012
getWCpage.py Initial Commits for git repo Jun 25, 2012
ia_timegates.o Initial Commits for git repo Jun 25, 2012
loc_timegates.o Initial Commits for git repo Jun 25, 2012
mcurl.pl Initial Commits for git repo Jun 25, 2012
mementoParser.pm Initial Commits for git repo Jun 25, 2012
nara_timegates.o Initial Commits for git repo Jun 25, 2012
perl_package_dep_installer.sh Added two more perl dependencies Feb 18, 2015
resetCache.sh Initial Commits for git repo Jun 25, 2012
timegates.o Initial Commits for git repo Jun 25, 2012
uk_timegates.o Initial Commits for git repo Jun 25, 2012
warrick.pl Fix - add missing nr argument to bind to existing functionality May 7, 2018
wc_timegates.o Initial Commits for git repo Jun 25, 2012
wiki_timegates.o Initial Commits for git repo Jun 25, 2012
wikia_timegates.o Initial Commits for git repo Jun 25, 2012
y_timegates.o Initial Commits for git repo Jun 25, 2012

README.md

Warrick

The website reconstructor

Dependencies

  • Perl5 or later
  • cURL
  • Python
  • Perl libraries: HTML::TagParser, LinkExtractor, Cookies, Status, and Date, and the URI library

Installation

Install Warrick's dependencies on the command line by running:

./INSTALL

Test the installation by running:

./TEST

This will recover a web page and compare it to a master copy.

For further options and information on using warrick, run:

perl warrick.pl --help

This version of Warrick has been redesigned to reconstruct lost websites from the Web Infrastructure using Memento.

Recovery Process Details

This program creates several files that provide information or log data about the recovery.

For a given recovery RECO_NAME, we will create a RECO_NAME_recoveryLog.out, PID_SERVERNAME.save, and logfile.o. These are created for every recovery job. RECO_NAME_recoveryLog.out is created in the home warrick directory, and contains a report of every URI recovered, the location of the recovered archived copy (the memento), and the location the file was saved to on the local machine in the following format:

  • ORIGINAL URI => MEMENTO URI => LOCAL FILE

Lines pre-pended with "FAILED" indicate a failed recovery of ORIGINAL URI

PID_SERVERNAME.save is the saved status file. This file is stored in the recovery directory and contains the information for resuming a suspended recovery job, as well as the stats for the recovery, such as the number of resources failed to be recovered, the number from different archives, etc. logfile.o is a temporary file that can be regarded as junk. It contains the headers for the last recovered resource.

History

  • Modified by Justin F. Brunelle (@jbrunelle) at Old Dominion University - 2011
  • Created by Frank McCown (@fmccown) at Old Dominion University - 2006

Contact

We want to know if you have if you have used Warrick toreconstruct your lost website. If you have successfully recovered your site or would like to assist in further development and improvements Warrick, please Open a GitHub issue and/or contact jbrunelle@cs.odu.edu.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

The GNU General Public License can be seen here: http://www.gnu.org/copyleft/gpl.html


You can’t perform that action at this time.