wget and then clean up links in php website

I am helping a friend set up a wordpress website. She found a website that she like the structure and look of. I wanted to capture the site including links and each individual page as a reference for what she likes and how to build the site for her.

I won’t copy it, but I will find the layout and colors and content a good guide to what we want to end up with.

The first step was to download the site…

wget -rk -np http://www.webaddress.com/

Once I had downloaded the site, I found that the links grabbed by wget were largely .php pages – not ending in .html, but in .php. To make the site usable takes two steps.

First, rename the *.php files to .php.html…

for i in *.php
do
  mv $i $i.html
done

Then correct the links to point to the new pages…

grep -rl .php *.html | xargs perl -pi~ -e 's/.php/.php.html/'

This gave us a working reference copy of the website locally.

 

—doug