Wednesday, November 29, 2006

Backup Your Blog - Beta

I use, and recommend, HTTrack to mirror your blog. HTTrack is very versatile, and will copy any number of blogs, or other websites, in one job, making the contents of each website accessible as if it was an online access, from any mirrored website.

Unfortunately, HTTrack mirroring may be problematic with Blogger Beta. When I started using HTTrack, I could back up a couple of my blogs in 5 - 10 minutes. I added more blogs to the backup job, and backup time went up to 1/2 hour or so, and used 100M or so of disk space. Still a neglible amount of time and disk space, considering the results.

As I started developing this blog, and especially when I started using labels in my posts, backup time and disk space increased dramatically. The last mirror that I ran was 2 weeks ago - it took 14 hours, and used 3.7G of disk space.

Today, in a comment for Backup Your Blog, I see

Well, i tried to use HTTrack but unfortunately it goes looping and tries to download over 200 MB of content, which my blog never has, as i only have 80+ posts.

I went searching for a solution, and in the HTTrack forum, I found one query, Blogger Beta Problems

i try to mirror my Blogger Beta blog but somehow there are some problems related to this task. If i just mirror it without any additional settings then it starts downloading about 200MB worth of sites, mirroring the index file many times but with different number codes (e.g. index25987.html), also there is a folder 'feeds' which keeps growing during the process, filling up quickly.

followed by
i too am having problems, only since i upgraded to beta blog. now it takes up to 90 minutes, & still is not complete, the no of files updated gets to a certain % then startes to decrease instead of increase & i am getting virtual memory errors.

I don't think that this is, necessarily, a design problem in HTTrack. I did spend some time looking thru the Options in the HTTrack job, and I am fairly certain that there is a setting in there that will be relevant to the problem. Finding that setting, though, could be easier with an HTTrack expert.

I hope that one becomes available soon.

1 comment:

Paul Rohde said...

I ran into the same problem with HTTrack when I switched to Blogger Beta.

When I noticed that it seemed to be looping on "search" and "widget" items, I just added these scan rules (under "set options"):


That made the backup manageable. The only side effect I see is that, in the backup version, the individual post titles do not appear in the archive menus. The posts are all there, though, and can still be displayed when the month is clicked in the archive menu.

Paul R