Category Archives: Uncategorized

Desires of a Backup System by Bob Hyatt

You invited me to write more on this topic. I believe I will do so, but in parts, rather than in a 20 page university paper. I will start with an overview of the ‘desired’ home server backup, vs what seems to be available.

For most homes and small businesses, the primary desires / goals are (in my view):

A. secure backups. In this case I add, secure from failure for at least 5 years!

B. Easy to do backups. I do NOT mean automatically scheduled backups. They automatically back up things you are NOT interested in. Or that will be useless to you if there is a problem. For example, do you really want to back up all the Operating System files, settings, etc. Things that change a lot, and will occupy a lot of space on your backup drives? I do NOT.

C. Allow you to determine what is important to You!
Not every photo or video is important to you. Not every PDF is valuable to you. Not every MP3 or FLAC is valuable to you. So you need a way to differentiate between valued and chaff. And for the ‘tool’ to remember. OR for you to keep the data you want to keep separately from the chaff. Then have the ‘tool’ look only at the desired data.

D. Allow you to add storage, as needed, easily. Adding a new hard drive should not be ‘a hope and a prayer’ kind of task.

E. If I should desire to change from one backup product or tools to another, the existing should NOT act as a barrier or gatekeeper. (as in keeping your data as prisoner).

F. The running of the backup device or server should NOT wear out the disks, ever! The issue with the video I described in the earlier ‘letter’ here was that the one always busy was attempting to make the ‘distribution’ of data on the drives “perfect”.

If you are a corporation or a government agency, such wearing out of the disks would be considered ‘cost of doing business’. And they have the budget for this task. A home user or a small office does NOT have a budget for this willful destruction. Further, “Perfect distribution” is NOT what a home user or small office is looking for. They are looking for a safe place to put their data, and the exact location is not important.

Home or small business users will seldom read the backups. Perhaps, not ‘seldom’. But the reality is you will do plenty of writes to update the backup. In some cases you might do more reads (if you share the data with media players, other computers on your network, etc).

One more difference between the products that are most sold in this arena? Large organizations are also looking for quick access to the data. E.g. The usage of this kind of thing is for ‘backup or archival’ purposes for the home and small business, whereas the large organizations are using the data put onto such a device as production data and they expect lots of access and lots of updates.

SAN and / or NAS systems in the large organizations are for processing production data, not for backups. So you begin the issues with a fundamental difference in the use of these devices. This is partly why the components tend to be expensive.

The good news for the home or small business is the time tested components used by the large organizations, while expensive, are likely to last longer than your need.

If I get some feedback on this introduction, I will introduce ‘task vs tool’ in the next ‘letter’ I write.

Archival File Storage

More people these days have important computer data that they want to keep.  More (most?) household business is done with computer files now.  Hobbies and interests are likely to involve computer files.  And, the all important family photos and videos are now digital.

Around the year 2000 I stopped using 35mm film, as digital cameras were getting good enough to be interesting.  I also realized that the capacity of hard disk space was growing faster than the resolution of the cameras, so I should be able to keep all my photos immediately available on the computer’s drive, as opposed to storing them in drawers full of discs and having to put in the right disc to get the desired files.

We want to keep these files safe and sound.

So what exactly does “safe” mean?

It is easy to understand the threat of a drive dieing, media being damaged or unreadable after being stored for a long time, and accidentally deleting or saving-over the wrong file.

Having multiple copies protects against these things.  Automated replication/backing up won’t necessarily protect you against destroying files by mistake, but there are specific ways of guarding against that which I’ll discuss in another post.  It’s replication and redundancy that I want to discuss here.

Let me relate an anecdote.  One time I wanted to do something with an old project I had worked on previously, but when I opened the file I found it was filled with gibberish!  I looked at my backup, and it was the same.  Whatever had happened to the file had occurred some time ago, without being noticed at the time.  I continued backing up the now-corrupted file, and a good one was now beyond my retention period for the backups.

Fortunately, I found a copy on another disc where I had made an ad-hoc backup copy of the work and put it in drawer to be forgotten.  Many years later, large data centers report that “silent corruption” affects more stored data than previously realized.  Just as with my backups, a RAID-5 verification won’t detect anything amiss.

So, I worry about the integrity of all my saved photos, old projects, and whatever else I’ve saved for a long time.

I’ve thought about ways to perform a cryptographic hash of each file’s contents, to be used as a checksum.  Repeating this will verify that the files are still readable and unchanged.  For example, when putting backup files on a disc I’ve used a command-line script to generate a text file of filenames and hashes, and include that on the disc too.

With files being stored on always-available hard drives, it is possible to automatically and periodically check these.  For example, do it just before performing a backup so you don’t back up a corrupted file, and you are alerted to restore it instead!  This is complicated by the fact that some files are changed on purpose, and how can an automated tool know which are not expected to be changed and which are really in current use?  Also, with arbitrary files — large numbers of files in a deep directory tree — it is more difficult to store the hash of each file.

So, I’ve never gotten around to making an automated system that does this.

My plans and dabbling had been to use XML to store information for each file, including the date/time stamp, hash, and where it was last backed up to and when, and any necessary overrides to the backup policy.  That would then be used to perform incremental backups, and re-checking would be part of the back-up process.

This problem can be simplified in light of current habits, and to just address one single issue.  It won’t track backup generations and such, but will only give the hash of each file.  The directory will be purely for archival files, so any change at all other than adding files will be something to yell about.  And finally, don’t worry about bloat in the hash file due to repeating the directory name over and over — it will be insignificant compared to the files anyway, or can be handled by running a standard compression tool on the file.

Originally, my hash generation program for media was done using the 4NT command shell.  However, the tool has evolved in ways that I don’t care for, so I stopped buying upgrades.  Now it (the plain command-line version) is known as TCC/LE, and this free version blocks the MD5 hashing function.  So, I’ve lost features I’ve previously paid for because I don’t want to buy features I don’t want.  I suppose I could dig out and preserve an old copy, but then that wouldn’t be something useful to you unless you also bought the tool, and only on Windows.

Writing such a file is/was trivial.  Just write the formatted directory listing to a file, along with a note of exactly the command arguments used.  (Include the hash and the file name as a relative (not fully-qualified) name, and skip the hash file itself).  Now recall that this was to be placed on saved media, which would then not change again.  So, repeating the same command on the top directory of the media would generate exactly the same file.  If a dumb byte-by-byte compare turned up anything, then any common DIFF tool would be used to turn up details.

An example sha256.txt file from 2007. This is placed on the backup media along with the files. Note that the file begins with instructions on how it was created, so it may be re-run to check the files for corruption even if you don’t remember how you did it.

Result of running:
     pdir /(f z @sha256[*]) *.tib |tee sha256.txt


Pluto-1.tib 2186522624 B5D19101E8821CE886E56738A9207E00E6C324DB4EC9A01111E9469A6FA2C233
Pluto-2.tib 2301180416 8DF3BDD0F4A1390A56537BE0D1BD93628BD1EB52D15B49BCFC272C7869C2CC53
Pluto-3.tib 2959220224 0E1FA34E0DCB9D1EE480EE75F1394DE9C95347D247A84EE52C746F175DC579D9
Pluto-41.tib 4660039168 9813846E427B4AFF8996A6AE275E4F9DB7C4897A46362A7B2FCA849A7E948E8F
Pluto-42.tib   11684864 2ADD81B304A19EE753EA8E868562B95A959214897F4846905CD7CD65F23EB817
Pluto-51.tib 4660039168 224B85F067026F0360E4ECEEB03E1EE49CE751BF37180453C6732935032BE0C9
Pluto-52.tib 1935769600 F10764774D18A1469CD48B4A605575D21DE558DE3129201F0DE1E82CCD6D6D1B

As a check of a live replication destination, it needs to be fully automated and handle added files, and allow me to confirm any deletion was done on purpose but then remove the matching hash data.

So, the most difficult part is reading in the hash data.  Exactly how that’s approached would depend on the programming language and/or framework used.  It doesn’t have to be XML, but can be minimally simple.

As I’ve implied, I want to produce such a tool (at long last!) for my own use and as something I can give away and encourage everyone else to use, too.  It should be available on all platforms, not just Windows.  So, what should I write it in?

Using Perl would be very easy, as it has common library code available for performing the hash and for traversing a directory structure, and also makes it dead simple to read and parse the text file containing the results to compare against.  The only real drawback would be the difficulty for non-technical Windows users to install, since Perl is not included on Windows by default.

Writing it in standard C++ using common portable library code means that it could be compiled for any platform to produce an executable that can be fully stand-alone.  Assuming that the SHA-256 code is obtainable easily enough, it would still need code to traverse the directory and to suck back in the results; the very things that are trivial in Perl.

To be continued…

How much would you pay for the universe?

This is inspiring.  Neil deGrasse Tyson could run for president.

This goes with it.  I actually found this first and followed the link to Tyson.

Watch in the Home Theater for best experience (or at least full screen 1080p, with headphones).  I liked the music so much that I bought the album it’s from, as a gift for my wife to listen to in the car on her traffic-jammed commute.

As it so happens, today I received a letter from The Planetary Society.  They are campaigning for action on the budget for NASA.  You can sign a petition at http://action.planetary.org.

Censorship is Wrong

This post is not to preach in support of the free exchange of ideas, as you can find that stated eloquently from many others.  Rather, I want to give some practical advice.

TOR

A few years ago, I was visiting a country that is famous for having a “firewall” that censors access to arbitrary sites on the whim of whoever is running it.  On that particular day, the blocked list included that fount of human knowledge, Wikipedia.  In fact, here is a link to TOR’s description on Wikipedia, as an example of how prevalent it is for me to do so!

I installed a Firefox browser extension that automatically tried TOR if a URL was blocked.  I don’t see that listed on currently supported extensions, but I see several proxy-switcher extensions that are generic.  If you have TOR installed, then any of those would let you change your browser settings to use TOR, or back to normal settings, by clicking an icon on the status bar.

Since the list of available or favored extensions will be always changing, I won’t list the top results that I see today.  Rather, just search for “proxy” in Firefox Extensions and look at anything that says it will Select, Switch, Toggle, etc. the proxy settings.

That would still require that TOR be installed first.  If you look at TOR’s website, you will also find Tor Browser Bundle, in many languages.  This will, with a single installer, automatically install TOR and a browser pre-configured to use it.  It may be stored on a USB stick and hidden away, and used on any machine.

Finding “unpublished” Tor entry points is the best way to stay reliably connected to the whole world, without cost.  New ones are added as they become blocked.

VPN

A VPN is normally used to connect your home PC to your office LAN.  However, the same function allows you to privately connect to another machine for any purpose.  Some companies sell accounts on computers located in friendly places such as Zurich, Amsterdam, Copenhagen, etc.  If you set up your computer to log in to this as your “office”, you can use any networking software normally, and it goes through the friendly remote location.

These are sold for a variety of purposes, including aggregating bandwidth and caching content, for users in areas with poor connections.  I would suppose some are set up specifically to combat censorship, in which case advertising them would just make them easy to block, so you’ll have to find out about them from someone who’s already on it.  In general, the paid products I know about (assuming they are not themselves blocked) offer high quality of service and no loss of bandwidth (though increased latency), and a level of service and support that goes with being a professional product.  Covert or free systems set up by activists might be more difficult to use and have low bandwidth.  For example, if I were to set up such a server on my home PC and give the address and password to relatives somewhere else, then everyone using it would be going through my home PC and sharing my total bandwidth.

You-Tube Performance

My original concern was to access Wikipedia, and if the page took a long time to get through TOR, I would just wait for it to load and then read the page when it finished.  But streaming video is another story:  If it loads slowly, you cannot watch it.

So, here is some advice for You-Tube, using a couple Firefox extensions.

Download the Video

Even with uncensored Internet access, I sometimes have problems with the network performance, which prevents me from watching a video.  I originally supposed that pausing the player would let it read ahead, and I’d watch after it buffered most of it.  But the Flash player they use does not work that way at all!  If you pause, it stops reading from the network!  It reads and plays one little chunk at a time, and does not buffer ahead.

What I really needed was to download the video file, and then watch it once I had successfully downloaded the whole thing.  This is also useful when the You-Tube player has any other kinds of problems, such as wrong aspect ratio, refusing to go full screen, or sound out of sync.  I download the file and play it on a more capable player.

Finally, for watching videos while traveling, including on a long flight, I needed to store several hours of content to watch without reliable network access at all.  The Android player has a “preload” feature, but it doesn’t have controls to force preloading or tell me when it’s done so, and furthermore it checks the site again before playing the preloaded file, so it must have a connection anyway!

You-Tube does not provide a download feature on their web page.  But, I found numerous Firefox extensions that provide this.  Many of them are poor in various ways, or are built around server-based conversion features I’m not interested in.  I wanted something that would just save the stream to a file, nothing more.  I found exactly that with the extension called “Download YouTube Videos as MP4” by a user called ialc.  This is based on a Greasemonkey script that you could run in a different browser, or adapt and change it to suit your needs.

I’ve used it for all the reasons I’ve explained, and find it simple and useful, and it just saves the file, without involving other servers or installing complicated software.  I strongly endorse it.  In fact, I am wary of any of the other extensions that offer You-Tube downloading.

Customize the You-Tube Page

Even if a video is provided as a file on another site, such as the author’s download page, it can be handy to access the You-Tube page anyway.  In particular, you can read the description, read and respond to comments, and add your ratings!

So, if you access a You-Tube page on a connection that is too poor to watch video, with the intent of using the Download button described earlier, and adding your comments, you might still find that the network connection chokes when it starts to play automatically when the page is loaded.

I tried a few Firefox extensions for You-Tube, that improve or customize the page in various ways.  I recall one feature was to “disable auto-play”, so it doesn’t start playing the video immediately when a page loads.  I ended up not using those extensions, so I don’t recall exactly which one that was!  But browsing through the extensions now, I see Stop YouTube AutoPlay by Nikola Kovacs, which is a single-purpose extension which does exactly that.  It has an option for stopping autoplay on background tabs, so I may load that on my machine for everyday use.  It’s terrible when the browser is restarted and all the tabs reload, and every one of them starts playing at the same time!

For simple changes to the You-Tube page, Greasemonkey scripts might be a better choice.

Maybe you can buffer, after all?

The SmartVideo extension by Ashish for Firefox will allow the player to buffer the video before you start playing.  It also states, “you can opt to defer video initialization video until you click on it” which I think means that it won’t do anything at all (not play, not buffer) until you tell it to.  This works for embedded players on other pages, too!

You-Tube Specific Proxy

In writing the previous section, I came upon this extension:  ProxTube by Malte Götz.  It appears to be ad-ware and was written because GEMA (Society for musical performing and mechanical reproduction rights in Germany) is making it difficult for You-Tube to provide service in Germany.  I don’t know how well it works from other countries.

 

 

 

 

 

 

Unicode on St. Patrick’s Day

I have a key to type ⌘ on my keyboard (though it doesn’t work in the MCE editor!  So I copied it from Notepad++ after typing there) which is handy for writing things like “Type ⌘B to rebuild the project.”  This is technically called the Place of Interest Sign.  But searching for clover in BabelMap gives me U+1F340 which may show up as (🍀) if you have a suitable font installed. There is also a Shamrock, (☘), with three leaves.