Thursday, June 14, 2007

Network Detour

Taking a quick detour to work on multi-homed storage arrays as I think this might be quite handy, with applications from off-site backup, to decreasing access times for valuable data (like FreeBSD ISOs). I mean to duplicate the specialized functionality of the Andrew File System, without quite so much focus on coding and maintaining one unified file-system -- here I hope to demonstrate that Andrew goes too far, and he does so at his own peril.

In other words, we can fudge it without losing very much functionality ... we will lose customer-oriented attention to detail in service presentation. No longer will this have the appearance to random users of being a unified file-system. I was hoping to keep random users in the loop as much as possible, but at this point I've been struggling with some of these issues for so long that I'm going to throw up my hands and answer, "this is my storage cluster. ask for help if you need anything."

I'd rather administrate a distributed storage cluster than a team of researchers anyway! So for now I will endeavor to fill my terabytes with something of value, and to make it accessible through the regular channels, whatever those may turn out to be. Ubuntu Mirrors listings and FreeBSD as well are in the sixth layer delicious notes for the day. Will probably be a few days before I can work out the kinks in the scripting and data inventory process.

I think I'll add a website with links to the data and a nice writeup about this joint interest of Tuesday Studios, Venture Creations, and the Rochester Institute of Technology. Hopefully I can justify some of the money that I have spent, and maybe if I get lucky I can get my activities tacked onto somebody's budget without too much struggle.

Today I am pretty sure that either of these commands can be used reliably inside of a cron job to create a mirror:

$ rsync -vaz --delete [rsync url] [local directory]
$ lftp -c mirror [remote [local]]

There are mechanisms to reverse the direction of this operation; see man rsync or lftp -c help mirror for more details. As long as we are only mirroring things that are already public data, then this is all you need to duplicate the effects of the AFS feature of remotely replicating read-only cells.

Unfortunately today from where I stand, it looks like there will never be a means of simple public read/write access to such a large disk. There are too many copyright issues, this is risky business -- if we're not careful about access control, we might even create something of value that doesn't technically belong to anyone, and then where will we be? Oh yeah...

This sort of policy can be enforced through other means, like access control lists on the file-system itself, or some other restriction implemented through the protocol or the server. Next step for these gigantic public mirrors are to actually make them publicly accessible and start advertising through the usual channels.

I think that I will configure both anonymous FTP and HTTP access, but I suspect that I will change my mind and use HTTP only when I get lazy. What is the value of FTP? Hmm... a convenient interface, in fact if I configure one machine with access to all of the pieces-parts of the whole storage cluster, then this might start looking like a unified file space again.

It's probably worth keeping FTP around.

3 comments:

Kingdon said...

I almost forgot about BitTorrent! And Slackware of course. I wanted to mirror Slackware ISOs, and it's possible to do this through BitTorrent.

Now if I can dig up a nice and easy command line torrent client for Ubuntu it will be a miracle.

Kingdon said...

Ha! BitTorrent is actually so convenient that my Slackware mirror has been already done for ages. Checking md5sums now, and I think I'll add a Fedora ISO mirror to the list.

Kingdon said...

Sure enough I learned something while hooking up the Fedora mirror. This is not a complaint. *Retrains self* This is not a complaint.