Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Monday, February 9, 2009

Tryouts: XulRunner Data Grab

So, I've got this hunk of data little over 1GB that I'm pretty sure is going to be essential to my business. It's not the 1GB size, or even the particular data that I'm working with, but the fact that I need access to it, at whatever machine I'm using today.

We've got two users on the same machine collaborating on that data using Unison File Sync. How's it going? One user wants to use Subversion, the other wants to use Git. Integration is a pain that we don't want to tackle now. So, we're giving each user their own copy of the data, with their own exclusive write permissions on that copy, and they can share that data with anyone. They're going to share passwords, or they're going to share a third-party data store on another machine, and it's going to handle the access control rights between these two folks.

Unison supports most popular file transfer mechanisms. Our hunk is exported via FTP to Kingdon's own location on the house machine, /export/ftp/home/kingdon, which is encapsulated in a chroot jail so FTP users can't get out without connecting by another protocol. The firewall is responsible for making sure that only appropriate users will access the machine by any protocol, either FTP, SSH, IMAP, so there are actually plenty of different ways in, but this port's VSFTPD process is restricted to data inside of /export/ftp because it was executed inside of a chroot jail.

The firewall isn't doing its job reliably. The first firewall gets its IP address from the cable modem, doles out a static IP on a private subnet to the second firewall, which assigns a static IP to the house machine on a second private subnet, exposes the whole internal network to anyone with an 802.11 wireless client device that knows how to read, and forwards port 21:FTP past both firewalls to the house machine, so users on the outside with a username and password can access FTP and that big storage area with about 120GB of space on /export/ftp

kpb1363@hilly:~/spring2006$ ftp house.tuesdaystudios.com
ftp: connect: Connection timed out

Bummer. Try again tomorrow? I'll look at the firewall and figure out what's up when I get home.

Ruby on Rails: http://getontracks.org/downloads/index

Tracks has a new version out! I'm going to deploy it on my ArchLinux host, currently represented by irie-arch.tuesdaystudios.com, and I wanted to put up a page there that describes the services exposed, including pricing info, in case someone asks for their own copy from me. Maybe I'll use Instiki for this.

Meet our customer and his ad-hoc server farm (two machines in one) that live in Rochester on a University network, with a nearby home-based backup server for super cheap.

Wednesday, June 25, 2008

I'm talking about Data Mining

I always imagined that my Del.icio.us Inbox should be my primary Inbox, and that if a person wanted to say something to me, they could simply put it on a web page and "say it out loud," you know what I mean?

Some web developer types are taking over my office at Tuesday Studios this week, and I am about to start wondering just how much free time they've got... not fluent in any web frameworks but I've lots of ideas, and I bought a book on Groovy and Grails the Java-derived dynamic languages, but there are diversions...

My business partner at Tuesday Studios is a Ruby pro, and I am finishing my Computer Science degree in November (God-willing) with some classes in Haskell and C#, so unfortunately Grails will have to wait.

Web Developers? Send me your portfolio!
Get an account, and tag your stuff with "for:yebyen"

http://del.icio.us/yebyen

Wednesday, April 23, 2008

The Sixth Layer: A Business Blog

Building an example of how to build a churning queue for blog posting bandits on the internet... this Blogger blog on blogspot.com is a good example of a content channel, the basic "news output," the first step to recognition in any sort of publishing company. The most you can ask for in the realm of consumable content (here we're talking newspaper articles, not video games) is a content author who provides content in text, audio, video, and URL formats.

Visit the Posts (Atom) feed link on the bottom of the front page and find the FeedBurner RSS frontend which shows the content in this blog: I am generally not posting MP3 or Video content, but the capability is there between Blogger and FeedBurner to add media files for a rich feed viewer such as iTunes to recognize and retrieve. You can include podcast content in your blog postings!

Text articles with embedded content can be viewed in a web browser and podcasts can be scanned in a podcast aggregator. Now, what about those URL-based posts with titles, text summaries and tag data: they're all coming from a particular account on the del.icio.us service, and if you have the account name you can request the set of tags by URL. But, what if you want to edit these tags?

What if you want to collect a group of posts, and tag them all a certain way? What if you want to drill down inside a large set of already tagged posts, and update the tags for a subset? The interface to handle this behavior has already been developed with tagging for gmail, but the feature described here is not available on this database... yet.

I'm on a mission to build this feature into my workflow. Anybody want to stop me? Please, I'm begging you, tell me this feature already exists before I write it myself!

Saturday, June 9, 2007

Data Retention Plan

You want a data retention plan? I'll give you a data retention plan. Open up your organization! Depend on free services! Put everything you own on the internet, and you will be just fine! Trust me ;-)

In related news, there's a new class of student companies joining us at Venture Creations, the business incubator sponsored by RIT, which houses Tuesday Studios and the server room! In reality I do have some servers of my own, and I have really put a lot of consideration into data retention over the last 8-10 years as I have watched my own precious data occasionally slip through the cracks when tossing an old computer to the curb. First step is to centralize your data and then give it to me... I'll take good care of it for you!

If that doesn't sound attractive to you, then my next recommendation is to give your data to Google. If you're worried about data mining, no, I'm sure it'll be MUCH safer with Google than in my own server rooms... the world's foremost experts in data mining? Nah, I really don't think they'd try anything too fishy.

Still, if that sounds too much like charity, or you like the idea of doing things in house, I am always happy to pay a visit and perform a consultation, or give a tour of my own server room. I could fill at least 1 hour's time talking about where I keep my data, and the measures that I employ to keep it safe.

In fact I will even go so far as to say, free consultation. Bear in mind this may be a limited time offer; I could get very busy tomorrow. You know, this is not a contract to work at no cost for the rest of my days.

I was going to start digging up information on those new companies at the incubator! Tune into my del.icio.us feed for the scoop, I'm going to see what Google already knows. Hopefully I'll meet some of these people soon and post again to tell you how cool their services are; I don't like to interrupt people when they're working, and everyone tends to look busy when I walk by their desks. I wonder if that's because my card says President?