I have been playing with Amazon EC2 for a couple of months now.
I have a blog dedicated to creating various databases (commercial and otherwise). At the moment the aim is to get Oracle RAC running on EC2.
So I have been creating various Amazon Machine Images (AMI), basically EC2 runs off the back of Xen virtualization software.
Rather than write a generalised blog, I am trying to be more specific with content. So this blog will cover various EC2 builds testing out the many and varied datamining/Business Intelligence (BI) software out there.
I was currently reviewing the basis of bizgres (an extension of PostgreSQL) with its commercial arm of Greenplum.
In my websurfing over the last 2 or so days since the seed of this idea formed I have also looked at the streamsql, which is a real-time streaming extension to standard SQL. It also has a commercial arm in Streambase. More about Streams shortly.
So what is the idea?
Why not use Amazon as a testing ground to providing a web service based on the above technology and more.
For example:
Instead of requiring a dedicated server for checking clickstream, why not store or upload your web access logs (even real time clicks) to Amazon S3 or an equivalent web storage service and then combining that with specific virtual instances (processing nodes) which can fire up and shutdown as required.
The process nodes morph from a streams collection tool, to a dataminer based on Bizgres, then again morphing to producing a output viewable in something like Yale.
Amazon also have a message queue service so there is nothing to stop writing a director node which spawns and reaps the above service nodes as required, storing the dataset being processed like a car on a production line.
Interested in helping out?
Either post a comment or email me at paulmoen at gmail dot com
Have Fun
Paul
Tuesday, July 3, 2007
What is VM datamine
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment