SQS, S3, EC2, DB1?
I am trying to cover several of my bases with this post, so I hope it makes sense. As you are probably aware we have been toying with and analysing Amazon web services (AWS). We are doing so for a number of reasons including building for what we consider to be a future platform. In the short term we have been looking at migrating existing web applications hosted in traditional environments (co-hosting etc..) towards the virtual servers/services provided by AWS.
In examining most of these existing web applications we find many rely on a database backend usually Mysql/Postgresql. Although it is fairly trivial to configure an instance of EC2 (an Amazon virtual server instance) , there is a key difference to traditional servers. The difference is that when the virtual instance goes down or is switched off all data is lost because the instance is virtual. Instead AWS is built around an expectation that storage takes place using the highly redundant/reliable S3 infrastructure. This of course makes sense except in the case where one is using a database for storage as opposed to files.
Now none of this is new to AWS, this has been the case for sometime, the AWS guys have also engaged their participants in discussion around what they would like to see to help resolve these operational limitations of the EC2 service. More or less all participants agree that some sort of highly centralised redundant/resilient database be provided as an additional service. This database web service would perhaps provide JDBC/ODBC or language specific drivers that could be utilised by the running EC2 instances over the AWS network, lets call it DB1. At this point the AWS provision will have been fully enumerated to handle most classic application patterns, lets just summarise those services
1. SQS - Simple Queue Service - for the messaging pattern
2. S3 - Simple Storage service - for the file resource pattern
3. EC2 - Elastic Compute Cloud - for the computing node pattern
4. DB1 - Data Base Singularity - for the Database or 2 tier pattern
Now obviously we are missing this last piece and even my conversation with Mike Culver of AWS revealed nothing of their plans (hermetically sealed lips), although we know they are listening and we know they want to solve these pattern provisions.
So for now AWS comes up a little short as far as our migration plans go but we have plenty of time for that transition. We also have some web apps that will be able to move over that are not DB reliant. The big problem for Amazon however in the short term is that the communities that they are working hard to satisfy like Ruby on Rails (ROR) developers have great dependancies on the DB model. One of the breakthroughs that ROR provided was the abstraction of the database itself allowing the developer to talk ruby (via Active record) to their domain objects. Of course without being able to use Active record out of the box in a reliable fashion, the enchantment that is ROR becomes less magical as an experience. This is not confined to ROR of course most web app frameworks across all of the popular languages and tools are effected in a similar manner.
I wouldn't let this post reduce the incredible value that Amazon is delivering with AWS for so many situations it is just revolutionary as a backend model. Additionally anything that means we have less servers and data-centers to look after is a good thing because we can concentrate on the innovation. For now we will just have to use clusters and clever work-arounds to take advantage of what's on the table with AWS. I guess our biggest surprise to date is that Microsoft or Google haven't yet brought anything even remotely competitive with AWS to the market yet, and what ever happened to IBM's and Sun's utility computing infrastructures? Even though AWS isn't perfect it's a million miles ahead of any other offering and provides complete openness for languages and tools whether thats ROR, PHP, Grails, Python or Java. The innovation I see happening around AWS (think about dynamically deployed nodes as instances like domain objects), looks set to shape the next generation of web app and service development, if you haven't looked at it yet you really should do so asap.
In examining most of these existing web applications we find many rely on a database backend usually Mysql/Postgresql. Although it is fairly trivial to configure an instance of EC2 (an Amazon virtual server instance) , there is a key difference to traditional servers. The difference is that when the virtual instance goes down or is switched off all data is lost because the instance is virtual. Instead AWS is built around an expectation that storage takes place using the highly redundant/reliable S3 infrastructure. This of course makes sense except in the case where one is using a database for storage as opposed to files.
Now none of this is new to AWS, this has been the case for sometime, the AWS guys have also engaged their participants in discussion around what they would like to see to help resolve these operational limitations of the EC2 service. More or less all participants agree that some sort of highly centralised redundant/resilient database be provided as an additional service. This database web service would perhaps provide JDBC/ODBC or language specific drivers that could be utilised by the running EC2 instances over the AWS network, lets call it DB1. At this point the AWS provision will have been fully enumerated to handle most classic application patterns, lets just summarise those services
1. SQS - Simple Queue Service - for the messaging pattern
2. S3 - Simple Storage service - for the file resource pattern
3. EC2 - Elastic Compute Cloud - for the computing node pattern
4. DB1 - Data Base Singularity - for the Database or 2 tier pattern
Now obviously we are missing this last piece and even my conversation with Mike Culver of AWS revealed nothing of their plans (hermetically sealed lips), although we know they are listening and we know they want to solve these pattern provisions.
So for now AWS comes up a little short as far as our migration plans go but we have plenty of time for that transition. We also have some web apps that will be able to move over that are not DB reliant. The big problem for Amazon however in the short term is that the communities that they are working hard to satisfy like Ruby on Rails (ROR) developers have great dependancies on the DB model. One of the breakthroughs that ROR provided was the abstraction of the database itself allowing the developer to talk ruby (via Active record) to their domain objects. Of course without being able to use Active record out of the box in a reliable fashion, the enchantment that is ROR becomes less magical as an experience. This is not confined to ROR of course most web app frameworks across all of the popular languages and tools are effected in a similar manner.
I wouldn't let this post reduce the incredible value that Amazon is delivering with AWS for so many situations it is just revolutionary as a backend model. Additionally anything that means we have less servers and data-centers to look after is a good thing because we can concentrate on the innovation. For now we will just have to use clusters and clever work-arounds to take advantage of what's on the table with AWS. I guess our biggest surprise to date is that Microsoft or Google haven't yet brought anything even remotely competitive with AWS to the market yet, and what ever happened to IBM's and Sun's utility computing infrastructures? Even though AWS isn't perfect it's a million miles ahead of any other offering and provides complete openness for languages and tools whether thats ROR, PHP, Grails, Python or Java. The innovation I see happening around AWS (think about dynamically deployed nodes as instances like domain objects), looks set to shape the next generation of web app and service development, if you haven't looked at it yet you really should do so asap.
Re: SQS, S3, EC2, DB1?
Came to this post via your comment here: http://blog.circleshare.com/index.php?/archives/51-All-a-Twitter.html
I've been doing some on paper calculations to try to figure out if applications like Twitter could run cost effectively on AWS.
I agree that persistence is the big problem. The current model for S3 seems to only work with file intensive systems like photo storage sites.
It would be very difficult to run an application like twitter with frequent small appends on S3 with its write once architecture.
I think amazon is very close to developing the killer hosting environment with AWS, but they need a better general storage solution.
I've been doing some on paper calculations to try to figure out if applications like Twitter could run cost effectively on AWS.
I agree that persistence is the big problem. The current model for S3 seems to only work with file intensive systems like photo storage sites.
It would be very difficult to run an application like twitter with frequent small appends on S3 with its write once architecture.
I think amazon is very close to developing the killer hosting environment with AWS, but they need a better general storage solution.
Re: SQS, S3, EC2, DB1?
Actually we have completely changed the way we write web apps and moved away from the Rails type MVC/DB model. This enables us to run our services on AWS and other competitive and emerging infrastructures, it just involves re-thinking the solutions. The Big upside is greater scalability, simplicity and better support for emerging business models. I will be able to talk about this more soon, after we come through our limited betas.