Cloud and failure

Despite all the cloud talk and where I live, it is like the cloud mecca, for enterprises it is still quite new and many are just starting to think about it. A hard lesson that many of us learn (and partly how we amass our scars) is to design for failures. For those, who run things in their enterprises data center, are quite spoilt I think. Failures are rare, and if machines or state goes down, moving to another one isn’t really a big deal (of course it is a little more complex, and not to say, there isn’t any down time, or business loss, etc.).

When thinking about a cloud migration (hybrid or otherwise) – a key rule is that you are guaranteed to have failures – at many aspects, and those cannot be exceptional conditions, but rather the normal design and expected behavior. As a result, you app/services/API/whatever needs to be designed for failure. And not only how your loosely couple your architecture to be able to handle these situations, but also, how the response isn’t a binary (yay, or a fancy 404); but rather a degraded experience, where your app/service/API/whatever still performs albeit in a deprecated mode.

Things that can throw one off, and is food for thought (not exhaustive, or on any particular order):

  • Managing state (when failures is guaranteed)
  • Latency – cloud is fast, but slower than your internal data center; you know – physics. 🙂 How are your REST API’s handling latency, and are they degrading performance?
  • “Chatiness” – how talkative, are your things on the wire? And how big is the payload?
  • Rollback, or fall forward?
  • Lossy transfers (if data structure sizes are large)
  • DevOps – mashing up of Developers, and Operations (what some call SRE) – you own the stuff you build, and, responsible for it.
  • AutoScale – most think this is to scale up, but it also means to scale down when resources are not needed.
  • Physical deployments – Regional deployment vs. global ones – there isn’t a right or wrong answer, it frankly depends on the service and what you are trying to do. Personally, I would lean towards regional first.
  • Production deployment strategies – various ways to skin a cat and no one is right or wrong per se (except, please don’t do a basic deployment) – that is suicide. I am use to A/B testing, but also what is now called Blue/Green deployment. Read up more here. And of course use some kind of a deployment window (that works for your business) – this allows you and your team to watch what is going on, and take corrective actions if required.
  • Automate everything you can; yes its not free, but you recoup that investment pretty quick; and will still have hair on the scalp!
  • Instrument – if you can’t measure it, you can’t fix it.

Again, not an exhaustive list, but rather meant to get one thinking. There are also some inherent assumptions – e.g. automation and production deployment suggests, there is some automated testing in place; and a CI/CD strategy and supporting tools.

Bottom line – when it comes to cloud (or any other distributed architecture), the best way to avoid failure is to fail constantly!

Is rand() harmful?

​I saw this awesome presentation on why rand() is considered harmful. When you need a random number, don’t call rand() and especially don’t say rand() % 100! This presentation will explain why that’s so terrible, and how C++11’s header can make your life so much easier.

If you need uniqueness and non-deterministic, especially on the context of security or crypto then you need to think about a few things. For example the frequency, non-uniform distribution, and not using a pseudo random number generator (such as Mersenne twister) and not a linear congruential generator.

Automated Code Reviews with Visual Studio?

I have been thinking of doing some code ‘smelliness’ test, and am keen to automate code reviews (as much as possible).

I am interested to know what tools have you guys used? I want to use the tools to find the low hanging fruits and know off the 80% of things and then we manually look at the more interesting aspects, which the tools don’t (or can’t) pick up.

Ideally, I would like this as an add-in to Visual Studio, which can run as part of a build and depending on how one configures it, can get to a gated check-in and/or work-items being created in TFS which then can be assigned and tracked.

What I am thinking is to complement the likes of FxCop, the built-in Visual Studio tools. There was TeamReview which I had looked at some point in the past, but we never got it running successfully. I have not had a chance to see it since then.

Someone has also attempted some of this via this, but it does not seem to go anywhere.

Surely, there someone has already build this which we can look into?

My TechEd Presentation – Building cross-platform Modern Apps – the Design perspective

TechEd a couple of months ago was really fun and I am grateful to Microsoft folks for giving me an opportunity to be both part of the Keynote and also have a slot in the Architecture track. Sorry it has taken me a very long time to upload my TechEd talk “Building cross-platform Modern Apps: the Design perspective”. But as they say better late than never.  😎

You can download a copy of my presentation – Xamarin – Building cross-platform.

Feel free to ping me if you have any questions.

Fallacies of Distributed Computing

I was reading something and came across these fallacies of Distributed Computing which all beginners (to distributed computing) have. Oh how we all learn.

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn’t change
  6. There is only one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Concurrency and CEP

The sooner we all understand the Concurrency ≠ CEP (Complex Event Processing), the better the world will be!

CEP is generally used when we implement real-time systems (of course that is not the only area where CEP is used). Real-time does not mean concurrent or for that matter high-performing system.

Of course there are correlations, but at the same time they are fundamentally different paradigms.

What I am working on today? Optimisation Algorithms

I often get the question – a what am I working on today? Some of the things I can’t discuss in an open forum, but some I can. Those that I can, I thought it was best to share via my blog and do quick small posts on it. Will this become a new series? Well time will tell – depends on how much bandwidth I will have.

This weekend, I am researching Optimisation Algorithms – both Deterministic and Probabilistic. Specifically interested in Swarm Intelligence (which are a type of Monte Carlo algorithm) and in that in Ant Colony and Particle Swarm routing are the main ones I am researching.

Occasionally Connected Architecture

When implementing an occasionally connected architecture for a solution, there are three fundamental requirements:

  1. Part of the overall solution, some smart client is deployed and installed on the desktop and a web only approach is not possible. The main rational being that a smart client can work in a disconnected mode which of course with a web application is not possible.
  2. Underlying infrastructure needs to be in place to support this. Infrastructure is not specifically networks and servers, but also both the operational environment and the user’s environment and machine. The operational environments need to allow things such as: data caching, local storage of user data, user profile details, etc.
  3. More robust exception management process – this is not only about handling errors but also understanding the fact that the application is in a disconnected state and needs to do things differently.

When designing an occasionally connected application, there are two design approaches that one can take – data centric or service oriented.

  1. Data Centric – Applications had a RDBMS of some sort installed locally and use the built-in capabilities of that RDBMS to propagate and sync data including resolving any conflicts.
    1. Server publishes data, which a client subscribes to and is copied locally. The conflict resolution (as changes can be both on the server or client) needs to be agreed upfront.
    2. Generally the database’s built-in conflict resolution is used – this makes it simpler for the application as one does not need to build this in the application.
    3. As there is only one data repository, the data convergence is guaranteed between the client and the server.
    4. Both the client and the server are tightly coupled.
    5. As a database needs to run locally, machines with small footprints or devices such as mobile phones will not be able to run this.
    6. If deployment is an issue then there is more work required here.
  2. Service-Oriented – Applications use the SOA paradigm and store information in messages which are queued (when disconnected) and send to the server when connected for processing.
    1. The client can interact with any service required and focuses on the service requests instead of the local data i.e. are loosely coupled.
    2. No local RDBMS required; of course some state information would still need to be saved.
    3. Better when needs to interact outside of the firewall (e.g. Internet or Intranet)
    4. Deployment is still required, but is simpler.

For Data centric application, from a design perspective the following aspects should be factored in:

  1. Application needs to be aware of the merge-replication schemes that are implemented as the application needs to optimise for data updates and conflicts.
  2. As a result, ACID properties are not used for transactions; instead a pub-sub model is implemented.

On the other hand, for Service-oriented apps, the application design should address the following:

  • Application has to implement asynchronous communication.
  • Overall solution needs to keep all the network interactions simple and cannot be complex.
  • Application needs to add data caching capabilities
  • Application needs to implement robust connection management (e.g. Manual vs. Automatic)
  • Implement a store-and-forward mechanism such as using MSMQ.
  • Application needs to implement a robust data and business rule conflict manager.
  • Interacting with CRUD like Web services.
  • The application and the work can be logically broken into “chunks” to allow one using a task-based approach.
  • The application should be able to handle both forward and reverse dependencies which in turn could be complex business logic.

As a high level guide, a data centric approach should be used when:

  • One can deploy a database instance on the client.
  • The application can function in a two-tier environment.
  • One can tightly couple the client to the server through data schema definitions and communication protocol.
  • There is a need for built-in change tracking and synchronization.
  • One wants to rely on the database to handle data reconciliation conflicts and minimize the amount of custom reconciliation code that needs to be written.
  • There is no need to interact with multiple disparate services.
  • Users are able to connect to a database directly through a LAN/VPN/IPsec.

And, a service oriented approach should be taken when:

  • One wants to decouple the client and server to allow independent versioning and deployment.
  • There is a need for more control and flexibility over data reconciliation issues.
  • The delivery team has expertise to write more advanced application infrastructure code.
  • There is a need for a lightweight client footprint.
  • The applications can be structured into a service-oriented architecture.
  • There is a need for specific business functionality (for example, custom business rules and processing, flexible reconciliation, and so on).
    • Note: One might also need to look at a few good rules engine if this is the case.
  • One needs control over the schema of data stored on the client and flexibility that might be different from the server.
  • The application needs to interact with multiple services using different communication technologies (Web services, Message Queuing, RPC, etc.).
  • There is a need for a custom security scheme.
  • The application needs to operate outside of the firewall.

C++ Message queuing options?

I am thinking of implementing a queue in one of the projects I am working on right now (sorry cannot go into more details until it gets published – hopefully in a few months). Anywyas, this is in C++ which needs to run on Ubuntu and my queueing experience (with C++ or otherwise) is only with MSMQ which is brilliant, but does not help me here as that run only on Windows. I also cannot use something like STL Queue as this will need to run across a number of machines and trying to sync between them would a royal pain. In other words, this needs to be distributed and async “loose” messaging. 🙂

I am already using MOOS, so one option is for me to continue to use that – however this is for another part of the application and it might be easier for me to use something else (still need to think it through a little more).

These are the requirements (these are must haves!). Also if it makes a difference I am using CDT for this project.

  • Needs to be able to run on Ubuntu 9.04 (and higher)
  • Needs to be Open Source (cannot be commercial)
  • Needs to be able to store messages “offline”
  • Needs to be able to run on TCP with minimal dependencies. It would be nice not to have a whole bunch of underlying dependencies.
  • Preferably be easy to use (as a consumer) – I don’t have much time to read through loads of documentation just to get my head around the underlying object model and how to use it.
  • C++ support (if it was not obvious until now)

I did a little research online and came across the following, and wanted to get some feedback:

  • ActiveMQ – seems like it has good C++ support via CMS (C++ Messaging Service).
  • Amazon SQS –  not sure how good the C++ support is. If there is no library per se that I can use, then writing things around REST APIs might be more painful. Also I suddenly have a dependency to be able to go to the public internet. Also it is not free (though there is a free 100K messages / month).
  • MQ4CPP – seems quite amateurish (kudos to the guy writing it though – seems like an interesting project to pick up when once has time).
  • RabbitMQ – I know some guys used this at work (though that was using it in .NET); nothing for C++, but there some C experimental code; overall does not inspire confidence (in the context of C++).
  • OpenAMQ – seems quite interesting and also has a C++ API based on its WireAPI.
  • Anything else??

At face value it seems like this is down to ActiveMQ and OpenAMQ. Just looking at the quick samples between the two ActiveMQ seems like more C++ friendly and easier to use compared to OpenAMQ. Of course this is just the first impression and I could be completely wrong – it is not like I have had a chance to play with this (yet anyways).

Does anyone have any experience and feedback on this matter? Feel free to comment on this post, or tweet me.

MOOS

I don’t think many people have heard of MOOS (which stands of Mission Oriented Operating Suite); I have been working with it for the past few months as part of my dissertation. And I must admit, the more I play with it, the more impressed I am. It is quite simple and yet powerful.

Whilst MOOS’s roots are in robotics (MRG) and embedded systems, I wonder if I can extend it to use it some of the grid computing scenarios. Maybe implement a pMapReduce or pHadoop? Or perhaps a .NET implementation. Hmm, just need some time. If you need a robust, cross platform communication layer then check out MOOS.

Analysis of Algorithms

If you were interested in algorithms and interested in some mathematical foundations for algorithm analysis? For example if you are interested in proof techniques, probability, Amortization analysis techniques, Case studies and Asymptotic notions (such as Big-Oh, Big-Omega, Little-oh, little-omega, Big-Theta) then check out these lecture notes (in ppt, 224kb) from California State University.

Good UML Tool (and free too)

Love it or hate it UML is important for anyone involved (Architect/Developer/Whoever) – either you need to create designs based on UML and you need to understand that someone else has. Sure it has its challenges and for some specific things there are better solutions (DSL's – more on that some other day). I am "old school" and over the years have used Rational Rose (or whatever IBM has renamed it to since buying Rational out).

But as are aware Rational tools are very pricey and I was on the look out for something reasonable. Most people have heard of EA (no, not the game company) which is a decent tool and not too expensive (at least the Desktop edition).

Of course if you are stingy like me and want something completely free then check out StarUML which is an open source UML/MDA platform and is pretty decent. When you work with it enough you will find some annoying things – most of which has workarounds. There are one or two things it just cannot do, but I only have been using it for a few months now and am quite impressed. Is it better than the likes of EA or Rose, nope but it is pretty damn good for the price.

image

Microsoft Architect Insight Conference

Microsoft hosts an annual event in the UK called the Architect Insight Conference. I am one of the speakers this year and will be presenting on “.NET 3.0 in the Enterprise”. This is a pretty good event and I would recommend it if you have not been to one of these. You can check out the Agenda here and if you want, register here and you can find out more information on the Speakers here. Here is a blurb from Microsoft on what can you expect to hear:

You’ll be able to engage in the technology debate with thought-provoking in-depth sessions from customer and partner architects along with members of the UK Microsoft Architect Council such as Avanade, Capgemini, Conchango and Solidsoft.

The agenda is split into seven tracks covering Enterprise, Real World, Identity, Lifecycle, SaaS, Collaboration and Dynamic Systems; the main content sessions will be 75 or 150 minutes in length to accommodate the different formats and levels of interaction that may be required.

There will be an emphasis on architectural ‘investigation’ through the use of small focus groups, as well as a structured networking clinic where individuals from similar vertical business backgrounds can discuss and work through a particular problem domain.

One man's feature is another man's bloat – The Java Performance Debate

Firstly this is not a Java bashing and I don’t preach to say .NET/C++, etc is faster. However, based on what I have seen it sure is slow – slow like a snail. Maybe its the time that takes to load the VM or maybe it Swing – gurk! I like how Andy puts it – “One man’s feature is another man’s bloat”. He has a very objective article on the area which are slow, what Sun is doing to address it and what the main issues (with the developers) are – who don’t know how to use it. He talks about the Memory, JVM, Desktop and Java2D

CLR: Under the Hood

The CLR team has a couple of slides from their roadshow where they talk about two tracks, one discusses what happens insight the CLR, if you have some of the books recommended in the presentation, none of this would be new to you. It covers things like the IL which is the abstract representation of an execution semantic and how that is represented using an abstract stack machine, where we consecutively execute each instruction, using the stack as the evaluation of that execution and how this stack abstraction works. And two, there is a discussion on perf engineering including the GC, costs and pitfalls, etc.

On a different note, been out and about on a few days of holiday with family visiting, but its good to be back now. 🙂