Long running processes and RESTful APIs.

I have to create a RESTful API that can accept a request for a resource that may take some time to find as it’s the result of a complex deterministic model. The ASP.NET Web API is hosted on an Azure Web Role and exposes a RESTful interface to the clients. The clients are not web browsers, in this case, but other applications.

On the initial GET or “fetch” of a particular resource the very long querystring that contains all of the data the model run requires is handled with a 202 Accepted. Then a 304 Not Modified is returned on each subsequent request for the same resource until finally we have the new resource in our server’s cache, placed there by a background process that monitors a queue that in turn is fed by the model engine. Finally we will return the new resource with a 200 OK. As the result already exists, in a way, and we are just having to take some time to fetch it for the client, then the cache is not being updated with a result but rather it is caching the resource from the application.

In REST the resource has an id. In this case the id is the URI or the ETag (the Entity Tag being the hash of the URI). This is not a “process”, like a POST or “insert” of a resource, nothing in the data is being changed, it’s just a very slow request.

Request Response Ping Ping

The initial request immediately returns a 202 Accepted. This is better than holding the connection open for up to the 4 minutes allowed by the Azure Load Balancer as that would be expensive and we would run the risk of overloading our application.

The 202 response carries with it an ETag (Entity Tag) which is the token that the client can use to make another request to the application. An ETag represents a resource and is intended to be used for caching. We are caching our resource and it’s current state is empty.

The client will then present the ETag in the If-None-Match header value. This is the specified way to check for any changes to a resource. If the state of the resource has not changed then the application will return a 304 Not Modified. If the resource has changed, which in practice means the application has completed its run and dumped the result in the cache, then the application will return the current state of the resource. The ETag is also returned, should the client wish to request the resource again, and a 200 OK to indicate the end of the request.

Note that the above is only example code that has been stripped of some conditional checks and guard clauses for readability.

Windows Azure Cloud Storage using the Repository pattern

Repository pattern instead of an ORM but with added Unit of Work and Specification patterns

When querying Azure Tables you will usually use the .NET client to the RESTful interface. The .NET client provides a familiar ADO.NET syntax that is easy to use and works wonderfully with LINQ. To prevent the access code becoming scattered through your code you should be collecting it into some kind of DAL. You should also be thinking about testability of your code and the simplist way to provide this is to have interfaces to your data access code. Okay, so there’s nothing earth-shattering here but getting the patterns together and learning to use Azure Tables to their best is probably new to you or your project.


What do you want to provide to every object that needs a backing store? I’d suggest searching and saving so here are the two methods every repository is going to need.

public interface IRepository<TEntity> where TEntity : TableServiceEntity
  IEnumerable<TEntity> Find(params Specification<TEntity>[] specifications);

  void Save(TEntity item);


What about getting back a particular entity, making changes and saving that back? The first thing to note is that in Azure Tables an entity is stored in the properties of a Table row *but* other entities may also be stored in the same Table. So think entity and not table, which is different to how you would normally think of a repository.

Let’s say for this example I want to be able to get a single entity, a range of entities, to be able to delete a given entity and even to page through a range of entities.

To keep the code cleaner I’m going to pass in the parameters as already formed predicates for my where clause. There’s little advantage to using the Specification pattern here other than I think it makes the code a little more explicit.

public interface IEntityRepository : IRepository<Entity>
    void Delete(Entity item);

    Entity GetEntity(params Specification<Entity>[] specifications);

    IEnumerable<Entity> GetEntities(
        params Specification<Entity>[] specifications);

    IEnumerable<Entity> GetEntitiesPaged(
        string key, int pageIndex, int pageSize);


public class EntityRepository : RepositoryBase, IEntityRepository
    public EntityRepository(IUnitOfWork context) 
        : base(context, "table")

    public void Save(Entity entity)
        // Insert or Merge Entity aka Upsert (>= v.1.4).
        // In case we are already tracking the entity we must 
        // first detach for the Upsert to work.
        this.Context.AttachTo(this.Table, entity);

    public void Delete(Entity entity)

    public Entity GetEntity(
        params Specification<Entity>[] specifications)
        return this.Find(specifications).FirstOrDefault();

    public IEnumerable<Entity> GetEntities(
        params Specification<Entity>[] specifications)
        // new ByKeySpecification("partitionKey")
        return this.Find(specifications);

    public IEnumerable<Entity> GetEntitiesPaged(
        string partitionKey, int pageIndex, int pageSize)
        var results = this.Find(
            new ByPartitionKeySpecification("partitionKey"));

        return results.Skip(pageIndex * pageSize).Take(pageSize);

    public IEnumerable<Entity> Find(
        params Specification<Entity>[] specifications)
        IQueryable<Entity> query = 

        query = specifications.Aggregate(
            query, (current, spec) => 

        return query.ToArray();

It’s easy enough to pass in a context for your repository following the Unit of Work pattern. You can create this quite simply (see TableStorageContext following). You have to define which Table your entity is stored in and you want that and your context as properties of your class. I find it cleaner to manage (and easier for the next developer to implement) if that work is done in a base class, RepositoryBase.

public class RepositoryBase
    public RepositoryBase(IUnitOfWork context, string table)
        if (context == null)
            throw new ArgumentNullException("context");

        if (string.IsNullOrEmpty(table))
            throw new ArgumentNullException(
                "table", "Expected a table name.");

        this.Context = context as TableServiceContext;
        this.Table = table;

        // belt-and-braces code - 
        // ensure the table is there for the repository.
        if (this.Context != null)
            var cloudTableClient = 
                new CloudTableClient(

    protected TableServiceContext Context { get; private set; }

    protected string Table { get; private set; }

So now we actually get to the meat of the matter and implement our TableServiceContext methods for the CRUD functionality we need. In this example I’ve a single Save method that uses the ‘Upsert’ (InsertOrMerge) functionality available in Azure since v.1.4 (2011-08). The Find method is there for convience – if it doesn’t suit your query then simply don’t use it.


public class TableStorageContext : TableServiceContext, IUnitOfWork
    // Constructor allows for setting up a specific 
    // connection string (for testing).
    public TableStorageContext(string connectionString = null)
        : base(

    // NOTE: the implementation of Commit may vary depending on 
    // your desired table behaviour.
    public void Commit()
            // Insert or Merge Entity aka Upsert (>=v.1.4) uses 
            // SaveChangesOptions.None to generate a merge request.
        catch (DataServiceRequestException exception)
            var dataServiceClientException =       
                exception.InnerException as 
            if (dataServiceClientException != null)
                if (
                    dataServiceClientException.StatusCode == 
                    // a conflict may arise on a retry where it
                    // succeeded so this is ignored.


    public void Rollback()
        // TODO: clean up context.

    private static string BaseAddress(string connectionString)
        return CloudStorageAccount(connectionString)

    private static StorageCredentials CloudCredentials(
        string connectionString)
        return CloudStorageAccount(connectionString).Credentials;

    private static CloudStorageAccount CloudStorageAccount(
        string connectionString)
        var cloudConnectionString = 
            connectionString ?? 
        var cloudStorageAccount =     
        return cloudStorageAccount;

    private void SetupContext()
            * this retry policy will introduce a greater delay if 
            * there are retries than the original setting of 3 retries 
            * in 3 seconds but it will then show up a problem with 
            * the system without the system failing completely.
        this.RetryPolicy = 

        // don't throw a DataServiceRequestException when 
        // a row doesn't exist.
        this.IgnoreResourceNotFoundException = true;

In my ServiceDefinition config I have a CloudConnectionString. This has to be parsed to get the endpoint and account details before I can create the TableServiceContext. A couple of static methods do the job. This object also implements the Commit and Rollback methods for the Unit of Work. My Commit is implementing ‘Upsert’ so you may want it to be different or you may want to have different implementations of TableStorageContext that you can pass in to your Repository class depending on how it needs to talk to storage.

Further Architectural Options

I favour Uncle Bob’s Clean Architecture and as such I wouldn’t expose my Repository classes to other modules. I would wrap them in a further service layer that would receive and pass back Model objects. Cloud Table Storage is much more flexible than relational database storage but you have to think about it quite differently and the structure of your code will be very different to what you may be used to.

I’ve placed the Repository project on github: WindowsAzureRepository.