Wednesday, July 23, 2008

Discovering Distributed Caching: Velocity

I had a great opportunity in the past week to work with a couple great minds doing a proof of concept for distributed caching.  I had first heard of distributed caching just this month while listening to a Deep Fried Bytes podcast.  They were discussing challenges and solutions to scaling extremely high traffic websites.  They implemented a distributed caching product called Memcached.  I found it interesting and put distributed caching on my to-learn stack.

So this week I got to dig in a little bit and find out how this stuff worked.  I was primarily researching Velocity, which is Microsoft's implementation of distributed caching.  Here are my notes:

There is really no best prescribed strategy for caching.  It is very much specific to your direct need in terms of how you implement it.  Generally speaking, the pattern for accessing cached objects is to look up your object based on a key, check for null - if null then pull from your secondary source (DB, file, etc.) if not null the use that object. 

Cache Patterns

One cool thing about Velocity is flexibility to choose between 3 types of cache patterns.
1) Partitioned - great for scalability.  Add a machine to the cache cluster (group of cache servers) and you have more memory for the objects to be distributed over.

2) Replicated - great for throughput.  This pattern will synchronize your cache to each server so each machine within the cluster has an identical set of cached objects.  This approach tends to be quicker because your objects are always living on the server that you request them from - but it limits you in terms of how much memory you have available to you.

3) Local - best performance.  This cache lives within your application process just like a static dictionary would.  It is faster because the objects are not serialized when the are put into the cache.

The uncool part about these 3 options is that only one is available in the current CTP release.  You'll have to wait for options 2 and 3.

Locking

Velocity supports pessimistic locking and optimistic versioning.

Optimistic version based storage will store a version with each cached item to track changes.  When a object is retrieved from the cache the version is also returned.  An update of the cached item succeeds only if the version of the passed-in cache item is the same as the one stored in the cache.

Pessimistic locking allows you to lock an object in the cache and keep others form updating the object from underneath you.

Querying

One challenge I found was if you have a set of objects (people for instance List<People>) and you want to find a person in that list based on their email address.  Well, that is a challenge because I stored each person object in the cache with a key based on the object's ID.  So to find a person by email address, I'll have to pull back ALL people and then iterate through the list in local memory.  What I need is a way to query that list before the cache is pulled back.  I've read that this is a feature that will be implemented in later CTPs of Velocity.

For now, I created a simple app that shows one possible solution to querying for objects using Tag objects from the API.  Check out the source code here.

Learning More

http://msdn.microsoft.com/en-us/library/cc645013.aspx
This post has a really good visual of the caching architecture:
http://www.25hoursaday.com/weblog/2008/06/06/VelocityADistributedInMemoryCacheFromMicrosoft.aspx

Saturday, July 19, 2008

Learning As A Developer

Something that I've struggled with ever since I started cutting code was to know where to concentrate my efforts when I go home at night and want to learn something new. It's great to download the latest shiny bits of technology and look around to see what's cool, but what I'm really interested in is being able to learn things ahead of time so when these things are needed on my team I'll be able to step up and provide value immediately.

Since I'm on a team that practices Agile development principles, as a developer I am responsible for a vertical slice of development from the user interface down to the database. The gap between user interface and database is so huge, so what the heck do I concentrate on? I could become really proficient in jQuery or ASP.Net MVC so I can be a superstar UI coder, or I could learn NHibernate or SubSonic inside out so I can rock out with the database. But no matter how much I learn about those, they could become worthless to my project if we decide to switch them out for shinier bits.

So as I've gone through a few iterations of learning frameworks and code libraries the one thing that I've found to be true is that learning the concrete implementation of software is not going to benefit me in the long term. What is valuable is reading the code (and API) and seeing how the problems are being solved. Some call these "design patterns". But really it's looking at the problem and seeing how others have naturally melded in to solving the problem in an elegant, easy to understand way. After you've seen the problem solved a few different ways you begin to see these patterns. So when you pick up the latest shiny bits of the latest MVC framework or remoting software, you're going to immediately start looking for the patterns you've seen before and start asking questions about their specific implementation. I think this makes "just in time learning" much easier.

So in summary, I think the best way to learn software and how to solve software problems is to read a lot of code. Identify key patterns of problem solving in other people's code. Don't concentrate on their API, but concentrate on the abstract ideas that they're using to solve their problems in a robust and elegant way.

Friday, July 18, 2008

Software Development Values

Software is amazing.  People have studied software development since the dawn of the computing age, and yet we still continue to struggle to get it exactly right.  There is even controversy over "when is it right?".  This is one of the things that makes software development so much fun for me.  There is no prescribed best way to do almost anything, so you've gotta rely on your gut feeling on so much... well, that and the hard knocks you endure along the way. 

How can we know that we're making the right decisions and building the best software that we can?  While reading over the Agile Manifesto tonight it became a little more clear to me.  If we define our values before making any decisions, we can allow those values to guide us in any decision we make; and in the end we're going to be better off for it.  Here is the manifesto, in all it's glory:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

To take that a step further, here are some things that I as a developer value:

  1. Readability and maintainability over development speed.
  2. Object oriented principles over procedural coding styles.
  3. Providing business value over implementing new technologies.
  4. Testable code over faster development.
  5. Thinking/Designing over writing code.
  6. Communicating over assuming.
  7. Clarity over cleverness.
  8. Development speed over code performance.