01-Real World Scaling and Deployment

Homework required in preparation for this lecture

  • None: Just keep working on your Incubator Project. 3 days to go!
  • Housekeeping
    • Career advice virtual office hours. Come reserve 30 minutes with me some afternoon this week
    • Rails lecture is not going to happen due to lack of time

Learning Objectives

  1. Scaling vs. Performance for mobile products
  2. Patterns of Scaling Problems
  3. Techniques on the client
  4. Techniques on the server

Scaling for Mobile Products

  • Real world examples: cafeteria flow chart
  • Performance
    • Different from scalinge
    • Performance is what one user experiences as he uses the application
    • Different from web sites, where some sites can be faster than others
    • Mobile apps are expected to be fast
    • Perception
      • Can you 'fool' the user into thinking the app is faster than it is?
      • Feedback: spinners etc
    • Reality: Do any potentially slow operation in a background thread
    • Optimization:
      • Worse sin: Optimizing early. Why?
      • Optimization is the search for bottlenecks. What's a bottleneck? Refer back to the cafeteria example.
      • Moving target. When you eliminate/improve one bottleneck, it just reveals the next one. You make starting the dashboard activity faster, so that now you can notice that drawing the map overlay is slow.
    • Important: Measurement
  • Scaling
    • "How many X per minute can you do" (e.g. user log ins, fetch of bottles, swiping to the next fashion photo.)
    • Different from response time: "How long does it take to accomplish Y?" Related but different
    • Scaling has to do with the load on the servers
Patterns of scaling problems and solutions
  • Load on the servers. Some scenarios, one or more of:
    • Too many clients (androids) asking the server to do operation O
    •  Individual clients asking the server to do operation P too often
    • Operation Q is time consuming for the server to satisfy
  • Solutions can be
    • Add an identical server to handle operations O, P or Q
    • Send operation O to one server and operation P to another server
    • Why are so many clients asking for O? Can we reduce the number?
    • What's the reason why a client would ask for operation P so often? Can we reduce that?
    • Is there a way to make operation Q faster to satisfy?

Techniques for the Client: Caching

  • Simplest: caching of different kinds
  • Caching in the mobile app itself

Pseudo Code
// without caching:
    public float getItemPrice(integer itemId) {
        urlBase = "http://vogueable.com/items/";
        Item itemFromServer = ServerProxy.getItem(urlBase + Integer.new(itemId);
        return itemfromSercer.price;

// Now with caching:
    private int cachedItem;
    private float cachedPrice;
    public float getItemPrice(int itemId) {
        urlBase = "http://vogueable.com/items/";
        if (itemId == cachedItem) {
              return cachedPrice;
        } else {
            Item itemFromServer = ServerProxy.getItem(urlBase + Integer.new(itemId);
            int cachedItem = itemId;
            float cachedPrice = itemFromServer.price;
            return cachedPrice;

    • But this can be improved further. 
    • What pattern of calls does the above example assume?
    • What bugs might this introduce?

Pseudo Code
// With even better caching
    private HashMap<int, float> priceHash;
    public float getItemPrice(int itemId) {
        urlBase = "http://vogueable.com/items/";
        if (priceHash.containsKey(itemId){
            return priceHash.get(itemId)
        } else {
            Item itemFromServer = ServerProxy.getItem(urlBase + Integer.new(itemId);
            float price = itemFroMServer.price;
            priceHash.put(itemId, price);
            return price;

    • What bugs are hidden in this code?
    • What assumptions does this kind of caching make about the scaling problem?

Techniques for the server: Caching

  • Next: Caching on the server
    • It might be that the server has to work too hard to respond to the request
    • We can use a similar technique on the server
      • For example, what if each time user wants to see the total price of items on their wishlist, it asks the server who calculates it to return it?
    • Ask yourself: how often does a user ask for the total price of items on their wishlist
  • Next: Caching in your network
    • "Key-Value" servers
    • Rely on REST URLs. 
    • Each time the server is asked for this URL http:/abc/def/1/def, return the same value without bothering the server
    • Very popular one: Memcached
    • Also see: Memcached HowTo

Pseudo Code
function get_foo(foo_id)
    foo = memcached_get("foo:" . foo_id)
    return foo if defined foo

    foo = fetch_foo_from_database(foo_id)
    memcached_set("foo:" . foo_id, foo)
    return foo

  • Next: Caching in "the cloud"
    • Content Delivery Networks (wikipedia)
    • Intercept HTTP packets closer to the client and deliver results
    • Saves on network latency

Techniques for the server: Load Balancing

  • Spread the load across multiple servers
  • Scale up vs. Scale out
    • Up: bigger and bigger computers and disk
    • Scale Out: Lots and lots of regular size servers with special jobs each
    • Note: Google, Amazon, Facebook etc. All scale out!
  • 3-Tier Architecture
    • Load balancer
      • Web Tier
      • App Tier
      • DBMS Tier
  • Purpose specific architecture
    • Funnel requests, depending on what they are for
    • Problem: Session management

Techniques for the server: More advanced

  • Database partitioning and replication
  • Queueing


  • Don't optimize too early. Your assumptions are often wrong
  • If your mobile app is slow, it may be a client performance problem or an overall scalability problem
  • They are different, treat them differently