Program Management, Technology

Three Web Servers and a Database walked into a bar …

So, here’s the deal. A friend of mine posed this scenario (hypothetical or not, I don’t know and I didn’t ask!).

There is a site that has three Web Servers and all of them pointing to one Database. She said that the site wasn’t working and asked how I would help troubleshoot this.

Yes – clear as mud.

I asked if there was a Load balancer that was diverting traffic; at first she said, she did not know but later changed it to, “No.”

It sounded like that was immaterial.

So I suggested that perhaps it would be good to temporarily stop two Web Servers and allow the traffic to go through only one server, to this she suggested that then how will the customers shop?

Yes, I was surprised, since the site was not working, how will it matter if anyone shopped or not? But, she clarified, that she meant that the site was slow!

Interesting twist!

Now, as I began to wonder aloud, I figured the issue might like with one of these three:

Network

Application Code

Database

Let’s look at each of these:

Network

My next follow-up was to find out of the three Web Servers at the same location or if these were in geographically diverse locations. The rationale is that, if these are in geographically different locations and the issue was happening regardless, then network might not be an issue.

Application Code

The common aspect to all three Web Servers would be the Application Code that is engineering the site. It might be worthwhile to see if the same issue is happening in other non-Production environments. If it does, it might suggest that the Application Code could be a likely candidate.

Database

Since all We Servers point to the same Database, if the DB is not properly indexed or is poorly designed that might also cause the issues.

As a Technical Program Manager, I have faced all kinds of issues where there is not a lot of information available. The situation is riddled with ambiguity and poses additional challenge. Sometimes you do not get “closure.” The issue might occur suddenly and then go away; this does not happen often, but I have seen it happen.

Coming back to my friend, she and I discussed these and debated for some time. She is still looking for some answers.

How would you go about identifying the root cause in the above problem to help my friend? ♦

4 thoughts on “Three Web Servers and a Database walked into a bar …”

    1. Good question – neither did I ask nor was I told about it. Might be in consequential in the larger scheme of things.

      Like

  1. I would approach it this way:
    Assumptions:
    The Web Servers are also Application servers
    Given
    We have 3 Web Servers

    Observation:
    1. The configuration itself does not seem production grade; i.e. would not be able to handle heavy traffic. The reasons could be – its a light weight application or an application that is not mission critical or financial constraints or a combination of the reasons.

    2. Since the site is working, but is slow, there problem is not functional. It is non-functional – performance issue

    Approach:
    Since the application is slow any way, the approach is to divide and conquer.
    Lets say the Web Servers are – A, B and C and the database server is D.
    First we focus on the webservers, using the starvation technique.
    Decide an order (A, then B and then C) for the webservers.
    Stop all requests on A. When the traffic on A becomes zero, disable its exposure to the public domain.
    Observe the average response time of the application. If it decreases, then server A is the issue. If not, run some load tests to see if there are performance detoriorations, then its either the code or the Database.
    Once work is done on one server, with the incoming traffic on the server still turned off, add the server back to the public domain. Then turn on the incoming traffic on it.
    Repeat this for each server.
    Code Performance – If performance logs are present then turn them on, or else insert timer logs in the code to measure the performance of the various parts of the code.

    Run a Plan on the DB to see how its running after every transaction.

    This should precipitate the problem, for you fix.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s