My Secret Life as a Spaghetti Coder
home | about | contact | privacy statement | getting started with cfrails
In an effort to bring about a discussion (and hopefully diffuse some information on) scaling, last week I told the story of Origin Shabamtech and Gulfomatic's Solutioneers, bringing about an open question regarding the myth of how Gulfomatic solved Shabamtech's scaling woes.

After telling the story, I asked,
I want to figure out the mystery. Do you have any ideas? How would you determine what's causing the site to crash? What might you look at? What might you do to fix it?
A typical infrastructure diagram
A typical infrastructure diagram, from Force10 Networks.

Let's recount what we know about the situation:
  • It was making a ton of money.
  • There was an application running it.
  • There was no source code for the application, and no vendor to contact to get it.
  • The application had to access a database by knowing the DB file location.
  • Gulfomatic's Solutioneers tried several potential solutions before finding the one that worked
That's all we know initially. What does that tell us?

  • Our solution needs to be implemented quickly, and we have money to spend if we need to.
  • Even if the application itself is the cause of our problems, we can't change it because we lack the source code, so our solution has to be done outside of the app itself.
  • Whatever solution we come up with has to take into account the fact that we're using a file path to access a database.
  • You can work methodically - trying the simplest solutions first, and revise them as more information presents itself.
But where do we go from here? The answer came from shag in the comments to the original post:
any clue as to what we're working with here? what kind of app are we talking about? what are the pieces in the puzzle that makes it go? what are the symptoms? what kind of access do we have? what kind of layers to we have? what kind of hardware do we have? what kind of os do we have?

i think we need to address what we are dealing with prior to the how to resolve the problem.

that being said, from a high level (and i mean like jupiter), some basic things are:

- identify symptoms
- scour logs
We start by identifying the symptoms. In this case, the application is crashing. What are the potential bottlenecks in the system that might cause it to crash? Are we lacking bandwidth? Is the web server crawling to a halt? Is the application itself using too many resources? Can the DB handle the load being thrown its way? Maybe the combination of these things is just causing the computer to crash.

Reading the logs and monitoring the different processes will probably give you an idea of who is the culprit.

Check the application logs if it has them. Check the web server and DB logs. Check your OS logs. Just like in programming, you cannot just make changes in random places to improve performance. You need to analyze the system's behavior to find the bottleneck, which will tell you where your changes will be most effective.

That gives us several potential bottlenecks:
  • Insufficient computational ability in processor speed, disk space / speed, or memory. Completely a hardware problem.
  • The web server + database + application combined are just too much for the current hardware to handle.
  • The database itself just cannot handle the number of requests being sent its way.
  • The application (which lacks source code!) is the bottleneck. It hogs processor cycles and memory like a squirrel hoards nuts.
(What did I miss?)

Given what we know about the situation, what would you do in each scenario?

Hey! Why don't you make your life easier and subscribe to the full post or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!


Comments
Leave a comment

Do we know anything more than we did a couple of days ago? I suppose I will have to answer some of my own questions.

We now know that our db connection is file based. This would lean towards foxpro, access, sql server, sqlite, derby, txt, xml....

We know there is a web interface.

We know we are dealing with compiled code (due to lack or source code). That leans towards .net or jsp flavor

I think this info is important, as the approach to the solution may be different for each (besides my knowledge of each is different).

Hardware:
I would think this is the easiest. You should be able to add to the hardware in the current box. In many cases, you should also be able to copy it to another box and test it as an option.

One box to many:
This could be tricky. Where does the db reside? Is it on the same disk as the OS? Is there a configuration setting that would allow me to change the location? If we can change the drive, or it is not on the OS drive, we should be able to add another box and place the db on it. We can access the disk via an ssh drive or through a hardware connection such as fiber channel. We should be able to even split off the web service and the backend process.

DB handling:
This could be tricky. Is it the DB, or the DB configuration? Could it be an index issue? Perhaps we could migrate from Access to SQL Server. Perhaps we could add a layer and let the file based call be a proxy to a new db. Either my knowledge is limited, or there are to many unknowns to be able to address this.

Application:
Again, this depends on the language/type of the application. There are different tools for different code. I am certain there is a decompiler for each though. I would absolutely use this if necessary to get under the hood. I would first look for application logs. I would turn to a performance monitor to find out if it is network calls, a certain thread spawning off. I would also want to peek at the db, to see what type of activities were going on at the time it ran out of control.

As I said before. Google is our friend when it comes to problems. I tend to search for every piece of information that has a clue. It may be something I know a lot about, but I always find there are things I didn't know (which can lead to an aha moment). The things I didn't know from the start usually lead me down several other threads... if nothing else, I add to my toolbox a lot of knowledge that I would not have research otherwise.

I was hoping someone else would comment and I could see what I was missing. You are going to have to be a little less ambiguous for me. I do better with flashing neon lights.... I know...

Posted by shag on Nov 12, 2008 at 09:40 PM UTC - 6 hrs

This is a somewhat mysterious blog post (ambiguous) and in my opinion shag hit the nail on the head, "scour the logs". My approach on these sorts of issues is to work from the inside out, look at existing logs, create enhanced logging - metrics logging - verbose GC logging - scour those logs. Use SeeFusion, wrap the JDBC driver (there may not be one in this case), capture the SeeFusion data to a database, scour those logs and database output. The devil is in the details!

Posted by Mike Brunt on Nov 12, 2008 at 11:02 PM UTC - 6 hrs

@shag - Your assessment is basically what I'm looking for.

@shag and Mike - Regarding the ambiguity: While the story grew out of a real life situation, I'm trying to keep it general to illuminate basic strategies and refine them as we go along.

Hope you guys don't mind too much. =)

I'll throw some more questions out next week.

Posted by Sammy Larbi on Nov 13, 2008 at 05:32 PM UTC - 6 hrs

@sammy, i'm going to keep my eye on you... i failed my own procedure... i should have googled. your fake name gave it away... once i finally looked it up.

the sad thing is i can't say, "people still use that?" i just co-inherited an app that is completely done in the all in one db.

on top of everything else, i must have been asleep at the wheel... i completely missed the dll reference and the reference to the point-straight-to-the-file-non-ODBC database. i guess many of my questions didn't make sense at that point. although not knowing anything, those were the questions i would ask.

wow... VulpesPro makes someone a lot of money... even today...

Posted by shag on Nov 13, 2008 at 07:36 PM UTC - 6 hrs

Leave a comment

Leave this field empty
Your Name
Email (not displayed, more info?)
Website

Comment:

Subcribe to this comment thread
Remember my details
Google
Web CodeOdor.com

Me
Picture of me

Topics
.NET (27)
AI/Machine Learning (15)
Bioinformatics (3)
C and C++ (9)
cfrails (22)
ColdFusion (85)
Customer Relations (20)
Databases (2)
DRY (19)
DSLs (13)
Electronics (2)
Future Tech (6)
Games (8)
Groovy/Grails (9)
Hardware (2)
IDEs (10)
Java (45)
JavaScript (6)
Lisp (3)
Mac OS (3)
Management (4)
Miscellany (63)
OOAD (40)
Programming (137)
Programming Quotables (10)
Rails (23)
Ruby (60)
Save Your Job (65)
scriptaGulous (4)
Software Development Process (28)
TDD (43)
TDDing xorblog (6)
Tools (6)
Web Development (9)
YAGNI (12)

Resources
Agile Manifesto & Principles
Principles Of OOD
ColdFusion
CFUnit
Ruby
Ruby on Rails
JUnit



RSS 2.0: Full Post | Short Blurb
Subscribe by email:

Delivered by FeedBurner