Incidents logged against Oracle & SQL database systems. Part 3

Jon Cowling 05-Feb-2016 14:33:04

Incidents logged against Oracle & SQL database systems. Part 3

Here is blog number 3 of 5 written by dsp Managed Services Practice Head - Marcus Cowles. In this series, Marcus will discuss the top incidents logged against database systems.

Carry on from the previous blogs about our top 5 incidents:

  1. Database maintenance
  2. Backups
  3. Performance
  4. Connectivity
  5. Patching

This episode is focusing on Performance -

Performance is something of a misnomer in IT. Users and administrators alike often complain about ‘performance issues’ without specifying exactly what their problem is. Perhaps part of the problem lies in the misunderstanding of the word itself. The word ‘performance’ itself does not offer any indication of a positive or negative experience, one would have to qualify the performance such as: ‘the performance of the system is good’ or ‘performance is bad’. A very common mistake is to say that an environment “is not performant” and unfortunately this word doesn’t mean anything in English (there is a French word but that means ‘effective’).

But somehow, when someone says “performance is a problem” we all seem to get the idea of what they’re talking about even though the sentence itself makes no sense. Semantics? You might think so but think of it in a support capacity when time is money and productivity is important and communication of an issue is paramount when resolution action is to be undertaken, the last thing you want is a misunderstanding between two people who have poor command of the language during a P1 issue.

So, what is performance? What makes it good, how can you tell if it is bad?

Performance is subjective and as such it should always be compared to a baseline to establish whether it is good or bad. So the first step in any performance investigation or analysis is to compare it to:

- What is the expected performance and how is that measured?
- What has the performance been in the past and is it being measured by the same metrics now?
- What is the impact of the current performance?

Once the above have been established a meaningful plan can be put together. Theoretically, all systems can be faster than they are currently but the nature of IT means that if you were to fine tune a system to an infinite degree, by the time the tuning had been completed a new piece of technology or protocol would have been developed meaning the investigation would have to start all over again to accommodate these new elements. There is a realistic measure to how much we can performance tune a system or process and because of this is it vitally important to set goals before any investigation commences; otherwise we will be stuck in an infinite loop of performance tuning with no idea of when to stop!

Realistically, how can we help to improve and maintain performance? There are many options in the database world, many different components which can be tuned to work together more effectively, but there are a few simple guidelines that can be followed to help maintain good performance:

Correct configuration - Follow best practices
- Industry best practices have been developed through a combination of logic, evolution and experimentation; there is a reason that best practices exist and why people respect them – they should be implemented or followed where possible

Control and identify change
- ‘Nothing changes without something changing’ – philosophical rubbish? Well, in an IT environment change in behaviour is effected by change in something else – be it positive or negative. So if performance has dropped to some degree the first question should be “what has changed since performance was acceptable?”

Analyse - Compare behaviour
- Performance problems can occur periodically, perhaps impacted by regular, expected changes (such as a backup or batch process – which would be considered a change because the backup isn’t always running.) If information is available, the behaviour of the system should be compared to a time when performance was shown to be acceptable; this will lead to knowledge-based investigation and will avoid the biggest pitfall of all – anecdotal analysis.

Facts - Avoid anecdotal analysis
- Everyone knows what anecdotal analysis is, even if they do not realise it. It sounds like this: “the system seems slow!”. By itself that sentence is useless to a technical team looking to maintain good performance – where do they look? What action do they take? Do they just go changing things at random? None of the above. The action taken must be controlled and targeted on the area of concern. For example the same user could report “I’m running a budget report which normally takes 5 minutes but it’s been running for 2 hours.” – now we have something at which we can direct our investigation and the user has also provided a benchmark of what acceptable performance is, we can proceed confidently with a measurable objective (in this case to get the same report running in 5 minutes).

Monitoring - Early warning of potential performance issues
- As with all technical systems, there are certain hot-spots which should be constantly monitored to ensure that they are running properly. Think of your car – you wouldn’t omit a fuel gauge and still expect to be able to maintain good performance of your vehicle! In this scenario you would run out of fuel, investigate the situation then refuel the car when that was established to be the cause. Far more sensible would be to pro-actively identify that the fuel is getting low, that it presents a risk (but not an actual issue) and resolve the problem before it happens. Again, we all do this in everyday life. So actively monitoring the systems can stop performance issues before they start and most monitoring solutions can also summarise or aggregate the data they collect which can provide useful trend analysis which will help to compare the behaviour now with a time when it was acceptable.

So how can we tackle ‘performance issues’? We are certainly not unarmed, but there are several pitfalls to be avoided and processes which can make the investigation more worthwhile (like requesting users to validate their issue with some facts).