Strict Standards: Declaration of Walker_Page::start_lvl() should be compatible with Walker::start_lvl(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::end_lvl() should be compatible with Walker::end_lvl(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::start_el() should be compatible with Walker::start_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::end_el() should be compatible with Walker::end_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_PageDropdown::start_el() should be compatible with Walker::start_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 593

Strict Standards: Declaration of Walker_Category::start_lvl() should be compatible with Walker::start_lvl(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::end_lvl() should be compatible with Walker::end_lvl(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::start_el() should be compatible with Walker::start_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::end_el() should be compatible with Walker::end_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_CategoryDropdown::start_el() should be compatible with Walker::start_el(&$output) in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/classes.php on line 710

Strict Standards: Redefining already defined constructor for class wpdb in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/wp-db.php on line 58

Deprecated: Assigning the return value of new by reference is deprecated in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/cache.php on line 99

Strict Standards: Redefining already defined constructor for class WP_Object_Cache in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/cache.php on line 404

Deprecated: Assigning the return value of new by reference is deprecated in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/query.php on line 21

Deprecated: Assigning the return value of new by reference is deprecated in /homepages/15/d244775686/htdocs/TDBASHome/BlogHome/BFBlog/wp-includes/theme.php on line 576
Disaster Recovery
Dec 20

Bad Things CAN Happen

I was conversing with a colleague of mine who was working with some Oracle DBAs who were deciding to abandon Oracle’s Recovery Manager and replace it with a 3rd party disk-imaging ‘backup’ solution. Not augment RMAN, but replace it entirely.

I was really surprised. Really, REALLY surprised!

After mulling over all the concerns, I put together some items you may want to consider before heading down this path:

  • Are you operating in ARCHIVELOG mode? If you are not, YOU WILL LOSE DATA.
  • If you are in ARCHIVELOG mode – What happens to the old archivelogs? Deleting the old ones before the next RMAN level zero renders the ones you have useless (except for logmining).
  • If you are in NOARCHIVELOG mode, how far back can you troubleshoot unauthorized data modification or application error? How quickly do your redo logs switch? – Multiply that by the number of groups you have, and you have your answer.
  • How do you address block corruption (logical AND physical) without RMAN? With a RMAN-based DR solution, block recovery takes ONE command. No data loss, no downtime. If you take a snapshot using 3rd party tools – Your backups now have that same block corruption. Where do you go from there?
  • If disk space is an issue, do you use the AS COMPRESSED BACKUPSET argument to reduce backup size? Do you pack the archivelogs into daily level ones? I’ve found ways to optimize our Oracle RMAN backups so we can cover 2 weeks with the same disk space that used to cover 2 days.
  • How do you monitor for block corruption? (Waiting for something to break is not valid instrumentation) I check for block corruption automatically, every day, by using RMAN and building it into my daily database backup scripts.

NOTE: Logical corruption happens. Even on a SAN, even on a VM. VMs can crash, power can be lost. I’ve experienced 2 incidents with block corruption in the recent quarter. Of course, since I built the Disaster Recovery system around RMAN – We caught the corruption the next day and fixed it with ZERO downtime and ZERO data loss.

Point-in-Time-Recovery (PITR) is enabled by RMAN - ALL disk imaging backup solutions lack this capability. If you are relying solely on a snapshot backup, you will lose all the data since the last snapshot.

Without tablespace PITR, you have to roll ALL the data in the database back. If you have multiple instances and are using a server snapshot with no RMAN, ALL the databases on that server will lose data! This is usually not acceptable.

Lastly, How much testing have you done with the snapshot solution? REAL TESTING. Have you taken a snapshot during continuous data change? We tried snap-shotting the database server using 3 different pieces of software. NONE took a consistently consistent and usable snapshot of the database. Sometimes it did. If we were lucky, and the DB was quiet. Is it acceptable to sometime get your client’s/company’s data restored?

Remember, the key is a multi-layered DR strategy (where disk imaging and snap-shotting IN CONJUNCTION with RMAN is incredibly effective!) and continuous REAL WORLD testing.

As a parting shot, in case you were wondering, The ‘DBAs’ had decided to rely soley on a disk imaging backup solution, not because they felt it had more to offer, or because it was tested to be more effective. But because they felt RMAN was difficult to use…

Brian Fedorko

Nov 15

GUIs are for look’n, the Command Line is for Doin’” – That is some of the best mentoring advice I have received or could give as a data storage professional, and it is true to this day!

GUIs (Graphical User Interfaces) have really made enterprise-class databases much more accessible, and have made viewing data and corralling vital stats wonderfully pleasant and simple. MySQL Enterprise Monitor and Oracle Enterprise Manager include some excellent, time-saving ‘advisers’ that simply tuning tasks as well. They have come a long way, and their utility is undeniable.

But, as a data storage professional, we are expected to be able to restore and return the system to and operational capacity when things go badly. Usually, this is where we need the skills to ‘pop open the hood’.

Just as a good drummer should be able to do whatever can be done with their hands with their feet, when they are behind their kit - A good DBA will be able to perform any action in the GUI, at the command-line as well. This is critically important because:

  • The GUI contains a subset of the CLI capabilities, utilities, and tools
  • The GUI is a separate piece of software, often with additional dependencies, that can break, while leaving the database up and available.

Remember, of all the duties a DBA is asked to perform, there is one that we must do correctly and effectively EVERY time - Data Recovery. Data loss is absolutely unacceptable. So, you must honestly ask yourself - If the database goes down, the GUI is unusable, and the data must be recovered, can I do it at the command line? If not, it should be your focus to develop that skill set immediately - Not being able to recover your company’s or client’s data because you couldn’t ‘point n’ click‘ your way through the process, your company can lose a fortune – And it will, most likely, cost you your job!

Oracle Enterprise Manager is a great example. It is extremely useful, but in my experience, extremely delicate. It cannot withstand being cloned or moved to a different server, and it can break with any ungraceful handling of its repository, inside the database. Chances are, if the database is in dire straits, EM will not be there.

Will you be ready?

Brian Fedorko

Jun 06

So you had a bad day...I’ve always advised my clients: If you choose to outsource your Disaster Recovery (DR), or any other integral, data-drenched portion of your IT domain, you should deposit all the savings directly into a high yield fund. This will be crucial when you have to deal with the litigation, remediation, and PR nightmare that accompanies the customer’s lost and compromised personal data.

Some 3rd party DR providers and Software as a Service (SaaS) vendors tout big savings…

But do you know who is working for them?
What audit records are kept about your data?
Who really has access to them?
Who are they accredited to?
How often is the site security reviewed?
How quickly can they detect an intrusion?
Can they detect data theft by an insider?

The list goes on. All of these WILL effect the total cost of utilizing this type of solution.

But why try to minimize the drain on funds a DR site represents when you can move it in-house and turn it into a revenue generator? I came across this story which details a savvy company that did just that and stands to save $750,000 PER YEAR.

Here are some ideas to turn your liability into a ROI generator:

  • DR Sites do not have to be across the country! 100-150 miles will put you on a different local power grid, and save your company thousands in travel and per diem alone.
  • You can utilize your in-house personnel and corporate knowledge to make informed decisions on maintenance!
  • You can use resources made obsolete during capital replacement, slashing stand-up costs
    By virtualizing, you can host many Virtual Machines with less hardware.
  • The brightest CIOs will utilize their disaster recovery site for: real-time replication of data AND applications, testing and development, and production load balancing.

All this, and your company retains sole strategic contol over the operation, run by employess who have a stake in your success. And that is priceless.

More to come…

Brian Fedorko

May 31

Like responsibility, it grows!The goldfish always grows to the size of the bowl. If you’re a DBA goldfish, you’ll probably script out repetitive tasks until the bowl gets bigger. And then they feed you more databases from various business areas, and you grow some more. How is that for a strained analogy?

Any Oracle DBA has been there - After your initial herd of databases are stable, happy, and well-fed, people notice. And then you reap the true reward of good work: More work! Unfortunately, this is usually when someone fishes a stove-piped database that has become very important internally. You know, the one put together by someone who left 2 years ago. No Critical Product Updates, one or two control files, and the telling 5Mb redo logs that switch every 10 seconds. But you gladly take it in anyway…

A bit of work and now the database is chugging along like a champ! Tuned, Optimized, Mirrored, multiplexed, in ARCHIVELOG mode, and integrated into your RMAN backup scripting.

Everything seems fine, but is it?

Surely you could easily and successfully recover if you had to this very minute, right?

Maybe.

Is logging of all operations enforced on this database, or at least in the user’s tablespace? Use the following to find out:

select FORCE_LOGGING from V$DATABASE;
select TABLESPACE_NAME, FORCE_LOGGING from DBA_TABLESPACES;

If forced logging is not or can not be applied to the database, there is a risk that NOLOGGING operations may have been performed on the databases objects. Common operations that are run under NOLOGGING are index builds, index rebuilds, direct load inserts, direct loads with SQL Loader, and partition manipulation. Once a NOLOGGING operation has been performed, we cannot roll forward, past that change in that tablespace! If it is a tablespace only containing indexes, we’ll suffer downtime while the indexes rebuild and bring the database back to a reasonable level of performance. If the database contains objects containing data, the risk grows for losing the transactions since the NOLOGGING operation.

A good first line of defense is to include REPORT UNRECOVERABLE into your RMAN backup scripts, and stay on top of the logs - Or test for the expected return and pipe the results to your dashboard or monitoring software like Big Brother by Quest. This will catch all manners of problems before they become critical:

RMAN> report unrecoverable;
Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
4    full or incremental     X:\ORADATA\DATA01\TESTDB\TEST01.DBF

Here’s a quick script I wrote to find when the last NOLOGGING operation occurred (Note: Output has been edited for page fit):

set LINESIZE 120
set PAGESIZE 40
DEFINE LINE1= 'LAST NON-LOGGED OPERATIONS'
DEFINE LINE2= 'Check the Change Numbers and times against your backups to determine'
DEFINE LINE3= 'if non-logged operations have occurred'
TTITLE Skip 3 CENTER LINE1 SKIP 2 LINE2 SKIP 1 LINE3 SKIP 2
BTITLE CENTER "BFBlog.TheDatabaseShop.com"
COLUMN DBF_NAME FORMAT A40 WORD_WRAPPED
COLUMN TS_NAME FORMAT A15 WORD_WRAPPED
select  d.NAME as DBF_NAME,
t.NAME as TS_NAME,
d.UNRECOVERABLE_CHANGE# as NOLOG_CHNG#,
to_char(d.UNRECOVERABLE_TIME, 'Dy DD-Mon-YYYY HH24:MI:SS') as NOLOG_TIME
from V$DATAFILE d join V$TABLESPACE t
on d.TS# = t.TS#
order by t.NAME;

Output:

LAST NON-LOGGEDOPERATIONS

Check the Change Numbers and times against your backups to determine
if non-logged operations have occurred

DBF_NAME             TS_NAME   NOLOG_CHNG# NOLOG_TIME
-------------------- --------- ----------- ------------------------
J:\...\SYSTEM01.DBF  SYSTEM    0
J:\...\UNDOTBS01.DBF UNDOTBS1  0
J:\...\SYSAUX01.DBF  SYSAUX    0
J:\...\TEST01.DBF    TEST      6271597     Tue 02-Jun-2008 18:30:46
J:\...\USERS01.DBF   USERS     0

After that, just make sure your last Level 0 backup is newer than the times listed, and be aware that Point In Time Recovery will be limited to before the NOLOGGING operations occurred and when the last Level 0 backup was taken.

Be sure to set up lines of communication and coordination in the future, so the risk of not being able to recover the entire database to the last transaction is reduced.

Brian Fedorko