DATASTAGESURESH: QUALITY STAGE IN DATASTAGE

Quality Stage in Data Stage

Data Quality Challenges

Different or inconsistent standards in structure, format or values
Missing data, default values
Spelling errors, data in wrong fields
Buried information
Data myopia
Data anomalies

Different or Inconsistent Standards

Missing Data & Default Values

Buried Information

The Anomalies Nightmare

Quality Stage

Quality Stage is a tool intended to deliver high quality data required for success in a

range of enterprise initiatives including business intelligence, legacy consolidation and

master data management. It does this primarily by identifying components of data that

may be in columns or free format, standardizing the values and formats of those data,

using the standardized results and other generated values to determine likely duplicate

records, and building a “best of breed” record out of these sets of potential duplicates.

Through its intuitive user interface Quality Stage substantially reduces time and cost to

implement Customer Relationship Management (CRM), data warehouse/business

intelligence (BI), data governance, and other strategic IT initiatives and maximizes their

return on investment by ensuring their data quality.

With Quality Stage it is possible, for example, to construct consolidated customer and

household views, enabling more effective cross-selling, up-selling, and customer

retention, and to help to improve customer support and service, for example by

identifying a company's most profitable customers. The cleansed data provided by Quality Stage allows creation of business intelligence on individuals and organizations for research, fraud detection, and planning.

Out of the box Quality Stage provides for cleansing of name and address data and some

related types of data such as email addresses, tax IDs and so on. However, Quality Stage

is fully customizable to be able to cleanse any kind of classifiable data, such as

infrastructure, inventory, health data, and so on.

Quality Stage Heritage

The product now called Quality Stage has its origins in a product called INTEGRITY from a

company called Vality. Vality was acquired by Ascential Software in 2003 and the

product renamed to Quality Stage. This first version of Quality Stage reflected its

heritage (for example it only had batch mode operation) and, indeed, its mainframe

antecedents (for example file name components limited to eight characters).

Ascential did not do much with the inner workings of Quality Stage which was, after all,

already a mature product. Ascential’s emphasis was to provide two new modes of

operation for Quality Stage. One was a “plug-in” for Data Stage that allowed data

cleansing/standardization to be performed (by Quality Stage jobs) as part of an ETL data

flow. The other was to provide for Quality Stage to use the parallel execution technology

(Orchestrate) that Ascential had as a result of its acquisition of Torrent Systems in 2001.

IBM acquired Accentual Software at the end of 2005. Since then the main direction has

been to put together a suite of products that share metadata transparently and share a

common set of services for such things as security, metadata delivery, reporting, and so

on. In the particular case of Quality Stage, it now shares a common Designer client with

Data Stage: from version 8.0 onwards Quality Stage jobs run as, or as part of, Data Stage

jobs, at least in the parallel execution environment.

QualityStage Functionality

Four tasks are performed by QualityStage; they are investigation, standardization,

matching and survivorship. We need to look at each of these in turn. Under the covers

QualityStage incorporates a set of probabilistic matching algorithms that can find

potential duplicates in data despite variations in spelling, numeric or date values, use of

non-standard forms, and various other obstacles to performing the same tasks using

deterministic methods. For example, if you have what appears to be the same

employee record where the name is the same but date of hire differs by a day or two, a

deterministic algorithm would show two different employees whereas a probabilistic

algorithm would show the potential duplicate.

(Deterministic means “absolute” in this sense; either something is equal or it is not.

Probabilistic leaves room for some degree of uncertainty; a value is close enough to be

considered equal. Needless to say, the degree of uncertainty used within QualityStage

is configurable by the designer.)

Investigation

By investigation we mean inspection of the data to reveal certain types of information

about those data. There is some overlap between Quality Stage investigation and the

kinds of profiling results that are available using Information Analyzer, but not so much

overlap as to suggest that removal of functionality from either tool. Quality Stage can

undertake three different kinds of investigation.

Features

Data investigation is done using the investigate stage
This stage analyzes each record field by field for its content and structure.
Free form fields are broken up into individuals and then analyzed.
Provide frequency distributions of distinct values and patterns
Each investigation phase produces pattern reports, word frequency reports and word classification reports. The reports are located in the same data directory of the server.

Investigate methods

Character Investigation

Single-domain fields

Entity Identifiers:

Eg: ZIP codes, SSN, Canadian postal codes

Entity Clarifiers:

Eg: name prefix, gender, and marital status.

Multiple-domain fields

large free-form fields such as multiple Address fields.

Character discrete investigation: looks at the characters in a single field (domain) to

report what values or patterns exist in that field. For example a field might be expected

to contain only codes A through E. A character discrete investigation looking at the

values in that field will report the number of occurrences of every value in the field (and

therefore any out of range values, empty or null, etc.) “Pattern” in this context means

whether each character is alphabetic, numeric, blank or something else. This is useful in

planning cleansing rules; for example a telephone number may be represented with or

without delimiters and with or without parentheses surrounding the area code, all in

the one field. To come up with a standard format, you need to be aware of what

formats actually exist in the data. The result of a character discrete investigation (which

can also examine just part of a field, for example the first three characters) is a

frequency distribution of values or patterns – the developer determines which.

Character concatenate investigation is exactly the same as character discrete

investigation except that the contents of more than one field can be examined as if they

were in a single field – the fields are, in some sense, concatenated prior to the

investigation taking place. The results of a character concatenate investigation can be

useful in revealing whether particular sets of patterns or values occur together.

Word investigation :is probably the most important of the three for the entire

QuialityStage suite, performing a free-format analysis of the data records. It performs

two different kinds of task; one is to report which words/tokens are already known, in

terms of the currently selected “rule set”, the other is to report how those words are to

be classified, again in terms of the currently selected “rule set”. There is no overlap to

Information Analyzer (data profiling tool) from word investigation.

Rule Set :

A rule set includes a set of tables that list the “known” words or tokens. For example,

the GBNAME rule set contains a list of names that are known to be first names in Great

Britain, such as Margaret, Charles, John, Elizabeth, and so on. Another table in the

GBNAME rule set contains a list of name prefixes, such as Mr, Ms, Mrs and so on, that

can not only be recognized as name prefixes (titles, if you prefer) but can in some cases

reveal additional information, such as gender.

When a word investigation reports about classification, it does so by producing a

pattern. This shows how each known word in the data record is classified, and the order

in which each occurs. For example, under the USNAME rule set the name WILLIAM F.

GAINES III would report the pattern FI?G – the F indicates that “William” is a known first

name, the I indicates the “F” is an initial, the ? indicates that “Gaines” is not a known

word in context, and the G indicates that “III” is a “generation” – as would be “Senior”,

“IV” and “fils”. Punctuation may be included or ignored.

Rule sets also come into play when performing standardization (discussed below).

Classification tables contain not only the words/tokens that are known and classified,

but also contain the standard form of each (for example “William” might be recorded as

the standard form for “Bill”) and may contain an uncertainty threshold (for example

“Felliciity” might still be recognizable as “Felicity” even though it is misspelled in the

original data record). Probabilistic matching is one of the significant strengths of

QualityStage.

Investigation might also be performed to review the results of standardization,

particularly to see whether there are any unhandled patterns or text that could be

better handled if the rule set itself were tweaked, either with improved classification

tables or through a mechanism called rule set overrides.

Standardization

Standardization, as the name suggests, is the process of generating standard forms of

data that might more reliably be matched. For example, by generating the standard

form “William” from “Bill”, then there is an increased likelihood of finding the match

between “William Gates” and “Bill Gates”. Other standard forms that can be generated

include phonetic equivalents (using NYSIIS and/or Soundex), and something like

“initials” – maybe the first two characters from each of five fields.

Each standardization specifies a particular rule set. As well as word/token classification

tables, a rule set includes specification of the format of an output record structure, into

which original and standardized forms of the data, generated fields (such as gender) and

reporting fields (for example whether a user override was used and, if so, what kind of

override) may be written.

It may be that standardization is the desired end result of using Quality Stage. For

example street address components such as “Street” or “Avenue” or “Road” are often

represented differently in data, perhaps differently abbreviated in different records.

Standardization can convert all the non-standard forms into whatever standard format

the organization has decided that it will use.

This kind of Quality Stage job can be set up as a web service. For example, a data entry

application might send in an address to be standardized. The web service would return

the standardized address to the caller.

More commonly standardization is a preliminary step towards performing matching.

More accurate matching can be performed if standard forms of words/tokens are

compared than if the original forms of these data are compared.

Standardization Process Flow

Delivered Rule Sets Methodology in Standardization

Example: Country Identifier Rule Set

Example: Domain Pre-processor Rule Set

Example: Domain Specific Rule Set

Logic for NAME Rule Set

Set variables for process option delimiters
Process the most common patterns first
Simplify the patterns
Check for common patterns again
Check for multiple names
Process organization names
Process individual names
Default processing (based on process options)
Post process subroutine to populate matching fields

Logic of ADDR Rule Sets

Process the most common patterns first
Simplify the patterns
Check for common patterns again
Call subroutines for each secondary address element
Check for street address patterns
Post process subroutine to populate matching fields

Logic of AREA Rule Sets

Process input from right to left
Call subroutines for each sub-domain (i.e. country name, post code, province, city)
Post process subroutine to populate matching fields

Rule Sets

Rule Sets are standardization processes used by the Standardize Stage and have three required components:

1. Classification Table – Contains the key words that provide special context, their standard value, and their user-defined class

2. Dictionary File – Defines the output columns that will be created by the standardization process

3. Pattern-Action File – Drives the logic of the standardization process and decides how to populate the output columns

Optional rule set components:

4 User Overrides

Reference Tables

Standardization Example

Parsing (the Standardization Adventure Begins…)

The standardization process begins by parsing the input data into individual data elements called tokens
Parsing parameters are provided by the pattern-action file
Parsing parameters are two lists of individual characters:

4 SEPLIST - Any character in this list will be used to separate tokens

4 STRIPLIST - Any character in this list will be removed

The SEPLIST is always applied first
Any character that is in the SEPLIST and not in the STRIPLIST, will be used to separate tokens and will also become a token itself
The space character should be included in both lists
Any character that is in both lists will be used to separate tokens but will not become a token itself

4 The best example of this is the space character - one or more spaces are stripped but the space indicates where one token ends and another begins

Parsing (Chinese, Japanese, Korean)

The parser behaves differently if the locale setting is Chinese, Japanese, or Korean
Spaces are not used to divide tokens so each character, including a space, is considered a token
Spaces are classified by underscores (_) in the pattern
The Classification file allows multiple characters to be classified together
Latin characters are transformed to double byte representations

Classification

Parsing separated the input data into individual tokens
Each token is basically either an alphabetic word, a number, a special character, or some mixture
Classification assigns a one character tag (called a class) to each and every individual parsed token to provide context
First, key words that can provide special context are classified

4 Provided by the standardization rule set classification table

4 Since these classes are context specific, they vary across rule sets

Next, default classes are assigned to the remaining tokens

4 These default classes are always the same regardless of the rule set used

Lexical patterns are assembled from the classification results

4 Concatenated string of the classes assigned to the parsed tokens

Classification – order

§ First, key words that can provide special context are classified

4 Provided by the standardization rule set classification table

4 Since these classes are context specific, they vary across rule sets

§ Next, default classes are assigned to the remaining tokens

4 These default classes are always the same regardless of the rule set used

§ Lexical patterns are assembled from the classification results

4 Concatenated string of the classes assigned to the parsed tokens

Classification Example

Default Classes

Class	Description
^	A single numeric token
+	A single unclassified alpha token
?	One or more consecutive unclassified alpha tokens
>	Leading numeric mixed token (i.e. 2B, 88WR)
<	Trailing numeric mixed token (i.e. B2, WR88)
@	Complex mixed token (i.e. NOT2B, C3PO, R2D2)

Default Classes (Special Characters)

Some special characters are “reserved” for use as default classes that describe token values that are not actual special character values

4 For example: ^ + ? > < @ (as described on the previous slide)

However, if a special character is included in the SEPLIST but omitted from the STRIPLIST, then the default class for that special character becomes the special character itself and in this case, the default class does describe an actual special character value

4 For example: Periods (.), Commas (,), Hyphens (-)

4 It is important to note this can also happen to the “reserved” default classes (for example: ^ = ^ if ^ is in the SEPLIST but omitted from the STRIPLIST)

Also, if a special character is omitted from both the SEPLIST and STRIPLIST (and it is surrounded by spaces in the input data), then the “special” default class of ~ (tilde) is assigned

4 If not surrounded by spaces, then the appropriate mixed token default class would be assigned (for example: P.O. = @ if . is omitted from both lists)

Default Class (NULL Class)

Has nothing to do with NULL values
The NULL class is a special class

4 Represented by a numeric zero (0)

4 Only time that a number is used as a class

Tokens classified as NULL are unconditionally removed
Essentially, the NULL class does to complete tokens what the STRIPLIST does to individual characters
Therefore, you will never see the NULL class represented in the assembled lexical patterns

Classification Table

Classification Tables contain three required space delimited columns:

Key word that can provide special context
Standard value for the key word

4 Standard value can be either an abbreviation or an expansion

4 The pattern-action file will determine if the standard value is used

Data class (one character tag) assigned to each key word

Classification Table Example

Tokens in the Classification Table

§ A common misconception by new users is assuming that every input alpha token should be classified by the classification table

4 Unclassified != Unhandled (i.e. unclassified tokens can still be processed)

§ Classification table is intended for key words that provide special context, which means context essential to the proper processing of the data

§ General requirements for tokens in the classification table:

4 Tokens with standard values that need to be applied (within proper context)

§ Tokens that require standard values, especially standard abbreviations, will often map directly into their own dictionary columns

§ Does not mean that every dictionary column requires a user defined class

4 Tokens with both a high individual frequency and a low set cardinality

§ Low set cardinality means that the token belongs to a group of related tokens that have a relatively small number of possible values and therefore the complete token group can be easily maintained in the classification table

§ If high set cardinality, adjacent tokens can often provide necessary context.

What is Dictionary File ?

Defines the output columns created by the standardization rule set
When data is moved to these output columns, it is called “bucketing”
The order that the columns are listed in the dictionary file defines the order the columns appear in the standardization rule set output
Dictionary file entries are used to automatically generate the column metadata available for mapping on the Standardize Stage output link

Dictionary File Example

Dictionary File Fields (Output Columns)

Standardization can prepare data for all of its uses and therefore most dictionary files contain three types of output columns:

Business Intelligence

4 Usually comprised of the parsed and standardized input tokens

Matching

4 Columns specifically intended to facilitate more effective matching

4 Commonly includes phonetic coding fields (NYSIIS and SOUNDEX)

Reporting

4 Columns specifically intended to assist with the evaluation of the standardization results

Standard Reporting Fields in the Dictionary File

Unhandled Pattern – the lexical pattern representing the unhandled data
Unhandled Data – the tokens left unhandled (i.e. unprocessed) by the rule set
Input Pattern – the lexical pattern representing the parsed and classified input tokens
Exception Data – place holder column for storing invalid input data (alternative to deletion)
User Override Flag – indicates whether or not a user override was applied (default = NO)

What is Pattern-Action File ?

Drives the logic of the standardization process
Configures the parsing parameters (SEPLIST/STRIPLIST)
Configures the phonetic coding (NYSIIS and SOUNDEX)
Populates the standardization output structures
Written in Pattern-Action Language, which consists of a series of patterns and associated actions structured into logical processing units called Pattern-Action Sets
Each Pattern-Action Set consists of:

4 One line containing a pattern, which is tested against the current data

4 One or more lines of actions, which are executed if the pattern tested true

Pattern-Action Set Example

Pattern-Action File Structure

Standardization vs. Validation

In QualityStage, standardization and validation describe different, although related, types of processing
Validation extends the functionality of standardization
For example: 50 Washington Street, Westboro, Mass. 01581

4 Standardization can parse, identify, and re-structure the data as follows:

§ House Number = 50

§ Street Name = WASHINGTON

§ Street Suffix Type = ST

§ City Name = WESTBORO

§ State Abbreviation = MA

§ Zip Code = 01581

4 Validation can verify that the data describes an actual address and can also:

§ Correct City Name = WESTBOROUGH

§ Append Zip + 4 Code = 1013

4 Validation provides this functionality by matching against a database

How to Deal with Un Handled data ?

There are two reporting fields in all delivered rule sets:

4 Unhandled Data

4 Unhandled Pattern

To identify and review unhandled data:

4 Investigate stage on the Unhandled Data and Unhandled Pattern columns

4 SQA stage on the output of the Standardize stage

Unhandled data may represent the entire input or a subset of the input
If there is no unhandled data, it does not necessarily mean the data is processed correctly
Some unhandled data does not need to be processed, if it doesn’t belong to that domain
Processing of a rule set may be modified through overrides or pattern action language

User Overrides

Most standardization rule sets are enabled with user overrides
User overrides provide the user with the ability to make modifications without directly editing the classification table or the pattern-action file
User Overrides are:

4 Entered via simple GUI screens

4 Stored in specific object within the rule set

4 Classification overrides can be used to add classifications for tokens not in the classification table or to replace existing classifications already in the classification table

4 The following pattern/text override objects are called based on logic in the pattern-action file

§ input pattern

§ input text

§ unhandled pattern

§ unhandled text

Domain Specific Override Example

Classification Override

Input Text Override

Input Pattern Override

User Modification Subroutines

There are two subroutines in each delivered rule set that are specifically for users to add pattern action language
User modifications within the pattern action file:

4 Input Modifications

§ This subroutine is called after the Input User Overrides are applied but before any of the rule set pattern actions are checked

4 Unhandled Modifications

§ This subroutine is called after all the pattern actions are checked and the Unhandled User Overrides are applied

Pattern Action Language

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r1/index.jsp?topic=/com.ibm.swg.im.iis.qs.patguide.doc/c_qspatact_container_topic.html

What is Matching ?

Matching is the real heart of Quality Stage. Different probabilistic algorithms are

available for different types of data. Using the frequencies developed during

investigation (or subsequently), the information content (or “rarity value”) of each value

in each field can be estimated. The less common a value, the more information it

contributes to the decision. A separate agreement weight or disagreement weight is

calculated for each field in each data record, incorporating both its information content

(likelihood that a match actually has been found) and its probability that a match has

been found purely at random. These weights are summed for each field in the record to

come up with an aggregate weight that can be used as the basis for reporting that a

particular pair or records probably are, or probably are not, duplicates of each other.

There is a third possibility, a “grey area” in the middle, which Quality Stage refers to as

the “clerical review” area – record pairs in this category need to be referred to a human

to make the decision because there is not enough certainty either way. Over time the

algorithms can be tuned with things like improved rule sets, weight overrides, different

settings of probability levels and so on so that fewer and fewer “clericals” are found.

Matching makes use of a concept called “blocking”, which is an unfortunately-chosen

term that means that potential sets of duplicates form blocks (or groups, or sets) which

can be treated as separate sets of potentially duplicated values. Each block of potential

duplicates is given a unique ID, which can be used by the next phase (survivorship) and

can also be used to set up a table of linkages between the blocks of potential duplicates

and the keys to the original data records that are in those blocks. This is often a

requirement when de-duplication is being performed, for example when combining

records from multiple sources, or generating a list of unique addresses from a customer

file, et cetera.

More than one pass through the data may be required to identify all the potential

duplicates. For example, one customer record may refer to a customer with a street

address but another record for the same customer may include the customer’s post

office box address. Searching for duplicate addresses would not find this customer; an

additional pass based on some other criteria would also be required. Quality Stage does

provide for multiple passes, either fully passing through the data for each pass, or only

examining the unmatched records on subsequent passes (which is usually faster).

Matching vs. Lookups, Joins, and Merges

Within Information Server, multiple stages offer capability that can be considered matching, for example:

4 Lookup

4 Join

4 Merge

4 Unduplicate Match

4 Reference Match

Lookups, Joins, and Merges typically use key attributes, exact match criteria, or matches to a range of values or simple formats
The Unduplicate Match Stage and Reference Match Stage offer probabilistic matching capability

There are two types of match stage

ü Unduplicate match :locates and groups all similar records within a single input data source. This process identifies potential duplicate records,which might then be removed

ü Reference Match identifies relationships among records in two data sources. An example of many-to-one matching is matching the ZIP codes in customer file with the list of valid ZIP codes. More than one record in the customer file can have the same ZIP code in it.

Blocking step

ü Blocking provides a method of limiting the number of pairs to examine. When you partition data sources into mutually-exclusive and exhaustive subsets and only search for matches within a subset, the process of matching becomes manageable.

ü Basic blocking concepts include:

ü Blocking partitions the sources into subsets that make computation feasible. Block size is the single most important factor in match performance. Blocks should be as small as possible without causing block overflows. Smaller blocks are more efficient than larger blocks during matching.

Reference Match Stage

ü The Reference Match stage identifies relationships among records. This match can group records that are being compared in different ways as follows:

ü One-to-many matching

ü Many-to-one matching

One-to-many matching

ü Identifies all records in one data source that correspond to a record for the same individual, event, household, or street address in a second data source.

ü Only one record in the reference source can match one record in the data source because the matching applies to individual events.Eg: finding the same individual based on comparing SSN in voter registration list and department of motor vehicles list.

Many-to-one matching

Multiple records in the data file can match a single record in the reference file.

Eg: matching a transaction data source to a master data source allows many transactions for one person in the master data source.

The Reference match stage delivers up to six outputs as follows:

– Match contains matched records for both inputs

– Clerical has records that fall in the clerical range for both inputs

– Data Duplicate contains duplicates in the data source

– Reference Duplicate contains duplicates in the reference source

– Data Residual contains records that are non-matches from the data input

– Reference Residual contains records that are non-matches from the reference input

Survivorship

As the name suggests survivorship is about what becomes of the data in these blocks of

potential duplicates. The idea is to get the “best of breed” data out of each block, based

on built-in or custom rules such as “most frequently occurring non-missing value”,

“longest string”, “most recently updated” and so on.

The data that fulfill the requirements of these rules can then be handled in a couple of

ways. One technique is to come up with a “master record” – a “single version of the

truth” – that will become the standard for the organization. Another possibility is that

the improved data could be populated back into the source systems whence they were

derived; for example if one source were missing date of birth this could be populated

because the date of birth was obtained from another source. Or more than one. If this

is not the requirement (perhaps for legal reasons), then a table containing the linkage

between the source records and the “master record” keys can be created, so that the

original, source systems have the ability also to refer to the “single source of truth” and

vice versa.

Address Verification and Certification

Quality Stage can do more (than simple matching). Address verification can be

performed; that is, whether or not the address is a valid format can be reported. Out of

the box address verification can be performed down to city level for most countries. For

an extra charge, an additional module for world-wide address verification (WAVES) can

be purchased, which will give address verification down to street level for most

countries.

For some countries, where the postal systems provide appropriate data (for example

SERP in the USA, CASS in Canada, DPID in Australia), address certification can be

performed: in this case, an address is given to Quality Stage and looked up against a

database to report whether or not that particular address actually exists. These

modules carry an additional price, but that includes IBM obtaining regular updates to

the data from the postal authorities and providing them to the Quality Stage licensee.

New Address Verification Module

Summary

IBM is planning to release its next version of Info Sphere Quality Stage Worldwide Address Verification module (v10)

4 Release time frame is Q4 2012

4 AVI v10 will have superior functionality and coverage over our current AVI v8.x module à see slide 4

4 AVI v10 will leverage new address/decoding reference data

4 AVI v10 will have broad support for various Information Server versions à see slide 5

For current AVI v8.x customers only:

4 AVI v8.x will have continues support until end of Dec. 2013

§ Address reference data for AVI v8.x has been discontinue by the vendor is ending in Dec. 2013

4 AVI v10 will include a Migration utility for automated migration from AVI v8.x to AVI v10

4 For comparison AVI v10 and AVI v8 can run side-by-side (for development)

Information Server / Operating System support matrix for AVI v10

Stage Icon and Location

Quality Stage Benefits

Quality Stage provides the most powerful, accurate matching available, based on

probabilistic matching technology, easy to set up and maintain, and providing the

highest match rates available in the market.

An easy-to-use graphical user interface (GUI) with an intuitive, point-and-click interface

for specifying automated data quality processes – data investigation, standardization,

matching, and survivorship – reduces the time needed to deploy data cleansing

applications.

Quality Stage offers a thorough data investigation and analysis process for any kind of

free formatted data. Through its tight integration with Data Stage and other Information

Server products it also offers fully integrated management of the metadata associated

with those data.

There exists rigorous scientific justification for the probabilistic algorithms used in

Quality Stage; results are easy to audit and validate.

Worldwide address standardization verification and enrichment capabilities – including

certification modules for the United States, Canada, and Australia – add to the value of

cleansed address data.

Domain-agnostic data cleansing capabilities including product data, phone numbers,

email addresses, birth dates, events, and other comment and descriptive fields, are all

handled. Common data quality anomalies, such as data in the wrong field or data

spilling over into the next field, can be identified and addressed.

Extensive reporting providing metrics yield business intelligence about the data and help

tune the application for quality assurance.

Service oriented architecture (SOA) enablement with Info Sphere Information Services

Director, allowing you to leverage data quality logic built using the IBM Info Sphere

Information Server and publish it as an "always on, available everywhere" service in a

SOA – in minutes.

The bottom line is that Quality Stage helps to ensure that systems deliver accurate,

complete, trusted information to business users both within and outside the enterprise.

DATASTAGESURESH

Saturday, 14 September 2013

QUALITY STAGE IN DATASTAGE

3 comments:

About Me