Monday, February 4, 2008

Informatica 7.1:Points to ponder with code pages

The problem:

The source data is 88591 we use ETL informatica it is having 11 charcter set of data since 88591 is 1 byte of data it gives samecode for different charcterset for example it is having x(code)=man(japan),women(korean) it raises ambiguity in database for japan and korean language , and now at present target is also 88591 the same "x" is sent to target but here there own languge is set in their system , if koren guy sees it he will undersatnd it is women and if japan guy sees he will understand as man but now we are going to make target database into utf-8 and informatica run in unicode mode here the source system is in 88591 as i have told u earlier it may generate same code for some charcterset now when we load it into target which is utf-8 here it generates unique code for different charcterset but we need to identify the end user reqirement and give him yhe exact data.

If end user is korean in earlier case it is x but now utf-8 generates unique code so we need to tell to informatca before loading in to target there it supports all charcaterset and give unique code for each charcter set.

My intention:-
We will be deciding at the time of running sessions or one time conversion yo flat file to utf-8 and then to target.

Although I know problem may seem hazy.Lets make it a lil bit clear before putting the solution.

A database named ABCD is defined to only support one character set(ISO-8859-1), data is getting populated here with data from multiple character sets like sjis,big5, GB2312 etc. We accept that the ordering of the data is according to ISO-8859/1

Slowly as time passes by ABCD will have text data in multiple different languages in multiple different character sets and later it becomes tough for identifying which language and character set the text belongs to. The UTF8 encoding of UNICODE, which keeps any current text in USASCII unchanged (the vast majority of our text data), but stores data from other character sets in 2, 3, or 4 byte units.

Now there is a requiremwnt to transfer data from ABCD toanother database named EFGH which is in UNICODE.So we need to be able to identify the character set of every text string.Lets assume we have identified that also.

Question is that how to perform that data transfer through INFA7.1

Thats can be done by INFA.Just keep following things in mind.

Check what is the type of your source database character set ( select * from nls_database_parameters ) 2/ Check what is the type of your target database character set( select * from nls_database_parameters ) 3/ Check what data movement has been set for Informatica Server which you are to assign in your workflow.
( Go to the config file you use to pass while starting informatica server in UNIX )

# Determine one of the two server data movement modes: UNICODE or ASCII.
# If not specified, ASCII data movement mode is assumed.
# ASCII:-PowerServer processes single byte character and does not perform codepage #conversion
#UNICODE:-Processes 2 bytes for a character.Enforce codepage validation

Set it Unicode,only then the end users will have full data else while there will be corrupt data.
If you are resetting,after resetting restart the Informatica Server service.
If you have set all those things right,then there is nothing to worry.Users should must see Data as per their locale.
You may face some LM_ error while loading data through INFA.In that case revert me back with error log portion like
MAPPING> CMN_1569 Server Mode: [UNICODE] CMN_1570 Server Codepage: [ISO
MAPPING> 8859-1 Western European]
If needed then disable codepage validation.

No comments: