Wednesday, January 17, 2024

Data Exchange - DRCRSplit for Delimited Files

Recently had to solve a problem during an implementation and as I've done before, I'm adding the solution here so I can find it again.

An incoming trial balance file is delimited and has two columns for the amount: a debit column and a credit column. Each record will have a value in one of these but not both. If the file was fixed format instead of delimited, there is a built-in import format expression called DRCRSplit that will deal with the two columns. With the function, you specify the midpoint of the overall character count but with delimited files you don't have a consistent midpoint character count, so it doesn't work with delimited files.

Like maybe some of you reading this, I did a Google search on this topic and found multiple Oracle forum posts, CloudCustomerConnect posts, etc. asking the same question over several years but didn't find an easy solution. There was a posted answer that combined the DRCRSplit function with the Column function, but as pointed out by someone else that solution causes data problems. So, below is what I did. There may be a more elegant way of solving this problem, but the solution worked for me.

I mapped the debit and credit balance columns to separate attribute fields and for the regular amount field, I mapped the account field to it. Each record needs a numeric amount; otherwise, it will get dropped on the import. I'm thinking the year could be used as the placeholder amount if it were available on each record. I'm guessing that one of the amount fields would probably work in conjunction with the no zero suppress expression.

[Editor note: if the right sidebar is covering part of the picture, click it to see the full detail]


Next, I used a SQL map on the entity to (a) pass the entity through and (b) to populate the amount. Below, the entity portion isn't shown. The AMOUNTX field (the target amount) is populated from the debit column if it is not equal to zero and from the credit column if the debit is equal to zero. Anything placed in an attribute field is stored as a string, so the TO_NUMBER function converts the string back to a number. If there was a need to deal with quotes, spaces, commas, etc. in the number as thousands separators or decimals, that could be addressed with either a format on TO_NUMBER or using one of the REGEXP functions.





I attached this SQL to the entity dimension mapping. If the entity dimension needs more mapping and the SQL gets in the way of using the other mapping types (explicits, etc.) then attach the SQL to another dimension, like Data Source. Any of them will work, so the SQL should be placed somewhere inconspicuous.