Calc - Matching Up Text Fields

Hi all,

I download two files from my banks that have transaction descriptions that
look like so (the one with a * to the left is from Bank #2 - the rest are
from Bank #1):
      PURCHASE AT MY GROCER #NNN, CHARLOTTE, NC ON 050913 FROM CARD#:
XXXXXXXXXXXXNNNN * MY GROCER #NNN CHARLOTTE NC
      PURCHASE AT BIG BOX NNNN, CHARLOTTE, NC ON 050413 FROM CARD#:
XXXXXXXXXXXX1113 MERCHANT PAYMENT BIG BOX ANNNN AT LOC NNNNNN BIG BOX
ANNNN PINEVILLE Charlotte NC

Notice the way two transactions at BIG BOX are referred to in two different
ways - one includes a date, while the other does not. Also, all series of
"N"s & "X"s above are numbers.

My database has payee names like so:
entity_id | entity_name
01 | My Grocer
02 | Big Box

I want to prep the bank data and create a fresh CSV file, so as to cull out
some of the unnecessary data and create a fresh CSV file for importing into
my back-end DB. Ideally, I can automagically match the entity name from
the description field to the entity name in my DB. My question is (or
maybe questions are):

If I have the ID in ColA and the Name in ColB (from my DB), I want to put
the description list (from the banks) in ColE and be able to match payee
names and have the ID copied to ColD. Even if I have to create a separate
column for each bank's description style, that's still gotta be better than
trying to manually match up transaction data from my bank to what I have in
my DB.

I thought about VLOOKUP, but that seems to not work at all when I try it
using text fields. I can make that work in another project where I have
barcodes. In that case, the barcodes make a solid primary key field to
work with. In this case, I have no such animal - and need to match the
text fields in order to get that PK. So perhaps VLOOKUP is not the best
way. Is there another function that will let me do this, and how does it
work?

Thanks!

I download two files from my banks that have transaction descriptions that look like so (the one with a * to the left is from Bank #2 - the rest are from Bank #1):
      PURCHASE AT MY GROCER #NNN, CHARLOTTE, NC ON 050913 FROM CARD#:
XXXXXXXXXXXXNNNN * MY GROCER #NNN CHARLOTTE NC
      PURCHASE AT BIG BOX NNNN, CHARLOTTE, NC ON 050413 FROM CARD#:
XXXXXXXXXXXX1113 MERCHANT PAYMENT BIG BOX ANNNN AT LOC NNNNNN BIG BOX
ANNNN PINEVILLE Charlotte NC

Notice the way two transactions at BIG BOX are referred to in two different ways - one includes a date, while the other does not. Also, all series of "N"s & "X"s above are numbers.

All this doesn't help much until we know exactly what you need to extract from all this.

My database has payee names like so:
entity_id | entity_name
01 | My Grocer
02 | Big Box

I want to prep the bank data and create a fresh CSV file, so as to cull out some of the unnecessary data and create a fresh CSV file for importing into my back-end DB. Ideally, I can automagically match the entity name from the description field to the entity name in my DB. My question is (or maybe questions are):

If I have the ID in ColA and the Name in ColB (from my DB), I want to put the description list (from the banks) in ColE and be able to match payee names and have the ID copied to ColD. Even if I have to create a separate column for each bank's description style, that's still gotta be better than trying to manually match up transaction data from my bank to what I have in my DB.

I thought about VLOOKUP, but that seems to not work at all when I try it using text fields.

So exactly what did you try - and what happened when you did?

What do you have in column E - the entire record from the bank, not just the "entity name" you want to search for? In that case, the first thing you probably have to do is to strip out just the entity name into a separate column: let's say column F. How you do that depends on the precise possibilities in the bank data. The next problem is that VLOOKUP() can retrieve values only from the right of matched cells, whereas your "entity ID" is currently to the left of the entity name. Since you haven't mentioned column C, I'm going to take that over for this purpose. In C1 enter =A1 and fill this down the list of entities: you now have a copy of the ID list to the right of the names - where they need to be.

Now in D1 enter
=VLOOKUP(F1;B$1:C$10;2;0)
and fill that down column D.

I trust this helps.

Brian Barker

Thanks Brian,

<SNIP>

All this doesn't help much until we know exactly what you need to extract
from all this.

I hope it became more clear further down that I really just need the entity
names.

<SNIP>

If I have the ID in ColA and the Name in ColB (from my DB), I want to put

the description list (from the banks) in ColE and be able to match payee
names and have the ID copied to ColD. Even if I have to create a separate
column for each bank's description style, that's still gotta be better than
trying to manually match up transaction data from my bank to what I have in
my DB.

I thought about VLOOKUP, but that seems to not work at all when I try it
using text fields.

What do you have in column E - the entire record from the bank, not just
the "entity name" you want to search for? In that case, the first thing
you probably have to do is to strip out just the entity name into a
separate column: let's say column F. How you do that depends on the
precise possibilities in the bank data. The next problem is that VLOOKUP()
can retrieve values only from the right of matched cells, whereas your
"entity ID" is currently to the left of the entity name. Since you haven't
mentioned column C, I'm going to take that over for this purpose. In C1
enter =A1 and fill this down the list of entities: you now have a copy of
the ID list to the right of the names - where they need to be.

Now in D1 enter
=VLOOKUP(F1;B$1:C$10;2;0)
and fill that down column D.

I trust this helps.

Yes, I actually set it up that way. Playing with it a bit more, I see I

do need to strip out the entity names from the description. Is it possible
to do that in Calc?

My function: =VLOOKUP(F2,$A$2:$B$187,2,0)
I see now that, if I delete the whole description, and keep just the
entity name, I get the results I want. Again, I just need to figure out
how to extract that bit from the bank's description. Any thoughts on that?

Regards,
Don

As I mentioned, that depends on the precise possibilities in the bank data. If the relevant data is always in the same columns of the relevant records, you can probably extract it using MID(); otherwise you may need to do a more complicated search through the text.

Brian Barker

Unfortunately, one bank throws everything except for the date and the
amount into a single "description" column. It would still be simpler if
they did not include the actual transaction date in that column, as it
means the description changes... well... daily. Grrrr....

On 2013/08/01 11:31 AM, Don Parris wrote:

Unfortunately, one bank throws everything except for the date and the
amount into a single "description" column. It would still be simpler if
they did not include the actual transaction date in that column, as it
means the description changes... well... daily. Grrrr....

On Thu, Aug 1, 2013 at 12:27 AM, Brian Barker <b.m.barker@btinternet.com>wrote:

At 21:35 31/07/2013 -0400, Don Parris wrote:

Again, I just need to figure out how to extract that bit from the bank's
description. Any thoughts on that?

As I mentioned, that depends on the precise possibilities in the bank
data. If the relevant data is always in the same columns of the relevant
records, you can probably extract it using MID(); otherwise you may need to
do a more complicated search through the text.

Brian Barker

--
On 2013/08/01 11:31 AM, Don Parris wrote:

Unfortunately, one bank throws everything except for the date and the
amount into a single "description" column. It would still be simpler if
they did not include the actual transaction date in that column, as it
means the description changes... well... daily. Grrrr....

On Thu, Aug 1, 2013 at 12:27 AM, Brian Barker <b.m.barker@btinternet.com>wrote:

yes, left() or mid() is quicker if the contents are fixed. In Don's worse case scenario, where the bank's records have descriptions that are random, unpredictable in length and sequence, and change daily, but at least have predictable strings (eg store name) somewhere in them, try

=IF(ISERR(FIND(LOWER(VLOOKUP(D$3,List1,2)),Data.$D4))=1,"",D$3)

The key elements in the formula are *find *(to tag the transaction) and *vlookup *(to return the entity), with if(), iserr() and lower() acting in support.
where
a) the formula is repeated for each string in a table alongside the transaction, with each column "trying it's luck" and the last column identifying the ID (by summing the row of possible ID's where non-responses =0)
b) iserr()=1 is used to identify if the transaction contains the key entity (if the field lacks the text, the cell blanks out, if it has the text, it returns the ID of the entity)
c) if() is used to tag the transaction with the entity's code
d) lower() is used because find() is case sensitive, and ties to the instruction to lower-casify the descriptions in sheet 2 (data) (equally use upper(), though it makes for wider description fields)
e) find() is used to see if the description contains the text or not (the actual position of the text is irrelevant)
f) vlookup() refers to List1, which is the table of entities and their code
g) d$3 is the entity ID at the head of the column, where multiple columns apply their own string to the description

_Working example__
_https://docs.google.com/file/d/0B6LXy9sguZVkcXRBbGUxQVVvT1k/edit?usp=sharing gives a working example for 1000 transactions and 31 strings
1. Enter the text string in sheet 1 (summary) - the example allocates the entity ID and allows for 31 entities, where the first 7 are colour coded
2. Drop the bank data in sheet 2 (data)*. Remember to **|format|change case => lower case the descriptions* - the example allows for 1000 transactions
3. See the processing in sheet 3 (analysis) - the 1000 transactions are auto-tagged for 31 texts, the 1st 7 are also colourised
4. Back to sheet 1 (summary), where the entitles are totalled for debits and credits in the month you choose and their frequency counted.

The summary page sub-totals the transactions using dsum
=IF($H4=0,"",DSUM(Result,F$3,$C3:$C4))
There are some embellishments, such as |validation (to show or hide non-key entities and to limit the summary to a month or not)

The sheets are protected against changes but there is no password. To edit, just undo the protection.
I hope this helps

Hi Errol,

Many thanks for the example. I will play with this over the weekend to see
how it works. I will drop a note (probably next week) to let you know how
it goes. I've been blogging about my project here and there, so maybe this
will be another step I can blog about. My post about connecting LOBase to
PostgreSQL is still the top-ranked story on my site, so hopefully I can add
some more LO action. :slight_smile:

No - as you'll know, I wrote none of that. Please take care when replying not to misattribute contributions.

Thanks.

Brian Barker

I'm sorry for offending you Brian.

I looked at the many contributions you have made on this list (and others) over the years. I'd like to take this chance to publicly honour you for caring for and committing to a community made up mostly of strangers far away.
Thank you for your ongoing sacrifice.

Errol

Hi :slight_smile:
+1
Regards from
Tom :slight_smile:

No - as you'll know, I wrote none of that. Please take care when replying not to misattribute contributions.

I'm sorry for offending you Brian.

Oh, not at all: absolutely no offence taken. It just seemed sensible to set the archive record straight - including giving you credit for your contribution.

I looked at the many contributions you have made on this list (and others) over the years. I'd like to take this chance to publicly honour you for caring for and committing to a community made up mostly of strangers far away.
Thank you for your ongoing sacrifice.

Too much, too much. But how kind!

Brian Barker