cancel lines with the (almost) the same timestamp

Hello,

I download big raw data containing that is supposed to show minute per
minute quotation of an indice (SP500). The problem is that some quotations
have been included twice in the same minute (at some seconds of distance),
for example here

2013-10-11 13:47:51;1701.0601;1701.21;1701;1701;696500
2013-10-11 13:47:57;1701.02;1701.21;1700.99;1700.99;644100
2013-10-11 13:48:51;1700.97;1700.97;1700.7;1700.7;496500
2013-10-11 13:49:51;1700.67;1700.67;1700.53;1700.53;369700
2013-10-11 13:49:57;1700.6801;1700.6801;1700.52;1700.52;374800
2013-10-11 13:50:57;1700.51;1700.51;1700.35;1700.37;441100

I tried to clean that manually but it appears to be a huge work. I'd like to
have a smarter way to ask calc to keep only one data of each minute time,
like this :

2013-10-11 13:47:51;1701.0601;1701.21;1701;1701;696500
2013-10-11 13:48:51;1700.97;1700.97;1700.7;1700.7;496500
2013-10-11 13:49:51;1700.67;1700.67;1700.53;1700.53;369700
2013-10-11 13:50:51;1700.47;1700.47;1700.35;1700.35;444900

The hard part is that not all minutes have twins (erasing one out of two
would'nt do) : here only 13:49 was twined.

How could that be solved ?

Let's imagine your values are in columns A to F.

o I'm guessing that, if you need only one value per minute, you don't need to know the seconds value of the timestamp. In a spare column, enter
=ROUNDDOWN(A5*24*60)/24/60
and fill or copy that down the column. This will round your timestamps to complete minutes. Now copy this column and paste it back over column A (replacing your original values), but using Paste Special... instead of ordinary Paste and ensuring "Paste all" and Formulas are *not* ticked in the Paste Special dialogue.

o At a convenient place, perhaps elsewhere on the sheet or on another sheet, create a list of times containing only one value per minute. (I'm going to assume that these values are also in column A starting at row n.) To do this, enter the first two values manually (here 2013-10-11 13:47:00 and 2013-10-11 13:48:00), select both, and then fill them down the column as necessary. Now all you need to do is to harvest the appropriate values for subsequent columns.

o In Bn, enter
=VLOOKUP($An;$A$1:$F$999;COLUMN();0)
(with the real row number in place of "n" and the real final row number in place of "999", of course). First fill this formula down column B. Then select all the relevant cells in column B (containing this formula, as modified) and fill these across columns C to F.

How does it work? Then VLOOKUP() function searches the first column of the range of original data for the first occurrence of the appropriate timestamp. When it finds a match, it copies the value from the appropriate later column. Using COLUMN() in place of an explicit column number allows the formula to adjust automatically for later columns. (If your cleaned data are not in columns A to F, you will need to modify this parameter to COLUMN()-1 or whatever.) The final parameter needs to be zero (or FALSE) to indicate that if there is no value for that minute, you will (presumably) want to see an error condition instead of a value unhelpfully harvested from a nearby minute.

I trust this helps.

Brian Barker

I download big raw data containing that is supposed to show minute
per minute quotation of an index (SP500). The problem is that some
quotations have been included twice in the same minute (at some
seconds of distance)

I tried to clean that manually but it appears to be a huge work. I'd
like to have a smarter way to ask calc to keep only one data of each
minute time

How could that be solved ?

Let's imagine your values are in columns A to F.

o I'm guessing that, if you need only one value per minute, you don't
need to know the seconds value of the timestamp. In a spare column,
enter
=ROUNDDOWN(A5*24*60)/24/60

Would =FLOOR(A5,1/1440) accomplish the same thing and if so wouldn't it be more efficient?

This will round your
timestamps to complete minutes. Now copy this column and paste it
back over column A

Why? I would think that both columns could be retained thus eliminating a manual operation.

I would also think, Brian, that a pivot table involving this modified time and averaging the market quotes would replace your following steps but I may know too little about pivot tables.

When collecting data I dislike copying and pasting (special or not). I especially dislike wiping out formulas. I am guessing that the original poster is somehow gathering the data automatically ("automagically") and it is being viewed in semi real time. How this is being accomplished I don't know.