Date will not format or sort when imported into calc (ods)

Am 09.09.2012 11:54, Tinkerer wrote:

Use the extension CT2N.
It Converts Text to Numbers.

Simples.

Tink.

Nobody needs to install any extension to convert between numbers and text. CT2N may fail in this situation (dmy vs. mdy).
If all the dates have been imported as text, a simple regex replacement converts in both directions.
It would be better to simply import correct values instead of fixing wrongly imported ones.

James knows how to import correct values but he expects automagic number formatting. The program should analyse the assumed formatting of the input strings and apply the corresponding number format codes to the respective target cells. Spreadsheets don't do that.
Formatting has zero relevance for the tasks spreadsheets are designed for. Only values (numbers) are relevant.

James Majeske wrote:

I receive dates in three different formats; big endian [31 Dec 2012], mixed
endian [Dec 31, 2012], and small endian [2012-12-31] (also known as
ISO8601).

With US loale setting and "detect special numbers" any of the above input strings yield cell value 41274 preformatted to show 41274 as "12/31/12" which is obviously and unambiguously the correct value. This value can be formatted at will and sorted by its numeric value.

In a spreadsheet I get all the information I require and it is easy to get
additional information if needed. I can add, modify, and delete headers both
horizontally and vertically. I can add, modify, or delete cells, rows, or
columns at will. As long as I leave the original data alone, I can add as
many calculated columns or rows as I wish. One of my spreadsheets is a
complete double entry bookkeeping system. I can use split or freeze to keep
the column and row titles visible while scrolling through the data of the
various accounts.
According to what I read last night, you must be very careful or you may
compromise essential data. The implication was that it would be easier and
safer to create a new database than to modify it. From the
https://wiki.documentfoundation.org/Documentation/Publications documentation
:
"Fields can be added or deleted, but adding a field requires you to enter
the data for that one field for every existing record with an entry for that
field."
"Deleting a field deletes all the data once contained in that field."
"Changing the field type of a field can lead to data being lost either
partially or completely."
"Deleting a table removes all of the data contained in every field of the
table."
I will continue reading, but so far it seems that a spreadsheet has all the
advantages and none of the disadvantages of a database.
My knowledge of databases is very limited. If there is any advantage in
using a database over a spreadsheet, please share it with me.

When I change the file extension (file_name.ct2n), I find no difference in
how the file is imported into a spreadsheet.

With the advice given earlier, I have found several methods that do what I
need to get done with no problems. My previous lack of knowledge was easily
corrected and I am confident in my ability to adapt the input to conform to
the requirements of its intended function.

Quote: [With US loale setting and "detect special numbers" any of the above
input strings yield cell value 41274 preformatted to show 41274 as
"12/31/12" which is obviously and unambiguously the correct value. This
value can be formatted at will and sorted by its numeric value.]

You are correct in that 12/31/12 is not ambiguous, but that is a special
case. When each element is unique and without knowing the source preference,
there are six different dates that may be generated from "10/11/12":
2010-11-12
2010-12-11
2011-10-12
2011-12-10
2012-10-11
2012-11-10

This is why, when not using ISO8601, I choose to use a four digit year AND a
month name. No matter the order of the elements, there can be no ambiguity.

In a spreadsheet I get all the information I require and it is easy to get
additional information if needed. I can add, modify, and delete headers both
horizontally and vertically. I can add, modify, or delete cells, rows, or
columns at will. As long as I leave the original data alone, I can add as
many calculated columns or rows as I wish. One of my spreadsheets is a
complete double entry bookkeeping system. I can use split or freeze to keep
the column and row titles visible while scrolling through the data of the
various accounts.
According to what I read last night, you must be very careful or you may
compromise essential data. The implication was that it would be easier and
safer to create a new database than to modify it. From the
https://wiki.documentfoundation.org/Documentation/Publications documentation
:
"Fields can be added or deleted, but adding a field requires you to enter
the data for that one field for every existing record with an entry for that
field."
"Deleting a field deletes all the data once contained in that field."
"Changing the field type of a field can lead to data being lost either
partially or completely."
"Deleting a table removes all of the data contained in every field of the
table."
I will continue reading, but so far it seems that a spreadsheet has all the
advantages and none of the disadvantages of a database.
My knowledge of databases is very limited. If there is any advantage in
using a database over a spreadsheet, please share it with me

      It seems to me that there ought to be advantages to using databases. After all, there are college courses of databases. Oracle has a database program (better known as a database management system that is very complex). The text book for it is 700+ pages. This is used in many of the large businesses. This textbook is known to give students headaches. So, is it possible that the things you see as advantages and disadvantages just might not be the whole story?
      The following is the link to the draft folder for Base. You can download the database use with Chapter 1 of the Base Guide. There is also a link to the database (Budget.odb) used for Chapters 2-4 of the Base Guide. Perhaps the latter could show the advantages of using a database instead of a spreadsheet.

http://www.odfauthors.org/libreoffice/english/base-guide/draft-lo3.4

      In the past, I have used a spreadsheet to keep track of my finances. Now I use a database which I prefer. Why? Size is one of the reasons. Each sheet get physically larger as data is entered, a database does not. Spreadsheet: data is entered into individual sheets based upon the specific data. Database: the entries are made in a single form. Spreadsheet: data must be linked from one sheet to another (I had links between data with a sheet). Database: This is done using table and field names.
      To me, one of many advantages of a database is the query. You can tell it to get data from specific fields, and it will display it in a table format. This does not take up physical space like it would in a spreadsheet. Another is what can be done with the "simple" sum function in a query. For example, my financial database has a field named, Amount. In this field I enter all of the expenses and income that I have. I create the query to find the sum for the Amount field. I also tell the query to separate this sum so that I get a subtotal for each of my Accounts (I have 9). So I now have 9 subtotals of the Amount field. I also tell the query to list the names in the Accounts field alphabetically. In the end, the query's output is 2 columns by 9 columns: first column contains the names of the accounts listed alphabetically, and the second column contains the balance for each account listed.
      This is the SQL statement for the query in Budget.odb (it only has 5 accounts):

SELECT "Account", SUM( "Amount" ) AS "Account Balance" FROM "Data" WHERE "Account" IS NOT NULL GROUP BY "Account" ORDER BY "Account"

      This is the output:

Account

Account Balance

Bank of America

-600

Bank of Tennessee

-541

Cash

20.48

Wamu MasterCard

649.84

Wells Fargo VISA

218.45

      I use the same structure for the balances of my accounts. The only difference is that in Budget.odb, the Account field has 5 entries (hence 5 rows of subtotals) whereas mine has 9 entries (hence 9 rows of subtotals).
      It just occurred to me another advantage: the Data Source window which you open in Writer or Calc using the F4 key. You can see the query output without opening the database.

--Dan

I am beginning to realize that the input files are consistent with themselves.
The problem is not being able to define an import format, so as to attain the same result every time.

I use several different data acquisition hardware resulting in 4 different formats of data. Only one file actually imports using all default setting correctly.

Forget about guessing what is correct and allow saving and loading of user "import formats" on the import filter screen.

The following is a reply to my query from a private source:

Data bases are for fixed data columns and standardized input methods and
require manipulation by sql type query tools. Unless you are getting
automatic feeds of significant amounts of data from your world wide
operations I don't see what it would do for you. If you are putting together
a business then a data base would be a good thing even if it is simple, but
those are available pre-canned and some are open source. Unless you are
doing something far more complex than I imagine spread sheets should serve
you well and spread sheets have reporting capabilities.

In reality, the reason just about every company went to data bases was that
it centralized and standardized the data it needed and made it available to
each different level of management in order that they could do their jobs
and analyze their results.

The complexity of data bases arises from how data is loaded and validated
and the number of users, which is generally every employee of a corporation.
Big companies have petabytes or more of data to manage and parallel and
redundant servers are required to ensure business continuity.

The other use for data bases is like firefox which has it's own mysql
database built in to manage stuff like cookies and cache and run under
control of the firefox program.

Regardless, a lot of small companies keep some data on incomplete data bases
that they got free or cheap and accumulate, compile, analyze and report
their data from spread sheets for which they either write macros or use the
spread sheet's native reporting tools.

To demonstrate the similarities check out this video tutorial.
<http://www.youtube.com/watch?v=yO2oQhinZ4Y>

What it implies of course is that if you have many spread sheets and they
are formatted in such a way as to correlate inter-relationships you could
import them all into your data base and manipulate that data into a report.
Fancy data base features include data entry windows for individual
contributors and complex reporting engines that can generate charts and
graphs as well as specific answers to pre-determined questions.

That seems a fair summary. Perhaps I included too much information and
obfuscated the problem.

Am 10.09.2012 01:49, JAMES MAJESKI wrote:

You are correct in that 12/31/12 is not ambiguous, but that is a special
case. When each element is unique and without knowing the source preference,
there are six different dates that may be generated from "10/11/12":
2010-11-12
2010-12-11
2011-10-12
2011-12-10
2012-10-11
2012-11-10

Your input data have 4 digit years. Before exporting csv from a spreadsheet you have to take care of unambiguous number formatting.
Again, there are much better applications to load, edit and save csv.
The Base component can be used as import configurator for pre-formatted spreadsheet templates, some spreadsheet geeks prefer macro programming.

Am 10.09.2012 18:56, JAMES MAJESKI wrote:

The following is a reply to my query from a private source:

Data bases are for fixed data columns and standardized input methods and
require manipulation by sql type query tools. Unless you are getting
automatic feeds of significant amounts of data from your world wide
operations I don't see what it would do for you. If you are putting together
a business then a data base would be a good thing even if it is simple, but
those are available pre-canned and some are open source. Unless you are
doing something far more complex than I imagine spread sheets should serve
you well and spread sheets have reporting capabilities.

You want a standardized input method to generate text files with ISO dates. Unlike its users, a spreadsheet does not care about formatting attributes. The only thing that really counts is the cell value.

Databases are designed to do what most of today's spreadsheet users try so desparately try to do in spreadsheets. Databases are ubiquitous online and offline.
Valid CSV files are exported from databases and they are destined to be imported back into other databases. Each row represents a record, each record has the same amount of fields, each field has a distinct data type.
A spreadsheet has no records nor fields and the only data types are text, number and boolean (in Calc even the booleans are numbers).
There are various ways to load csv into a spreadsheet so you can process the imported data by means of spreadsheet formulas. Simple statistics, projections, what-if-scenarios are the most typical spreadsheet applications based on database data. You pull it into a sheet and let the spreadsheet do what spreadsheets use to do (which is not text editing).

Base is our database component. It is very, very underdeveloped, nevertheless underestimated. Even the worst database tool does a better database job than the best spreadsheet can do.

James

CT2N is a Libre extension, not a file extension

http://extensions.libreoffice.org/extension-center?getCategories=&getCompatibility=any&sort_on=positive_ratings&path=%2FLibreOffice-Extensions-and-Templates%2Fextension-center&portal_type=PSCProject&SearchableText=CT2N

Load this into the Libre Extensions Manager in the Calc Tools menu
A new icon will appear in your tool bar.
Click on it to convert highlighted cells, or whole sheet.

Tink.

Am 10.09.2012 19:40, Andreas Säger wrote:

Base is our database component. It is very, very underdeveloped,
nevertheless underestimated. Even the worst database tool does a better
database job than the best spreadsheet can do.

Since nobody ever uses Base for plain text databases, here is another approach without Base and without macro programming:
1) Prepare a sheet template and format entire columns to your liking, according to the incoming fields.
2) Open the text file in a text editor.
3) Copy all.
4) Open a new spreadsheet from the prepared template, paste and fill out the text import dialog (US English, special numbers, delimiter).

Two digit years have always been a problem. I always presume that the use two
digit years was obsolete after the Y2K publicity, but bad habits continue.
We are no longer in the era of eighty column punch cards, so there is no
excuse for two digit years.

ISO8601 is the international standard, so it is not ambiguous. In other
formats, using four digit years and month names are not ambiguous no matter
the element order. Any other formats require a time consuming examination
for clues as to element order or an explanation from the source.

My input data may be in any of the formats. Once I determine the order of
the elements and resolved two digit years, I can easily convert to ISO8601.

My research has convinced me that I do not have neither the time nor the
resources to set up and maintain a database. I might consider it if all of
the data were received in the same layout, but the layout is as varied as
are the sources. Since I am the only one that is using the data, a
spreadsheet serves me best as I am able to freely add, delete, and modify
the layout, format, calculations, etc. Something a for which a database has
never been designed.

If I had many collaborators to work on the project, perhaps a database may
be of more use, but since I do not, I do not need the added headache of
trying to setup and maintain an additional level of complexity.

This is the conclusion I have drawn based upon the available documentation
on the internet and conversations with people that have worked with their
company's database. They all say it does not work until the processes are
resolved to the point that they need not change. One person explained the
problems that occurred when their processes did require a major
modification. After weeks of attempting to make the required modifications,
it was decided to set up another database. Then they went through months of
consultation and headache to get the essential data transferred from the old
database into the new one.

A database is a good tool to keep track of large amounts of data and
tracking processes, but unless it is a big company that has unchanging
processes, a database may be more of a liability than a help. With my data
constantly evolving, a database would need to be modified on a regular basis
and I am unwilling to invest the extra time to do so. All my data fits on a
spreadsheet and is easily modified as the sources are added or deleted as
each source has a different idea as to how the data should be presented.

Am 12.09.2012 07:23, JAMES MAJESKI wrote:

Two digit years have always been a problem. I always presume that the use two
digit years was obsolete after the Y2K publicity, but bad habits continue.
We are no longer in the era of eighty column punch cards, so there is no
excuse for two digit years.

A spreadsheet does not store 2-digit years nor 4 digits. All spreadsheets store numbers and nothing but numbers (or text). A date in a spreadsheet program is an integer day number (unless it is a string). Day zero is 1899-12-30, day 40,000 was 2009-07-06, today is day 41164. Today's So there can not be any problem with 2-digit years. Format the numbers to your liking so they show 2-digit years, 4-digit years or no year at all. But that will not change the cell value.
If you really need to export your currently active sheet into plain text you can apply any format you want.

Why csv? Which application do you try to exchange data with?

Am 12.09.2012 08:00, JAMES MAJESKI wrote:

My research has convinced me that I do not have neither the time nor the
resources to set up and maintain a database. I might consider it if all of
the data were received in the same layout, but the layout is as varied as
are the sources. Since I am the only one that is using the data, a
spreadsheet serves me best as I am able to freely add, delete, and modify
the layout, format, calculations, etc. Something a for which a database has
never been designed.

Well, then you don't heve the time nor the resources to set up and maintain a database in spreadsheets. The latter scenario is the nightmare that never stops.
A CSV file is an excerpt from a database and you can use any tool where your are free to modify layout, format, calculations etc. But then you save back a modified layout and format with additional calculations.

A database program lets you freely handle all data in any shape while maintaining the underlying data structures automatically for you.

In a spreadsheet a date year is a display choice, the actual date is
stored as a number relative to a 0 day. In a database, however, it
depends on how the field is defined: text or Date/Datetime. AFAIK all
Date/Datetime entries require a 4 digit year. The problem with a
database is it may use a location specific order (Imm/dd/yyyy for US or
dd/mm/yyyy otherwise) for entry and storage.

Hi :slight_smile:
I think all these tools require skill and experience.  It's easiest to keep using the tool you have most skills and experience in but at the same time it is a good idea to try to build-up experience with other tools.

If Andreas was working with your data then a database-program would be better and faster.  If you had his level of skill and experience with Base and your level with Calc/Excel then you would probably find that Base was far, far easier and faster than Calc/Excel for this particular task.  It's beyond the scope of this list to set-up a database for you but i suspect that if Andreas or Alex or someone did then you would find that easier to work with than the spreadsheet system you are using at the moment.

Something i would quite like to see at some point in the future is people on this list getting paid work on a consultancy basis for odd one-off tasks, perhaps not even at such high rates as consultants might normally get.  There are huge philosophical and practical problems around that sort of thing though even though it's allowed within the scope of the GPL and MPL license agreements.

If someone approached me off-list i would feel obliged to point them towards someone with more skill and knowledge such as Jay, Regina, Dan and the plethora of other people we see giving great answers week in week out.  If the request was made on-list then i thik we could figure out who would be best for a particular task, given time constraints and personal work-loads.

Regards from
Tom :slight_smile:

Databases are very useful even for small projects because of the
querying capabilities of a database. Base and Access (MSO) are both
relational databases meaning that the data tables are related and
queries can search for data in any table. Relational databases require
that the tables be predefined and this can be a problem for data entry
when the data is not received in the same format as the data tables.

Another option for a database is a NoSQL database such as MongoDB or
CouchDB (both are FOSS projects) where the database "tables" do not
require a predefined layout and what may take multiple linked tables in
a relational database can be done in one "table". For example a book in
a relational database may have multiple authors, formats (different ISBN
numbers), and subjects. In a relational database you would have a table
for the book (title, year, publisher), another for authors, another for
ISBN numbers, and another for subjects. In a NoSQL database one could
combine this information into one "table". The data can be searched.

Another problem is that often the MSO bundle does not include Access so
a user who needs a database is either forced to find one or use Excel as
a substitute.

What I am doing is customizing a spreadsheet for the data as the data is
presented in so many different ways. The data is then imported to the
spreadsheet using the tips and tricks so that the data will be entered
correctly. I then move things around, reformat, etc until the spreadsheet
data is compatible with the the master spreadsheet. Cut and paste to the
master spreadsheet and sort to integrate the data and it is all done except
for the analysis.

I will try the trick of importing the data from the clipboard. Thank you for
that idea.