Currently Using Spreadsheet for Personal Project - Thinking About Database

Tom_Davies1 · August 11, 2014, 12:11pm

Hi
I am not sure if it has been covered in this thread but unlike spreadsheets
with databases you hardly ever read directly from the main tables.

Normally you have "Queries" that filter and do calculations. They look a
LOT like tables such as spreadsheets but they seldom contain ALL the data
from any of the real "Tables".

So, one of your "Queries" might show only all those books that are marked
as "having been read". Another might show books that have not yet been
read.

If you want to add even more complexity you might have another Query to
show books that you started but got bored of (or side-tracked or lost)
before finishing, perhaps with an estimate of what page you had gotten to.
You might also have a Query showing books to avoid.

Wrt connected series such as the various Star Trek ones it might be good
just change the name of each series to put an identifier at the start of
each series name. In this case adding "StarTrek" at the beginning of the
series name means that when you sort the list of series's into alphabetical
order all the Star Trek ones would be bunched up together. In the future
you could add a new column to the SeriesTable to take into account series
that are based in the same universe as each other but don't seem to have
anything you could use as a cohesive identifying name. CJ Cherryh's Chanur
series links with "Down-Below Station" (sorry for the spoiler for anyone
that hasn't read them already!).

So that is 2 types of changes you could add in later, if it's ever needed.
Such changes only need to be done once but have an effect everywhere. You
wouldn't have to dig out each recurrence to make them all consistent with
each other.

Btw, When i have lacked a computer i have sometimes managed a book a day.
Admittedly after some books i needed some recovery time to mull over issues
raised. I didn't have a wife or a life though. I've known academics and
extremely fast readers who seem to churn through tons of books very
quickly. So, 17 books/year sounds quite low to me. I guess it might be a
lot to a 'normal' person (if there is any such person).
Regards from
Tom

Oogie_McGuire · August 11, 2014, 1:28pm

Oh Gosh, what do they call folks like me? <GG>

I average 64+ books a year, 1 a week plus one a month. Down a bit this year due to textbooks but still… <G>

Eugenie (Oogie) McGuire
Desert Weyr, LLC - Black Welsh Mountain Sheep http://www.desertweyr.com/
LambTracker - Open Source SW for Shepherds http://www.lambtracker.com
Paonia, CO USA

Mark_Phillips · August 11, 2014, 1:44pm

Joel,

A suggestion from left field, but it will help you in the long run. Take a
look at the django project - https://www.djangoproject.com/. It is a
framework for creating database driven web sites. It is very easy to use
and very powerful. The framework automatically creates all the plumbing you
need to build a project (such as database tables), and allows you, the
designer, to focus on the higher level problems of your design (what the
page views should look like). The user community is very helpful. There are
also many inexpensive hosting options for a public facing dajngo project,
or you can just run it locally on your own machine.

Try the tutorial and you will see how easy it is to use -
https://docs.djangoproject.com/en/1.6/intro/tutorial01/.

Good luck!

Mark

Paul16 · August 11, 2014, 2:36pm

I may be overestimating the simplicity of Base, but I suspect Base will
be far, far simpler than creating a Django project.

A Django project means actual coding, and a lot of it. Yes I know how
simple Django is, but it still requires a decent amount of coding. As I
understand it, Joel isn't a programmer, and if he is struggling to have
enough time to learn about databases and Base, he surely doesn't have
time to learn to be a programmer just for this.

Even if he was a programmer already, developing it in Django will
probably take longer than developing it in Base. In fact, I suspect it
would be faster for me to develop this sort of thing in Base than in
Django, and I'm a programmer with almost no experience in Base
(admittedly not much experience in Django per se, but some in other
frameworks like Django). But as I said, I may be overestimating the
simplicity of Base.

Learning to program, and learning something like Django, are both
worthy pursuits in their own right, and will allow one to make
applications such as this one and much more complex ones, but for a
simple home hobby system, I'm not sure that sort of input is warranted.

Perhaps looking at how Django is put together will help with your
understanding of databases and data access, but even that will require
a fair input of time and effort.

Suggestions such as this, though, are great. Even if they don't get
chosen as part of the solution, or are in fact not really directly
applicable, they still add ideas, and help one think outside the box a
little.

Again, YMMV, this is just IMHO.

Paul

toki · August 11, 2014, 5:07pm

Oh Gosh, what do they call folks like me?
I average 64+ books a year, 1 a week plus one a month.

A prime customer of an independent bookstore.

year due to textbooks but still… <G>

Students are deliberately excluded from the statistics.
A back of the envelope swag suggests that including them will double the number of books read in one year.

jonathon

toki · August 11, 2014, 5:19pm

My impression is that the OP _wants_ to learn a little bit about database design and construction.

Both Koha and Calibre are FLOSS solutions that literally require nothing more than installation, and then adding the data for the books. (There are also other reasons why those are more practical solutions than rolling one's own.)

jonathon

Paul16 · August 11, 2014, 10:26pm

Hi Joel,

Some comments on your diagram, in the order I thought of them, not of
importance:

1) I tend to use integers for primary keys. Smallints only go into the
thousands, and most of the stuff I have worked on has needed to be able
to scale well beyond that. Even things that seem small to start with
have a habit of growing unexpectedly, and you don't want to have to
change your database design down the road for something small like
that.

Planning ahead really is the name of the game with databases, as
half the reason we use them to begin with is to handle things that
have grown beyond our simple spreadsheets, or that are expected to
become unwieldy in the future. The whole point of a good design is
that it should handle this sort of thing. And the extra storage space
really shouldn't be an issue. For example, a quick estimate
suggests that using integer instead of smallint for keys in the diagram
you gave will only use an extra 1 mb of storage space if you go all the
way up to the limit of the smallint range, and beyond that it is
academic as you would be *required* to use int instead of smallint.

But for a home project you're unlikely to exceed that number, so this
is fine. It's just worth being aware of this when designing it, so that
you know the choices you have made, and why you have made them.

I would make them unsigned smallints, though. Not really a good idea to
use negative numbers for keys, and so making them signed just wastes
half their range.

2) I wouldn't put a Read flag in the Book table. You have a Status
table with a ReadStatus, so this would be duplication of data.

3) Same with the SeriesComplete flag in the Series table. You can get
that information by doing a query that would check if all the books in
the series are in the read status.

It's usually much better practice to generate this sort of information
as you need it from the relevant pieces of information in the database
(assuming you have all the required pieces in the database), than to
store this information a second time, which is just duplication. Even
if the space wasted isn't important, it just leads to the possibility
of the data being inconsistant.

The only time one stores such things is when the actual calculation to
work this out from the pieces of data takes so long that it is actually
necessary to store calculated results to save time in the application.

4) Same with AuthorID in the Series table. Each book will have an
author, so you don't need a series author. That way if the books in the
series are written by different authors everything still works. This
way, who would the series author be, and how would that be any
different from the authors of the individual books?

Note that you can still show this information in the frontend when
viewing the series. You can make a query to check if the authors of all
the books in the series are the same, and if so show that author, but
if they are different just show "various" or something.

Also, when entering all the books in a series, you can still have the
frontend ask you for the series information first, including author,
and if you fill it in then it will prepopulate the author field when
adding each book.

Of course, maybe you actually do have a use for a series author as
separate to the authors of the individual books, so maybe this does
make sense to you, it just doesn't seem right to me.

5) Some people want author names to be a single field, which is simpler
for some things, and avoids hassles with authors with multiple names,
and other people want them to be a name and surname field, which does
mean you can do things like order by surname or order by first name. And
avoids the problem of you sometimes entering the name surname first and
sometimes first name first.

With the name split into parts it is easier to check if the author
exists by both name and surname when entering new authors, so even if
you mistype one of them the author will still be found if they are
already in the database, but this can still be done if the name is all
in one field, it is just harder. You can also still sort the author
list by surname and by first name, it is just harder if both are in one
field.

The choice is up to you.

6) I would make the ReadStatus slightly larger, say text(30).
If you use varchar there is no wasted space if some of the entries have
less text, so you could even make this varchar(100) or something. Or
give it a short code and a longer description. Something like "could not
finish" is already too long for 15 characters. Sooner or later you're
going to want to add a Status that is longer than the 15...

7) The Status table is incorrect. The way you have it you could have
multiple statuses for a book, and each one would be a hand typed status
description. One time you might type "reading", and another time
you might mistype "raeding". Instead, you should change the BookID to a
StatusID in the Status table, and add a StatusID to the Book table. The
PageNum field would also need to move to the Book table.

This way you would put all the possible statuses into the Status table,
and each book would have a link to that table, showing which of those
possible statuses the book currently is in.

Uh...

I think that's all for now.

Otherwise it looks good. You're almost there.

Paul

Mark_Phillips · August 11, 2014, 10:34pm

Calibre is also a great idea/solution for this project, unless one of the
goals of the project is to learn about databases and how to use a database
to solve a real world problem. I added the second clause, because just
learning about databases an admirable goal, but imo, one also should learn
how to build the bits that allow an end user to create/read/update/delete
data in the database. Django is one vehicle for doing this. Paul, I have to
disagree with your analysis of django...I have worked with many
non-programmers on building useful projects with that framework. There is a
learning curve, but it is not insurrmountable by a non-programmer. ymmv.

Mark

Mark_Bourne · August 12, 2014, 9:37pm

Paul wrote:

So I was interested, and did a little more digging. I figured I would
share the info.

I don't know to be honest, not having a lot of experience in database
design. I've only ever really used MySQL, which does support enums,
but maybe that's just a MySQL feature.

Seems like it is a MySQL specific feature, although PostgreSQL seems to
support it.

Ah, so definitely best avoided if portability is a concern! And probably advisable to avoid even if portability isn't an immediate concern.

To me, the set of possible status values just seems more like part of
the database schema design than data entry - you wouldn't generally
add or remove status options (and the application may assign special
meaning to certain statuses, so it may be critical that a specific
set of values is defined). Then again, in some applications being
able to introduce new statuses at any time could be an advantage.

Yeah, being able to change these values later is one of the main
reasons to use a separate table. Clients almost *always* end up adding
or removing some of these.

The database work I've done has been for either myself or small voluntary organisations I'm involved in, so I've never really had to deal with that. Where the set of values could obviously be expanded (e.g. to indicate colour of something) I'd certainly do that with a table containing the options. My main use for enums has been to indicate things like male/female gender, allowing null if I need to allow for an unknown value (point taken - someone might would want to expand that list of options...)

Also, when you have specific functionality tied to some of these
statuses, it's always a good idea to add a flag field for it, and make
your code check if the flag is on, rather than if the status name
matches a specific value, so that if the client ends up deciding (as
they invariably will) that actually a second status must also do that
thing that they assured you only the one status would ever do, then you
simply turn that flag on for the second status, rather than having to
change your code base and hunt for all instances of checking for the
status by name.

One of the places I have used enums was for that kind of a flag, where there were three options for how the application could handle a record. Essentially an event booking database where each participant was assigned to one of about 10 groups, and each group needed to be handled by the application in one of three different ways (displaying different sets of options), so I had a "group type" enum (with three possible values) on the groups table indicating to the application how participants in that group should be handled.

Looking back, I actually did end up writing code into the application to decode that "group type" into a set of flags indicating which parts of the GUI should be displayed and which options should be offered - with the intention of porting that into a database table. I just never got round to doing that before the event was over.

So it seems I'm violently agreeing with you!

Also, then you can localize your status names if you ever want to make
your program support other languages.

Something else I haven't had to worry about, but if I did I'd have done the localisation in the application - the enum values (as I use them anyway) are not generally displayed as-is to the user anyway, they're just convenient names to use in the database. Using localised status names in the database table surely hinders localising the application - if you wanted to connect both English and French front-end applications to the same back-end database, wouldn't they both get the same status names to display, rather than the names appropriate for the application language? Or would the status table in that case contain strings in one language (or perhaps even generic keywords) for which the application would look up translations in it's own localisation table?

The page here gives some benefits and pitfalls, but I don't see any of
the benefits as not being provided by the separate status table
solution:
https://www.udemy.com/blog/mysql-enum/

This page also gives a few more reasons why you shouldn't use them, and
prefers "reference tables", which I called status tables:
http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/

Some interesting points there, thanks.

While most of those reasons may not apply to this project, the two that
do for me are:

1) The statuses are data, and shouldn't be stored as structure. This is
a conceptual thing, but it's best to start out thinking the right way.

In this particular case with statuses, I could go either way and do see your point. I might even go with having "date started" and "date finished" fields and determining whether a book is unread, in progress or completed from those.

More generally, I'd certainly use a table where the list of options is subject to change and doesn't have any special meaning to the application (e.g. colours or continents). For gender I've previously used an enum, but in light of this discussion will probably do that differently in future; an application shouldn't care about a third option being added.

Now it comes to it, I'm not sure I've got any uses for enums left... Possibly when each new option requires explicit support in the application - from the point of view of modifying things in future, it's more obvious that adding a new possible value to an enum column (essentially changing the database structure) might cause problems for the application, and not so obvious that adding a new row to a reference table might cause problems for the application. But I'm less convinced than I was that this can't be handled in a different way, e.g. with a set of boolean flags.

2) It makes populating the dropdowns way easier, unless you were going
to hard-code them, which is admittedly even easier, but can lead to
problems down the road when you want to make changes or additions.

Can't argue with that. When I have used enums, the list of options in the UI and mapping between those and database enum values has been hard-coded.

Just FYI

Paul

Thanks! It's certainly been useful to see better ways of doing things.

Mark.

Paul16 · August 12, 2014, 11:18pm

Hi Mark,

[snip]

> Also, then you can localize your status names if you ever want to
> make your program support other languages.

Something else I haven't had to worry about, but if I did I'd have
done the localisation in the application - the enum values (as I use
them anyway) are not generally displayed as-is to the user anyway,
they're just convenient names to use in the database. Using localised
status names in the database table surely hinders localising the
application - if you wanted to connect both English and French
front-end applications to the same back-end database, wouldn't they
both get the same status names to display, rather than the names
appropriate for the application language? Or would the status table
in that case contain strings in one language (or perhaps even generic
keywords) for which the application would look up translations in
it's own localisation table?

Normally, localization is done by having a plain text file with all the
application strings in it, and this file is simply copied and each
string translated for each different language. Then at run time the
application knows which locale it is in, and looks in the appropriate
file for all the translated application strings. But they can just as
easily be stored in the database, though.

(Sorry, I know you probably know that, just including it all for
anybody reading that isn't as familiar with this stuff.)

Now, I've never actually dealt with localization myself, so I'm
guessing a little, and not sure what might be considered best
practice out in the wild. But obviously you wouldn't want to add a new
field to each table for each language, although you might if you were
only dealing with a small, restricted set of languages. But generally
you want that sort of stuff stored as data, not as structure, because
it's easier to add rows than it is to add fields. So I'm guessing one
solution would be to add a language code field to the lookup table, and
make a composite primary key of that and the ID. Then you can enter one
row for each status for each language.

But generally I think composite keys are considered bad. Although maybe
not, but I don't like them, at least. So another approach is to link on
something other than the primary key. One thing we often did with
lookup tables was have a short code and a description for each, so for
purposes of the application you would use the code, but you could
display the description to the user. And use whichever one fit better
for printing. So you could use the short code as the foreign key from
whatever table references the status table, instead of the ID. Then the
status table could still have a single primary key field of the ID, and
just a unique constraint on the short code and language fields. Again,
one entry for each short code and language pair, with the translated
description. The disadvantage of this is not having the primary key of
the lookup table as the foreign key of the referencing table.

For me once you have a database, there are other questions, like what
are you storing in the database (user entries as opposed to programmer
entered things like statuses)? Do those need to be translated?
If they do, then you've got a whole different set of problems, but
assuming they don't, then you only have to translate the scaffolding
stuff.

I'd want to keep the translation of database stuff in the database. At
least then it's easier to update if you ever do need to make changes.
Just add a status, and add all the translations for it. Otherwise, if
it's an enum, and then the translations are in the frontend, you need
to not only update the database structure to add an enum, you also need
to update all the frontends to have the new translation (even if that
doesn't actually require a recompile).

[snip]

Werner1 · August 13, 2014, 6:09am

Hi,

Hi Mark,

...

Normally, localization is done by having a plain text file with all the
application strings in it, and this file is simply copied and each
string translated for each different language. Then at run time the
application knows which locale it is in, and looks in the appropriate
file for all the translated application strings. But they can just as
easily be stored in the database, though.

I18N (Internationalizing) for user interface labels is relatively easily done in most languages, one way is to use gettext which is supported by a lot of programming languages.

Doing I18N for database tables is another story, a few years ago I looked around for support of I18N support for database tables and at the time I didn't find any, have you found one? The storing of the I18N data is relatively easy, but how to access that from the application depending on user login in and how to fall back to a base language in case a value is not translated is not that easy (at least in my view:)).

Werner

Peter_Goggin · August 13, 2014, 9:01am

I have a MS access data base running under XP. For various reasons I am converting my ;laptops to run Ubuntu with LibreOffice. I have successfully transfered Thunderbird e-mail from XP to Ubuntu. I am able to transfer and read MS Word and Excel files to run under LibreOffice under Ubuntiu.

I cannot see how to transfer rhe MS Access to run uder LibreBase. I tried doing the conversion under XP by loading LibreOffice and then opening the Access data base using LibreBase. with the following results:

1 All tables were successfully converted and retained their data.
2 All vies finished as being defined as tables.
3 No forms were transferred.

Is there an easy way to convert Access data bases to run under LibreBase?

Can I get the vies transferred as vies ? Do I need to extract the SQL for the vies and then recreate them in Libre Base.

If I cannot transfer the forms directly then I will need to recreate them. I have looked at how forms work in LibreBase butr I cannot find out how to create a menu form which allows the juser to select other menu forms or data entry forms.

Is there a reort system in Libre Base?

Any help will be gratefully accepted

Regards

Peter Goggin

iplaw67 · August 13, 2014, 9:58am

Hi Peter,

I cannot see how to transfer rhe MS Access to run uder LibreBase. I
tried doing the conversion under XP by loading LibreOffice and then
opening the Access data base using LibreBase. with the following results:

All this does unfortunately, is link your ODB file to the MDB data source.

1 All tables were successfully converted and retained their data.
2 All vies finished as being defined as tables.

Yes, generally tables and views are converted to tables.

3 No forms were transferred.

Forms, Queries and Reports have to be rebuilt.

Is there an easy way to convert Access data bases to run under LibreBase?

No, not without redoing much of the work that you did in Access to make
your db work for you in the way you wanted.

Can I get the vies transferred as vies ? Do I need to extract the SQL
for the vies and then recreate them in Libre Base.

Export the View as SQL, and then create a new view in LibreOffice with
that SQL.

If I cannot transfer the forms directly then I will need to recreate
them. I have looked at how forms work in LibreBase butr I cannot find
out how to create a menu form which allows the juser to select other
menu forms or data entry forms.

For that, you should try installing the Switchboard extension
(LibreOffice extensions website), to see if it meets your needs, or else
learn to manipulate UNO objects via one of the scripting languages that
LibreOffice supports.

Is there a reort system in Libre Base?

Yes, it is called ReportBuilder, and is a Java-based subset of the
Pentaho reporting engine. You need a JVM/JRE in order to use it, and by
default on current Ubuntu installations, it is not installed (neither is
Base for that matter) with the default installation of LibreOffice.

Alex

Tom_Davies1 · August 13, 2014, 11:48am

Hi
Ahh, i take it that "View" is the new name for "Queries"? It is quite a
good name for it. "Queries" had to be explained quite a bit although i
thought it made more sense tbh.

Open Source advocates and people who are helping organisations to move away
from Microsoft lock-ins often give the surprising advice of keeping the MS
systems in place!

This helps the migration flow more smoothly because some of the lock-ins
are difficult to escape. By keeping the old system available people find
it easier to over-come some of the simpler hurdles without needing so much
extensive and intensive retraining.

Microsoft generally don't have a migration route for new products or newer
versions of products. So people are forced into using the newer system.

Open Source generally encourages migration instead of using the typical
Microsoft route. They/we usually allow you to keep the older system
alongside the newer one. So normal users can keep going back to using the
older, more familiar tools when they really need to but tend to move to the
newer system because it's better and does more. A few people need a bit of
a push so it's a good idea to set a deadline.

People often resent being forced into using the newer system. For some
reason when the newer system is a Microsoft one it is often the user that
gets the blame or is made to look or feel like a foolish fuddy-duddy. When
the newer system is anything other than Microsoft people are generally
encouraged to blame the software.

So, in this case i would keep the existing Access database and existing
copies of MS Office that are already installed. LibreOffice works well
alongside MS Office.

Sometimes, at an early stage of migration, some newer machines need the
same version of MS Office to be installed but that fizzles out as more&more
works more smoothly on LibreOffice/OpenOffice
Regards from
Tom

Werner1 · August 13, 2014, 12:21pm

I don't think so, at least not in SQL terms.

http://www.w3schools.com/sql/sql_view.asp

http://en.wikipedia.org/wiki/SQL see Queries section

Werner

Paul16 · August 13, 2014, 7:38pm

Hi Werner,

Hi,

> Hi Mark,

...
>
> Normally, localization is done by having a plain text file with all
> the application strings in it, and this file is simply copied and
> each string translated for each different language. Then at run
> time the application knows which locale it is in, and looks in the
> appropriate file for all the translated application strings. But
> they can just as easily be stored in the database, though.
>
I18N (Internationalizing) for user interface labels is relatively
easily done in most languages, one way is to use gettext which is
supported by a lot of programming languages.

Doing I18N for database tables is another story, a few years ago I
looked around for support of I18N support for database tables and at
the time I didn't find any, have you found one? The storing of the
I18N data is relatively easy, but how to access that from the
application depending on user login in and how to fall back to a base
language in case a value is not translated is not that easy (at least
in my view:)).

I don't know of any frameworks for I18N in databases, although I'm not
sure of the level of support in things like Django, CakePHP and other
web frameworks. From what my brief research has shown, most of them do
have some level of support.

Possibly the best support I saw was in Propel, a PHP ORM.
http://propelorm.org/blog/2011/01/11/propel-gets-i18n-behavior-and-why-it-matters.html

Also, these articles discuss the actual database design and present
different standard options:
http://www.codeproject.com/Articles/8084/Creating-multilingual-websites-Part#databasedesign
http://www.apphp.com/tutorials/index.php?page=multilanguage-database-design-in-mysql

Storing the data with a good structure makes the queries only a little
harder, usually one extra join, although that doesn't take into account
fallback language.

The Propel behaviour seems quite sensible to me. If you aren't going to
use an ORM, I would suggest at least making a wrapper in your code to
manipulate queries and add the extra join, so for the rest of the
code you don't have to worry about that detail, and just do queries as
normal.

If you need a default language, I would probably keep the default
language values in the original table, and just store translations in a
translation table. Then you can have the wrapper detect if the current
locale is the default language, and do nothing in those cases. If the
locale is different, then get both the field from the original table
and the translation, and have the wrapper pick up if the translation is
blank, then substitute the default value, and write to whatever
logging system you are using noting the incomplete translation.

Just my thoughts. Googling brought up quite a few other links, mostly
to StackOverflow, and the discussions gave more options and further
reading.

Paul

Tom_Davies1 · August 13, 2014, 2:46pm

Hi
There is a separate mailing list which might be able to offer suggestions.
Our mailing list for international translators and co-ordinating between
the different languages is

L10n@Global.LibreOffice.Org

(lower and upper-case don't matter = it's just to help human readers make
more sense of it)
Regards from
Tom

Tom_Davies1 · August 14, 2014, 8:27am

Hi
Thanks! Ahh, so it's kinda a sub-set of what i understand as
"Queries"?

It's just when you view the Query in table view rather than viewing it as
Sql or the nice gui way of building up filters, formulae and such.

In Base i think Forms and Reports can be built in Writer, or Calc.

So, people using those Forms and Reports are presented with a familiar
environment and can do useful things such as simple re-arrangements and
tweaks or even fork them into different layouts (err, eg using "Save As .."
and give a new name) to do different things.

It protects the users from unfamiliar database design tools that they could
accidentally use to damage the database in all sorts of ways.

Annoyingly i think Writer or Calc Reports kinda don't allow people to enter
new data so proper Forms and Reports might need to be built within Base for
that. Base has wizards for all sorts of things. I'm never sure whether
it;s better to use wizards or avoid them.

Regards from
Tom

toki · August 18, 2014, 10:04pm

Librarians and book vendors have a slightly lower estimate for the
average number of books read per year.

There are huge swathes of the American population where book reading is
completely unknown. Those sub-groups pull the average number of books
read down considerably.

There is a correlation between the number of books in a student's home,
and their academic performance. There is a secondary correlation between
the number of books the parents of the student read, and the student's
academic performance.

jonathon