Currently Using Spreadsheet for Personal Project - Thinking About Database

Joel_Madero · August 10, 2014, 5:53pm

So - I've started with the previous suggestions and am hesitant to change now (put in about 5 hours over the past couple days to learn by doing).

Attached is what I have so far. It's pretty basic but just two tables (as you can see) and a form which I made to my likings.

Suggestions welcome

Not having built in easy to use switchboard functionality is a bummer :-/

Best,
Joel

P.S. Keep in mind that this is just a hobby project not a professional db, I'm not in the computer administration field so I'll never need to use db for professional use. Basically I want to balance functionality with the stress on my time (I literally have about 70-80 hour work weeks along with a family and other duties within LibreOffice). So it's always a balance of time against functionality. That being said, I'm always open and happy to take suggestions

Joel_Madero · August 10, 2014, 6:48pm

Hi All,

So I went back to planning stage. Link to what I think might work - hoping to get the planning stage done today so I can start actually putting together the db - I have 3 days off so now's a good time for me to get the basic structure together Thanks in advance!

https://drive.google.com/file/d/0B2kdRhc960qdbGJIQ1M3NWtrdmc/edit?usp=sharing

Best,
Joel

jomali · August 10, 2014, 8:02pm

Joel,

The NumberOfBooksByAuthor and ReadBooksByAuthor fields are unnecessary.
They can be derived from a simple query for a report.

Since there has to be a one-to-one relation between the ReadStatus table
and the BookInformation table, you might as well merge them. When you do
merge them, a NotRead field is superfluous. If the Read field is binary,
when it is true the book has been read and when it is false it is not read.
Also, you left out a DateRead field, corresponding to the Year, Month, Day
fields of your original spread sheet.

Similarly, since there has to be a one-to-one relation between the
BookInformation table and the Rankings table, you might as well merge them
as well.

The NumberOfBooksInSeries, the ReadInSeries and the RemainingInSeries
fields are unnecessary. Such information is readily obtained through
queries for reports.

By the way, carefully consider how you intend to import the data from your
original spreadsheet to the database. There are various ways to do it, and
how you structure your database will impact how you import your data.

Mark_Bourne · August 10, 2014, 8:30pm

It looks like each book should only have one of the "ReadStatus" flags set, so I'd make that an enum field on the BookInformation table, with possible values of "Not Read", "Reading" and "Read". You can set the default value for the field to "Not Read" so a new record will be set to that status if no value is specified for that field.

"NumberOfBooksByAuthor" and "ReadBooksByAuthor" are not needed on the "AuthorInfo" table - you can get those by querying the database. I may have the syntax slightly wrong here, but along the lines of:
SELECT `ai`.`AuthorID`, `ai`.`AuthorName`, COUNT(`bi`.`BookID`) from `AuthorInfo` `ai` LEFT JOIN `BookInformation` `bi` ON `bi`.`AuthorID` = `ai`.`AuthorID` GROUP BY `ai`.`AuthorID`
should give the number of books by each author. Add:
AND `bi`.`ReadStatus` = "Read"
to the ON condition and you can get the number of read books by each author.

That's the kind of thing a database enables you to do much more easily than with a spreadsheet ;o)

Mark.

Joel Madero wrote:

Paul16 · August 10, 2014, 8:33pm

Hi Joel,

Joel,

The NumberOfBooksByAuthor and ReadBooksByAuthor fields are
unnecessary. They can be derived from a simple query for a report.

Agreed.

Since there has to be a one-to-one relation between the ReadStatus
table and the BookInformation table, you might as well merge them.

The way you have it rather misses the point of a status table. I
wouldn't merge them, I would instead add a ReadStatusID to the
BookInformation table, and change the ReadStatus table to:

ReadStatusID
ReadStatusDescription

You want to store entries in here like the following:
1 - Finished Reading
2 - Not Started
3 - Busy Reading
4 - Could Not Finish
5 - Must Read Soon

that sort of thing.

Also, you left out a DateRead field, corresponding to the Year,
Month, Day fields of your original spread sheet.

Yup.

Similarly, since there has to be a one-to-one relation between the
BookInformation table and the Rankings table, you might as well merge
them as well.

Hrm. It does depend on what you want to do with rankings.

You could merge them into the BookInformation table, have a varchar
field that holds a short description of each ranking category for each
book, or you could make this also a status table. Add DepthRankingID,
LessonsRankingID, CitationsRankingID etc fields to the BookInformation
table, and have the Rankings table like so:

RankingID
RankingDescription

With the following sorts of entries:

1 - Very Poor
2 - Poor
3 - Average
4 - Fair
5 - Good
6 - Excellent
7 - Specially Awesome
8 - Good but incomplete
9 - Thorough, but dry
etc

The NumberOfBooksInSeries, the ReadInSeries and the RemainingInSeries
fields are unnecessary. Such information is readily obtained through
queries for reports.

Yup.

By the way, carefully consider how you intend to import the data from
your original spreadsheet to the database. There are various ways to
do it, and how you structure your database will impact how you import
your data.

Your biggest hassle here will be that the spreadsheet data is
inconsistant. There are considerations here, but trying to keep all of
it in mind might be a bit much right now if this is new to you. I
wouldn't worry too much. It's more important to get the proper database
structure now than to worry about how you will import the data. Once
it's all in place and working, you can worry about getting the data in.
I'm sure plenty of people here can help with that.

Paul

Paul16 · August 10, 2014, 8:35pm

It looks like each book should only have one of the "ReadStatus"
flags set, so I'd make that an enum field on the BookInformation
table, with possible values of "Not Read", "Reading" and "Read". You
can set the default value for the field to "Not Read" so a new record
will be set to that status if no value is specified for that field.

That's essentially the same idea as the status tables I was suggesting.
In my experience we've always used status tables, so I would suggest
those. Partly because I'm not familiar with database support for enums.
How well supported and widely supported is that by the most common
databases?

Joel_Madero · August 10, 2014, 9:35pm

Thanks all - I have enough to do "something" - probably won't be perfect
but will work for my needs. As for importing the data - not horribly
concerned about this. Very worse case I'll manually enter which won't be
too horrible in my case (really only about 250 entries to put in). More
concerned about getting the db done.

Thanks again for the input.

Best,
Joel

Mark_Bourne · August 10, 2014, 9:48pm

Paul wrote:

It looks like each book should only have one of the "ReadStatus"
flags set, so I'd make that an enum field on the BookInformation
table, with possible values of "Not Read", "Reading" and "Read". You
can set the default value for the field to "Not Read" so a new record
will be set to that status if no value is specified for that field.

That's essentially the same idea as the status tables I was suggesting.

Indeed; I think our replies crossed in the post ;o)

In my experience we've always used status tables, so I would suggest
those.

I wouldn't dispute that.

Partly because I'm not familiar with database support for enums.
How well supported and widely supported is that by the most common
databases?

I don't know to be honest, not having a lot of experience in database design. I've only ever really used MySQL, which does support enums, but maybe that's just a MySQL feature.

To me, the set of possible status values just seems more like part of the database schema design than data entry - you wouldn't generally add or remove status options (and the application may assign special meaning to certain statuses, so it may be critical that a specific set of values is defined). Then again, in some applications being able to introduce new statuses at any time could be an advantage.

Mark.

Paul16 · August 10, 2014, 10:26pm

So I was interested, and did a little more digging. I figured I would
share the info.

I don't know to be honest, not having a lot of experience in database
design. I've only ever really used MySQL, which does support enums,
but maybe that's just a MySQL feature.

Seems like it is a MySQL specific feature, although PostgreSQL seems to
support it.

To me, the set of possible status values just seems more like part of
the database schema design than data entry - you wouldn't generally
add or remove status options (and the application may assign special
meaning to certain statuses, so it may be critical that a specific
set of values is defined). Then again, in some applications being
able to introduce new statuses at any time could be an advantage.

Yeah, being able to change these values later is one of the main
reasons to use a separate table. Clients almost *always* end up adding
or removing some of these.

Also, when you have specific functionality tied to some of these
statuses, it's always a good idea to add a flag field for it, and make
your code check if the flag is on, rather than if the status name
matches a specific value, so that if the client ends up deciding (as
they invariably will) that actually a second status must also do that
thing that they assured you only the one status would ever do, then you
simply turn that flag on for the second status, rather than having to
change your code base and hunt for all instances of checking for the
status by name.

Also, then you can localize your status names if you ever want to make
your program support other languages.

The page here gives some benefits and pitfalls, but I don't see any of
the benefits as not being provided by the separate status table
solution:
https://www.udemy.com/blog/mysql-enum/

This page also gives a few more reasons why you shouldn't use them, and
prefers "reference tables", which I called status tables:
http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/

While most of those reasons may not apply to this project, the two that
do for me are:

1) The statuses are data, and shouldn't be stored as structure. This is
a conceptual thing, but it's best to start out thinking the right way.

2) It makes populating the dropdowns way easier, unless you were going
to hard-code them, which is admittedly even easier, but can lead to
problems down the road when you want to make changes or additions.

Just FYI

Paul

Oogie_McGuire · August 10, 2014, 10:45pm

I would, I often change my mind about how to classify stuff and it also prevents the status from being entered in differently.

Eugenie (Oogie) McGuire
Desert Weyr, LLC - Black Welsh Mountain Sheep http://www.desertweyr.com/
LambTracker - Open Source SW for Shepherds http://www.lambtracker.com
Paonia, CO USA

Oogie_McGuire · August 10, 2014, 10:48pm

We agree <G> I think my comments passed yours in the mail.

I hadn't done any relational database work until this current big project, thats why the database design book I recommended was so critical. It got me out of the spreadsheet mindset and into the database one.

Just because it's for one person for one reason doesn't mean you can't design it with proper database functions, after all the structure might be useful to someone else as well.

Eugenie (Oogie) McGuire
Desert Weyr, LLC - Black Welsh Mountain Sheep http://www.desertweyr.com/
LambTracker - Open Source SW for Shepherds http://www.lambtracker.com
Paonia, CO USA

Joel_Madero · August 10, 2014, 10:49pm

Gah - I just realized that I think I have to separate "date read" from the book info. Reason being that if I read a book 2-3 times and want to track the dates then I have to have another table. With this, I'd do a one to many relationship between the book information and the read date tables, yes?

The way I imagine it working is when I enter a book title + author, it checks book information to see if the book is already in the table, if it is, it just pulls up the info related to the book, and I can mark as read, add a new date that I've read (which might or might not be the first time I've read it), it enters the date in the date read table, allows me to rate (or update the rating) etc...

Make sense?

Best,
Joel

P.S. It seems like part of db work is just figuring out personal preferences. Reading the emails a few times it seems like there are differing opinions a little about the use of tables (the # of tables), etc...

So my plan is sticking with the original thought - keep ranking separate from the book information table, but splitting author, series, etc... from book information and putting those in a different table.

Thanks for all the help, really appreciate it.

Oogie_McGuire · August 10, 2014, 10:51pm

One reason to keep all the tables separate is to facilitate adding things to them later. Also localization for other languages and to keep entries exact so searches are easier to do. Solves issues with How did I spell that? problems in queries

To my mind the only thing that belongs in a table are things that are only related to that item.

Since there has to be a one-to-one relation between the ReadStatus
table and the BookInformation table, you might as well merge them.

The way you have it rather misses the point of a status table. I
wouldn't merge them, I would instead add a ReadStatusID to the
BookInformation table, and change the ReadStatus table to:

ReadStatusID
ReadStatusDescription

You want to store entries in here like the following:
1 - Finished Reading
2 - Not Started
3 - Busy Reading
4 - Could Not Finish
5 - Must Read Soon

that sort of thing.

There is a one to many between read status and book info so it would need tobe a separate table in my mind. Exactly as above. So it would be a foreign key in the book table.

Hrm. It does depend on what you want to do with rankings.

You could merge them into the BookInformation table, have a varchar
field that holds a short description of each ranking category for each
book, or you could make this also a status table. Add DepthRankingID,
LessonsRankingID, CitationsRankingID etc fields to the BookInformation
table, and have the Rankings table like so:

RankingID
RankingDescription

With the following sorts of entries:

1 - Very Poor
2 - Poor
3 - Average
4 - Fair
5 - Good
6 - Excellent
7 - Specially Awesome
8 - Good but incomplete
9 - Thorough, but dry
etc

By the way, carefully consider how you intend to import the data from
your original spreadsheet to the database. There are various ways to
do it, and how you structure your database will impact how you import
your data.

I found that it was easiest to strip out and create a separate spreadsheet for each of the tables in my database, copy the data to it from the original behemoth of a spreadsheet, save as a .CSV file and use a .CSV to SQL importer function to get the data into my database But my spreadsheet originally consisted of 15 separate sheets in one file each with approximately 200 rows and about 700 columns. I am using SQLite as my database not BASE.

Eugenie (Oogie) McGuire
Desert Weyr, LLC - Black Welsh Mountain Sheep http://www.desertweyr.com/
LambTracker - Open Source SW for Shepherds http://www.lambtracker.com
Paonia, CO USA

Paul16 · August 10, 2014, 11:05pm

Indeed, although without a background in it or someone who knows his
salt, it can be really difficult to figure out the best way to do
things.

Once the database structure is right, the rest should be pretty
straightforward, especially in Base. Coding is a set of best
practices all on its own, so if you were coding the frontend (like I
would probably do), that's another story completely, but with Base it
should be relatively simple, and there shouldn't be too many options,
so there won't be as much chance of making bad choices. The database
structure really is the foundation everything else builds upon.

Paul

Paul16 · August 10, 2014, 11:16pm

Hi Joel,

Gah - I just realized that I think I have to separate "date read"
from the book info. Reason being that if I read a book 2-3 times and
want to track the dates then I have to have another table. With this,
I'd do a one to many relationship between the book information and
the read date tables, yes?

Yep, sounds right.

The way I imagine it working is when I enter a book title + author,
it checks book information to see if the book is already in the
table, if it is, it just pulls up the info related to the book, and I
can mark as read, add a new date that I've read (which might or might
not be the first time I've read it), it enters the date in the date
read table, allows me to rate (or update the rating) etc...

With this, you could even do away with a read status, and just see if
there are any dates in the DateRead table for the book, but then you
wouldn't get things like "busy reading" and "must read soon", so I'd
still keep both tables. Note that this does mean you could potentially
have a book with a status of read, but no completed dates. This is
something the frontend would need to check for and not allow you to
change the status to read without adding a completed date.

Make sense?

Best,
Joel

P.S. It seems like part of db work is just figuring out personal
preferences. Reading the emails a few times it seems like there are
differing opinions a little about the use of tables (the # of
tables), etc...

There are of course slight differences, but there shouldn't be any
major ones, except where we have different ideas about exactly what you
want to achieve. If there are any large differences in advice, we've
probably misunderstood something

So my plan is sticking with the original thought - keep ranking
separate from the book information table, but splitting author,
series, etc... from book information and putting those in a different
table.

I'm still not entirely sure how you want to rank books, but yes,
keeping it as its own table seems like the right idea to me.

You could just track a single number (say from 1 to 5, or 1 to 10), for
each rank category, or a single text description for each category, and
keep that in the book table, but making it a separate table is a little
more flexible if you're going to use standardised ranks across all
books, and not an individual description for each book.

Paul

Joel_Madero · August 11, 2014, 1:48am

I decided to do the smart thing and diagram it out - the diagram isn't perfect (looks wise) but I hope that I can get a +1 before actually making the databases.

Note: I decided to put rankings with date read table, this way I can have different rankings for the same book (for each instance that I read it, some books seem better - or worse - the second time around)

Thoughts appreciated, a million thanks for all the advice given so far.

Link: https://drive.google.com/file/d/0B2kdRhc960qdZzJZenR1Qno2LWM/edit?usp=sharing

Best,
Joel

toki · August 11, 2014, 2:04am

When you design that table for Series, remember:

The same work can be in two or more series. For example: James Legge's 2
volume series _The Sacred Texts of Daoism_ was also part of Max Muller's
50 volume series _The Sacred Texts of the East_.

A series can have two or more authors. For example, _Star Trek: Deep
Space Nine_ contains books written by half a dozen different authors.

Books set in the same world need not be part of a series, even though
marketing mistakenly claims that they are a series. For example: Anne
McCaffrey's _P.E.R.N._ novels.

Consider how you'll handle spin-of series. For example: _Star Trek: The
Next Generation_ alongside _Star Trek: Deep Space Nine_, alongside _Star
Trek: Corp of Engineers_, etc.

Consider how you'll handle trilogies such as Douglas Adams _The
Hitchhiker's Guide to the Galaxy_, (Officially, it is a trilogy in five
parts, or six parts, if _And Another Thing..._ is included in that trilogy.)

For the Read Status table, you might also want to have tags for
"reread", and "acquired".

toki · August 11, 2014, 2:21am

Potential issues with the author field:

Book One: Author A & Author B;
Book Two: Author B & Author A;
Book Three: Author A & Author C. Author C is really Author B, but has
had a name change;

Book One, Book Two, and Book Three are all in the same series.

I've come across that in books written by academics, for academics.
Unfortunately, I don't remember any examples, offhand.

jonathon

Joel_Madero · August 11, 2014, 2:23am

I'm not horribly worried about name changes

Best,
Joel

Joel_Madero · August 11, 2014, 5:19am

Hey Jonathan,

responded in line but the gist is - this is a hobby and I'm not going to spend hours and hours thinking about corner cases/etc... if a book has multiple authors, I'll pick one, if it belongs to multiple series, I'll pick one, if it's a spin off to a main series, I'll just track the spin off and ignore that it's part of a larger set. This will suit my needs fine.

If I were doing your system I'd do the following:
Book table
  Title
  number of pages
  Foreign key links to an authors table and a series table
  boolean read or not or else a link to a table read status see below
Authors
  Name
Series
  Name of series like Dragons of Pern or Harry Potter
Read status
  started
  finished
  wanted

Linking should be by query.

When you design that table for Series, remember:

The same work can be in two or more series. For example: James Legge's 2
volume series _The Sacred Texts of Daoism_ was also part of Max Muller's
50 volume series _The Sacred Texts of the East_.

Yes I'll just pick one series and stick with it - again this is a hobby project that I'm not going to spend ages thinking about corner cases.

A series can have two or more authors. For example, _Star Trek: Deep
Space Nine_ contains books written by half a dozen different authors.

Books set in the same world need not be part of a series, even though
marketing mistakenly claims that they are a series. For example: Anne
McCaffrey's _P.E.R.N._ novels.

Consider how you'll handle spin-of series. For example: _Star Trek: The
Next Generation_ alongside _Star Trek: Deep Space Nine_, alongside _Star
Trek: Corp of Engineers_, etc.

I won't treat them any differently - even if it's not 100% correct to the author. See above.

Consider how you'll handle trilogies such as Douglas Adams _The
Hitchhiker's Guide to the Galaxy_, (Officially, it is a trilogy in five
parts, or six parts, if _And Another Thing..._ is included in that trilogy.)

For the Read Status table, you might also want to have tags for
"reread", and "acquired".

###

FWIW, the typical "heavy reader" will take 210 years to read a million
pages:
* "Heavy Reader" being defined as reading 17 books per year.
(http://libraries.pewinternet.org/2012/04/04/part-2-the-general-reading-habits-of-americans/)
* A book is defined as 280 pages (64,000 words) long.
(http://www.huffingtonpost.com/2012/03/09/book-length_n_1334636.html)

Yes I know the above - FWIW again this is a hobby so I'm not sticking to "280 pages or more" - that's absurd for a hobby to be so strict. My rule is just that I don't count a book unless I read the entire thing - this is unfortunate as I'm in law school reading thousands and thousands of pages, none of which I'll count But, it's been a 10 year project and I'm not going to quit on it - even if the reality is it'll be very very hard to complete. I have a tracker stats sheet on the spreadsheet file and it shows my current pace and the year I can anticipate finishing . . . and no, it's not within one lifetime

Best,
Joel