Currently Using Spreadsheet for Personal Project - Thinking About Database

Joel_Madero · August 7, 2014, 3:46am

Hi All -

So I have been using spreadsheet for about 10 years to keep track of a goal that I set my first year of college (reading a million pages before I die). That being said, I keep adding things to it and it's becoming more and more complicated. This week I decided I wanted to add yet another thing - I want to track series that I read and for it to generate the "next book in the series" after I complete a book. I think I've figured out how to do this with spreadsheet using match and index but I'm just debating if it's time for me to really sit down and create a database with the info.

The file is located here: https://docs.google.com/file/d/0B2kdRhc960qdQTN1TGQxUXhVX0k

Thoughts much appreciated - I have a bit of experience with Access and am pretty good at "learning by doing" but don't want to change to database just for the sake of changing. Kind of looking for pros and cons. Thanks in advance!

Best,
Joel

Mark_LaPierre · August 8, 2014, 3:39am

Hey Joel,

A database is great for generating reports from your data. They are
also a great way to enforce data consistency which makes reporting so
much easier.

Databases are also a fine way to enforce business rules. Stuff like not
allowing someone to enter a date for completing a process step before
the previous steps have their dates in place.

They are also great configuration management tools. The database can
keep previous versions of a record safe for future reference. Say you
have a widget. The database records all the desired meta-data
concerning the design and creation of the widget. Then it's discovered
that the widget has the wrong size hole in it. The database can save
the original records and create a new record for the new revision. A
database can record what revision of each widget was employed in a
certain application.

Generally there are thousands of widgets all in varying states of
revision and acceptance. Databases are great for tracking change. If
your data set changes over time then a database is a great way to track
that change.

Databases are also better at managing large data sets as you are finding
out with your current spreadsheets.

Have you read the BASE guide yet? There's some really good database
design concepts illustrated there. You really want to structure your
database in a way that you can expand and restructure the database later
when you decide to add features such as the "next in a series" concept
you mentioned above.

I've perused your Google docs spreadsheet. I think it would be quite
easy to import that into a database.

Joel_Madero · August 8, 2014, 4:29pm

Just wanted to bump this to see if anyone has thoughts. Worst case I'm
going to just dive into Database and see what I can learn but I'm hoping to
get a little feedback before spending time which might be totally pointless
as I really don't have the time to just throw away Thanks again in
advance!

Paul16 · August 8, 2014, 4:59pm

Hi Joel,

Well, on the one hand, changing to a database does mean learning
databases, and putting in a certain amount of time to develop it,
although you should be able to pull in the data from the spreadsheet.
There's also no clear gain, if the spreadsheet works for you, what do
you gain by going to a database? And because databases are a little
more structured, it means changes are harder to make. On the other
hand, it does make things a little neater, and more formalised. Keeping
track of things may be easier, at least mentally.

The point at which one should switch from spreadsheet to database is
always hard to find. Personally, I would switch with this, but I
already know databases pretty well, so it would be that much easier for
me to set up, and I'd probably find it easier to make changes and add
functionality.

I find such large spreadsheets unwieldy, because you have to work with
the actual data, whereas databases you work with the data container,
and the database applies that to the actual data. Pulling up the
correct information is simply a matter of writing a simple (for me)
query, whereas in a spreadsheet you have to manually select the correct
cells, type in formula to exclude values, etc., etc.

So basically, if it works, why change it? But if you do change to a
database, you may end up with something that is neater and
cleaner, meaning that it is easier to find the information in all the
data, but maybe slightly harder to just change in an ad-hoc kind of
way, which itself can be both good and bad.

Just my thoughts. If you do go ahead with the database idea, feel free
to ping me with further questions when you get into it. I'd be glad to
help if I can find the time. I know databases pertty well, if not Base
so well.

Paul

Joel_Madero · August 8, 2014, 5:07pm

Hey Paul,

First thanks for your thoughts

Well, on the one hand, changing to a database does mean learning
databases, and putting in a certain amount of time to develop it,
although you should be able to pull in the data from the spreadsheet.
There's also no clear gain, if the spreadsheet works for you, what do
you gain by going to a database? And because databases are a little
more structured, it means changes are harder to make. On the other
hand, it does make things a little neater, and more formalised. Keeping
track of things may be easier, at least mentally.

Neater is exactly what I'm aiming for. Currently it's kind of all over the
place and with me continuously adding things I am always debating where to
put things, how to organize, how to easily view it, etc.. etc... So neater
is a big + for me.

The point at which one should switch from spreadsheet to database is
always hard to find. Personally, I would switch with this, but I
already know databases pretty well, so it would be that much easier for
me to set up, and I'd probably find it easier to make changes and add
functionality.

Sure - I like to learn though so if it's not too steep of a learning curve
(again I've used Access a little) then I think with some dedication I could
get it done.

I find such large spreadsheets unwieldy, because you have to work with
the actual data, whereas databases you work with the data container,
and the database applies that to the actual data. Pulling up the
correct information is simply a matter of writing a simple (for me)
query, whereas in a spreadsheet you have to manually select the correct
cells, type in formula to exclude values, etc., etc.

Unwieldy is where I'm at with this thing, and it's only getting worse

So basically, if it works, why change it? But if you do change to a
database, you may end up with something that is neater and
cleaner, meaning that it is easier to find the information in all the
data, but maybe slightly harder to just change in an ad-hoc kind of
way, which itself can be both good and bad.

It does indeed work and I'm pretty good with spreadsheets generally but -
well at my old job we had someone who had a 1.5 gig spreadsheet (and it was
absurd) and his logic was the same "it works" - I had a hard time not
saying "it works but it doesn't work RIGHT" - I'm starting to feel like my
spreadsheet doesn't work right (from functionality to organization to me
just being happy with it)

Just my thoughts. If you do go ahead with the database idea, feel free
to ping me with further questions when you get into it. I'd be glad to
help if I can find the time. I know databases pertty well, if not Base
so well.

I will most definitely take you up on this. I guess my first step would be
a table that just has:
UniqueID
Author
Book Title
Series Name

That sound right? I'm going to start after work today.

Best,
Joel

P.S. I won't lose functionality that I have in spreadsheet when
using database right? Basically everything you can do in a spreadsheet you
can accomplish in a database?

Paul16 · August 8, 2014, 5:43pm

Hi Joel,

Neater is exactly what I'm aiming for. Currently it's kind of all
over the place and with me continuously adding things I am always
debating where to put things, how to organize, how to easily view it,
etc.. etc... So neater is a big + for me.

Unwieldy is where I'm at with this thing, and it's only getting
worse

It does indeed work and I'm pretty good with spreadsheets generally
but - well at my old job we had someone who had a 1.5 gig spreadsheet
(and it was absurd) and his logic was the same "it works" - I had a
hard time not saying "it works but it doesn't work RIGHT" - I'm
starting to feel like my spreadsheet doesn't work right (from
functionality to organization to me just being happy with it)

Well, sounds like changing to a database is the way to go, then

I will most definitely take you up on this. I guess my first step
would be a table that just has:
UniqueID
Author
Book Title
Series Name

That sound right? I'm going to start after work today.

Well, you have a few choices:

a) you could have one table for books read and one for the books still
to read, and when you've read a book you have the database transfer the
record from the one table to the other.

b) you could have a table for all books, and a flag in the table to
show when it's read

c) you could have a table for all books, and another table with your
rating and such details that gets a record added when you've read the
book

Unfortunately, such choices are the ones we need to make when designing
the database, and it can sometimes be hard to know which is the best
choice.

Your current spreadsheet looks a little like scenario (a) with the two
worksheets, but it's not the best idea for the database, I feel.
Scenario (c) avoids having to have nulls in the table, but the data will
only ever be used joined to the first table, so it essentially comes
down to the same thing. I lean towards scenario (b) myself.

As such, the table would be called Books, and have the following fields:

BookID (or UniqueID, or just ID)
Author
Title
SeriesName

Joel_Madero · August 8, 2014, 5:53pm

Hi Again - and thanks!

Well, you have a few choices:

a) you could have one table for books read and one for the books still
to read, and when you've read a book you have the database transfer the
record from the one table to the other.

b) you could have a table for all books, and a flag in the table to
show when it's read

c) you could have a table for all books, and another table with your
rating and such details that gets a record added when you've read the
book

So my thought was to have three tables -
*Table - BooksRead*
Author
Book
SeriesName (can allow Null because most books are not in a series)
Pages

*Table - Series*
SeriesName (not unique per row)
BookName (This + SeriesName would be unique)
Read (binary true/false auto set dependeing on if the book is in BooksRead
or not)

*Table - Rating*
Each category (ie, story, ending, character development, etc...)
Overall - auto calculated based on combo of other values
(here I'd love to have some kind of graphical representation of my ratings,
like stars, incremented by half star, 1-5 star total)

*So then what would be left is:*
Figuring out how to summarize my data like I do in the spreadsheet (pages
read per year, pages remaining, pages read per month, etc....)

Figuring out how to get some printout of "next book in series" - I suspect
some kind of query would work that checks in "Series" table and finds the
"next" book in a given series that does not have a true value in "Read"
field. In spreadsheet I was going to use match/index to accomplish this.

Other cool things like summarizing my opinions of particular authors, a
list of "want to read" books independent of the series. Possibly add genre
somewhere and a randomized "suggest a book" thing.

At some point I'd love to figure out how to pull series data from some
source so I don't have to manually add books that are in a series. Some
series have 50+ books, entering them all manually might be brutal

Sound feasible?

Best,
Joel

Paul16 · August 8, 2014, 6:33pm

Hi Joel,

So my thought was to have three tables -
*Table - BooksRead*
Author
Book
SeriesName (can allow Null because most books are not in a series)
Pages

*Table - Series*
SeriesName (not unique per row)
BookName (This + SeriesName would be unique)
Read (binary true/false auto set dependeing on if the book is in
BooksRead or not)

This just duplicates information in the BooksRead table, not a good idea

*Table - Rating*
Each category (ie, story, ending, character development, etc...)
Overall - auto calculated based on combo of other values
(here I'd love to have some kind of graphical representation of my
ratings, like stars, incremented by half star, 1-5 star total)

Calculating the overall value may be tricky, depending on how the other
fields are done, and almost impossible if you want to give descriptive
ratings like you have in places in the spreadsheet. Once you've gotten
it down to a number, though, showing stars instead of the number is
doable, I'm just not sure how easy in Base.

Also, with the above setup, you don't have anywhere to record books you
want to read (except in the series table, and so only books in a
series), like you have in the spreadsheet.

*So then what would be left is:*
Figuring out how to summarize my data like I do in the spreadsheet
(pages read per year, pages remaining, pages read per month, etc....)

That would be some form of report, and once the data is in proper
tables, fairly easy to do.

Figuring out how to get some printout of "next book in series" - I
suspect some kind of query would work that checks in "Series" table
and finds the "next" book in a given series that does not have a true
value in "Read" field. In spreadsheet I was going to use match/index
to accomplish this.

Something like select all the books in the series table that don't have
an entry in the read books table (going by name, author and series),
ordering that list by the number of the book in the series, and showing
only the first entry of the resulting list, should do it.

Maybe not a simple query, but pretty easy to whip up if you know a
little SQL.

Other cool things like summarizing my opinions of particular authors,
a list of "want to read" books independent of the series. Possibly
add genre somewhere and a randomized "suggest a book" thing.

All easier to do in a database than a spreadhseet.

At some point I'd love to figure out how to pull series data from some
source so I don't have to manually add books that are in a series.
Some series have 50+ books, entering them all manually might be
brutal

Hrm, that may require a little actual programming...

Joel_Madero · August 8, 2014, 6:38pm

Not really seeing duplicated info here. I was thinking that I'd enter most
of my data in the BooksRead table - when I enter it would have a binary "is
this book in a series" - if I push yes then it would pop up a dialog for me
to enter the series info if it's not already in the series table and
possibly ask "do you want to enter all books in this series now?" so then
it would populate the series table after I enter the single book in
"BooksRead"

Best,
Joel

Paul16 · August 8, 2014, 6:50pm

Hi Joel,

> Hi Joel,
>
>
>
> > So my thought was to have three tables -
> > *Table - BooksRead*
> > Author
> > Book
> > SeriesName (can allow Null because most books are not in a series)
> > Pages
> >
> > *Table - Series*
> > SeriesName (not unique per row)
> > BookName (This + SeriesName would be unique)
> > Read (binary true/false auto set dependeing on if the book is in
> > BooksRead or not)
>
> This just duplicates information in the BooksRead table, not a good
> idea
>

Not really seeing duplicated info here. I was thinking that I'd enter
most of my data in the BooksRead table - when I enter it would have a
binary "is this book in a series" - if I push yes then it would pop
up a dialog for me to enter the series info if it's not already in
the series table and possibly ask "do you want to enter all books in
this series now?" so then it would populate the series table after I
enter the single book in "BooksRead"

Yes, but what would go into the Series table? SeriesName and BookName
are already in the BooksRead table, and the read flag just indicates
that it is in the BooksRead table.

The only time there wouldn't be duplication is if you entered books into
the series table that you hadn't read yet.

Paul

Joel_Madero · August 8, 2014, 7:02pm

Yes, but what would go into the Series table? SeriesName and BookName
are already in the BooksRead table, and the read flag just indicates
that it is in the BooksRead table.

So the Series Table would have all books in a series and this way I could
query the next book to be read in a series. So for an example:

Book Table:
Harry Potter and the Sorcerer's Stone (...) SeriesName="Harry Potter"

Series Table:
Harry Potter | Harry Potter and the Sorcerer's Stone (Read)
Harry Potter | Harry Potter and the Chamber of Secrets (Unread)
Harry Potter | Harry Potter and the Prisoner of Azkaban (Unread)
....

Then if I set up some kind of a report that told me the next book in the
series, when I query "Harry Potter" it would give me "Harry Potter and the
Chamber of Secrets" (and possibly list all other books in the series that
are not yet read, or list all books in the series, highlight the ones that
are unread, or some such thing)

What I don't want is my main table to include both read and "want to read"
books - I like keeping the "what I've read" completely separate as this is
what I'm most interested in for tracking purposes (to meet my goal) and
what not.

Best,
Joel

Paul16 · August 8, 2014, 7:27pm

Hi Joel,

>
> Yes, but what would go into the Series table? SeriesName and
> BookName are already in the BooksRead table, and the read flag just
> indicates that it is in the BooksRead table.
>

So the Series Table would have all books in a series and this way I
could query the next book to be read in a series. So for an example:

Book Table:
Harry Potter and the Sorcerer's Stone (...) SeriesName="Harry Potter"

Series Table:
Harry Potter | Harry Potter and the Sorcerer's Stone (Read)
Harry Potter | Harry Potter and the Chamber of Secrets (Unread)
Harry Potter | Harry Potter and the Prisoner of Azkaban (Unread)
....

In this case, one book would be duplicated, and duplication is
usually a bad idea.

It would be better to put all the books into a single Books table.

Then if I set up some kind of a report that told me the next book in
the series, when I query "Harry Potter" it would give me "Harry
Potter and the Chamber of Secrets" (and possibly list all other books
in the series that are not yet read, or list all books in the series,
highlight the ones that are unread, or some such thing)

This can still be done if they are all in a Books table.

What I don't want is my main table to include both read and "want to
read" books - I like keeping the "what I've read" completely separate
as this is what I'm most interested in for tracking purposes (to meet
my goal) and what not.

Well, this is a big deal in a spreadsheet, where all that data is just
smooshed together and you have trouble visually separating it, but in a
database this isn't actually a problem. By design, tables are meant to
hold lots of data like this, and you query just the data you want at
any given time. It is better to design it as one big table, and pull
out only the data you want to see, than to separate it into multiple
tables that hold duplicate data. With duplicate data you are not only
wasting storage space (perhaps very little in this case, but in
principle a bad thing), but also bringing in the possibility of that
data differing in the different locations in which it is stored,
and then which copy do you trust?

Note that any data can be corrupted, but with multiple copies of the
data in a database, differing programs or a bug in your program can
cause the data to differ.

Your scenario also doesn't account for non-series books that you
haven't read. If that is yet a third table, then things are getting
messy. So the best design is one table for books both read and unread,
including series information, but if you really want to separate them
out, then one table for read books, and one for unread books, both
including series information, is the best way to go, I think, with the
front-end transferring the books between the two tables when you've read
them.

Paul

Joel_Madero · August 8, 2014, 8:04pm

Hey Paul,

In this case, one book would be duplicated, and duplication is
usually a bad idea.

It would be better to put all the books into a single Books table.

I'm here to learn so I'll take your advice and do a single table.

> Then if I set up some kind of a report that told me the next book in
> the series, when I query "Harry Potter" it would give me "Harry
> Potter and the Chamber of Secrets" (and possibly list all other books
> in the series that are not yet read, or list all books in the series,
> highlight the ones that are unread, or some such thing)

This can still be done if they are all in a Books table.

+1

> What I don't want is my main table to include both read and "want to
> read" books - I like keeping the "what I've read" completely separate
> as this is what I'm most interested in for tracking purposes (to meet
> my goal) and what not.

Well, this is a big deal in a spreadsheet, where all that data is just
smooshed together and you have trouble visually separating it, but in a
database this isn't actually a problem. By design, tables are meant to
hold lots of data like this, and you query just the data you want at
any given time. It is better to design it as one big table, and pull
out only the data you want to see, than to separate it into multiple
tables that hold duplicate data. With duplicate data you are not only
wasting storage space (perhaps very little in this case, but in
principle a bad thing), but also bringing in the possibility of that
data differing in the different locations in which it is stored,
and then which copy do you trust?

Sounds reasonable.

So questions (again thank you much for your guidance, I'm definitely
starting the process tonight):

when I make the form to enter new data (ie. a new read book) can it be
designed to where it searches the current books in Books table and then if,
and only if, it doesn't find a match (based on author + book title), it
starts a new form for me to enter new data?

For a series, is there a way for me to do like "enter a series" and then it
would prompt ("how many books in this series") and when I put in 5, it
would then auto fill author + series name, so all I have to do is enter
book titles for the 5 books and all 5 would be added to the Books table?

Reading books twice - I don't do this often, but when I do, I expect the
pages to count twice. Best way to accomplish this is to have another field
called "Count" and then page count would be Page Count * Count (# of times
read)?

I'm getting excited about this as I've come up with another cool idea that
might be easy. I'm going to link books to the library, so for "next book in
series" would be a link that I could click and easily put on hold at the
library I think that this would be both awesome and relatively easy
(assuming the library has a standard method of searching which I suspect it
does).

Best,
Joel

Joel_Madero · August 8, 2014, 8:14pm

Hi Mark -

So crazy, I found your email response on my phone but for the life of me I
can't find it in either gmail or thunderbird (makes me concerned I'm losing
emails )

That being said I figured I'd just reply to your comments. First thanks so
much for taking the time to respond. As you might have seen, Paul has
convinced me that a database is the way to go and I have some ideas as to
how I'll begin the process. All of the benefits you provided seem like
additional proof that my spreadsheet is no longer the best way to keep
track of my life time goal of a million pages.

What did you mean by "generally there are thousands of widgets all in
varying states of revision and acceptance." What are these widgets and how
do I find them?

For the last comment - I have not read the BASE guide. I know that as a
Linux user this is horrible for me to say but I'm really a "learn by doing
and failing and repeating" than a "read a book." This is for quite a few
reasons but the main one being that I don't have the time to read a book
where much of the info will be superfluous. I like pinging the list and
jumping in the IRC channel and asking specific questions and crossing my
fingers someone is nice enough to give me bite sized chunks of info that I
can work with. The most frustrating thing as a non Computer Science person
is jumping into a IRC chat (one in particular comes to mind) asking a
simple question and have someone say "read this 50 page manual" when
usually it's a 1 time thing that I need it for and usually it boils down to
a single command or a few at most that I need to run. Additionally, half
the time I don't understand the books so I have to do a ton of googling to
interpret what I'm reading. I'm a soon to be lawyer and I really just don't
have the time to spend reading a ton of documentation (that being said, I
love our documentation team, and they do in fact do incredible work).

Thanks again!

Best,
Joel

Paul16 · August 8, 2014, 8:23pm

Hi Joel,

In addition to the below, you'll need to think about what database you
want to use. With Base, you can use the built-in HSQLDB, but from all
reports you really don't want to do that. I'd suggest setting up
something like PostgreSQL or MariaDB on your machine, and using that
with Base as the frontend.

I'm not sure how much you know about this stuff or how closely you've
been following the discussions on this list regarding Base database
backends, but it isn't really that hard from what I gather, and it's
best to start off with a good setup, although you can opt to start off
simply with the builtin backend and migrate later, but that will mean a
little more work down the line.

when I make the form to enter new data (ie. a new read book) can it be
designed to where it searches the current books in Books table and
then if, and only if, it doesn't find a match (based on author + book
title), it starts a new form for me to enter new data?

This probably requires a macro in Base, so I'm not sure exactly how
complicated this will get, but yes, it should be able to do this. With
constraints on the database (like a unique key on the author and
title) it should prevent you actually entering duplicate data (barring
a misspelling), but you of course want to know about that situation
before filling in the whole form, so probably a macro, or a button to
check the table for the book, before you continue filling in the form.

It should also be easy enough to make a search screen to find books in
the table, which is probably something you want anyway. So you could do
this before entering books that might already be in there, but that is
then a two step process, instead of the more straightforward process as
described above, although probably slightly simpler to set up.

This is something I would leave out for now, just get it working first,
then add these sorts of helpful things once the basic design is working.

For a series, is there a way for me to do like "enter a series" and
then it would prompt ("how many books in this series") and when I
put in 5, it would then auto fill author + series name, so all I have
to do is enter book titles for the 5 books and all 5 would be added
to the Books table?

This should be doable. I'm not sure exactly how to do it in Base, but it
is probably simplest with something like two entry forms, one that
just enters a single book, and one that takes an author and series name,
but a list of titles. Or maybe just a button on the entry form that says
"Add next in series", that would re-bring up the entry form with those
fields prepopulated.

Again, probably just get a single book entry screen going first, then
add or adapt it to do series.

Reading books twice - I don't do this often, but when I do, I expect
the pages to count twice. Best way to accomplish this is to have
another field called "Count" and then page count would be Page Count
* Count (# of times read)?

Best to call this something like ReadCount. You want your names to be
descriptive, without being horrendously long.

So the table would have a PageCount and a ReadCount, and any report
that you design would have a field called ReadPageCount, that would
show PageCount * ReadCount, but it wouldn't be stored in the database
(no need, and it would again be a case of duplication).

I'm getting excited about this as I've come up with another cool idea
that might be easy. I'm going to link books to the library, so for
"next book in series" would be a link that I could click and easily
put on hold at the library I think that this would be both awesome
and relatively easy (assuming the library has a standard method of
searching which I suspect it does).

You would need to have some way of interfacing with the library system,
but yes, assuming they have such a thing, it can be done. Again, I'm
unsure of the limits with Base, but it is probably doable through a
macro. This might be a bit advanced, I'm not sure. Best to leave it
till all the rest is working

Paul

mariosv · August 8, 2014, 10:30pm

Hi Joel,

please don't think in reading the documentation like wasted time but an
invested time. It can really help to learn the basis, which I'm sure will be
useful for ever, specially what is learned about SQL. And IMO without at
least a quick read not easy even to ask for help.

With the documentation there is a complete sample used in it Database used
in Base Handbook
<https://wiki.documentfoundation.org/images/a/af/Media_without_Macros.odb>

There are also a good help and samples in the English and Spanish ApacheOO
forum
Database examples
<https://forum.openoffice.org/en/forum/viewforum.php?f=100>
Ejemplos bases de datos.
<https://forum.openoffice.org/es/forum/viewforum.php?f=70>

Thanks Paul for their good advices.

Miguel Ángel.

Joel_Madero · August 9, 2014, 2:00am

Sure I truly didn't mean that. I mean I literally don't have the time (as in I put in about 75-80 hour work weeks + I have a wife whom I try to spend time with as much as possible + my additional LibreOffice duties both in QA and otherwise. So for me, I just can't read all the documentation for each component to learn every thing there is to learn so that I can use 1% of it for one hobby project. Just not going to happen. Please don't think I find reading documentation generally to be a waste of time.

Best,
Joel

Mark_LaPierre · August 9, 2014, 5:17am

The BASE file is actually a zip file that contains the individual files
for each table in the database. If you use the built in HSQLDB while
you develop your database you can later open the BASE file in an archive
manager of your choosing and extract those tables into an external HSQLDB.

What I'm getting to here is that you don't have to set up a separate
back end DB while you are developing your database. You can use local
tables for all that until you get the database to behave the way you
want it to. Only then will you have to consider splitting your DB into
a front end that contains all the forms, reports, and Basic code, and a
back end that contains the tables that hold your data.

When you're ready to split your DB, probably months from now, I can find
the instructions for you.

Oogie_McGuire · August 10, 2014, 4:58pm

I havne't used Base, but I have taken a very complex spreadsheet and designed an SQL database from it when it became unwieldy. Key things I'd suggest, more tables is not bad, in fact the more the better if it clearly separates data into one table. Do not duplicate data. If you have a field whose contents are duplicated then that really probably needs to be a separate table.

2 books I found invaluable for helping me design my system are

Beginning Database Design by Clare Churcher and Beginning SQL Queries also by Clare Churcher. They were the most readable and understandable of the lot.

If I were doing your system I'd do the following:
Book table
  Title
  number of pages
  Foreign key links to an authors table and a series table
  boolean read or not or else a link to a table read status see below
Authors
  Name
Series
  Name of series like Dragons of Pern or Harry Potter
Read status
  started
  finished
  wanted

Linking should be by query.

Just wanted to bump this to see if anyone has thoughts. Worst case I'm
going to just dive into Database and see what I can learn but I'm hoping to
get a little feedback before spending time which might be totally pointless
as I really don't have the time to just throw away Thanks again in
advance!

Eugenie (Oogie) McGuire
Desert Weyr, LLC - Black Welsh Mountain Sheep http://www.desertweyr.com/
LambTracker - Open Source SW for Shepherds http://www.lambtracker.com
Paonia, CO USA

Paul16 · August 10, 2014, 5:39pm

Really good suggestions!

Do not duplicate data. If you have a field whose contents are
duplicated then that really probably needs to be a separate table.

Yes, true. Note that if a field is duplicated, is suggests
consolidating into one table, but if the contents of a field are
duplicated, it suggests splitting it off into its own table.

If I were doing your system I'd do the following:
Book table
  Title
  number of pages
  Foreign key links to an authors table and a series table
  boolean read or not or else a link to a table read status see
below
Authors
  Name
Series
  Name of series like Dragons of Pern or Harry Potter
Read status
  started
  finished
  wanted

Having a separate table for Authors and Series is a good idea. I didn't
want to confuse the OP with a whole bunch of tables straight off the
bat, but you're dead right, it is probably best to get to grips with
this now, rather than need it later and not have it designed in.

And a status table allowing more than just read or unread is a good
idea too. It adds flexibility to the system.

This does make the system a little more complicated. You will have
more forms, more dropdowns on the book entry form, and will have to
remember to do things like add the authors before you add their books,
but this added complexity will pay off in terms of extra flexibility
later.

This is also really the big conceptual difference between a spreadsheet
and a database. In a spreadsheet you start just capturing a few books,
and just retype things like author names, and then you end up with lots
of books, and things start getting harder and harder to keep track of.
In a database you start off by separating everything out into different
bits, then combine them in the different ways you need them, and
everything is slightly more complicated (especially to begin with), but
actually easier to work with in the long run.

To Joel (the OP):
You'll need to read up on one-to-many and many-to-many relationships if
you aren't already familiar with them.

Some of the less used columns in you spreadsheet, like "Loc", I'm not
clear on what they're used for and where they should go, and rating
has been used in different ways, so again I'm not sure how you intend
to use that. You do kind of need to decide on one way and stick to it
throughout. An advantage of spreadsheets is that you can just do stuff
in that sort of ad-hoc manner as it seems appropriate to the individual
instance, but in databases, because of the extra structure, you need to
stick to one way of doing things. But that does make it more
consistant, and therefore easier to work with in the long run.

Paul