Other SQL question as it affects Base

elderdanlewis · July 28, 2012, 7:36pm

An example of an inner join:
"table1" INNER JOIN "table2" ON "table1"."foreign key"="table2"."primary key"

Mathematically the following seems to be possible:
"table1" INNER JOIN "table2" ON "table1"."foreign key"=2*"table2"."primary key"

Is the latter also an INNER JOIN? There seems to be other possible multipliers other than 2. Also, what about inequalities? For example, wanting to see all the rows of data for which the foreign key is less than the MAX, MIN, or Ave of the primary key. (These are three suggested "ON expressions" for an INNER JOIN. Using greater than rather than less than is another possibility.)

I'm curious since I have a mathematical background. Any ideas?

--Dan

planas · July 28, 2012, 9:45pm

Dan

If understand the keys they are not required to be integers though they usually are. The major requirement for a key is that it is unique and only exist once in the table. Keys can include multiple columns. So multiplying a key would often not make sense. I believe you can add more requirements in the ON clause.

The reason keys are often integers is that one often defines the key as integer with automatic incrementing when a new row is added.

drewjensen · July 29, 2012, 3:42am

Absolutely
- or even odder things, often with a where

tbl1 as t1 inner join tbl2 as t2
on t1.ID = t2.FK
where mod( t2.ID,2 ) <> 0

back to watching men's floor exercise then..

//drew

Mark_Stanton · July 29, 2012, 12:02pm

Keys are intended for the internal workings of the database, they are
not meant to be used like this.

What you want is the WHERE clause, or occasionally the HAVING clause,
matching or excluding rows based on their data.

Keys are NOT data and should not be used as such.

Regards
Mark Stanton
One small step for mankind...

elderdanlewis · July 30, 2012, 12:08am

Mark Stanton wrote:

Keys are intended for the internal workings of the database, they are not meant to be
used like this.

What you want is the WHERE clause, or occasionally the HAVING clause, matching or
excluding rows based on their data.

Keys are NOT data and should not be used as such.

Regards Mark Stanton One small step for mankind...

I have a table, Reading, that contains three fields: ID, Date, and Meter. Date is the
date the meter is read, and Meter is the meter reading.
I want to know how much electricity (KWH used) I have used each month. The query below will provide that information:

SELECT "Ending"."Date", "Ending"."Meter" -"Beginning"."Meter" AS "KWH Used" FROM "Power". "Reading" AS "Beginning" LEFT JOIN "Power". "Reading" AS" Ending" ON "Ending"."ID" =1 + "Beginning"."ID"

Perhaps you are right about keys being intended for the internal workings of the database. While I don't have an example of one key being the multiple of the other key, it is possible. It just might not have any practical applications.

--Dan

planas · July 30, 2012, 2:54am

Mark Stanton wrote:

Keys are intended for the internal workings of the database, they are not meant to be
used like this.

What you want is the WHERE clause, or occasionally the HAVING clause, matching or
excluding rows based on their data.

Keys are NOT data and should not be used as such.

Regards Mark Stanton One small step for mankind...

I have a table, Reading, that contains three fields: ID, Date, and Meter. Date is the
date the meter is read, and Meter is the meter reading.
I want to know how much electricity (KWH used) I have used each month. The query below will provide that information:

SELECT "Ending"."Date", "Ending"."Meter" -"Beginning"."Meter" AS "KWH Used" FROM "Power". "Reading" AS "Beginning" LEFT JOIN "Power". "Reading" AS" Ending" ON "Ending"."ID" =1 + "Beginning"."ID"

Try the following

SELECT date, meter
FROM Reading
WHERE date = initial date OR date = ending date (date = '4/1/2012' OR date = '4/30/2012' - using correct date format for your system)

Perhaps you are right about keys being intended for the internal workings of the database. While I don't have an example of one key being the multiple of the other key, it is possible. It just might not have any practical applications.

Keys are intended in an relational database to uniquely identify each row of data. While often being integers any unique string or group of columns can be a key. For example, if there is only one date entry for each date you could use the date as a key. The reason integers are typically used is that they allow the possible multiple entry of a date in a table because each entry would assigned a different integer. The key value id = 100 has no required mathematical relation to the key value id = 200, mostly likely all this means is there are 100 entries since 100th entry. Otherwise there may no connection be the two entries.

For example, two companies could have the same name - Fred's Wholesale Produce but be located in different states (Kentucky and Indiana for example). A key of company name and state is possible but often awkward to use when accessing other tables such as orders or purchases. Integer id's allow for accessing other tables by using on field which happens to a unique integer.

An Orders table might have the following initial columns: id, id_vendor, id_PO_items PO, Date etc. The id is a unique key for the Orders table, the id_vendor refers to the id in a separate vendor table which might have the vendor contact information. The Id_PO_items might refer to a table which lists each item order with appropriate vendor information.

elderdanlewis · July 30, 2012, 3:16pm

Jay Lozier wrote:

Mark Stanton wrote:

Keys are intended for the internal workings of the database, they are not meant to
be used like this.

What you want is the WHERE clause, or occasionally the HAVING clause, matching or
excluding rows based on their data.

Keys are NOT data and should not be used as such.

Regards Mark Stanton One small step for mankind...

I have a table, Reading, that contains three fields: ID, Date, and Meter. Date is
the date the meter is read, and Meter is the meter reading. I want to know how much
electricity (KWH used) I have used each month. The query below will provide that
information:

SELECT "Ending"."Date", "Ending"."Meter" -"Beginning"."Meter" AS "KWH Used" FROM
"Power". "Reading" AS "Beginning" LEFT JOIN "Power". "Reading" AS" Ending" ON
"Ending"."ID" =1 + "Beginning"."ID"

Try the following

SELECT date, meter FROM Reading WHERE date = initial date OR date = ending date
(date = '4/1/2012' OR date = '4/30/2012' - using correct date format for your system)

Sorry, but this does not give the same results as my query. Yours will give the
date the meter is read and the reading for each date in the "Reading" table for the two
listed in the WHERE clause. The question the query is suppose to answer is How many KWH am
I using each month? Your query does not answer this question. In my table, I have 41 rows
of data. When I run my query, I get 40 rows as my output. Your query has only 2 rows in
the output.
Here is the first 4 rows of the table.

ID End Date Ending
1 04/16/09 16717
2 05/15/09 17125
3 06/17/09 17418
4 07/20/09 17735

I Added some things to my query:
SELECT "Beginning"."Date", "Ending"."Date", DATEDIFF( "Ending"."Date", "Beginning"."Date" ) AS "Days", "Ending"."Meter" - "Beginning"."Meter" AS "KWH used", ( "Ending"."Meter" - "Beginning"."Meter" ) / DATEDIFF( "Ending"."Date", "Beginning"."Date" ) AS "KWH/Day" FROM "Power"."Reading" AS "Beginning" LEFT JOIN "Power"."Reading" AS "Ending" ON "Ending"."ID" = "Beginning"."ID" + 1

WARNING: This was done using Base as the front end and MySQL 5.1.x as the back end. If you try this using a Base embedded database, you have to change the DATEDIFF() to
DATEDIFF('dd', "Ending"."Date", "Beginning"."Date")

The results of my "new" query are:
Date Date Days KWH used KWH/Day
04/16/09 05/15/09 29 408 14.069
05/15/09 06/17/09 33 293 8.8788
06/17/09 07/20/09 33 317 9.6061

Perhaps you are right about keys being intended for the internal workings of the
database. While I don't have an example of one key being the multiple of the other
key, it is possible. It just might not have any practical applications.

Keys are intended in an relational database to uniquely identify each row of data.
While often being integers any unique string or group of columns can be a key. For
example, if there is only one date entry for each date you could use the date as a key.
The reason integers are typically used is that they allow the possible multiple entry
of a date in a table because each entry would assigned a different integer. The key
value id = 100 has no required mathematical relation to the key value id = 200, mostly
likely all this means is there are 100 entries since 100th entry. Otherwise there may
no connection be the two entries.

For example, two companies could have the same name - Fred's Wholesale Produce but be
located in different states (Kentucky and Indiana for example). A key of company name
and state is possible but often awkward to use when accessing other tables such as
orders or purchases. Integer id's allow for accessing other tables by using on field
which happens to a unique integer.

An Orders table might have the following initial columns: id, id_vendor, id_PO_items
PO, Date etc. The id is a unique key for the Orders table, the id_vendor refers to the
id in a separate vendor table which might have the vendor contact information. The
Id_PO_items might refer to a table which lists each item order with appropriate vendor
information.

--Dan

      Basic theory: relational databases are based upon Algebra including the Algebra of
Sets (unions, intersections, Cartesian products, Subtraction of sets, etc.). In fact some of the early complaints about the structure of relational databases was that it was too mathematical.
      Observations: JOIN ON clause is based upon the Algebra of Sets. The ON expression is algebraic. The WHERE clause is also algebraic.
      My point: If there is an algebraic expression that describes the relationship between a primary key and its foreign key, then that expression can be used as the ON expression. If further restrictions are needed, then these belong in the WHERE clause.
      This is about the theoretical rather than what has been done. I'm looking at the possible rather than the traditional.

Mark_Stanton · July 30, 2012, 4:42pm

As far as I remember it's explicitly stated in the SQL definitions

Id fields should not contain data entered by users

You can't rely on them not least because often they're automatically
generated by the system. They bear no necessary relationship to the
data.

Mark

Mark_Stanton · July 30, 2012, 6:05pm

Ok, this is, then, quite an interesting idea.

The problem is that you've got nothing reliable in your data to
relate one date to the next (by the looks of it). You're only taking
readings at roughly monthly intervals.

A good, purely SQL, solution to this would be to code a stored
procedure that is guaranteed to hand out successive numbers, and then
you can use that to do the arithmetic you've done already. You'd
want another table that stores the current value of your counter, and
your entry data process would get the next number for your next entry
and update the table. You could do it by finding the maximum number
already allocated in your table and add one (or whatever).

If you're prepared to bet the ranch on having one reading every month
without fail, you could find the lowest date and then generate an
integer singly increasing index on month's since first date. That'd
be two more SQL selections.

The problem with what you've done is that there's no guarantee that
it'll continue to work. It probably will, by the looks of it, but
it's only a fortunate side effect of something that is not
guaranteed.

In a business environment you'd get fired for something like that,
and quite rightly! In an amateur environment, you're getting away
with it, which might be good enough for you.

The next alternative is pulling the data in date order (yes, I know
it's almost certainly in date order already, but if you don't make
sure then you can never enter historical data [you're sure to miss
entry for some reason some day]), and then run through the table
doing the sum between last month's data and this month's.

Of course, if you do ever enter data out of order the first strategy
here won't work, or you'd have to manually reallocate your sequence
numbers.

Regards
Mark

mariosv · July 30, 2012, 6:07pm

I have used a sentence like next with Firebird, but I don't know if your database lets to use SELECT like a field.

SELECT R_End.Date,
   (SELECT R_Beg.Date FROM Reading AS R_Beg WHERE R_Beg.Date
            = (SELECT DISTINCT MAX(R_Look.Date) FROM Reading AS R_Look
                WHERE (R_Look.Date < R_End.Date))) AS Beginning_Date
        R_End.Meter, (R_End.Meter -
          (SELECT R_Beg.Meter FROM Reading AS R_Beg WHERE R_Beg.Date
            = (SELECT DISTINCT MAX(R_Look.Date) FROM Reading AS R_Look
                WHERE (R_Look.Date < R_End.Date)))) AS KwH_used
  FROM Reading as R_End

(SELECT DISTINCT MAX(R_Look.Date) FROM Reading AS R_Look
WHERE (R_Look.Date < R_End.Date))
Search the maximum date before the actual row date, which lets to get the previous values to the actual row date.

Miguel Ángel.

  * Inglés - detectado
  * Inglés
  * Español
  * Gallego
  * Italiano

  * Inglés
  * Español
  * Gallego
  * Italiano

<javascript:void(0);>

elderdanlewis · July 30, 2012, 7:25pm

Mark Stanton wrote:

As far as I remember it's explicitly stated in the SQL definitions

Id fields should not contain data entered by users

You can't rely on them not least because often they're automatically generated by the
system. They bear no necessary relationship to the data.

Mark

      Somewhere there is disconnect. We seem to be talking at each other.
The table I put in my earlier email had three fields: the primary key, ID, which was automatically generated by Base, the date the meter was read, and the reading. The field, ID never has contained data entered by me. It is in First Normal, Second Normal, etc. Form.
      When use a left join to join this table to itself, the first three fields of each row are the three fields of the table and the values of the ID field are unique. What about the other three fields of each row of the query? For each row, ID in the fourth column is always one larger than the ID in the first column.
      What relationship does the values of ID have with the values of the dates and the meter readings? None. I'm only using the two ID fields to place the rows of data where I want them to be.
      It so happens that the ID field consists of integers. More importantly, being integers, they can be used to order the rows. This is want I did with them: order the data. I just used arithmetic to say what that order is.
      Concerning the query that I posted earlier: the values I got using it are identical to the values I have in a spreadsheet to track these statistics. I have 41 rows in the table resulting with 40 rows of query output. That is 200 cells containing data which match to the data in the spreadsheet.
      Here is where I think we are not paying attention to each other. The usual ON expression involves setting the primary and foreign key as equal. I'm saying that there are situations that arise between tables in which equality in the primary-foreign key pair is not what is required.
      I have given an example for which a simple arithmetic equation works. In fact, I think there is a class of queries that fit the case: "primary key" = "foreign key" + n where n is a positive integer and the table is joined to itself. When a company compares weekly, monthly quarterly, or annual sales year over year, they are going to use something very similar to this. This can be used to determine quarter over quarter, etc. as well. How else do you create a query that has weekly data for a store for last year on the left and the data for the same store for this year on the right? [I would use "foreign key" = "primary key" + 52 in a LEFT OUTER JOIN.]
      Now as far as using more complex algebraic expression in a JOIN, I agree that this is a "little far out". It really depends upon how the primary and foreign keys are generated.
      For example, the primary key could be the positive integers (counting numbers). The foreign key could be the cubes of the positive integers. Then if we want to join the rows of one table with the rows of another table so that the foreign key is the cube of the primary key, we would have to use "foreign key" = "primary key" cubed in the ON expression. Now I agree we don't usually use the cube of positive integers as the values of a foreign key. So this is definitely "far out". My point is that it does meet the requirements for a query that could produce useful information. Just because the first four values for the foreign key is 1, 8, 27, and 64, that does not mean these are not distinct values that could not be used to order the rows of a table by them.
      For the most part, people will continue to use positive integers with unit increments in their primary keys. This means the foreign keys will have the same file type. For them, what I have written will probably make little sense. And they may not need any of the other possibilities. And I also agree, most of these possibilities serve no real purpose based upon how we do things now. Some of these primary key properties we have mentioned might be very difficult to impossible to define for a table.
      From the beginning, my only purpose is to question as to what is possible given the characteristics of the primary and foreign keys. Again, I realize that what is possible may not be what we would want to do. It may not serve a useful purpose even though it works and it follows all the rules of SQL.

--Dan

elderdanlewis · July 30, 2012, 9:44pm

Mark Stanton wrote:

Ok, this is, then, quite an interesting idea.

The problem is that you've got nothing reliable in your data to relate one date to the
next (by the looks of it). You're only taking readings at roughly monthly intervals.

Once a month, I get a bill from the local power company that contains the following data: the date the meter was read the previous billing cycle, its reading, the date the was read for this billing cycle, its reading, and the amount due. I would think this is very reliable: they want to get paid. And with bill in hand, it is not very difficult to enter the data into the table. If the accounting people do not do this on a regular basis, they are not likely to have a job. For an individual, it is all about diligence. Use it, and the results will be accurate and reliable. Without, your right, the results are useless. [Garbage IN Garbage Out.]

A good, purely SQL, solution to this would be to code a stored procedure that is
guaranteed to hand out successive numbers, and then you can use that to do the
arithmetic you've done already. You'd want another table that stores the current value
of your counter, and your entry data process would get the next number for your next
entry and update the table. You could do it by finding the maximum number already
allocated in your table and add one (or whatever).

This is far more complex than what I have created.

If you're prepared to bet the ranch on having one reading every month without fail, you
could find the lowest date and then generate an integer singly increasing index on
month's since first date. That'd be two more SQL selections.

I agree about databases in general of this type. But it seems to me that not having accurate data in is a problem that should be handled in another way. The query is designed for using available data to provide answer to the person who uses it. It can only give as good information as what is given it. I'm very diligent when it comes to data like this. I'm well aware of the phrase GIGO. I make sure that correct information is entered when it needs to be entered. (I don't put garbage in.) So, I don't get garbage out.

The problem with what you've done is that there's no guarantee that it'll continue to
work. It probably will, by the looks of it, but it's only a fortunate side effect of
something that is not guaranteed.

Strange you should write this. The query matches algebraic equations. I'm just not lucky: I am using the laws of algebra. It will always work. Yes, the problem is in the accuracy of the entered data. Diligence is the key. Addendum: While I have not added it yet, I kept track of the probably amount for this billing cycle using today's date, today's reading, the average KWH usage per day, the number of days, and the cost per KWH for the previous billing cycle. I update this more than once a week.

In a business environment you'd get fired for something like that, and quite rightly!
In an amateur environment, you're getting away with it, which might be good enough for
you.

How wrong you are! I too have some business experience. Mine proves you wrong. I used this process on paper during the years 1981 and 1982. During each year. I used four columns: week ending date last year, sales for the week ending on the date, week ending date this year, and sales for the week ending on the date. In my dry cleaning store, the week ended on Mondays. The first record had data for 1980 (by another manager) and 1981. The second had data for 1981 and 1982. The first proved that I was doing quite well as compared to the previous manager (1981 vs 1980). I knew this by mid summer, but the home office (in another city) did not know until about 3 months later. (There was a 37.5% increase in sales. That was nice.) The second indicated a drop in sales (the Regan recession). I saw that a couple of months before it was noticed in the home office. I knew what 4 other managers in the Baton Rouge area did not know because I had the data.
Why did the owner of the store visit my store to see my data every time he was in Baton Rogue? I made sure the data was always kept up to date. You see, you are right. The question is: How valid is the data that is being used? I made sure my data was valid. This is why rather than being fired for doing this, I was later moved to Lake Charles to take over a much larger store that included a laundry plant. I was even given the job of training new managers, but that required more than I could handle.

The next alternative is pulling the data in date order (yes, I know it's almost
certainly in date order already, but if you don't make sure then you can never enter
historical data [you're sure to miss entry for some reason some day]), and then run
through the table doing the sum between last month's data and this month's.

The bills I get makes entering historical data quite possible since each bill contains the present date read and its reading as well as the previous date read and its reading. This permits comparing two rows of the table for the dates and readings. Besides, if data were not entered for a month or more, the bills can be ordered by the month of the bill with the oldest on top. Then it is just a matter of checking the previous month's data against that row in the table and then entering this month's data.

Of course, if you do ever enter data out of order the first strategy here won't work,
or you'd have to manually reallocate your sequence numbers.

Regards Mark

--Dan

Mark_Stanton · July 31, 2012, 9:36am

I'm listening very closely.

Your strategy is based on the assumption that consecutive readings
have consecutive primary keys. Not even just being in the right
order, they have to be consecutive integers for your strategy to
work. This elevates the primary key from "internal workings" to
"data". Your strategy is based on the specific values of those
numbers.

That's not what table primary keys are intended for in the relational
database model.

As I said in my previous post, it sort of works, and happens to work
reliably enough for you in your situation. That's fine.

You say you're a mathematician. Does coincidental relationship bear
mathematical scrutiny as a general strategy in that context?

Regards
Mark