Each user activity is associated with a timestamp : create a project, add a resource, view a project, register...
Some are in the 'yyyy-mm-dd hh:mm:ss' format, and some are in the 'yyyy-mm-dd hh:mm:ss.xxxx'. There is no problem to make time comparision within the same table (same timestamp precision), but when I need to refer to different tables, it is a trouble ...
Here is an example:
tracking_page_hits (table)
- time_viewed (timestamp, '2009-03-04 13:45:05.08988)
- target_id (text, /addownresource.php)
- uid
resources (table)
- creation_date (timestamp, '2009-03-04 12:45:05)
- uid
- resource_id
There are two buttons in the addownresource.php page: "add", "cancel". Either button brings user back to the same page - /myresources.php
Did the user add a resource? I need to check the resources table to see if a resource was created at that moment. I use these two criteria to check:
- creation_date > time_viewed AND creation_date - time_viewed <= 1 minute (The action should take less than a minute)
- uid is the same
'2009-03-04 12:45:05 < 2009-03-04 13:45:05.08988. Judging from the criteria, this particular resource is not the "outcome" of visiting "addownresource.php". But, but, but, what if the creation_date is actually '2009-03-04 12:45:05.37599'? Then, it fits the criteria.
I suggest to convert all timestamps to the same precision. That's how it works in psql:
date_trunc('second', time_viewed)
Then every timestamp will be in the 'yyyy-mm-dd hh:mm:ss' format.
------------------------------
Be careful when comparing timestamps of different precision / different timezone.
Subscribe to:
Post Comments (Atom)
Hmmm... truncating to the second? Depending, that can be a bad thing depending on how many click per second are made (in our case this is hopefully not a problem).
ReplyDeleteSo, another way to look at this is through the comments on the real outcome of saving a resource (target like 'myresources.php%' and referrer like 'http://ia.usu.edu/addownresource.php%'). You'll see 'successfully added own resource to folder' Perhaps those dates will be closer?
That said, I think you have a good point with timestamps. I mentioned that to the Exploratorium as well as they might be integrating a few data sources.
ReplyDeleteHowever, the nice thing is that if one can capture the session id's and sequentially order the logs, then the session can be dissected in sequence and a good-enough precision on the time of saving a resource is possible.
Another note on the IA resource timestamps... The creation_dates are not stable back in the first year. It was argued with the NSDL resources whether it should be the MD creation date (as received from the NSDL) or the local record creation date... I can't remember which won the argument, but I think it is the local time...