Thursday, April 30, 2009

How to export report from google analytics (PHP)

You need:
GA account (email + password)
GA profile (usually the website you are tracking)
php

Why do we do this?
Access your data without logging into GA
Can download massive number of reports all at once
Twist it into which ever format you want

Template code:
http://manfred.dschini.org/2008/09/26/google-analytics-export/
click "available here " to download

How to use it:
You will see some $_GET[....] .... That is when the user need to write an interface, and set variables through the UI. Actually, you can set those variables inside the php code. I will explain the things you need to set:
$cmp (I don't know. the default is average. So, you can set it as 'average')
$fmt (This correspond to the GA's four export formats. 0-pdf, 1-xml, 2-..... It is highly recommended to use '1' if you want to twist the data further).
$rpt (report typle. export.php page lists all available reports)
$pdr (report duration. for example: $pdr='20080601-20080829')
$email (your GA email)
$password (your GA password)
$id (your GA ID. This is tricky. How do you know your id?
Go to this page:
http://code.google.com/apis/analytics/docs/gdata/gdataLibraries.html
Let's assume this url is not stable. You can get to this page by searching: "Google Analytics data report api Client Libraries & Sample Code". The first result ...
Okay, now you are on that page, click "interactive samples" under JavaScript. These are a few samples. Use the "Retrieve Your Accounts" demo, "Run". Follow the steps. You will get your profile. The profile Id is the value for your $id variable.
By default, the php template will return data on daily basis. If you select a very large time window, you will get too much numbers. There is another variable you can set:
$gdfmt. This allows you to get report by hour/day/week/month
$gdfmt='nth_month' means it is by month. I like by month -:p
This variable is not in the template. But once you know how it works, it is easy to set another variable.
As for the url's format, it is:
https://www.google.com/analytics/reporting/export?
with a number of paramters attached. You can go to your GA account, and click "export", then check out the url of one of those downloadable formats. That gives you a better idea of how to twist your template.

Other API
Bart mentions one:
http://www.cari.net/applescript-and-google-analytics-automatic-exporting.html
It seems it is much easier. But it is for Mac machine. I don't know Mac :(

Google provides its own data report api. The Google Analytics Data Export API is currently in public beta. While in public beta, our API will remain subject to change. I spend a few hours to learn it. It doesn't provide a good tutorial. Plus their Java codes have error! And I don't know where to download the complete js library. I give up...

Friday, April 24, 2009

the precision of timestamp

Each user activity is associated with a timestamp : create a project, add a resource, view a project, register...
Some are in the 'yyyy-mm-dd hh:mm:ss' format, and some are in the 'yyyy-mm-dd hh:mm:ss.xxxx'. There is no problem to make time comparision within the same table (same timestamp precision), but when I need to refer to different tables, it is a trouble ...

Here is an example:
tracking_page_hits (table)
- time_viewed (timestamp, '2009-03-04 13:45:05.08988)
- target_id (text, /addownresource.php)
- uid
resources (table)
- creation_date (timestamp, '2009-03-04 12:45:05)
- uid
- resource_id
There are two buttons in the addownresource.php page: "add", "cancel". Either button brings user back to the same page - /myresources.php
Did the user add a resource? I need to check the resources table to see if a resource was created at that moment. I use these two criteria to check:
- creation_date > time_viewed AND creation_date - time_viewed <= 1 minute (The action should take less than a minute)
- uid is the same

'2009-03-04 12:45:05 < 2009-03-04 13:45:05.08988. Judging from the criteria, this particular resource is not the "outcome" of visiting "addownresource.php". But, but, but, what if the creation_date is actually '2009-03-04 12:45:05.37599'? Then, it fits the criteria.

I suggest to convert all timestamps to the same precision. That's how it works in psql:
date_trunc('second', time_viewed)
Then every timestamp will be in the 'yyyy-mm-dd hh:mm:ss' format.

------------------------------
Be careful when comparing timestamps of different precision / different timezone.

Thursday, April 23, 2009

More than one slash in url

The target_id column in the tracking_page_hits table records the user-requested url (usually a php page). Normally it should start with ONE slash, then the url, for example /index.php. I just realized that some url starts with MORE THAN slashes, for example ///index.php. There must be some error in recording those urls. This affect my data cleaning:
when target_id = '/feedback.php' then blah blah....
when target_id ~ '^/feedback\.php.*$' then blah blah blah ...
In the case where there are >1 slash, the regex cannot detect the patterns. I can think of two solutions:
1, give up the equal sign, and alway use target_id ~ '^/+.....'
2, remove all those unnecessary starting ////// from the table.
I like method 2, the problem is solved once for all, and I don't need to worry about whether I should use = or ~ , '^/ ......, or '^/+ ...'
How to do it then? still thinking ....
-----------------------
URLs are not always in your expected format. If you find some anomaly, regular expression is a good tool.

Wednesday, April 22, 2009

Why am I blogging?

This is a journal on my experience with "educational data mining" a digital library service: http://ia.usu.edu

who might be interested:
me :)
edm researchers
in-service teachers, maybe maybe not. Probably not now, but you will find something interesting 3 months from now on.