Some URL has a slash at the end, for example, /iamteacher.php/
This is not a good news for pattern matching. So I remove all of them:
update tracking_page_hits set target_id = substring(target_id from '(.*[^/]+)/+$') where target_id ~ '.+/$';
Be careful with regex. Does it remove all slashes at the end, or just the last one?
Subscribe to:
Post Comments (Atom)
Hi Beijie,
ReplyDeleteSorry about the lateness of this post.
Why do you have the [^/]+?
if you are only interested in stripping any final slashes you can do it with just (.*)/+$
SELECT substring('http://example.com/dir1/dir2/script.php/' from '(.*)/+$')
This will also handle
SELECT substring('http://example.com/dir1/dir2/script.php?var1=val1&var2=val2/' from '(.*)/+$')
Not sure that helps, but there you have it.
Oh, and BTW, I also noticed that at times the www are used in the domain name and sometimes not, so that is another place to "standardize".
ReplyDeleteThe only question is to add or drop the www's...
For example, the following two URLs should be equivalent:
http://www.example.com/dir1/dir2/script1.php
http://example.com/dir1/dir2/script1.php
Some domains require the www and others refuse it, some handle both just fine... What do you think about that?
OR!YAC (Oh, right! Yet another comment :)
ReplyDeleteYou asked if it removed multiple end slashes...
Because you have /+$ then it will remove all trailing slashes.
But when I reread, did you say you are removing all slashes or just all trailing slashes?
If it is all slashes, then you will want:
SELECT replace('http://example.com/dir1/dir2//dir3/script1.php?var1=val1&var2=val2//', '/', '');
enjoy.