Name: wikipedia-counter
Owner: Mapzen
Description: null
Created: 2015-10-20 22:06:59.0
Updated: 2016-08-27 17:30:47.0
Pushed: 2015-10-26 18:18:17.0
Homepage: null
Size: 160
Language: JavaScript
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This repo contains code to take hourly page view count data from Wikipedia, and construct page counts for longer periods using PostgreSQL.
Note: This requires PostgreSQL 9.5 to take advantage of the new UPSERT feature, or it's just too slow
Hourly page view files are downloaded on demand (or you can download them yourself and read directly from the files), a Node script parses the files and imports relevant data into Postgres, and finally fun queries can be run against the resulting data.
So far this has been used only to aggregate a single month of Wikipedia logs.