gophergala2016/blogalert

Name: blogalert

Owner: gophergala2016

Description: null

Created: 2016-01-22 19:20:20.0

Updated: 2016-02-14 20:53:05.0

Pushed: 2016-01-25 00:53:30.0

Homepage: null

Size: 57

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

blogalert

http://blogalert.adamtalbot.me/

Blogalert crawls blog sites and alerts you when a subscription has new content

It is devided into a crawler, an api and a frontend. All of these can be hosted seperatly as long as they share a database.

TODO
Wishlist
How it works
Spider

The spider uses a worker pool, when a blog is crawled the initial page is scanned. Following this all links (within the same domain) are queued to be scanned. This repeats up to a depth of 5 or a total of 200 pages, whatever happenes first.

The MD5 value of a page is stored as well, if this has not changed then the page is not proccessed. This is to stop articles being proccessed multiple times. This also means that if the initial page has not changed then no other pages are scanned. It is assumed new blog articles get featured on this page.

Frontend

This is a very small html server serving a html tempalte with some js and css.

API

This handles requests from the frontend. Currently new articles are calculated on request, but in time the hope is to pre proccess this so the api can just call down to pre rendered data in the db. Pre rendering would also allow for sending notifications to users.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.