go: scraping web sites

Colly: Fast and Elegant Scraping Framework for Gophers.

Colly provides a clean interface to write any kind of crawler/scraper/spider

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Distributed scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Google App Engine support

Colly on GitHub

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}