Name: sce
Owner: NASA JPL MEMEX
Owner: NASA JPL MEMEX
Description: Sparkler Crawl Environment - a packaged, dockerized version of http://github.com/USCDataScience/sparkler.git
Created: 2017-06-05 21:54:07.0
Updated: 2018-02-25 22:57:03.0
Pushed: 2017-11-01 16:09:37.0
Homepage: http://irds.usc.edu/sparkler/
Size: 57
Language: Shell
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The Sparkler Crawl Environment aims at providing an efficient, scalable, consistent and reliable software architecture consisting of domain discovery tools able to enrich a given domain by expanding the collection of artifacts that define the domain.
This repository, named sce, provides a command-line utility for building Sparkler Crawl Environment as a multi-container Docker application running through the Docker Compose tool on a single node. As a PoC, you can easily install the Sparkler Crawl Environment on a single node using the kickstart.sh
bash script that automatically builds and starts up all the software components:
./kickstart.sh [-l /path/to/log]
The Sparkler Crawl Environment is built on top of Sparkler, a new web crawler that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and Felix. Sparkler is an extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster.
The Sparkler Crawl Environment aims at providing an efficient, scalable, consistent and reliable software architecture consisting of domain discovery tools able to enrich a given domain by expanding the collection of artifacts that define the domain.
This repository, named sce, provides a command-line utility for building Sparkler Crawl Environment as a multi-container Docker application running through the Docker Compose tool on a single node. As a PoC, you can easily install the Sparkler Crawl Environment on a single node using the kickstart.sh
bash script that automatically builds and starts up all the software components:
./kickstart.sh [-l /path/to/log]
The Sparkler Crawl Environment is built on top of Sparkler, a new web crawler that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and Felix. Sparkler is an extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster.