CD2H gitForager

hasadna/nli-z3950

Name: nli-z3950

Owner: The Public Knowledge Workshop

Description: Script to help getting bibliographical data from The National Library of Israel using Z3950 protocol and MARC format

Created: 2018-03-12 11:25:14.0

Updated: 2018-04-23 14:50:16.0

Pushed: 2018-04-23 14:51:11.0

Homepage:

Size: 39

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

nli-z3950

Script to help getting bibliographical data from The National Library of Israel using Z3950 protocol and MARC format

The script dumps JSON serialization of the MARC data by default, optionally it can also dump MARC data in the original, binary MARC21 format

Usage

Stateful search using CCL queries

Search queries should be provided in data/ccl_queries/ccl_queries.csv with a single ccl_query column

Search takes the result as input and only updates new entires

er run -it -v `pwd`/data:/data orihoch/nli-z3950 run ./search

Output data will be available under data/search_results directory

Export search results

er run -it -v `pwd`/data:/data orihoch/nli-z3950 run ./search_export

Using CCL Queries

See https://software.indexdata.com/yaz/doc/tools.html#CCL for some examples

Development

Using Docker

Build and run locally

er build -t nli-z3950 . &&\
er run -it -v `pwd`/data:/data orihoch/nli-z3950 run --verbose ./search

Locally

See the Dockerfile for installation instructions. You need both Python 2.7 and Python 3.6 and some dependencies.

PYTHON2=python2 MAX_RECORDS=50 dpp run --verbose ./search

Sync with google storage

 chown -R $USER data
il -m rsync -r ./data gs://knesset-data-pipelines/hasadna-migdar-data/$USER-`date +%Y-%m-%d_%H-%m`

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.