Ensure you have an OpenRefine server running. Then install the OpenRefine client as follows.
wget -nv https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.7/openrefine-client_0-3-7_linux -O ~/.local/bin/openrefine-client
chmod +x ~/.local/bin/openrefine-client
2019-08-19 13:11:22 URL:https://github-production-release-asset-2e65be.s3.amazonaws.com/80617276/11234c80-c030-11e9-8d8d-6b20776f164f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190819%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190819T131122Z&X-Amz-Expires=300&X-Amz-Signature=9d24ce810d3d6acb6aff3430e75c5d98eea29e3ad689ae95e28c79a30bca4215&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-client_0-3-7_linux&response-content-type=application%2Foctet-stream [4322528/4322528] -> "/home/jovyan/.local/bin/openrefine-client" [1]
Download sample data
openrefine-client --download "https://gist.githubusercontent.com/felixlohmeier/065727cffeafb216c24f730c40f3b1f6/raw/4923c19cf8bd78d53d211f046bda1afd11bf7b72/lobid-gnd-reconciliation-data.csv" --output lobid-gnd-reconciliation-data.csv
Download to file lobid-gnd-reconciliation-data.csv complete
Import file into OpenRefine
openrefine-client --create lobid-gnd-reconciliation-data.csv --separator=";" --projectName="lobid-gnd-reconciliation"
id: 1615020900072 rows: 4
openrefine-client --export "lobid-gnd-reconciliation"
name beruf ort J. Weizenbaum Informatiker Berlin Twain, Mark Schriftsteller Kumar, Lalit Jemand
Download sample json file (the content of this file was previously extracted via Undo/Redo history in the OpenRefine graphical user interface)
openrefine-client --download "https://gist.githubusercontent.com/felixlohmeier/065727cffeafb216c24f730c40f3b1f6/raw/5e245786cf273a967c9cd0c285f5a2e9f81f8439/lobid-gnd-reconciliation-history.json" --output lobid-gnd-reconciliation-history.json
Download to file lobid-gnd-reconciliation-history.json complete
Apply transformations rules
openrefine-client --apply lobid-gnd-reconciliation-history.json "lobid-gnd-reconciliation"
File lobid-gnd-reconciliation-history.json has been successfully applied to project 1615020900072
Export project to terminal again
openrefine-client --export "lobid-gnd-reconciliation"
name Beruf oder Beschäftigung Geburtsort Sterbeort Ländercode Weizenbaum, Joseph Informatiker Berlin Berlin USA Mathematiker Deutschland Twain, Mark Lotse Florida, Mo. Redding, Conn. USA Schriftsteller Drucker Journalist Soldat Kumar, Lalit Elektroingenieur Delhi Indien
Export data in Excel (.xls) format
openrefine-client --export "lobid-gnd-reconciliation" --output lobid-gnd-reconciliation.csv
Export to file lobid-gnd-reconciliation.csv complete
cat lobid-gnd-reconciliation.csv
name,Beruf oder Beschäftigung,Geburtsort,Sterbeort,Ländercode "Weizenbaum, Joseph",Informatiker,Berlin,Berlin,USA ,Mathematiker,,,Deutschland "Twain, Mark",Lotse,"Florida, Mo.","Redding, Conn.",USA ,Schriftsteller,,, ,Drucker,,, ,Journalist,,, ,Soldat,,, "Kumar, Lalit",Elektroingenieur,Delhi,,Indien
openrefine-client --delete "lobid-gnd-reconciliation"
Project 1615020900072 has been successfully deleted
rm lobid-gnd-reconciliation-data.csv lobid-gnd-reconciliation-history.json lobid-gnd-reconciliation.csv
openrefine-client --help
Usage: openrefine-client [--help | OPTIONS] Script to provide a command line interface to an OpenRefine server. Options: -h, --help show this help message and exit Connection options: -H 127.0.0.1, --host=127.0.0.1 OpenRefine hostname (default: 127.0.0.1) -P 3333, --port=3333 OpenRefine port (default: 3333) Commands: -c [FILE], --create=[FILE] Create project from file. The filename ending (e.g. .csv) defines the input format (csv,tsv,xml,json,txt,xls,xlsx,ods) -l, --list List projects --download=[URL] Download file from URL (e.g. example data). Combine with --output to specify a filename. Commands with argument [PROJECTID/PROJECTNAME]: -d, --delete Delete project -f [FILE], --apply=[FILE] Apply JSON rules to OpenRefine project -E, --export Export project in tsv format to stdout. -o [FILE], --output=[FILE] Export project to file. The filename ending (e.g. .tsv) defines the output format (csv,tsv,xls,xlsx,html) --template=[STRING] Export project with templating. Provide (big) text string that you enter in the *row template* textfield in the export/templating menu in the browser app) --info show project metadata General options: --format=FILE_FORMAT Override file detection (import: csv,tsv,xml,json ,line-based,fixed-width,xls,xlsx,ods; export: csv,tsv,html,xls,xlsx,ods) Create options: --columnWidths=COLUMNWIDTHS (txt/fixed-width), please provide widths in multiple arguments, e.g. --columnWidths=7 --columnWidths=5 --encoding=ENCODING (csv,tsv,txt), please provide short encoding name (e.g. UTF-8) --guessCellValueTypes=true/false (xml,csv,tsv,txt,json, default: false) --headerLines=HEADERLINES (csv,tsv,txt/fixed-width,xls,xlsx,ods), default: 1, default txt/fixed-width: 0 --ignoreLines=IGNORELINES (csv,tsv,txt,xls,xlsx,ods), default: -1 --includeFileSources=true/false (all formats), default: false --limit=LIMIT (all formats), default: -1 --linesPerRow=LINESPERROW (txt/line-based), default: 1 --processQuotes=true/false (csv,tsv), default: true --projectName=PROJECT_NAME (all formats), default: filename --projectTags=PROJECTTAGS (all formats), please provide tags in multiple arguments, e.g. --projectTags=beta --projectTags=client1 --recordPath=RECORDPATH (xml,json), please provide path in multiple arguments without slashes, e.g. /collection/record/ should be entered like this: --recordPath=collection --recordPath=record, default xml: record, default json: _ _ --separator=SEPARATOR (csv,tsv), default csv: , default tsv: \t --sheets=SHEETS (xls,xlsx,ods), please provide sheets in multiple arguments, e.g. --sheets=0 --sheets=1, default: 0 (first sheet) --skipDataLines=SKIPDATALINES (csv,tsv,txt,xls,xlsx,ods), default: 0, default line- based: -1 --storeBlankCellsAsNulls=true/false (csv,tsv,txt,xls,xlsx,ods), default: true --storeBlankRows=true/false (csv,tsv,txt,xls,xlsx,ods), default: true --storeEmptyStrings=true/false (xml,json), default: true --trimStrings=true/false (xml,json), default: false Templating options: --mode=row-based/record-based engine mode (default: row-based) --prefix=PREFIX text string that you enter in the *prefix* textfield in the browser app --rowSeparator=ROWSEPARATOR text string that you enter in the *row separator* textfield in the browser app --suffix=SUFFIX text string that you enter in the *suffix* textfield in the browser app --filterQuery=REGEX Simple RegEx text filter on filterColumn, e.g. ^12015$ --filterColumn=COLUMNNAME column name for filterQuery (default: name of first column) --facets=FACETS facets config in json format (may be extracted with browser dev tools in browser app) --splitToFiles=true/false will split each row/record into a single file; it specifies a presumably unique character series for splitting; --prefix and --suffix will be applied to all files; filename-prefix can be specified with --output (default: %Y%m%d) --suffixById=true/false enhancement option for --splitToFiles; will generate filename-suffix from values in key column Example data: --download "https://git.io/fj5hF" --output=duplicates.csv --download "https://git.io/fj5ju" --output=duplicates-deletion.json Basic commands: --list # list all projects --list -H 127.0.0.1 -P 80 # specify hostname and port --create duplicates.csv # create new project from file --info "duplicates" # show project metadata --apply duplicates-deletion.json "duplicates" # apply rules in file to project --export "duplicates" # export project to terminal in tsv format --export --output=deduped.xls "duplicates" # export project to file in xls format --delete "duplicates" # delete project Some more examples: --info 1234567890123 # specify project by id --create example.tsv --encoding=UTF-8 --create example.xml --recordPath=collection --recordPath=record --create example.json --recordPath=_ --recordPath=_ --create example.xlsx --sheets=0 --create example.ods --sheets=0 Example for Templating Export: Cf. https://github.com/opencultureconsulting/openrefine-client#advanced-templating
The openrefine-client is available as a one file executable for Windows, Mac OS and Linux. Client and server can be executed on different machines (host and port of the OpenRefine server can be specified, e.g. -H 127.0.0.1 -P 80
).
Please file an issue if you miss some features in the command line interface or if you have tracked a bug. And you are welcome to ask any questions!