# Using the openrefine-client in a Linux Bash environment¶

## Preparations¶

First we need an OpenRefine server running and the openrefine-client installed.

### Option 1: binder¶

This binder has OpenRefine, the openrefine-client and a Jupyter server proxy preinstalled. OpenRefine should be listening on default port 3333 and the GUI should be available at the urlpath /openrefine.

### Option 2: Local environment¶

Ensure you have an OpenRefine server running. Then install the OpenRefine client as follows.

In [ ]:
wget -nv https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.10/openrefine-client_0-3-10_linux -O ~/.local/bin/openrefine-client
chmod +x ~/.local/bin/openrefine-client


## Create a directory¶

We will store some files so it is clearer to use a new folder.

In [ ]:
workspace=$(date +%Y%m%d_%H%M%S) mkdir -p ~/$workspace && cd ~/$workspace && pwd  ## Create project¶ Download sample data In [ ]: openrefine-client --download "https://git.io/fj5hF" --output=duplicates.csv  Import file into OpenRefine In [ ]: openrefine-client --create duplicates.csv  ## List all projects¶ In [ ]: openrefine-client --list  ## Show project metadata¶ In [ ]: openrefine-client --info "duplicates"  ## Export project to terminal¶ In [ ]: openrefine-client --export "duplicates"  ## Apply rules from json file¶ Download sample json file (the content of this file was previously extracted via Undo/Redo history in the OpenRefine graphical user interface) In [ ]: openrefine-client --download "https://git.io/fj5ju" --output=duplicates-deletion.json  Apply transformations rules In [ ]: openrefine-client --apply duplicates-deletion.json "duplicates"  Export project to terminal again In [ ]: openrefine-client --export "duplicates"  ## Export project to file¶ Export data in Excel (.xls) format In [ ]: openrefine-client --export "duplicates" --output deduped.xls  ## Delete project¶ In [ ]: openrefine-client --delete "duplicates"  ## Advanced templating¶ Create another project from the example file above In [ ]: openrefine-client --create duplicates.csv --projectName=advanced  The following example code will export the columns "name" and "purchase" in JSON format from the project "advanced" for rows matching the regex text filter ^F$ in column "gender"

In [ ]:
openrefine-client "advanced" \
--prefix='{ "events" : [
' \
--template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
--rowSeparator=',
' \
--suffix='
] }' \
--filterQuery='^F$' \ --filterColumn='gender'  There is also an option to store the results in multiple files. Each file will contain the prefix, an processed row, and the suffix. In [ ]: openrefine-client "advanced" \ --prefix='{ "events" : [ ' \ --template=' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \ --rowSeparator=', ' \ --suffix=' ] }' \ --filterQuery='^F$' \
--filterColumn='gender' \
--splitToFiles=true


Filenames are suffixed with the row number by default (e.g. advanced_1.json, advanced_2.json etc.). There is another option to use the value in the first column instead:

In [ ]:
openrefine-client "advanced" \
--prefix='{ "events" : [
' \
--template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
--rowSeparator=',
' \
--suffix='
] }' \
--filterQuery='^F\$' \
--filterColumn='gender' \
--splitToFiles=true \
--suffixById=true


Check the results in the current directory.

In [ ]:
ls


Because our project "advanced" contains duplicates in the first column "email" this command will overwrite files (e.g. [email protected]). When using this option, the first column should contain unique identifiers.

## Delete project¶

In [ ]:
openrefine-client --delete "advanced"


## Getting help¶

In [ ]:
openrefine-client --help


The openrefine-client is available as a one file executable for Windows, Mac OS and Linux. Client and server can be executed on different machines (host and port of the OpenRefine server can be specified, e.g. -H 127.0.0.1 -P 80).

Please file an issue if you miss some features in the command line interface or if you have tracked a bug. And you are welcome to ask any questions!