CLI workshop

CARTO Solutions Technical Workshops

Jorge Sanz · May 2019

http://bit.ly/1905-cli-workshop


This workshop is intended to show why using a command line interface can be useful on many of our everyday tasks, and more convenient and effective than other graphical interfaces.


Contents

  • Intro
  • Some basic commands
  • Viewing and finding things
  • Accessing APIs and web services
  • Handling local data
  • Miscellany

CLI versus GUI

_ CLI GUI
Eease
Functionality
Speed
Multitasking
Automation

When to use the command line

  • Repeated tasks
  • Greater control over functionality
  • Dealing with lots of information
  • Focus in the process
  • Accessing remote servers

Piping and redirections

In [40]:
# pipe the result of a command into another
cat /etc/hosts | grep localhost
127.0.0.1	localhost
::1     ip6-localhost ip6-loopback
127.0.1.1 cartodb.localhost
In [41]:
# write the contents of a command results into a file (overwriting)
cat /etc/hosts > /tmp/myhosts

Basic commands

  • Remember you can use the arrow keys to browse your history.
  • The HISTSIZE environment variable defines the size of your history.
  • Ctrl + r allows you to search on your history
  • !! executes the previous command
  • CTL + L clean the screen (same as executing clear command)
  • CTL + A, CTL + E: go to the begin/end of the line
  • CTL + W, CTL + U: remove the previous word or the everything before the cursor
  • fc will open an editor with the last command, useful for editing long ones
  • !$ will replace by the last parameter of your last command
In [42]:
# Show system info
uname -a
Linux aldaia 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
In [43]:
# Show the current user
whoami
jsanz
In [44]:
# Show the current directory
pwd
/home/jsanz/src/carto/talks/1905-cli
In [45]:
# Show the current date
date +%F\ %T
2019-05-09 17:24:50
In [46]:
# Print a variable
echo $HISTSIZE
1000
In [47]:
# Print disk status
df -h | grep -v loop
Filesystem                   Size  Used Avail Use% Mounted on
udev                         7,8G     0  7,8G   0% /dev
tmpfs                        1,6G  1,8M  1,6G   1% /run
/dev/mapper/ubuntu--vg-root  424G  297G  106G  74% /
tmpfs                        7,8G  189M  7,7G   3% /dev/shm
tmpfs                        5,0M  4,0K  5,0M   1% /run/lock
tmpfs                        7,8G     0  7,8G   0% /sys/fs/cgroup
/dev/sda1                    472M  184M  265M  41% /boot
tmpfs                        1,6G   60K  1,6G   1% /run/user/1000
/dev/fuse                    250G  1,0K  250G   1% /run/user/1000/keybase/kbfs
In [48]:
# Print usage
du ~/media/carto --max-depth=1 --human-readable
780K	/home/jsanz/media/carto/koppen-climatic
343M	/home/jsanz/media/carto/CNIG
859M	/home/jsanz/media/carto/zcta
40K	/home/jsanz/media/carto/fme
665M	/home/jsanz/media/carto/landast
49M	/home/jsanz/media/carto/noaa
478M	/home/jsanz/media/carto/antartica
115M	/home/jsanz/media/carto/FARS
244M	/home/jsanz/media/carto/cartociudad
13M	/home/jsanz/media/carto/magrama
42M	/home/jsanz/media/carto/KP
59M	/home/jsanz/media/carto/builder-demos
143M	/home/jsanz/media/carto/natural_earth
16G	/home/jsanz/media/carto/ncdc
3,4M	/home/jsanz/media/carto/ukr
92M	/home/jsanz/media/carto/btn25
131M	/home/jsanz/media/carto/INE
20K	/home/jsanz/media/carto/.mapbox-studio
289M	/home/jsanz/media/carto/carto_establishments
5,9M	/home/jsanz/media/carto/geopois
1,5G	/home/jsanz/media/carto/geonames
2,7G	/home/jsanz/media/carto/nba
1014M	/home/jsanz/media/carto/waze
2,7G	/home/jsanz/media/carto/ordenance_survey
568K	/home/jsanz/media/carto/TV3
608M	/home/jsanz/media/carto/NYC
88M	/home/jsanz/media/carto/UNIGIS
124M	/home/jsanz/media/carto/multicriterio
42M	/home/jsanz/media/carto/nuts2013
4,0K	/home/jsanz/media/carto/CartoDB
843M	/home/jsanz/media/carto/catastro-osm
321M	/home/jsanz/media/carto/yodlee
48K	/home/jsanz/media/carto/paseo.shp
du: cannot read directory '/home/jsanz/media/carto/postgres_data/9.5/main': Permission denied
12K	/home/jsanz/media/carto/postgres_data
6,7G	/home/jsanz/media/carto/osm
1,5G	/home/jsanz/media/carto/gadm
37G	/home/jsanz/media/carto

In [50]:
# Find processes
ps aux | grep unclutter
jsanz     2161  0.0  0.0  10948   916 ?        S    09:14   0:00 /bin/sh -c unclutter
jsanz     2162  0.0  0.0  30944  2708 ?        S    09:14   0:00 unclutter
jsanz    11039  0.0  0.0  17728  1128 pts/0    R+   17:24   0:00 grep --color=auto unclutter
In [52]:
# Send a kill signal to the unclutter processs
kill -9 2162
ps aux | grep unclutter
jsanz    11053  0.0  0.0  17728  1048 pts/0    S+   17:25   0:00 grep --color=auto unclutter

Scripting

You can write pretty complex scripts using bash that include functions, control structures, and so on.

In [53]:
# You can do for loops
for i in {1..5}
do
   echo "Welcome $i"
done
Welcome 1
Welcome 2
Welcome 3
Welcome 4
Welcome 5
In [54]:
# If structures... and much more
if [ $HISTSIZE -gt 500 ]
then
    echo "Big History!"
else
    echo "Short History"
fi
Big History!

Viewing and finding

wc and cat

In [55]:
# Counting lines, words, and characters
wc /etc/hosts
 28  54 631 /etc/hosts
In [56]:
# Usually we only want the lines
wc -l /etc/hosts
28 /etc/hosts
In [57]:
# Print the beginning of a file
head -n5 /etc/hosts
127.0.0.1	localhost
127.0.1.1	aldaia

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
In [58]:
# Same with the end
tail -n5 /etc/hosts
## vagrant-hostmanager
#10.0.2.70 grafana.cartodb.net
#10.0.2.82 kibana.cartodb.net

127.0.1.1 cartodb.localhost
In [59]:
# Combining pipes with wc to count how many files are in a folder
ls /var/log/*.log* \
| wc -l
58

find

In [60]:
# Finding all files inside a folder based in their names and types
# we redirect errors to the black hole at /dev/null
find /var/log -type f -name "*log*" 2> /dev/null \
| wc -l
203
In [61]:
# Finding and deleting
find /tmp/ -name ".DS_Store" -type f -delete 2> /dev/null

In [62]:
# Finding and executing a command (count lines in gitignore files)
find ~/src/carto -name ".gitignore" -type f -exec wc -l {} \;
13 /home/jsanz/src/carto/blog/.gitignore
1 /home/jsanz/src/carto/carto-workshop/.gitignore
3 /home/jsanz/src/carto/carto-workshop/06-sdks/exercises/python_SDK/.gitignore
3 /home/jsanz/src/carto/onpremises/.gitignore
2 /home/jsanz/src/carto/onpremises/omnibus/.gitignore
1 /home/jsanz/src/carto/onpremises/doc/.gitignore
8 /home/jsanz/src/carto/product-updates/.gitignore
2 /home/jsanz/src/carto/customer_success/projects/infogroup/.gitignore
1 /home/jsanz/src/carto/customer_success/projects/cushman_wakefield/201611_office_locations/.gitignore
1 /home/jsanz/src/carto/customer_success/outreach/tech_touch/.gitignore
55 /home/jsanz/src/carto/solutions/.gitignore
4 /home/jsanz/src/carto/solutions/onprem/ssh/.gitignore
2 /home/jsanz/src/carto/solutions/doc/.gitignore
12 /home/jsanz/src/carto/help/.gitignore
1 /home/jsanz/src/carto/help/glossary_download/.gitignore
2 /home/jsanz/src/carto/talks/.gitignore
12 /home/jsanz/src/carto/talks/resources/reveal.js/reveal.js-3.5.0/.gitignore
1 /home/jsanz/src/carto/support/src/followups/.gitignore
89 /home/jsanz/src/carto/support/src/metrics/.gitignore
2 /home/jsanz/src/carto/support/.gitignore
92 /home/jsanz/src/carto/support/guides/import-sql-api-test/.gitignore
9 /home/jsanz/src/carto/support/guides/mobile_test/src/.gitignore
1 /home/jsanz/src/carto/support/guides/mobile_test/src/app/.gitignore
39 /home/jsanz/src/carto/Reveal/.gitignore

sed

Great for working with text files:

  • replacing texts
  • finding patterns
  • working with line numbers
In [63]:
# Print comments in the hosts file
sed -n -e '/\#.*/p' /etc/hosts
# The following lines are desirable for IPv6 capable hosts
#192.168.56.101 cartodb.lan
# 5.32.4.75 cartodb.lan
## vagrant-hostmanager-start id: aea15025-4dcf-452d-a47e-7e270a187de8
## vagrant-hostmanager-end
## vagrant-hostmanager-start id: 0a853cbe-6d72-4784-aea1-137a727a3f0a
## vagrant-hostmanager
#10.0.2.70 grafana.cartodb.net
#10.0.2.82 kibana.cartodb.net
In [64]:
# Remove comments and then remove blank lines
sed -e '/\#.*/d' -e '/^\s*$/d' /etc/hosts
127.0.0.1	localhost
127.0.1.1	aldaia
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.33.10 carto.lan
10.0.3.79	bcg.devnet.local
10.0.3.140	carto.lan
127.0.1.1 cartodb.localhost

Silver Searcher

The Silver Searcher, quick find anything on your code

In [65]:
ag betis ~/src/carto/help
/home/jsanz/src/carto/help/_app/_contents/working-with-data/development/tools.md
78:![CDB Manager]({{ site.baseurl }}/images/working-with-data/tools/betis.png)
80:### Why use betis.carto.io?
82:Normally, to use CDB Manager you would have to clone [its GitHub repo](https://github.com/CartoDB/cdb-manager), then run a Python server and open its interface via localhost. This site is a more convenient way to use the tool: [https://betis.carto.io/](https://betis.carto.io/) is just a static hosted version of CDB Manager.
84:**Note:** We named this site `betis` as a small tribute to CDB Manager's main developer, Dani Carrión. He's a huge fan of [Real Betis Balompié](https://en.wikipedia.org/wiki/Real_Betis), a Spanish football team from Seville.
88:1. Visit [https://betis.carto.io/](https://betis.carto.io/)

Accessing web services and APIs

curl

https://curl.haxx.se/docs/manpage.html

In [66]:
# accessing a simple URL
curl http://numbersapi.com/8/26
August 26th is the day in 1821 that the University of Buenos Aires, Argentina, is officially opened.
In [67]:
# making a POST request with a Content Type header
curl \
--data '{"q":"select user"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql
{"rows":[{"user":"cartodb_publicuser_029cd392-0d6a-471d-83d1-f13ded9648d7"}],"time":0.008,"fields":{"user":{"type":"name"}},"total_rows":1}
In [68]:
# making a HEAD request
curl \
--head \
https://jsanz.carto.com/api/v2/sql?q=select+user
HTTP/1.1 200 OK
Server: openresty
Date: Thu, 09 May 2019 15:25:34 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With, X-Prototype-Version, X-CSRF-Token, Authorization
Carto-Rate-Limit-Limit: 81
Carto-Rate-Limit-Remaining: 80
Carto-Rate-Limit-Reset: 0
vary: Authorization
X-SQLAPI-Log: {"request":{"sql":{"type":"query","sql":"select user"}}}
Content-Disposition: inline; filename=cartodb-query.json; modification-date="Thu, 09 May 2019 15:25:34 GMT";
Cache-Control: no-cache,max-age=31536000,must-revalidate,public
Last-Modified: Thu, 09 May 2019 15:25:34 GMT
X-SQLAPI-Profiler: {"authorization":2,"getConnectionParams":1,"getUserTimeoutLimits":1,"queryExplain":6,"eventedQuery":2,"beforeSink":1,"total":13}
X-Varnish: 52190654
Age: 0
X-Cache: MISS

In [69]:
# formatting the result using python3
curl -s \
--data '{"q":"select user"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql \
| python3 -m json.tool
{
    "rows": [
        {
            "user": "cartodb_publicuser_029cd392-0d6a-471d-83d1-f13ded9648d7"
        }
    ],
    "time": 0.004,
    "fields": {
        "user": {
            "type": "name"
        }
    },
    "total_rows": 1
}

Handling data

jq

jq is a JSON processor, more advance that just pretty printing a result

In [70]:
# just formatting the output
curl -s \
--data '{"q":"select cartodb_id, scalerank, featurecla, name, worldcity from populated_places limit 5"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql > /tmp/data.json

cat /tmp/data.json | jq .
{
  "rows": [
    {
      "cartodb_id": 2338,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Kwekwe",
      "worldcity": 0
    },
    {
      "cartodb_id": 2339,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Plumtree",
      "worldcity": 0
    },
    {
      "cartodb_id": 2340,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Beitbridge",
      "worldcity": 0
    },
    {
      "cartodb_id": 2341,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Gwanda",
      "worldcity": 0
    },
    {
      "cartodb_id": 2342,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Chiredzi",
      "worldcity": 0
    }
  ],
  "time": 0.005,
  "fields": {
    "cartodb_id": {
      "type": "number"
    },
    "scalerank": {
      "type": "number"
    },
    "featurecla": {
      "type": "string"
    },
    "name": {
      "type": "string"
    },
    "worldcity": {
      "type": "number"
    }
  },
  "total_rows": 5
}
In [71]:
# processing the results to show only the names
cat /tmp/data.json \
| jq ".rows[].name"
"Kwekwe"
"Plumtree"
"Beitbridge"
"Gwanda"
"Chiredzi"
In [72]:
# generate a new JSON with custom properties
cat /tmp/data.json \
| jq ".rows[] | {id: .cartodb_id, scale: .scalerank, name: .name}"
{
  "id": 2338,
  "scale": 7,
  "name": "Kwekwe"
}
{
  "id": 2339,
  "scale": 7,
  "name": "Plumtree"
}
{
  "id": 2340,
  "scale": 7,
  "name": "Beitbridge"
}
{
  "id": 2341,
  "scale": 7,
  "name": "Gwanda"
}
{
  "id": 2342,
  "scale": 7,
  "name": "Chiredzi"
}
In [73]:
# combining with other tools
curl -sk -G "${CARTO_API_URL}api/v1/map/named/?api_key=${CARTO_API_KEY}" \
| jq ".template_ids[]" \
| tr -d '"' \
| sort \
| head
cartoframes_ver20170406_layers1_time0_baseid0_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid1_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid1_labels0_zoom1
cartoframes_ver20170406_layers1_time0_baseid2_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid2_labels0_zoom1
cartoframes_ver20170406_layers1_time0_baseid2_labels1_zoom0
cartoframes_ver20170406_layers1_time0_baseid2_labels1_zoom1
tpl_01b39da5_5c92_4d43_93b8_7814b8c4037b
tpl_042460ae_a5a4_11e6_88d3_0e233c30368f
tpl_05f7c3cb_7faf_4397_80af_82918208096b
In [74]:
cat /tmp/data.json | jq ".rows[]" \
| json2csv > /tmp/data.csv

cat /tmp/data.csv
"cartodb_id","scalerank","featurecla","name","worldcity"
2338,7,"Populated place","Kwekwe",0
2339,7,"Populated place","Plumtree",0
2340,7,"Populated place","Beitbridge",0
2341,7,"Populated place","Gwanda",0
2342,7,"Populated place","Chiredzi",0

csvkit

https://csvkit.readthedocs.io/en/latest/

  • csvclean: fix common CSV problems
  • csvcut: remove columns
  • csvgrep: filtering data
  • csvjoin: join two CSVs by a given column
  • csvjson: generate json or GeoJSON
  • csvlook: pretty print CSV
  • csvstat: basic statistics
In [75]:
# convert to CSV and pretty print the table
cat /tmp/data.csv \
| csvlook
| cartodb_id | scalerank | featurecla      | name       | worldcity |
| ---------- | --------- | --------------- | ---------- | --------- |
|      2.338 |         7 | Populated place | Kwekwe     |     False |
|      2.339 |         7 | Populated place | Plumtree   |     False |
|      2.340 |         7 | Populated place | Beitbridge |     False |
|      2.341 |         7 | Populated place | Gwanda     |     False |
|      2.342 |         7 | Populated place | Chiredzi   |     False |
In [76]:
cat /tmp/data.csv \
| csvgrep -c 4 -m tree \
| csvlook
| cartodb_id | scalerank | featurecla      | name     | worldcity |
| ---------- | --------- | --------------- | -------- | --------- |
|      2.339 |         7 | Populated place | Plumtree |     False |
In [77]:
cat /tmp/data.csv | csvstat
  1. "cartodb_id"

	Type of data:          Number
	Contains null values:  False
	Unique values:         5
	Smallest value:        2.338
	Largest value:         2.342
	Sum:                   11.700
	Mean:                  2.340
	Median:                2.340
	StDev:                 1,581
	Most common values:    2.338 (1x)
	                       2.339 (1x)
	                       2.340 (1x)
	                       2.341 (1x)
	                       2.342 (1x)

  2. "scalerank"

	Type of data:          Number
	Contains null values:  False
	Unique values:         1
	Smallest value:        7
	Largest value:         7
	Sum:                   35
	Mean:                  7
	Median:                7
	StDev:                 0
	Most common values:    7 (5x)

  3. "featurecla"

	Type of data:          Text
	Contains null values:  False
	Unique values:         1
	Longest value:         15 characters
	Most common values:    Populated place (5x)

  4. "name"

	Type of data:          Text
	Contains null values:  False
	Unique values:         5
	Longest value:         10 characters
	Most common values:    Kwekwe (1x)
	                       Plumtree (1x)
	                       Beitbridge (1x)
	                       Gwanda (1x)
	                       Chiredzi (1x)

  5. "worldcity"

	Type of data:          Boolean
	Contains null values:  False
	Unique values:         1
	Most common values:    False (5x)

Row count: 5

OGR

OGR is the vector brother of the well-known GDAL library to process raster geospatial data.

There are two interesting commands to learn to inspect and process geospatial datasets:

In [78]:
# ogrinfo presents an overview of a layer from a data source: type, extent, SRS, schema, etc
ogrinfo -summary ~/media/carto/flights.gpkg cartodb-query
INFO: Open of `/home/jsanz/media/carto/flights.gpkg'
      using driver `GPKG' successful.

Layer name: cartodb-query
Geometry: Point
Feature Count: 395633
Extent: (-158.517000, 17.962500) - (-65.685200, 62.540400)
Layer SRS WKT:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
FID Column = cartodb_id
Geometry Column = geom
mfr_mdl_code: Integer (0.0)
fid: Integer (0.0)
adshex: String (0.0)
latitude: Real (0.0)
longitude: Real (0.0)
altitude: Integer (0.0)
speed: Integer (0.0)
track: Integer (0.0)
squawk: Integer (0.0)
timestamp: DateTime (0.0)
year_mfr: Integer (0.0)
type_aircraft: Integer (0.0)
agency: String (0.0)
flight_id: String (0.0)
mfr: String (0.0)
model: String (0.0)
n_number: String (0.0)
name: String (0.0)
other_names1: String (0.0)
other_names2: String (0.0)
serial_number: String (0.0)
type: String (0.0)
In [79]:
# ogr2ogr processes and converts a datasource between different formats

# on this example we generate a new Geopackage only with one geometry from the original dataset
# this is useful to upload to carto only a dataset definition (after importing you delete that single row)
ogr2ogr \
-f GPKG  \
-overwrite \
-limit 1 \
-nln flights_first \
/tmp/flights_first.gpkg ~/media/carto/flights.gpkg
In [80]:
ogrinfo -summary /tmp/flights_first.gpkg flights_first \
| head -n7
INFO: Open of `/tmp/flights_first.gpkg'
      using driver `GPKG' successful.

Layer name: flights_first
Geometry: Point
Feature Count: 1
Extent: (-122.137000, 37.709900) - (-122.137000, 37.709900)

Miscellany

autojump

Jump around your file system without traversing folders

https://github.com/wting/autojump

I have aliased the command autojump to the key j because I'm so lazy

pigz

pigz is a parallized version of gzip, it uses all your cores to compress/decompress your files

htop

htopis a convenient way to inspect your running process, filter, kill, etc

duc

duc is a cached du command, that is, a tool that builds an index of how your hard disk is being used and helps you understand where your gigabytes are. It comes with cli and graphical interfaces.

In [81]:
duc ls ~/media/carto | head
 15.4G ncdc
  6.6G osm
  2.6G ordenance_survey
  2.6G nba
  1.4G gadm
  1.4G geonames
1013.2M waze
858.2M zcta
842.6M catastro-osm
664.8M landast

translate

In [82]:
# Translator using several providers
# https://github.com/soimort/translate-shell
trans -brief es:en 'cuando se despertó, el dinosaurio aún estaba allí'
trans -brief es:zh+ja 'cuando se despertó, el dinosaurio aún estaba allí'
When he woke up, the dinosaur was still there
当他醒来时,恐龙还在那里
彼が目を覚ましたとき、恐竜はまだそこにいました

pygmentize

In [83]:
# Pretty print source code
# http://pygments.org/docs/cmdline/
pygmentize -g /home/jsanz/src/sdks/carto-python/carto/fields.py
"""
Module for defining response objects

.. module:: carto.fields
   :platform: Unix, Windows
   :synopsis: Module for defining response objects

.. moduleauthor:: Daniel Carrion <daniel@carto.com>
.. moduleauthor:: Alberto Romeu <alrocar@carto.com>


"""

from pyrestcli.fields import ResourceField


class VisualizationField(ResourceField):
    """
    :class:`carto.visualizations.Visualization`
    """
    value_class = "carto.visualizations.Visualization"


class TableField(ResourceField):
    """
    :class:`carto.tables.Table`
    """
    value_class = "carto.tables.Table"


class UserField(ResourceField):
    """
    :class:`carto.users.User`
    """
    value_class = "carto.users.User"


class EntityField(ResourceField):
    """
    :class:`carto.permissions.Entity`
    """
    value_class = "carto.permissions.Entity"


class PermissionField(ResourceField):
    """
    :class:`carto.permissions.Permission`
    """
    value_class = "carto.permissions.Permission"

one liners

# Get the top ten commands from your history
# (you may have to change the column index in my case was $3)

history \
| awk '{CMD[$3]++;count++;}END { for (a in CMD)print CMD[a] " " CMD[a]/count*100 "% " a;}' \
| grep -v "./" \
| column -c3 -s " " -t \
| sort -nr \
| nl \
| head -n10