CLI workshop¶

CARTO Solutions Technical Workshops

Jorge Sanz · May 2019

http://bit.ly/1905-cli-workshop

This workshop is intended to show why using a command line interface can be useful on many of our everyday tasks, and more convenient and effective than other graphical interfaces.

Contents¶

Intro
Some basic commands
Viewing and finding things
Accessing APIs and web services
Handling local data
Miscellany

CLI versus GUI¶

_	CLI	GUI
Eease	❌	✅
Functionality	✅	❌
Speed	✅	❌
Multitasking	❌	✅
Automation	✅	❌

When to use the command line¶

Repeated tasks
Greater control over functionality
Dealing with lots of information
Focus in the process
Accessing remote servers

Piping and redirections¶

# pipe the result of a command into another
cat /etc/hosts | grep localhost

127.0.0.1	localhost
::1     ip6-localhost ip6-loopback
127.0.1.1 cartodb.localhost

# write the contents of a command results into a file (overwriting)
cat /etc/hosts > /tmp/myhosts

Basic commands¶

Remember you can use the arrow keys to browse your history.
The HISTSIZE environment variable defines the size of your history.
Ctrl + r allows you to search on your history
!! executes the previous command
CTL + L clean the screen (same as executing clear command)
CTL + A, CTL + E: go to the begin/end of the line
CTL + W, CTL + U: remove the previous word or the everything before the cursor
fc will open an editor with the last command, useful for editing long ones
!$ will replace by the last parameter of your last command

# Show system info
uname -a

Linux aldaia 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# Show the current user
whoami

jsanz

# Show the current directory
pwd

/home/jsanz/src/carto/talks/1905-cli

# Show the current date
date +%F\ %T

2019-05-09 17:24:50

# Print a variable
echo $HISTSIZE

1000

# Print disk status
df -h | grep -v loop

Filesystem                   Size  Used Avail Use% Mounted on
udev                         7,8G     0  7,8G   0% /dev
tmpfs                        1,6G  1,8M  1,6G   1% /run
/dev/mapper/ubuntu--vg-root  424G  297G  106G  74% /
tmpfs                        7,8G  189M  7,7G   3% /dev/shm
tmpfs                        5,0M  4,0K  5,0M   1% /run/lock
tmpfs                        7,8G     0  7,8G   0% /sys/fs/cgroup
/dev/sda1                    472M  184M  265M  41% /boot
tmpfs                        1,6G   60K  1,6G   1% /run/user/1000
/dev/fuse                    250G  1,0K  250G   1% /run/user/1000/keybase/kbfs

# Print usage
du ~/media/carto --max-depth=1 --human-readable

780K	/home/jsanz/media/carto/koppen-climatic
343M	/home/jsanz/media/carto/CNIG
859M	/home/jsanz/media/carto/zcta
40K	/home/jsanz/media/carto/fme
665M	/home/jsanz/media/carto/landast
49M	/home/jsanz/media/carto/noaa
478M	/home/jsanz/media/carto/antartica
115M	/home/jsanz/media/carto/FARS
244M	/home/jsanz/media/carto/cartociudad
13M	/home/jsanz/media/carto/magrama
42M	/home/jsanz/media/carto/KP
59M	/home/jsanz/media/carto/builder-demos
143M	/home/jsanz/media/carto/natural_earth
16G	/home/jsanz/media/carto/ncdc
3,4M	/home/jsanz/media/carto/ukr
92M	/home/jsanz/media/carto/btn25
131M	/home/jsanz/media/carto/INE
20K	/home/jsanz/media/carto/.mapbox-studio
289M	/home/jsanz/media/carto/carto_establishments
5,9M	/home/jsanz/media/carto/geopois
1,5G	/home/jsanz/media/carto/geonames
2,7G	/home/jsanz/media/carto/nba
1014M	/home/jsanz/media/carto/waze
2,7G	/home/jsanz/media/carto/ordenance_survey
568K	/home/jsanz/media/carto/TV3
608M	/home/jsanz/media/carto/NYC
88M	/home/jsanz/media/carto/UNIGIS
124M	/home/jsanz/media/carto/multicriterio
42M	/home/jsanz/media/carto/nuts2013
4,0K	/home/jsanz/media/carto/CartoDB
843M	/home/jsanz/media/carto/catastro-osm
321M	/home/jsanz/media/carto/yodlee
48K	/home/jsanz/media/carto/paseo.shp
du: cannot read directory '/home/jsanz/media/carto/postgres_data/9.5/main': Permission denied
12K	/home/jsanz/media/carto/postgres_data
6,7G	/home/jsanz/media/carto/osm
1,5G	/home/jsanz/media/carto/gadm
37G	/home/jsanz/media/carto

# Find processes
ps aux | grep unclutter

jsanz     2161  0.0  0.0  10948   916 ?        S    09:14   0:00 /bin/sh -c unclutter
jsanz     2162  0.0  0.0  30944  2708 ?        S    09:14   0:00 unclutter
jsanz    11039  0.0  0.0  17728  1128 pts/0    R+   17:24   0:00 grep --color=auto unclutter

# Send a kill signal to the unclutter processs
kill -9 2162
ps aux | grep unclutter

jsanz    11053  0.0  0.0  17728  1048 pts/0    S+   17:25   0:00 grep --color=auto unclutter

Scripting¶

You can write pretty complex scripts using bash that include functions, control structures, and so on.

# You can do for loops
for i in {1..5}
do
   echo "Welcome $i"
done

Welcome 1
Welcome 2
Welcome 3
Welcome 4
Welcome 5

# If structures... and much more
if [ $HISTSIZE -gt 500 ]
then
    echo "Big History!"
else
    echo "Short History"
fi

Big History!

Viewing and finding¶

wc and cat¶

# Counting lines, words, and characters
wc /etc/hosts

 28  54 631 /etc/hosts

# Usually we only want the lines
wc -l /etc/hosts

28 /etc/hosts

# Print the beginning of a file
head -n5 /etc/hosts

127.0.0.1	localhost
127.0.1.1	aldaia

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback

# Same with the end
tail -n5 /etc/hosts

## vagrant-hostmanager
#10.0.2.70 grafana.cartodb.net
#10.0.2.82 kibana.cartodb.net

127.0.1.1 cartodb.localhost

# Combining pipes with wc to count how many files are in a folder
ls /var/log/*.log* \
| wc -l

58

find¶

# Finding all files inside a folder based in their names and types
# we redirect errors to the black hole at /dev/null
find /var/log -type f -name "*log*" 2> /dev/null \
| wc -l

203

# Finding and deleting
find /tmp/ -name ".DS_Store" -type f -delete 2> /dev/null

# Finding and executing a command (count lines in gitignore files)
find ~/src/carto -name ".gitignore" -type f -exec wc -l {} \;

13 /home/jsanz/src/carto/blog/.gitignore
1 /home/jsanz/src/carto/carto-workshop/.gitignore
3 /home/jsanz/src/carto/carto-workshop/06-sdks/exercises/python_SDK/.gitignore
3 /home/jsanz/src/carto/onpremises/.gitignore
2 /home/jsanz/src/carto/onpremises/omnibus/.gitignore
1 /home/jsanz/src/carto/onpremises/doc/.gitignore
8 /home/jsanz/src/carto/product-updates/.gitignore
2 /home/jsanz/src/carto/customer_success/projects/infogroup/.gitignore
1 /home/jsanz/src/carto/customer_success/projects/cushman_wakefield/201611_office_locations/.gitignore
1 /home/jsanz/src/carto/customer_success/outreach/tech_touch/.gitignore
55 /home/jsanz/src/carto/solutions/.gitignore
4 /home/jsanz/src/carto/solutions/onprem/ssh/.gitignore
2 /home/jsanz/src/carto/solutions/doc/.gitignore
12 /home/jsanz/src/carto/help/.gitignore
1 /home/jsanz/src/carto/help/glossary_download/.gitignore
2 /home/jsanz/src/carto/talks/.gitignore
12 /home/jsanz/src/carto/talks/resources/reveal.js/reveal.js-3.5.0/.gitignore
1 /home/jsanz/src/carto/support/src/followups/.gitignore
89 /home/jsanz/src/carto/support/src/metrics/.gitignore
2 /home/jsanz/src/carto/support/.gitignore
92 /home/jsanz/src/carto/support/guides/import-sql-api-test/.gitignore
9 /home/jsanz/src/carto/support/guides/mobile_test/src/.gitignore
1 /home/jsanz/src/carto/support/guides/mobile_test/src/app/.gitignore
39 /home/jsanz/src/carto/Reveal/.gitignore

sed¶

Great for working with text files:

replacing texts
finding patterns
working with line numbers

# Print comments in the hosts file
sed -n -e '/\#.*/p' /etc/hosts

# The following lines are desirable for IPv6 capable hosts
#192.168.56.101 cartodb.lan
# 5.32.4.75 cartodb.lan
## vagrant-hostmanager-start id: aea15025-4dcf-452d-a47e-7e270a187de8
## vagrant-hostmanager-end
## vagrant-hostmanager-start id: 0a853cbe-6d72-4784-aea1-137a727a3f0a
## vagrant-hostmanager
#10.0.2.70 grafana.cartodb.net
#10.0.2.82 kibana.cartodb.net

# Remove comments and then remove blank lines
sed -e '/\#.*/d' -e '/^\s*$/d' /etc/hosts

127.0.0.1	localhost
127.0.1.1	aldaia
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.33.10 carto.lan
10.0.3.79	bcg.devnet.local
10.0.3.140	carto.lan
127.0.1.1 cartodb.localhost

Silver Searcher¶

The Silver Searcher, quick find anything on your code

ag betis ~/src/carto/help

/home/jsanz/src/carto/help/_app/_contents/working-with-data/development/tools.md
78:![CDB Manager]({{ site.baseurl }}/images/working-with-data/tools/betis.png)
80:### Why use betis.carto.io?
82:Normally, to use CDB Manager you would have to clone [its GitHub repo](https://github.com/CartoDB/cdb-manager), then run a Python server and open its interface via localhost. This site is a more convenient way to use the tool: [https://betis.carto.io/](https://betis.carto.io/) is just a static hosted version of CDB Manager.
84:**Note:** We named this site `betis` as a small tribute to CDB Manager's main developer, Dani Carrión. He's a huge fan of [Real Betis Balompié](https://en.wikipedia.org/wiki/Real_Betis), a Spanish football team from Seville.
88:1. Visit [https://betis.carto.io/](https://betis.carto.io/)

Accessing web services and APIs¶

curl¶

https://curl.haxx.se/docs/manpage.html

# accessing a simple URL
curl http://numbersapi.com/8/26

August 26th is the day in 1821 that the University of Buenos Aires, Argentina, is officially opened.

# making a POST request with a Content Type header
curl \
--data '{"q":"select user"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql

{"rows":[{"user":"cartodb_publicuser_029cd392-0d6a-471d-83d1-f13ded9648d7"}],"time":0.008,"fields":{"user":{"type":"name"}},"total_rows":1}

# making a HEAD request
curl \
--head \
https://jsanz.carto.com/api/v2/sql?q=select+user

HTTP/1.1 200 OK
Server: openresty
Date: Thu, 09 May 2019 15:25:34 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With, X-Prototype-Version, X-CSRF-Token, Authorization
Carto-Rate-Limit-Limit: 81
Carto-Rate-Limit-Remaining: 80
Carto-Rate-Limit-Reset: 0
vary: Authorization
X-SQLAPI-Log: {"request":{"sql":{"type":"query","sql":"select user"}}}
Content-Disposition: inline; filename=cartodb-query.json; modification-date="Thu, 09 May 2019 15:25:34 GMT";
Cache-Control: no-cache,max-age=31536000,must-revalidate,public
Last-Modified: Thu, 09 May 2019 15:25:34 GMT
X-SQLAPI-Profiler: {"authorization":2,"getConnectionParams":1,"getUserTimeoutLimits":1,"queryExplain":6,"eventedQuery":2,"beforeSink":1,"total":13}
X-Varnish: 52190654
Age: 0
X-Cache: MISS

# formatting the result using python3
curl -s \
--data '{"q":"select user"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql \
| python3 -m json.tool

{
    "rows": [
        {
            "user": "cartodb_publicuser_029cd392-0d6a-471d-83d1-f13ded9648d7"
        }
    ],
    "time": 0.004,
    "fields": {
        "user": {
            "type": "name"
        }
    },
    "total_rows": 1
}

Handling data¶

jq¶

jq is a JSON processor, more advance that just pretty printing a result

# just formatting the output
curl -s \
--data '{"q":"select cartodb_id, scalerank, featurecla, name, worldcity from populated_places limit 5"}' \
--header "content-type:application/json" \
https://jsanz.carto.com/api/v2/sql > /tmp/data.json

cat /tmp/data.json | jq .

{
  "rows": [
    {
      "cartodb_id": 2338,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Kwekwe",
      "worldcity": 0
    },
    {
      "cartodb_id": 2339,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Plumtree",
      "worldcity": 0
    },
    {
      "cartodb_id": 2340,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Beitbridge",
      "worldcity": 0
    },
    {
      "cartodb_id": 2341,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Gwanda",
      "worldcity": 0
    },
    {
      "cartodb_id": 2342,
      "scalerank": 7,
      "featurecla": "Populated place",
      "name": "Chiredzi",
      "worldcity": 0
    }
  ],
  "time": 0.005,
  "fields": {
    "cartodb_id": {
      "type": "number"
    },
    "scalerank": {
      "type": "number"
    },
    "featurecla": {
      "type": "string"
    },
    "name": {
      "type": "string"
    },
    "worldcity": {
      "type": "number"
    }
  },
  "total_rows": 5
}

# processing the results to show only the names
cat /tmp/data.json \
| jq ".rows[].name"

"Kwekwe"
"Plumtree"
"Beitbridge"
"Gwanda"
"Chiredzi"

# generate a new JSON with custom properties
cat /tmp/data.json \
| jq ".rows[] | {id: .cartodb_id, scale: .scalerank, name: .name}"

{
  "id": 2338,
  "scale": 7,
  "name": "Kwekwe"
}
{
  "id": 2339,
  "scale": 7,
  "name": "Plumtree"
}
{
  "id": 2340,
  "scale": 7,
  "name": "Beitbridge"
}
{
  "id": 2341,
  "scale": 7,
  "name": "Gwanda"
}
{
  "id": 2342,
  "scale": 7,
  "name": "Chiredzi"
}

# combining with other tools
curl -sk -G "${CARTO_API_URL}api/v1/map/named/?api_key=${CARTO_API_KEY}" \
| jq ".template_ids[]" \
| tr -d '"' \
| sort \
| head

cartoframes_ver20170406_layers1_time0_baseid0_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid1_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid1_labels0_zoom1
cartoframes_ver20170406_layers1_time0_baseid2_labels0_zoom0
cartoframes_ver20170406_layers1_time0_baseid2_labels0_zoom1
cartoframes_ver20170406_layers1_time0_baseid2_labels1_zoom0
cartoframes_ver20170406_layers1_time0_baseid2_labels1_zoom1
tpl_01b39da5_5c92_4d43_93b8_7814b8c4037b
tpl_042460ae_a5a4_11e6_88d3_0e233c30368f
tpl_05f7c3cb_7faf_4397_80af_82918208096b

json2csv¶

https://www.npmjs.com/package/json2csv

cat /tmp/data.json | jq ".rows[]" \
| json2csv > /tmp/data.csv

cat /tmp/data.csv

"cartodb_id","scalerank","featurecla","name","worldcity"
2338,7,"Populated place","Kwekwe",0
2339,7,"Populated place","Plumtree",0
2340,7,"Populated place","Beitbridge",0
2341,7,"Populated place","Gwanda",0
2342,7,"Populated place","Chiredzi",0

csvkit¶

https://csvkit.readthedocs.io/en/latest/

csvclean: fix common CSV problems
csvcut: remove columns
csvgrep: filtering data
csvjoin: join two CSVs by a given column
csvjson: generate json or GeoJSON
csvlook: pretty print CSV
csvstat: basic statistics

# convert to CSV and pretty print the table
cat /tmp/data.csv \
| csvlook

| cartodb_id | scalerank | featurecla      | name       | worldcity |
| ---------- | --------- | --------------- | ---------- | --------- |
|      2.338 |         7 | Populated place | Kwekwe     |     False |
|      2.339 |         7 | Populated place | Plumtree   |     False |
|      2.340 |         7 | Populated place | Beitbridge |     False |
|      2.341 |         7 | Populated place | Gwanda     |     False |
|      2.342 |         7 | Populated place | Chiredzi   |     False |

cat /tmp/data.csv \
| csvgrep -c 4 -m tree \
| csvlook

| cartodb_id | scalerank | featurecla      | name     | worldcity |
| ---------- | --------- | --------------- | -------- | --------- |
|      2.339 |         7 | Populated place | Plumtree |     False |

cat /tmp/data.csv | csvstat

  1. "cartodb_id"

	Type of data:          Number
	Contains null values:  False
	Unique values:         5
	Smallest value:        2.338
	Largest value:         2.342
	Sum:                   11.700
	Mean:                  2.340
	Median:                2.340
	StDev:                 1,581
	Most common values:    2.338 (1x)
	                       2.339 (1x)
	                       2.340 (1x)
	                       2.341 (1x)
	                       2.342 (1x)

  2. "scalerank"

	Type of data:          Number
	Contains null values:  False
	Unique values:         1
	Smallest value:        7
	Largest value:         7
	Sum:                   35
	Mean:                  7
	Median:                7
	StDev:                 0
	Most common values:    7 (5x)

  3. "featurecla"

	Type of data:          Text
	Contains null values:  False
	Unique values:         1
	Longest value:         15 characters
	Most common values:    Populated place (5x)

  4. "name"

	Type of data:          Text
	Contains null values:  False
	Unique values:         5
	Longest value:         10 characters
	Most common values:    Kwekwe (1x)
	                       Plumtree (1x)
	                       Beitbridge (1x)
	                       Gwanda (1x)
	                       Chiredzi (1x)

  5. "worldcity"

	Type of data:          Boolean
	Contains null values:  False
	Unique values:         1
	Most common values:    False (5x)

Row count: 5

OGR¶

OGR is the vector brother of the well-known GDAL library to process raster geospatial data.

There are two interesting commands to learn to inspect and process geospatial datasets:

# ogrinfo presents an overview of a layer from a data source: type, extent, SRS, schema, etc
ogrinfo -summary ~/media/carto/flights.gpkg cartodb-query

INFO: Open of `/home/jsanz/media/carto/flights.gpkg'
      using driver `GPKG' successful.

Layer name: cartodb-query
Geometry: Point
Feature Count: 395633
Extent: (-158.517000, 17.962500) - (-65.685200, 62.540400)
Layer SRS WKT:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
FID Column = cartodb_id
Geometry Column = geom
mfr_mdl_code: Integer (0.0)
fid: Integer (0.0)
adshex: String (0.0)
latitude: Real (0.0)
longitude: Real (0.0)
altitude: Integer (0.0)
speed: Integer (0.0)
track: Integer (0.0)
squawk: Integer (0.0)
timestamp: DateTime (0.0)
year_mfr: Integer (0.0)
type_aircraft: Integer (0.0)
agency: String (0.0)
flight_id: String (0.0)
mfr: String (0.0)
model: String (0.0)
n_number: String (0.0)
name: String (0.0)
other_names1: String (0.0)
other_names2: String (0.0)
serial_number: String (0.0)
type: String (0.0)

# ogr2ogr processes and converts a datasource between different formats

# on this example we generate a new Geopackage only with one geometry from the original dataset
# this is useful to upload to carto only a dataset definition (after importing you delete that single row)
ogr2ogr \
-f GPKG  \
-overwrite \
-limit 1 \
-nln flights_first \
/tmp/flights_first.gpkg ~/media/carto/flights.gpkg

ogrinfo -summary /tmp/flights_first.gpkg flights_first \
| head -n7

INFO: Open of `/tmp/flights_first.gpkg'
      using driver `GPKG' successful.

Layer name: flights_first
Geometry: Point
Feature Count: 1
Extent: (-122.137000, 37.709900) - (-122.137000, 37.709900)

Miscellany¶

autojump¶

Jump around your file system without traversing folders

https://github.com/wting/autojump

I have aliased the command autojump to the key j because I'm so lazy

pigz¶

pigz is a parallized version of gzip, it uses all your cores to compress/decompress your files

htop¶

htopis a convenient way to inspect your running process, filter, kill, etc

duc¶

duc is a cached du command, that is, a tool that builds an index of how your hard disk is being used and helps you understand where your gigabytes are. It comes with cli and graphical interfaces.

duc ls ~/media/carto | head

 15.4G ncdc
  6.6G osm
  2.6G ordenance_survey
  2.6G nba
  1.4G gadm
  1.4G geonames
1013.2M waze
858.2M zcta
842.6M catastro-osm
664.8M landast

translate¶

# Translator using several providers
# https://github.com/soimort/translate-shell
trans -brief es:en 'cuando se despertó, el dinosaurio aún estaba allí'
trans -brief es:zh+ja 'cuando se despertó, el dinosaurio aún estaba allí'

When he woke up, the dinosaur was still there
当他醒来时，恐龙还在那里
彼が目を覚ましたとき、恐竜はまだそこにいました

pygmentize¶

# Pretty print source code
# http://pygments.org/docs/cmdline/
pygmentize -g /home/jsanz/src/sdks/carto-python/carto/fields.py

"""
Module for defining response objects

.. module:: carto.fields
   :platform: Unix, Windows
   :synopsis: Module for defining response objects

.. moduleauthor:: Daniel Carrion <daniel@carto.com>
.. moduleauthor:: Alberto Romeu <alrocar@carto.com>


"""

from pyrestcli.fields import ResourceField


class VisualizationField(ResourceField):
    """
    :class:`carto.visualizations.Visualization`
    """
    value_class = "carto.visualizations.Visualization"


class TableField(ResourceField):
    """
    :class:`carto.tables.Table`
    """
    value_class = "carto.tables.Table"


class UserField(ResourceField):
    """
    :class:`carto.users.User`
    """
    value_class = "carto.users.User"


class EntityField(ResourceField):
    """
    :class:`carto.permissions.Entity`
    """
    value_class = "carto.permissions.Entity"


class PermissionField(ResourceField):
    """
    :class:`carto.permissions.Permission`
    """
    value_class = "carto.permissions.Permission"

one liners¶

# Get the top ten commands from your history
# (you may have to change the column index in my case was $3)

history \
| awk '{CMD[$3]++;count++;}END { for (a in CMD)print CMD[a] " " CMD[a]/count*100 "% " a;}' \
| grep -v "./" \
| column -c3 -s " " -t \
| sort -nr \
| nl \
| head -n10

Resources¶

The complete introduction to The Command Line
Solutions summit 2017 CLI talk
useful commands to get info about your system
for loops
if statements
about redirections
35 find examples
sed examples: 1, 2 and 3