Numerical Analysis & Statistics: MATLAB, R, NumPy, Julia

a side-by-side reference sheet

grammar and invocation | variables and expressions | arithmetic and logic | strings | regexes | dates and time | tuples | arrays | arithmetic sequences | 2d arrays | 3d arrays | dictionaries | functions | execution control | file handles | directories | processes and environment | libraries and namespaces | reflection

tables | import and export | relational algebra | aggregation

vectors | matrices | sparse matrices | optimization | polynomials | descriptive statistics | distributions | linear regression | statistical tests | time series | fast fourier transform | clustering

univariate charts | bivariate charts | multivariate charts

version usedMATLAB 8.3

Octave 3.8
3.1Python 2.7
NumPy 1.7
SciPy 0.13
Pandas 0.12
Matplotlib 1.3
show version$ matlab -nojvm -nodisplay -r 'exit'

$ octave --version
$ R --versionsys.version
$ julia --version
implicit prologuenoneimport sys, os, re, math
import numpy as np
import scipy as sp
import scipy.stats as stats
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
grammar and invocation
$ cat >>foo.m
1 + 1

$ matlab -nojvm -nodisplay -r "run('foo.m')"

$ octave foo.m
$ cat >>foo.r
1 + 1

$ Rscript foo.r

$ R -f foo.r
$ cat >>
print(1 + 1)

$ python
$ cat >>foo.jl
println(1 + 1)

$ julia foo.jl
$ matlab -nojvm -nodisplay

$ octave
$ R$ python$ julia
command line program$ matlab -nojvm -nodisplay -r 'disp(1 + 1); exit'

$ octave --silent --eval '1 + 1'
$ Rscript -e 'print("hi")'python -c 'print("hi")'$ julia -e 'println("hi")'
block delimitersfunction end
if elseif else end
while end
for end
{ }offside rule
statement separator; or newline

Newlines not separators after three dots: ...

Output is suppressed when lines end with a semicolon.
; or sometimes newline

Newlines not separators inside (), [], {}, '', "", or after binary operator.
newline or ;

Newlines not separators inside (), [], {}, triple quote literals, or after backslash: \
end-of-line comment
1 + 1 % addition1 + 1 # addition1 + 1 # addition1 + 1 # addition
variables and expressions
assignmenti = 3i = 3
i <- 3
3 -> i
assign("i", 3)
i = 3i = 3
parallel assignment
compound assignment
arithmetic, string, logical
nonenone# do not return values:
+= -= *= /= //= %= **=
+= *=
&= |= ^=
null% Only used in place of numeric values:
NA NULLNone np.nan

# None cannot be stored in a numpy array;
# np.nan can if dtype is float64.
# Only used in place of float values:
null testisnan(v)

% true for '', []:
v == None
v is None

# np.nan == np.nan is False
conditional expressionnone(if (x > 0) x else -x)
ifelse(x > 0, x, -x)
x if x > 0 else -xx > 0 ? x : -x
arithmetic and logic
true and false
1 0 true falseTRUE FALSE T FTrue Falsetrue false
falsehoodsfalse 0 0.0
matrices evaluate to false unless nonempty and all entries evaluate to true
FALSE F 0 0.0
matrices evaluate to value of first entry; string in boolean context causes error
False None 0 0.0 '' [] {}false
logical operators~true | (true & false)

% short-circuit operators:
&& ||
short-circuit operators:
&& ||

& and | can operate on and return vectors, but && and || return scalars
and or not&& || !
relational operators
== ~= > < >= <=== != > < >= <=== != > < >= <=== != > < >= <=
integer typeInt8 Int16 Int32 Int64 Int128

Int is either 32 or 64 bits, depending on WORD_SIZE
unsigned typeUInt8 UInt16 UInt32 UInt64 UInt128

UInt is either 32 or 64 bits, depending on WORD_SIZE
float type
Float16 Float32 Float64
arithmetic operators
add, sub, mult, div, quot, rem
+ - * / none mod(n, divisor)+ - * / %/% %%+ - * / // %+ - * / div(n, divisor) rem(n, divisor)

# always non-negative:
mod(n, divisor)
integer division
fix(13 / 5)13 %/% 5
as.integer(13 / 5)
13 // 5div(13, 5)
integer division by zero
Inf NaN or -Infresult of converting Inf or NaN to an integer with as.integer:
raises ZeroDivisionErrorraises DivideError
float division
13 / 513 / 5float(13) / 513 / 5
5 \ 13
float division by zero
dividend is positive, zero, negative
these values are literals:
these values are literals:
raises ZeroDivisionErrorthese values are literals:
power2 ^ 162 ^ 16
2 ** 16
2 ** 162 ^ 16
sqrt -1% returns 0 + 1i:
# returns NaN:

# returns 0+1i:
# raises ValueError:

# returns 1.41421j:
import cmath
# raises DomainError:

# returns 0.0 + 1.0im:
sqrt(-1 + 0im)
transcendental functionsexp log sin cos tan asin acos atan atan2exp log sin cos tan asin acos atan atan2math.exp math.log math.sin math.cos math.tan math.asin math.acos math.atan math.atan2exp log sin cos tan asin acos atan atan2
transcendental constantspi epi exp(1)math.pi math.epi e
float truncation
round towards zero, to nearest integer, down, up

# trunc() and other functions return floats.
# Int() raises InexactError if float argument has
# nonzero fractional portion.
absolute value
and signum
abs signabs signabs(-3.7)
math.copysign(1, -3.7)
integer overflowbecomes float; largest representable integer in the variable intmaxbecomes float; largest representable integer in the variable .Machine$integer.maxbecomes arbitrary length integer of type long
float overflow
InfInfraises OverflowError
float limits
rational construction22 // 7
rational decompositionnum(22 // 7)
den(22 // 7)
complex construction
1 + 3i1 + 3i1 + 3j1 + 3im
complex(1, 3)
complex decompositionreal imag
abs arg
Re Im
abs Arg
import cmath

real(1 + 3im)
imag(1 + 3im)
abs(1 + 3im)
angle(1 + 3im)
conj(1 + 3im)
random number
uniform integer, uniform float
floor(100 * rand)
floor(100 * runif(1))
np.random.randint(0, 100)
random seed
set, get, and restore
rand('state', 17)
sd = rand('state')
rand('state', sd)
sd = .Random.seed
sd = np.random.get_state()
bit operatorsbitshift(100, 3)
bitshift(100, -3)
bitand(1, 2)
bitor(1, 2)
bitxor(1, 2)
bitcmp(1, 'uint16')
% Octave:
bitcmp(1, 16)
none100 << 3
100 >> 3
1 & 2
1 | 2
1 ^ 2
binary, octal, and hex literals0b101010
convert integer to and from string with radix
base(7, 42)
parse(Int, "60", 7)
literal'don''t say "no"'

% Octave only:
"don't say \"no\""
"don't say \"no\""
'don\'t say "no"'
'don\'t say "no"'
"don't say \"no\""
r"don't " r'say "no"'
"don't say \"no\""
newline in literal
literal escapes% Octave double quote only:
\\ \" \' \0 \a \b \f \n \r \t \v
\\ \" \' \a \b \f \n \r \t \v \ooo# single and double quoted:
\newline \\ \' \" \a \b \f \n \r \t \v \ooo \xhh
\\ \" \' \a \b \f \n \t \r \v
\ooo \xhh \uhhhh \Uhhhhhhhh
variable interpolationcount = 3
item = "ball"
println("$count $(item)s")
expression interpolation
"1 + 1 = $(1 + 1)"
concatenatestrcat('one ', 'two ', 'three')paste("one", "two", "three", sep=" ")'one ' + 'two ' + 'three'
literals, but not variables, can be concatenated with juxtaposition:
'one ' "two " 'three'
"one " * "two " * "three"

string("one ", "two ", "three")
hbar = repmat('-', 1, 80)hbar = paste(rep('-', 80), collapse='')hbar = '-' * 80hbar = "-" ^ 80

hbar = repeat("-", 80)
index of substring% returns array of one-indexed
% locations

strfind('hello', 'el')
counts from one, returns
-1 if not found

regexpr("el", "hello")
# Counts from zero; raises ValueError if not found:
# returns UnitRange:
search("hello", "el")
extract substring
s = 'hello'
% syntax error: 'hello'(1:4)
substr("hello", 1, 4)'hello'[0:4]"hello"[1:4]
split% returns cell array:
strsplit('foo,bar,baz', ',')
strsplit('foo,bar,baz', ',')'foo,bar,baz'.split(',')split("foo,bar,baz", ",")
join% takes cell array as arg:
strjoin({'foo', 'bar', 'baz'}, ',')
paste("foo", "bar", "baz", sep=",")
paste(c('foo', 'bar', 'baz'),
','.join(['foo', 'bar', 'baz'])join(["foo", "bar", "baz"], ",")
both sides, left, right
strtrim(' foo ')
deblank('foo ')
gsub("(^[\n\t ]+|[\n\t ]+$)",
  " foo ")
sub("^[\n\t ]+", "", " foo")
sub("[\n\t ]+$", "", "foo ")
' foo '.strip()
' foo'.lstrip()
'foo '.rstrip()
on right, on left, centered
s = repmat(' ', 1, 10)
s(1:5) = 'lorem'
strjust(s, 'left')
strjust(s, 'right')
strjust(s, 'center')
sprintf("%-10s", "lorem")
sprintf("%10s", "lorem")
number to string
strcat('value: ', num2str(8))paste("value: ", toString("8"))'value: ' + str(8)
string to number7 + str2num('12')
73.9 + str2num('.037')
7 + as.integer("12")
73.9 + as.double(".037")
7 + int('12')
73.9 + float('.037')
7 + parse(Int, "12")
73.9 + parse(Float64, ".037")
translate caselower('FOO')
sprintf('%s: %.3f %d', 'foo', 2.2, 7)sprintf("%s: %.3f %d", "foo", 2.2, 7)'%s: %.3f %d' % ('foo', 2.2, 7)@sprintf("%s: %.2f %d", "foo", 2.2, 7)

# index of first byte of last char:
character access
s = 'hello'
% syntax error: 'hello'(1)
substr("hello", 1, 1)'hello'[0]"hello"[1]

# index must be byte-index of the first byte of a
# character. Raises BoundsErrror if no such byte,
# and UnicodeError if byte not first in char.
chr and ordchar(65)
regular expressions
character class abbreviations. \d \D \s \S \w \W

% also C-string style backslash escapes:
\a \b \f \n \r \t \v
# escape backslash in strings by doubling:
. \d \D \s \S \w \W
. \d \D \s \S \w \W. \d \D \h \H \s \S \v \V \w \W
^ $ \< \># escape backslash in strings by doubling:
^ $ \< \> \b \B
^ $ \A \b \B \Z^ $ \A \b \B \z \Z
match testregexp('hello', '^[a-z]+$')
regexp('hello', '^\S+$')
regexpr("^[a-z]+$", "hello") > 0
regexpr('^\\S+$', "hello") > 0'^[a-z]+$', 'hello')'^\S+$', 'hello')
ismatch(r"^[a-z]+$", "hello")
case insensitive match testregexpi('Lorem Ipsum', 'lorem')regexpr('(?i)lorem', "Lorem Ipsum") >'lorem', 'Lorem Ipsum', re.I)ismatch(r"lorem"i, "Lorem Ipsum")
none(?i) (?m) (?s) (?x)re.I re.M re.S re.Xi m s x
first match, all matches
s = 'do re mi mi mi'
regexprep(s, 'ma', 'once')
regexprep(s, 'mi', 'ma')
sub('mi', 'ma', 'do re mi mi mi')
gsub('mi', 'ma', 'do re mi mi mi')
rx = re.compile(r'mi')
s = rx.sub('ma', 'do re mi mi mi', 1)
s2 = rx.sub('ma', 'do re mi mi mi')
replace("do re mi mi mi", r"mi", s"ma", 1)
replace("do re mi mi mi", r"mi", s"ma")
backreference in match and substitutionregexp('do do', '(\w+) \1')
regexprep('do re', '(\w+) (\w+)', '$2 $1')
regexpr('(\\w+) \\1', 'do do') > 0
sub('(\\w+) (\\w+)', '\\2 \\1', 'do re')

rx = re.compile(r'(\w+) (\w+)')
rx.sub(r'\2 \1', 'do re')
ismatch(r"(\w+) \1", "do do")
group capturerx = '(\d{4})-(\d{2})-(\d{2})'
m =, '2010-06-03')
yr, mo, dy = m.groups()
rx = r"(\d{4})-(\d{2})-(\d{2})"
m = match(rx, "2010-06-03")
yr, mn, dy = m.captures
dates and time
current date/time
t = nowt = as.POSIXlt(Sys.time())import datetime

t =
t = now()
date/time typefloating point number representing days since year 0 in the Gregorian calendarPOSIXltdatettimeDateTime
date/time difference typefloating point number representing daysa difftime object which behaves like a floating point number representing secondstimedelta, which can be converted to float value in seconds via total_seconds() methodBase.Dates.Millisecond
get date partsdv = datevec(t)
% syntax error: datevec(t)(1)
t$year + 1900
t$mon + 1
get time partsdv = datevec(t)
build date/time from partst = datenum([2011 9 20 23 1 2])t = as.POSIXlt(Sys.time())
t$year = 2011 - 1900
t$mon = 9 - 1
t$mday = 20
t$hour = 23
t$min = 1
t$sec = 2
import datetime

t = datetime.datetime(2011, 9, 20, 23, 1, 2)
t = DateTime(2011, 9, 20, 23, 1, 2)
convert to string
parse datetimes = '2011-09-20 23:01:02'
fmt = 'yyyy-mm-dd HH:MM:SS'
t = datenum(s, fmt)
t = strptime('2011-09-20 23:01:02',
  '%Y-%m-%d %H:%M:%S')
import datetime

s = '2011-05-03 10:00:00'
fmt = '%Y-%m-%d %H:%M:%S'
t = datetime.datetime.strptime(s, fmt)
fmt = "yyyy-mm-dd HH:MM:SS"
t = DateTime("2011-05-03 10:00:00", fmt)

# fmt string can be compiled:
df = Dates.DateFormat(fmt)
t2 = DateTime("2011-05-03 10:00:00", df)
format datetime
datestr(t, 'yyyy-mm-dd HH:MM:SS')format(t, format='%Y-%m-%d %H:%M:%S')t.strftime('%Y-%m-%d %H:%M:%S')Dates.format(t, "yyyy-mm-dd HH:MM:SS")
sleeppause(0.5)Sys.sleep(0.5)import time

celllisttupleTuple{T[, ...]}
tup = {1.7, 'hello', [1 2 3]}tup = list(1.7, "hello", c(1, 2, 3))tup = (1.7, "hello", [1,2,3])tup = (1.7, "foo", [1, 2, 3])
lookup element
% indices start at one:
# indices start at one:
# indices start at zero:
# indices start at one:
update element
tup{1} = 2.7tup[[1]] = 2.7tuples are immutabletuples are immutable
element typealways numeric# "numeric":
class(c(1, 2, 3))

# arrays can also have "boolean" or "string" elements
# values can have different types:
[type(x) for x in a]
a = [1, 2, 3]

# Array{Int64, 2}:
# Int64:
a = [1, 2, 3, 4]

% commas are optional:
a = [1 2 3 4]
# use c() constructor:
a = c(1, 2, 3, 4)
a = [1, 2, 3, 4]a = [1, 2, 3, 4]
empty test
length(a) == 0

% An array used in a conditional test is
% false unless nonempty and all entries evaluate
% as true.
length(a) == 0not aisempty(a)
% Indices start at one:
# Indices start at one:
# Indices start at zero:
# Indices start at one:
a(1) = -1a[1] = -1a[0] = -1a[1] = -1
out-of-bounds behaviora = []

% error:

% increases array size to 10;
% zero-fills slots 1 through 9:

a(10) = 10
a = c()
# evaluates as NA:
# increases array size to 10:
a[10] = "lorem"
a = []
# raises IndexError:
# raises IndexError:
a[10] = 'lorem'
a = []

# raises BoundsError:
# raises BoundsError:
a[10] = "lorem"
index of elementa = [7 8 9 10 8]

% returns [2 5]:
find(a == 8)

% returns 2:
find(a == 8, 1, 'first')
a = c('x', 'y', 'z', 'w', 'y')

# c(2, 5):
which(a == 'y')
a = ['x', 'y', 'z', 'w', 'y']

a.index('y')   # 1
a.rindex('y')  # 4
a = ["x", "y", "z", "w", "y"]

# 2:
findfirst(a, "y")
# 5:
julia> findlast(a, "y")
by endpoints
a = ['a' 'b' 'c' 'd' 'e']

% ['c' 'd']:
a = c("a", "b", "c", "d", "e")

# c("c", "d"):
a[seq(3, 4)]
a = ['a', 'b', 'c', 'd', 'e']

# ['c', 'd']:
a = ["a", "b", "c", "d", "e"]

# ["c", "d"]:
slice to end
a = ['a' 'b' 'c' 'd' 'e']

% ['c' 'd' 'e']:
a = c("a", "b", "c", "d", "e")

# both return c("c", "d", "e"):
tail(a, n=length(a) - 2)
a = ['a', 'b', 'c', 'd', 'e']

# ['c', 'd', 'e']:
a = ["a", "b", "c", "d", "e"]

# ["c", "d", "e"]:
integer array as index[7 8 9]([1 3 3])c(7, 8, 9)[c(1, 3, 3)]np.array([7, 8, 9])[[0, 2, 2]]# [7, 9, 9]:
[7, 8, 9][[1, 3, 3]]
logical array as index[7 8 9]([true false true])c(7, 8, 9)[c(T, F, T)]np.array([7, 8, 9])[[True, False, True]]# [7, 9]:
[7, 8, 9][[true, false, true]]
concatenatea = [1 2 3]
a2 = [a [4 5 6]]
a = [a [4 5 6]]
% or:
a = horzcat(a, a2)
a = c(1, 2, 3)
a2 = append(a, c(4, 5, 6))
a = append(a, c(4, 5, 6))
a = [1, 2, 3]
a2 = a + [4, 5, 6]
a.extend([4, 5, 6])
a = [1, 2, 3]
a2 = vcat(a, [4, 5, 6])
a = vcat(a, [4, 5, 6])
replicatea = repmat(NA, 1, 10)a = rep(NA, 10)

# 30 a's, 50 b's, and 90 c's:
rep(c("a", "b", "c"), c(30, 50, 90))
a = [None] * 10
a = [None for i in range(0, 10)]
fill(NaN, 10)
address copy, shallow copy, deep copy
There is no address copy. Because arrays cannot be nested, there is no distinction between shallow copy and deep copy. Assignment and passing an array to a function can be regarded as performing a shallow or deep copy, though MATLAB does not allocate memory for a 2nd array until one of the arrays is modified.Arrays in R behave like arrays in MATLAB.import copy

a = [1, 2, [3, 4]]

a2 = a
a3 = list(a)
a4 = copy.deepcopy(a)
a = Any[1, 2, [3, 4]]

a2 = a
a3 = copy(a)
a4 = deepcopy(a)
a = [9 7 3]
for i = 1:numel(a)
  x = a(i)
for (x in c(9, 7, 3)) {
for i in [9, 7, 3]:
for i = [9, 7, 3]
indexed iterationfor (i in seq_along(a)) {
  cat(sprintf("%s at index %d\n", i, a[i]))
a = ['do', 're', 'mi', 'fa']
for i, s in enumerate(a):
  print('%s at index %d' % (s, i))
a = ["do", "re", "mi", "fa"]
for (i, s) in enumerate(a)
  println(i, " ", s)
reversea = [1 2 3]
a2 = fliplr(a)
a = fliplr(a)
a = c(1, 2, 3)
a2 = rev(a)
a = rev(a)
a = [1, 2, 3]
a2 = a[::-1]
a = [1, 2, 3]
a2 = reverse(a)
sorta = [3 1 4 2]
a = sort(a)
a = c('b', 'A', 'a', 'B')
a2 = sort(a)
a = sort(a)
a = ['b', 'A', 'a', 'B']
a = [3, 1, 4, 2]
a2 = sort(a)
dedupea = [1 2 2 3]
a2 = unique(a)
a = c(1, 2, 2, 3)
a2 = unique(a)
a = [1, 2, 2, 3]
a2 = list(set(a)))
a = unique([1, 2, 2, 3])
ismember(7, a)7 %in% a
is.element(7, a)
7 in a7 in a
7 ∈ a
a ∋ 7
intersect([1 2], [2 3 4])intersect(c(1, 2), c(2, 3, 4)){1, 2} & {2, 3, 4}intersection([1, 2], [2, 3, 4])
∩([1, 2], [2, 3, 4])
union([1 2], [2 3 4])union(c(1, 2), c(2, 3, 4)){1, 2} | {2, 3, 4}union([1, 2], [2, 3, 4])
∪([1, 2], [2, 3, 4])
relative complement, symmetric differencesetdiff([1 2 3], [2])

a1 = [1 2]
a2 = [2 3 4]
union(setdiff(a1, a2), setdiff(a2, a1))
setdiff(c(1, 2, 3), c(2))

union(setdiff(c(1, 2), c(2, 3, 4)),
  setdiff(c(2, 3, 4), c(1, 2)))
{1, 2, 3} - {2}

{1, 2} ^ {2, 3, 4}
setdiff([1, 2, 3], [2])
symdiff([1, 2], [2, 3, 4])
arrayfun( @(x) x*x, [1 2 3])sapply(c(1,2,3), function (x) { x * x})map(lambda x: x * x, [1, 2, 3])
# or use list comprehension:
[x * x for x in [1, 2, 3]]
a = [1 2 3]
a(a > 2)
a = c(1, 2, 3)
a[a > 2]

Filter(function(x) { x > 2}, a)
filter(lambda x: x > 1, [1, 2, 3])
# or use list comprehension:
[x for x in [1, 2, 3] if x > 1]
Reduce(function(x, y) { x + y }, c(1, 2, 3), 0)reduce(lambda x, y: x + y, [1 ,2, 3], 0)reduce(+, [1, 2, 3])
foldl(-, 0, [1, 2, 3])
foldr(-, 0, [1, 2, 3])
universal and existential tests
all(mod([1 2 3 4], 2) == 0)
any(mod([1 2 3 4]) == 0)
all(c(1, 2, 3, 4) %% 2 == 0)
any(c(1, 2, 3, 4) %% 2 == 0)
all(i % 2 == 0 for i in [1, 2, 3, 4])
any(i % 2 == 0 for i in [1, 2, 3, 4])
all([x % 2 == 0 for x in [1, 2, 3, 4]])
any([x % 2 == 0 for x in [1, 2, 3, 4]])
shuffle and samplea = c(1, 1, 2, 3, 9, 28)
sample(a, 3)
from random import shuffle, sample

a = [1, 2, 3, 4]
sample(a, 2)
none; MATLAB arrays can't be nested# R arrays can't be nested.
# One approximation of zip is a 2d array:

a = rbind(c(1, 2, 3), c('a', 'b', 'c'))

# To prevent data type coercion, use a data frame:
df = data.frame(numbers=c(1, 2, 3),
  letters=c('a', 'b', 'c'))
# array of 3 pairs:
a = zip([1, 2, 3], ['a', 'b', 'c'])
arithmetic sequences
unit difference1:100# type integer:
seq(1, 100)

# type double:
seq(1, 100, 1)
range(1, 101)1:100
difference of 100:10:100# type double:
seq(0, 100, 10)
range(0, 101, 10)0:10:100
difference of 0.10:0.1:10seq(0, 10, 0.1)[0.1 * x for x in range(0, 101)]

# 3rd arg is length of sequence, not step size:
sp.linspace(0, 10, 100)
iterate over arithmetic sequence# range replaces xrange in Python 3:
n = 0
for i in xrange(1, 1000001):
  n += i
n = 0
for i in 1:1000000
  n += i
convert arithmetic sequence to arraya = range(1, 11)
# Python 3:
a = list(range(1, 11))
a = Array(1:10)
two dimensional arrays
element typealways numericA = array(c(1, 2, 3))

# "array":

# "boolean", "numeric", or "string":
np.array([1, 2, 3]).dtype

# possible values: np.bool, np.int64, np.float64,
# np.complex128, ...
literal[1, 2; 3, 4]

% commas optional; newlines can replace semicolons::
[1 2
 3 4]
construct from sequencereshape([1 2 3 4], 2, 2)array(c(1, 2, 3, 4), dim=c(2, 2))np.array([1, 2, 3, 4]).reshape(2, 2)
construct from rowsrow1 = [1 2 3]
row2 = [4 5 6]

A = [row1; row2]
rbind(c(1, 2, 3), c(4, 5, 6))np.vstack((np.array([1, 2, 3]), np.array([4, 5, 6])))

np.array([[1, 2], [3, 4]])
construct from columnscol1 = [1; 4]
col2 = [2; 5]
col3 = [3; 6]

% commas are optional:
A = [col1, col2, col3]
cbind(c(1, 4), c(2, 5), c(3, 6))cols = (
  np.array([1, 4]),
  np.array([2, 5]),
  np.array([3, 6])
construct from subarraysA = [1 3; 2 4]

A4_by_2 = [A; A]
A2_by_4 = [A A]
A = matrix(c(1, 2, 3, 4), nrow=2)
A4_by_2 = rbind(A, A)
A2_by_4 = cbind(A, A)
A = np.array([[1, 2], [3, 4]])
A2_by_4 = np.hstack([A, A])
A4_by_2 = np.vstack([A, A])
number of elements, number of dimensions, dimension lengths

% length of 1st dimension (i.e. # of rows):
size(A, 1)

% length of longest dimension:

# number of rows:
lookup% indices start at one:
[1 2; 3 4](1, 1)
# indices start at one:
A = array(c(1, 2, 3, 4), dim=c(2, 2)

A[1, 1]
# indices start at zero:
A = np.array([[1, 2], [3, 4]])

A[0][0] or
A[0, 0]
1d lookupA = [2 4; 6 8]
% returns 8:

% convert to column vector of length 4:
A2 = A(:)
A = array(c(2, 4, 6, 8), dim=c(2, 2))

# returns 8:
A = np.array([[2, 4], [6, 8]])

# returns np.array([6, 8]):

# returns 8:
lookup row or columnA = [1 2 3; 4 5 6; 7 8 9]

% 2nd row:
A(2, :)

% 2nd column:
A(:, 2)
A = t(array(1:9, dim=c(3, 3)))

# 2nd row:
A[2, ]

# 2nd column:
A[, 2]
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# 2nd row:
A[1, :]

# 2nd column:
A[:, 1]
updateA = [2 4; 6 8]
A(1, 1) = 3
A = array(c(2, 4, 6, 8), dim=c(2, 2))
A[1, 1] = 3
A = np.array([[2, 4], [6, 8]])
A[0, 0] = 3
update row or columnA = [1 2; 3 4]

% [2 1; 3 4]:
A(1, :) = [2 1]

% [3 1; 2 4]:
A(:, 1) = [3 2]
A = t(array(1:4, dim=c(2, 2)))

A[1, ] = c(2, 1)
A[, 1] = c(3, 2)
A = np.array([[1, 2], [3, 4]])

A[0, :] = [2, 1]
# or
A[0] = [2, 1]

A[:, 0] = [3, 2]
update subarrayA = ones(3, 3)
A(1:2, 1:2) = 2 * ones(2, 2)
% or just:
A(1:2, 1:2) = 2
A = array(1, dim=c(3, 3))
A[1:2, 1:2] = array(2, dim=c(2, 2))
# or just:
A[1:2, 1:2] = 2
A = np.ones([3, 3])
A[0:2, 0:2] = 2 * np.ones([2, 2])
out-of-bounds behaviorA = [2 4; 6 8]

% error:
x = A(3, 1)

% becomes 3x2 array with zero at (3, 2):
A(3, 1) = 9
Lookups and updates both cause subscript out of bounds error.Lookups and updates both raise an IndexError exception.
slice subarrayA = reshape(1:16, 4, 4)'

% top left 2x2 subarray:
A(1:2, 1:2)

% bottom right 2x2 subarray:
A(end-1:end, end-1:end)

% 2x2 array containing corners:
A([1 4], [1 4])
A([1 end], [1 end])
A = t(array(1:16, dim=c(4, 4)))

# top left 2x2 subarray:
A[1:2, 1:2]

# bottom right 2x2 subarray:
A[-1:-2, -1:-2]

# 2x2 array containing corners:
A[c(1, 4), c(1, 4)]
A = np.array(range(1, 17)).reshape(4, 4)

# top left 2x2 subarray:
A[0:2, 0:2]

# bottom right 2x2 subarray:
A[2:, 2:]
transposeA = [1 2; 3 4]

A = array(c(1, 2, 3, 4), dim=c(2, 2))
A = np.array([[1, 2], [3, 4]])
flip% [ 2 1; 4 3]:
fliplr([1 2; 3 4])

% [3 4; 1 2]:
flipud([1 2; 3 4])
# install.packages('pracma'):

A = t(array(1:4, dim=c(2, 2)))

A = np.array([[1, 2], [3, 4]])

circular shift

along columns, along rows
A = [1 2; 3, 4]

% [3 4; 1 2]:
circshift(A, 1)

% [2 1; 4 3]:
circshift(A, 1, 2)

% The 2nd argument can be any integer; negative values shift
% in the opposite direction.
# install.packages('pracma'):

A = t(array(1:4, dim=c(2, 2)))

circshift(A, c(1, 0))
circshift(A, c(0, 1))
A = np.array([[1, 2], [3, 4]])

np.roll(A, 1, axis=0)
np.roll(A, 1, axis=1)
clockwise, counter-clockwise
A = [1 2; 3 4]

% [3 1; 4 2]:
rot90(A, -1)

% [2 4; 1 3]:

% set 2nd arg to 2 for 180 degree rotation
# install.packages('pracma'):

A = t(array(1:4, dim=c(2, 2)))

rot90(A, -1)
rot90(A, 2)
A = np.array([[1, 2], [3, 4]])

np.rot90(A, -1)
np.rot90(A, 2)
rows, columns
M = [1 2; 3 4]

% sum each row:
cellfun(@sum, num2cell(M, 2))

% sum each column:
cellfun(@sum, num2cell(M, 1))

% sum(M, 2) and sum(M, 1) also sum rows and columns
M = matrix(c(1, 2, 3, 4), nrow=2)

# sum each row:
apply(M, 1, sum)

# sum each column:
apply(M, 2, sum)
M = np.array([[1, 2], [3, 4]])

np.add.reduce(A, 1)

np.add.reduce(A, 0)

# np.add is a built-in universal function. All universal functions have a reduce method.

# np.sum(A, 1,) and np.sum(A, 0) also sum rows and columns
three dimensional arrays
construct from sequencereshape([1 2 3 4 5 6 7 8], 2, 2, 2)array(c(1, 2, 3, 4, 5, 6, 7, 8), dim=c(2, 2, 2))np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(2, 2, 2)
construct from nested sequencesnonenonenp.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
construct 3d array from 2d arraysA = [1, 2; 3, 4]
A(:,:,2) = [5, 6; 7, 8]
permute axesA = reshape([1 2 3 4 5 6 7 8], 2, 2, 2)

% swap 2nd and 3rd axes:
permute(A, [1 3 2])
A = array(1:8, dim=c(2, 2, 2))

# swap 2nd and 3rd axes:
aperm(A, perm=c(1, 3, 2))
A = np.array(range(1, 9)).reshape(2, 2, 2)

# swap 2nd and 3rd axes:
A.transpose((0, 2, 1))
flipA = reshape([1 2 3 4 5 6 7 8], 2, 2, 2)

flipdim(A, 3)
circular shiftA = reshape([1 2 3 4 5 6 7 8], 2, 2, 2)

% 3rd arg specifies axis:
circshift(A, 1, 3)
noneA = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(2, 2, 2)

np.roll(A, 1, axis=2)
% no literal; use constructor:
d = struct('n', 10, 'avg', 3.7, 'sd', 0.4)

% or build from two cell arrays:
d = cell2struct({10 3.7 0.4}, {'n' 'avg' 'sd'}, 2)
# keys are 'n', 'avg', and 'sd':
d = list(n=10, avg=3.7, sd=0.4)

# keys are 1, 2, and 3:
d2 = list('do', 're', 'mi')
d = {'n': 10, 'avg': 3.7, 'sd': 0.4}d = Dict("n"=>10.0, "avg"=>3.7, "sd"=>0.4)
getfield(d, 'n')

# if 'n' is a key:
d.var =**2d$var = d$sd**2d['var'] = d['sd']**2
missing key behavior
errorNULLraises KeyError
is key present
isfield(d, 'var')is.null(d$var)'var' in dhaskey(d, "var")
d = rmfield(d, 'sd')d$sd = NULLdel(d['sd'])
iteratefor i = 1:numel(fieldnames(d))
  k = fieldnames(d){i}
  v = d.(k)
for (k in names(d)) {
  v = d[[k]]
for k, v in d.iteritems():
keys and values as arrays% these return cell arrays:
unlist(d, use.names=F)
mergenoned1 = list(a=1, b=2)
d2 = list(b=3, c=4)
# values of first dictionary take precedence:
d3 = c(d1, d2)
d1 = {'a':1, 'b':2}
d2 = {'b':3, 'c':4}
define functionfunction add(x, y)
  x + y
add = function(x, y) {x + y}function add(x,y)
  x + y

# optional syntax when body is an expression:
add(x, y) = x + y
invoke function
add(3, 7)add(3, 7)add(3, 7)
nested functionfunction ret1 = add3(x, y, z)
  function ret2 = add2(x, y)
    ret2 = x + y;

  ret1 = add2(x, y) + z;
add3 = function(x, y, z) {
  add2 = function(x, y) { x + y }
  add2(x, y) + z
function add3(x, y, z)
  function add2(x2, y2)
    x2 + y2
  add2(x, y) + z
missing argument behaviorraises error if code with the parameter that is missing an argument is executedraises errorraises MethodError
extra argument behavior
ignoredraises errorraises MethodError
default argumentfunction mylog(x, base=10)
  log(x) / log(base)
mylog = function(x,base=10) {
  log(x) / log(base)
variadic functionfunction s = add(varargin)
  if nargin == 0
    s = 0
    r = add(varargin{2:nargin})
    s = varargin{1} + r
add = function (...) {
  a = list(...)
  if (length(a) == 0)
  s = 0
  for(i in 1:length(a)) {
    s = s + a[[i]]
return valuefunction ret = add(x, y)
  ret = x + y;

% If a return variable is declared, the value assigned
% to it is returned. Otherwise the value of the last
% statement will be used if it does not end with a
% semicolon.
return argument or last expression evaluated. NULL if return called without an argument.return argument or last expression evaluated. Void if return called without an argument.
multiple return valuesfunction [x, y] = first_two(a)
  x = a(1);
  y = a(2);

% sets first to 7; second to 8:
[first, second] = first_two([7 8 9])
function first_two(a)
  a[1], a[2]

x, y = first_two([1, 2, 3])
anonymous function literal% body must be an expression:
@(x, y) x + y
function(x, y) {x + y}add = (x, y) -> x + y

add = function(x, y)
  x + y
invoke anonymous functionadd(1, 2)
closuremake_counter = function() {
  i = 0
  function() {
    i <<- i + 1
function as value
overload operator
call operator like function`+`(3, 7)+(3, 7)
execution control
ifif (x > 0)
elseif (x < 0)
if (x > 0) {
} else if (x < 0) {
} else {
if x > 0:
elif x < 0:
if x > 0
elseif x < 0
whilei = 0
while (i < 10)
  i = i + 1
while (i < 10) {
  i = i + 1
while i < 10:
  i += 1
i = 0
while i < 10
  i += 1
forfor i = 1:10
for (i in 1:10) {
for i in range(1,11):
for i = 1:10
break continuebreak nextbreak continuebreak continue
raise exception
error('%s', 'failed')stop('failed')raise Exception('failed')throw("failed")
handle exceptiontry
catch err
  error=function(e) print(message(e)))
  raise Exception('failed')
except Exception as e:
file handles
standard file handles0 1 2

% Octave has predefined variables
% containing the above descriptors:

stdin stdout stderr
stdin() stdout() stderr()sys.stdin sys.stdout sys.stderrSTDIN STDOUT STDERR
read line from stdinline = input('', 's')line = readLines(n=1)line = sys.stdin.readline()line = readline()
write line to stdoutfprintf(1, 'hello\n')cat("hello\n")

write formatted string to stdoutfprintf(1, '%.2f\n', pi)cat(sprintf("%.2f\n", pi))import math

print('%.2f' % math.pi)
open file for readingf = fopen('/etc/hosts')
if (f == -1)
  error('failed to open file')
f = file("/etc/hosts", "r")f = open('/etc/hosts')f = open("/etc/hosts")
open file for writingif ((f = fopen('/tmp/test', 'w') == -1)
  error('failed to open file')
f = file("/tmp/test", "w")f = open('/tmp/test', 'w')f = open("/etc/hosts", "w")
open file for appendingif ((f = fopen('/tmp/err.log', 'a') == -1)
  error('failed to open file')
f = file("/tmp/err.log", "a")f = open('/tmp/err.log', 'a')f = open("/tmp/err.log", "a")
close filefclose(f)close(f)f.close()close(f)
i/o errorsfopen returns -1; fclose throws an errorraise IOError exception
read lineline = fgets(f)line = readLines(f, n=1)line = f.readline()line = readline(f)
iterate over file by linewhile(!feof(f))
  line = fgets(f)
for line in f:
read file into array of stringslines = readLines(f)lines = f.readlines()lines = readlines(f)
write stringfputs(f, 'lorem ipsum')cat("lorem ipsum", file=f)f.write('lorem ipsum')write(f, "lorem ipsum")
write linefputs(f, 'lorem ipsum\n')writeLines("lorem ipsum", con=f)f.write('lorem ipsum\n')
flush file handlefflush(f)flush(f)f.flush()
file handle position
get, set

% 3rd arg can be SEEK_CUR or SEEK_END
fseek(f, 0, SEEK_SET)

# sets seek point to 12 bytes after start;
# origin can also be "current" or "end"

seek(f, where=0, origin="start")
redirect stdout to filesink("foo.txt")
write variables to fileA = [1 2; 3 4]
B = [4 3; 2 1]

save('data.mdata', 'A', 'B')
A = matrix(c(1, 3, 2, 4), nrow=2)
B = matrix(c(4, 2, 3, 1), nrow=2)

save(A, B, file='data.rdata')
A = np.matrix([[1, 2], [3, 4]])
B = np.matrix([[4, 3], [2, 1]])

# Data must be of type np.array;
# file will have .npz suffix:
np.savez('data', A=A, B=B)
read variables from file% puts A and B in scope:

% puts just A in scope:
load('data.mdata', 'A')
# puts A and B in scope:
data = np.load('data.npz')
A = data['A']
B = data['B']
write all variables in scope to filesave('data.txt')save.image('data.txt')
working directory
get, set



build pathnamefullfile('/etc', 'hosts')file.path("/etc", "hosts")os.path.join('/etc', 'hosts')
dirname and basename[dir, base] = fileparts('/etc/hosts')dirname("/etc/hosts")
absolute pathnamenormalizePath("..")os.path.abspath('..')
iterate over directory by file% lists /etc:

% lists working directory:
# list.files() defaults to working directory
for (path in list.files('/etc')) {
for filename in os.listdir('/etc'):
glob pathsglob('/etc/*')Sys.glob('/etc/*')import glob

processes and environment
command line arguments% does not include interpreter name:
# first arg is name of interpreter:

# arguments after --args only:
environment variable
get, set

setenv('PATH', '/bin')


os.environ['PATH'] = '/bin'

ENV["PATH"] = "/bin"
exit(0)quit(save="no", status=0)sys.exit(0)exit(0)
external commandif (shell_cmd('ls -l /tmp'))
  error('ls failed')
if (system("ls -l /tmp")) {
  stop("ls failed")
if os.system('ls -l /tmp'):
  raise Exception('ls failed')
command substitutions = readall(‘ls`)
libraries and namespaces
load libraryWhen a function is invoked, MATLAB searches the library path for a file with the same name and a .m suffix. Other functions defined in the file are not visible outside the file.# quoting the name of the package is optional:
# or:

# if the package does not exist, require returns false, and library raises an error.
import fooinclude("foo.jl")
list loaded librariesnonesearch()dir()
library search pathpath()
source file
install package% Octave: how to install package
% downloaded from Octave-Forge:

pkg install foo-1.0.0.tar.gz
install.packages("ggplot2")$ pip install scipy
load package library% Octave:
pkg load foo
# or:
import foo
list installed packages% Octave:
pkg list
$ pip freeze
data typeclass(x)class(x)
# often more informative:
attributes% if x holds an object:
attributes(x)[m for m in dir(x)
  if not callable(getattr(o,m))]
methods% if x holds an object:
none; objects are implemented by functions which dispatch based on type of first arg[m for m in dir(x)
  if callable(getattr(o,m))]
variables in scopewho()

% with size and type:

# with type and description:
undefine variable
undefine all variablesclear -arm(list=objects())none
eval('1 + 1')eval(parse(text='1 + 1'))eval('1 + 1')eval(parse("1 + 1"))
function documentationhelp tanhelp(tan)
list library functionsnonels("package:moments")dir(stats)whos(Base)
search documentationdocsearch tan??tan$ pydoc -k tanapropos("tan")
construct from column arrayssx = {'F' 'F' 'F' 'M' 'M' 'M'}
ht = [69 64 67 68 72 71]
wt = [148 132 142 149 167 165]
cols = {'sx', 'ht', 'wt'}
people = table(sx', ht', wt', 'VariableNames', cols)
# gender, height, weight of some people
# in inches and lbs:

sx = c("F", "F", "F", "M", "M", "M")
ht = c(69, 64, 67, 68, 72, 71)
wt = c(148, 132, 142, 149, 167, 165)
people = data.frame(sx, ht, wt)
sx = ['F', 'F', 'F', 'F', 'M', 'M']
ht = [69, 64, 67, 66, 72, 70]
wt = [150, 132, 142, 139, 167, 165]
people = pd.DataFrame({'sx': sx, 'ht': ht, 'wt': wt})
construct from row dictionariesrows = [
  {'sx': 'F', 'ht': 69, 'wt': 150},
  {'sx': 'F', 'ht': 64, 'wt': 132},
  {'sx': 'F', 'ht': 67, 'wt': 142},
  {'sx': 'F', 'ht': 66, 'wt': 139},
  {'sx': 'M', 'ht': 72, 'wt': 167},
  {'sx': 'M', 'ht': 70, 'wt': 165}]
people = pd.DataFrame(rows)

# number of rows and cols in 2-element vector:
column names as arraypeople.Properties.VariableNamesnames(people)
returns Index object:
access column as
# vectors:
# 1 column data frame:

# if name does not conflict with any DataFrame attributes:
access row as tuplepeople(1,:)# 1 row data frame:
people[1, ]
# list:
as.list(people[1, ])
access datum% height of 1st person:
# height of 1st person:
people.get_value(0, 'ht')
order rows by columnsortrows(people, 'ht')people[order(people$ht), ]people.sort(['ht'])
order rows by multiple columnssortrows(people, {'sx', 'ht'})people[order(people$sx, people$ht), ]people.sort(['sx', 'ht'])
order rows in descending ordersortrows(people, 'ht', 'descend')people[order(-people$ht), ]people.sort('ht', ascending=[False])
limit rows
people(1:3, :)people[seq(1, 3), ]people[0:3]
offset rows
people(4:6, :)people[seq(4, 6), ]people[3:]
reshapepeople$couple = c(1, 2, 3, 1, 2, 3)
reshape(people, idvar="couple", direction="wide",
  timevar="sx", v.names=c("ht", "wt"))
remove rows with null fieldssx = c('F', 'F', 'M', 'M')
wt = c(120, NA, 150, 170)

df = data.frame(sx, wt)
df2 = na.omit(df)
attach columns# put columns ht, wt, and sx
# in variable name search path:


# alternative which doesn't put columns in
# search path:

with(people, sum(ht))
detach columns
spreadsheet editorcan edit data, in which case return value of edit must be saved
people = edit(people)
import and export
import tab delimited# first row defines variable names:
readtable('/tmp/password.txt', 'Delimiter', '\t')

# file suffix must be .txt, .dat, or .csv
# first row defines variable names:
df = read.delim('/path/', stringsAsFactors=F, quote=NULL)
# first row defines column names:
df = pd.read_table('/path/')
import csv
% first row defines variable names:
df = readtable('/path/to.csv')
# first row defines variable names:
df = read.csv('/path/to.csv', stringsAsFactors=F)
# first row defines column names:
df = pd.read_csv('/path/to.csv')
set column separatordf = readtable('/etc/passwd',
  'Delimiter', ':',
  'ReadVariableNames', 0,
  'HeaderLines', 10)
df = read.delim('/etc/passwd',
# $ grep -v '^#' /etc/passwd > /tmp/passwd

df = pd.read_table('/tmp/passwd', sep=':', header=None)
set column separator to whitespacedf = read.delim('/path/to.txt', sep='')df = read_table('/path/to.txt', sep='\s+')
set quote character# default quote character for both read.csv and read.delim
# is double quotes. The quote character is escaped by doubling it.

# use single quote as quote character:
df = read.csv('/path/to/single-quote.csv', quote="'")

# no quote character:
df = read.csv('/path/to/no-quote.csv', quote="")
Both read_table and read_csv use double quotes as the quote character and there is no way to change it. A double quote can be escaped by doubling it.
import file w/o header# column names are V1, V2, …
# $ grep -v '^#' /etc/passwd > /tmp/passwd
# column names are X0, X1, …
df = pd.read_table('/tmp/passwd', sep=':', header=None)
set column namesdf = readtable('/path/to/no-header.csv',
  'ReadVariableNames', 0)

df.Properties.VariableNames = {'ht', 'wt', 'age'}
df = read.csv('/path/to/no-header.csv',
  col.names=c('ht', 'wt', 'age'))
df = pd.read_csv('/path/to/no-header.csv',
  names=['ht', 'wt', 'age'])
set column types# possible values: NA, 'logical', 'integer', 'numeric',
# 'complex', 'character', 'raw', 'factor', 'Date',
# 'POSIXct'
# If type is set to NA, actual type will be inferred to be
# 'logical', 'integer', 'numeric', 'complex', or 'factor'

df = read.csv('/path/to/data.csv',
  colClasses=c('integer', 'numeric', 'character'))
recognize null valuesdf = read.csv('/path/to/data.csv',
  colClasses=c('integer', 'logical', 'character'),
df = read_csv('/path/to/data.csv',
change decimal markdf = read.csv('/path/to.csv', dec=',')
recognize thousands separatornonedf = read_csv('/path/to.csv', thousands='.')
unequal row length behaviorMissing fields will be set to NA unless fill is set to FALSE. If the column is of type character then the fill value is an empty string ''.

If there are extra fields they will be parsed as an extra row unless
flush is set to FALSE
skip comment linesdf = read.delim('/etc/passwd',
skip rowsdef = readtable('/path/to/data.csv',
  'HeaderLines', 4)
df = read.csv('/path/to/data.csv', skip=4)df = read_csv('/path/to/data.csv', skiprows=4)

# rows to skip can be specified individually:
df = read_csv('/path/to/data.csv', skiprows=range(0, 4))
max rows to readdf = read.csv('/path/to/data.csv', nrows=4)df = read_csv('/path/to/data.csv', nrows=4)
index columnnonedf = pd.read_csv('/path/to.csv', index_col='key_col')

# hierarchical index:
df = pd.read_csv('/path/to.csv', index_col=['col1', 'col2'])
export tab delimitedwrite.table(df, '/tmp/', sep='\t')
export csv
# first column contains row names unless row.names
# set to FALSE

write.csv(df, '/path/to.csv', row.names=F)
relational algebra
project columns by namepeople(:, {'sx', 'ht'})people[c('sx', 'ht')]people[['sx', 'ht']]
project columns by positionpeople(:, [1 2])people[c(1, 2)]
project expression# convert to cm and kg:
transform(people, ht=2.54*ht, wt=wt/2.2)
project all columnspeople( > 66, :)people[people$ht > 66, ]
rename columnscolnames(people) = c('gender', 'height', 'weight')
access sub data frame# data frame of first 3 rows with
# ht and wt columns reversed:

people[1:3, c(1, 3, 2)]
select rowspeople( > 66, :)subset(people, ht > 66)
people[people$ht > 66, ]
people[people['ht'] > 66]
select distinct rowsunique(people(:,{'sx'}))unique(people[c('sx')])
split rows# class(x) is list:
x = split(people, people$sx == 'F')

# data.frame only containing females:
inner joinpw = read.delim('/etc/passwd',
  col.names=c('name', 'passwd', 'uid', 'gid', 'gecos',
    'home', 'shell'))

grp = read.delim('/etc/group',
  col.names=c('name', 'passwd', 'gid', 'members'))

merge(pw, grp, by.x='gid', by.y='gid')
# $ grep -v '^#' /etc/passwd > /tmp/passwd
# $ grep -v '^#' /etc/group > /tmp/group

pw = pd.read_table('/tmp/passwd', sep=':', header=None, names=['name', 'passwd', 'uid', 'gid', 'gecos', 'home', 'shell'])

grp = pd.read_table('/tmp/group', sep=':', header=None, names=['name', 'passwd', 'gid', 'members'])

pd.merge(pw, grp, left_on='gid', right_on='gid')
nulls as join values
left joinmerge(pw, grp, by.x='gid', by.y='gid', all.x=T)pd.merge(pw, grp, left_on='gid', right_on='gid', how='left')
full joinmerge(pw, grp, by.x='gid', by.y='gid', all=T)pd.merge(pw, grp, left_on='gid', right_on='gid', how='outer')
antijoinpw[!(pw$gid %in% grp$gid), ]
cross joinmerge(pw, grp, by=c())
group by columngrouped = people.groupby('sx')
multiple aggregated valuesgrouped = people.groupby('sx')
grouped.aggregate(np.max)[['ht', 'wt']]
group by multiple columns
aggregation functions
nulls and aggregation functions
vector literal
same as arraysame as arraysame as array
element-wise arithmetic operators+ - .* ./+ - * /+ - * /
result of vector length mismatchraises errorvalues in shorter vector are recycled; warning if one vector is not a multiple length of the otherraises ValueError
scalar multiplication3 * [1, 2, 3]
[1, 2, 3] * 3
3 * c(1, 2, 3)
c(1, 2, 3) * 3
3 * np.array([1, 2, 3])
np.array([1, 2, 3]) * 3
dot productdot([1, 1, 1], [2, 2, 2])c(1, 1, 1) %*% c(2, 2, 2)v1 = np.array([1, 1, 1])
v2 = np.array([2, 2, 2]), v2)
cross productcross([1, 0, 0], [0, 1, 0])v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
np.cross(v1, v2)
normsnorm([1, 2, 3], 1)
norm([1, 2, 3], 2)
norm([1, 2, 3], Inf)
vnorm = function(x, t) {
  norm(matrix(x, ncol=1), t)

vnorm(c(1, 2, 3), "1")
vnorm(c(1, 2, 3), "E")
vnorm(c(1, 2, 3), "I")
v = np.array([1, 2, 3])
np.linalg.norm(v, 1)
np.linalg.norm(v, 2)
np.linalg.norm(v, np.inf)
literal or constructor% row-major order:
A = [1, 2; 3, 4]
B = [4 3
     2 1]
# column-major order:
A = matrix(c(1, 3, 2, 4), 2, 2)
B = matrix(c(4, 2, 3, 1), nrow=2)

# row-major order:
A = matrix(c(1, 2, 3, 4), nrow=2, byrow=T)
# row-major order:
A = np.matrix([[1, 2], [3, 4]])
B = np.matrix([[4, 3], [2, 1]])
constant matrices
all zeros, all ones
zeros(3, 3) or zeros(3)
ones(3, 3) or ones(3)
matrix(0, 3, 3)
matrix(1, 3, 3)
np.matrix(np.ones([3, 3]))
np.matrix(np.zeros([3, 3]))
zeros(Float64, (3, 3))
ones(Float64, (3, 3))
diagonal matrices
and identity
diag([1, 2, 3])
% 3x3 identity:
diag(c(1, 2, 3)
# 3x3 identity:
np.diag([1, 2, 3])
nrows, ncols = A.shape
element access
A(1, 1)A[1, 1]A[0, 0]
row access
A(1, 1:2)A[1, ]A[0]
column access
A(1:2, 1)A[, 1]A[:, 0]
submatrix accessC = [1, 2, 3; 4, 5, 6; 7, 8, 9]
C(1:2, 1:2)
C = matrix(seq(1, 9), 3, 3, byrow=T)
C[1:2, 1:2]
A = np.matrix(range(1, 10)).reshape(3, 3)
A[:2, :2]
scalar multiplication3 * A
A * 3
3 .* A
A .* 3
3 * A
A * 3
3 * A
A * 3
element-wise operators.+ .- .* ./+ - * /+ - np.multiply() np.divide()
A * BA %*% BA * B
powerA ^ 3

% power of each entry:
A .^ 3
A ** 3
kronecker productkron(A, B)kronecker(A, B)np.kron(A, B)
comparisonall(all(A == B))
any(any(A != B))
all(A == B)
any(A != B)
np.all(A == B)
np.any(A != B)
normsnorm(A, 1)
norm(A, 2)
norm(A, Inf)
norm(A, 'fro')
norm(A, "1")
norm(A, "I")
norm(A, "F")
conjugate transposeA = [1i, 2i; 3i, 4i]
A = matrix(c(1i, 2i, 3i, 4i), nrow=2, byrow=T)
A = np.matrix([[1j, 2j], [3j, 4j]])
pseudoinverseA = [0 1; 0 0]


A = matrix(c(0, 0, 1, 0), nrow=2)
A = np.matrix([[0, 1], [0, 0]])

eigenvectors[evec, eval] = eig(A)
% each column of evec is an eigenvector
% eval is a diagonal matrix of eigenvalues
singular value decompositionX = randn(10)

[u, d, v] = svd(X)
X = matrix(rnorm(100), nrow=10)
result = svd(X)

# singular values:

# matrix of eigenvectors:

# unitary matrix:
np.linalg.svd(np.random.randn(100).reshape(10, 10))
solve system of equationsA \ [2;3]solve(A, c(2, 3))np.linalg.solve(A, [2, 3])
sparse matrices
sparse matrix construction% 100x100 matrix with 5 at (1, 1) and 4 at (2, 2):
X = sparse([1 2], [1 2], [5 4], 100, 100)
X = spMatrix(100, 100, c(1, 2), c(1, 2), c(5, 4))import scipy.sparse as sparse

row, col, val = [5, 4], [1, 2], [1, 2]
X = sparse.coo_matrix((val, (row, col)), shape=(100, 100))
sparse matrix decomposition[rows, cols, vals] = find(X)

% just the values:
sparse identity matrix% 100x100 identity:

# not square; ones on diagonal:
sparse.eye(100, 200)
dense matrix to sparse matrix
and back
X = sparse([1 0 0; 0 0 0; 0 0 0])
X2 = full(X)
imoprt scipy.sparse as sparse

A = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
X = sparse.coo_matrix(A)
X2 = X.todense()
sparse matrix storage% is storage sparse:

% memory allocation in bytes:

% number of nonzero entries:
# memory allocation in bytes:
import scipy.sparse as sparse

linear minimization% minimize c * x subject to the constraint that A * x <= b

linprog(c, A, b)
linear minimization with bounds on decision variables% also impose constraints Lb <= x <= Ub

linprog(c, A, b, [], [], Lb, Ub)
linear minimization with equality constraints% also impose constraints Aeq * x == beq

lingprog(c, A, b, Aeq, beq)
quadratic minimizationquadprog()
integer linear minimizationintlinprog()
exact polynomial fitx = [1 2 3 4]
y = [3 9 2 1]
% polynomial coefficient array:
p = polyfit(x, y, 3)

% plot polynomial:
xx = -10:.1:10
yy = polyval(p, xx)
plot(xx, yy)
exact polynomial fit with derivative values
piecewise polynomial fit
cubic spline
f = spline(1:20, normrnd(0, 1, 1, 20))
x = 1:.1:20
plot(x, ppval(f, x))
f = splinefun(rnorm(20))
x = seq(1, 20, .1)
plot(x, f(x), type="l")
underdetermined polynomail fit
overdetermined polynomial fit
multivariate polynomial fit
descriptive statistics
first moment statisticsx = [1 2 3 8 12 19]

x = c(1,2,3,8,12,19)

x = [1,2,3,8,12,19]

x = [1 2 3 8 12 19]

second moment statisticsstd(x, 1)
var(x, 1)
n = length(x)

sd(x) * sqrt((n-1)/n)
var(x) * (n-1)/n
second moment statistics for samplesstd(x)
n = float(len(x))

sp.std(x) * math.sqrt(n/(n-1))
sp.var(x) * n/(n-1)
skewnessOctave uses sample standard deviation to compute skewness:

kurtosisOctave uses sample standard deviation to compute kurtosis:

kurtosis(x) - 3
nth moment and nth central momentn = 5

moment(x, n)
moment(x, n, "c")

n = 5
moment(x, n)
moment(x, n, central=T)
n = 5

stats.moment(x, n)
modemode([1 2 2 2 3 3 4])samp = c(1,2,2,2,3,3,4)
quantile statisticsmin(x)
quantile(x, .90)
quantile(x, prob=.90)
stats.scoreatpercentile(x, 90.0)
bivariate statistiscs
correlation, covariance
x = [1 2 3]
y = [2 4 7]

cor(x, y)
cov(x, y)
x = c(1,2,3)
y = c(2,4,7)

cor(x, y)
cov(x, y)
x = [1,2,3]
y = [2,4,7]

stats.linregress(x, y)[2]
correlation matrixx1 = randn(100, 1)
x2 = 0.5 * x1 + randn(100, 1)
x3 = 0.1 * x1 + 0.1 * x2 + 0.1 * randn(100, 1)

corr([x1 x2 x3])
x1 = rnorm(100)
x2 = x1 + 0.5 * rnorm(100)
x3 = 0.3 * x1 + 0.1 * 2 + 0.1 * rnorm(100)

cor(cbind(x1, x2, x3))
data set to frequency tablex = c(1,2,1,1,2,5,1,2,7)
tab = table(x)
frequency table to data setrep(as.integer(names(tab)),
binx = c(1.1, 3.7, 8.9, 1.2, 1.9, 4.1)
xf = cut(x, breaks=c(0, 3, 6, 9))
# bins are (0, 3], (3, 6], and (6, 9]:
bins = tapply(x, xf, length)
density, cumulative, quantile,
sample of 10
binopdf(x, n, p)
binocdf(x, n, p)
binoinv(y, n, p)
binornd(n, p, 1, 10)
dbinom(x, n, p)
pbinom(x, n, p)
qbinom(y, n, p)
rbinom(10, n, p)
stats.binom.pmf(x, n, p)
stats.binom.cdf(x, n, p)
stats.binom.ppf(y, n, p)
stats.binom.rvs(n, p)
density, cumulative, quantile,
sample of 10
poisspdf(x, lambda)
poisscdf(x, lambda)
poissinv(y, lambda)
poissrnd(lambda, 1, 10)
dpois(x, lambda)
ppois(x, lambda)
qpois(y, lambda)
rpois(10, lambda)
stats.poisson.pmf(x, lambda)
stats.poisson.cdf(x, lambda)
stats.poisson.ppf(y, lambda)
stats.poisson.rvs(lambda, size=1)
density, cumulative, quantile,
sample of 10
normpdf(x, mu, sigma)
normcdf(x, mu, sigma)
norminv(y, mu, sigma)
normrnd(mu, sigma, 1, 10)
dnorm(x, mu, sigma)
pnorm(x, mu, sigma)
qnorm(y, mu, sigma)
rnorm(10, mu, sigma)
stats.norm.pdf(x, mu, sigma)
stats.norm.cdf(x, mu, sigma)
stats.norm.ppf(y, mu, sigma)
stats.norm.rvs(mu, sigma)
density, cumulative, quantile,
sample of 10
gampdf(x, k, theta)
gamcdf(x, k, theta)
gaminv(y, k, theta)
gamrnd(k, theta, 1, 10)
dgamma(x, k, scale=theta)
pgamma(x, k, scale=theta)
qgamma(y, k, scale=theta)
rgamma(10, k, scale=theta)
stats.gamma.pdf(x, k, scale=theta)
stats.gamma.cdf(x, k, scale=theta)
stats.gamma.ppf(y, k, scale=theta)
stats.gamma.rvs(k, scale=theta)
density, cumulative, quantile,
sample of 10
exppdf(x, lambda)
expcdf(x, lambda)
expinv(y, lambda)
exprnd(lambda, 1, 10)
dexp(x, lambda)
pexp(x, lambda)
qexp(y, lambda)
rexp(10, lambda)
stats.expon.pdf(x, scale=1.0/lambda)
stats.expon.cdf(x, scale=1.0/lambda)
stats.expon.ppf(x, scale=1.0/lambda)
density, cumulative, quantile,
sample of 10
chi2pdf(x, nu)
chi2cdf(x, nu)
chi2inv(y, nu)
chi2rnd(nu, 1, 10)
dchisq(x, nu)
pchisq(x, nu)
qchisq(y, nu)
rchisq(10, nu)
stats.chi2.pdf(x, nu)
stats.chi2.cdf(x, nu)
stats.chi2.ppf(y, nu)
density, cumulative, quantile,
sample of 10
betapdf(x, alpha, beta)
betacdf(x, alpha, beta)
betainvf(y, alpha, beta)
betarnd(alpha, beta, 1, 10)
dbeta(x, alpha, beta)
pbeta(x, alpha, beta)
qbeta(y, alpha, beta)
rbeta(10, alpha, beta)
stats.beta.pdf(x, alpha, beta)
stats.beta.cdf(x, alpha, beta)
stats.beta.ppf(y, alpha, beta)
stats.beta.pvs(alpha, beta)
density, cumulative, quantile,
sample of 10
unifpdf(x, a, b)
unifcdf(x, a, b)
unifinv(y, a, b)
unifrnd(a, b, 1, 10)
dunif(x, a, b)
punif(x, a, b)
qunif(y, a, b)
runif(10, a, b)
stats.uniform.pdf(x, a, b)
stats.uniform.cdf(x, a, b)
stats.uniform.ppf(y, a, b)
stats.unifrom.rvs(a, b)
Student's t
density, cumulative, quantile,
sample of 10
tpdf(x, nu)
tcdf(x, nu)
tinv(y, nu)
trnd(nu, 1, 10)
dt(x, nu)
pt(x, nu)
qt(y, nu)
rt(10, nu)
stats.t.pdf(x, nu)
stats.t.cdf(x, nu)
stats.t.ppf(y, nu)
Snedecor's F
density, cumulative, quantile,
sample of 10
fpdf(x, d1, d2)
fcdf(x, d1, d2)
finv(y, d1, d2)
frnd(d1, d2, 1, 10)
df(x, d1, d2)
pf(x, d1, d2)
qf(y, d1, d2)
rf(10, d1, d2)
stats.f.pdf(x, d1, d2)
stats.f.cdf(x, d1, d2)
stats.f.ppf(y, d1, d2)
stats.f.rvs(d1, d2)
empirical density function% $ apt-get install octave-econometrics

x = (-3:.05:3)'
y = kernel_density(x, normrnd(0, 1, 100, 1))
dfunc = density(rnorm(100))

empirical cumulative distributionF is a right-continuous step function:
F = ecdf(rnorm(100))
empirical quantile functionF = ecdf(rnorm(100))
Finv = ecdf(F(seq(0, 1, .01)))
linear regression
simple linear regression
coefficient, intercept, and residuals
x = [1 2 3]
y = [2 4 7]

[lsq, res] = polyfit(x, y, 1)
a = lsq(1)
b = lsq(2)
y - (a*x+b)
x = seq(10)
y = 2 * x + 1 + rnorm(10)

fit = lm(y ~ x)

# yhat = ax + b:
a = fit$coefficients[2]
b = fit$coefficients[1]

# y - (ax + b):
x = np.array([1,2,3])
y = np.array([2,4,7])

lsq = stats.linregress(x, y)
a = lsq[0]
b = lsq[1]
y - (a*x+b)
no interceptx = seq(10)
y = 2 * x + 1 + rnorm(10)

fit = lm(y ~ x + 0)

# y = ax:
a = fit$coefficients[1]
multiple linear regressionx1 = rnorm(100)
x2 = rnorm(100)
y = 2 * x2 + rnorm(100)

fit = lm(y ~ x1 + x2)
interactionx1 = rnorm(100)
x2 = rnorm(100)
y = 2 * x1 + x2 + 3 * x1 * x2 + rnorm(100)

# x1, x2, and x1*x2 as predictors:
fit = lm(y ~ x1 * x2)

# just x1*x2 as predictor:
fit2 = lm(Y ~ x1:x2)
logistic regressiony = round(runif(100))
x1 = round(runif(100))
x2 = y + rnorm(100)

fit = glm(y ~ x1 + x2, family="binomial")
statistical tests
wilcoxon signed-rank test
variable is symmetric around zero
x = unifrnd(-0.5, 0.5, 100, 1)

% null hypothesis is true:
wilcoxon_test(x, zeros(100, 1))

% alternative hypothesis is true:
wilcoxon_test(x + 1.0, zeros(100, 1))
# null hypothesis is true:
wilcox.test(runif(100) - 0.5)

alternative hypothesis is true:
wilcox.test(runif(100) + 0.5)
kruskal-wallis rank sum test
variables have same location parameter
x = unifrnd(0, 1, 200, 1)

% null hypothesis is true:
kruskal_wallis_test(randn(100, 1), randn(200, 1))

% alternative hypothesis is true:
kruskal_wallis_test(randn(100, 1), x)
# null hypothesis is true:
kruskal.test(list(rnorm(100), rnorm(200)))

# alternative hypothesis is true:
kruskal.test(list(rnorm(100), runif(200)))
kolmogorov-smirnov test
variables have same distribution
x = randn(100, 1)
y1 = randn(100, 1)
y2 = unifrnd(-0.5, 0.5, 100, 1)

% null hypothesis is true:
kolmogorov_smirnov_test_2(x, y1)

% alternative hypothesis is true:
kolmogorov_smirnov_test_2(x, y2)
# null hypothesis is true:
ks.test(rnorm(100), rnorm(100))

# alternative hypothesis is true:
ks.test(rnorm(100), runif(100) - 0.5)
one-sample t-test
mean of normal variable with unknown variance is zero
x1 = 3 * randn(100, 1)
x2 = 3 * randn(100, 1) + 3

% null hypothesis is true:
t_test(x1, 0)

% alternative hypothesis is true:
t_test(x2, 0)
# null hypothesis is true:
t.test(rnorm(100, 0, 3))

# alternative hypothesis is true:
t.test(rnorm(100, 3, 3))
independent two-sample t-test
two normal variables have same mean
x = randn(100, 1)
y1 = randn(100, 1)
y2 = randn(100, 1) + 1.5

% null hypothesis is true:
t_test_2(x, y1)

% alternative hypothesis is true:
t_test_2(x, y2)
# null hypothesis is true:
t.test(rnorm(100), rnorm(100))

# alternative hypothesis is true:
t.test(rnorm(100), rnorm(100, 3))
one-sample binomial test
binomial variable parameter is as given
n = 100
x = rbinom(1, n, 0.5)

# null hypothesis that p=0.5 is true:
binom.test(x, n)

# alternative hypothesis is true:
binom.test(x, n, p=0.3)
two-sample binomial test
parameters of two binomial variables are equal
prop_test_2()n = 100
x1 = rbinom(1, n, 0.5)
x2 = rbinom(1, n, 0.5)

# null hypothesis that p=0.5 is true:
prop.test(c(x1, x2), c(n, n))

y = rbinom(1, n, 0.3)
# alternative hypothesis is true:
prop.test(c(x1, y), c(n, n))
chi-squared test
parameters of multinomial variable are all equal
chisquare_test_independence()fair = floor(6 * runif(100)) + 1
loaded = floor(7 * runif(100)) + 1
loaded[which(loaded > 6)] = 6

# null hypothesis is true:

# alternative hypothesis is true:
poisson test
parameter of poisson variable is as given
# null hypothesis is true:
poisson.test(rpois(1, 100), r=100)

# alternative test is true:
poisson.test(rpois(1, 150), r=100)
F test
ratio of variance of normal variables is as given
var_test()x = rnorm(100)
y = rnorm(100, 0, sd=sqrt(3))

# null hypothesis is true:
var.test(y, x, ratio=3)

# alternative hypothesis is true:
var.test(y, x, ratio=1)
pearson product moment test
normal variables are not correlated
cor_test()x1 = rnorm(100)
x2 = rnorm(100)
y = x2 + rnorm(100)

# null hypothesis is true:
cor.test(y, x1)

# alternative hypothesis is true:
cor.test(y, x2)
shapiro-wilk test
variable has normal distribution
# null hypothesis is true:

# alternative hypothesis is true:
bartlett's test
two or more normal variables have same variance
bartlett_test()x = rnorm(100)
y1 = rnorm(100)
y2 = 0.1 * rnorm(100)

# null hypothesis is true:
bartlett.test(list(x, y1))

# alternative hypothesis is true:
bartlett.test(list(x, y))
levene's test
two or more variables have same variance
install.packages('reshape', 'car')

x = rnorm(100)
y1 = rnorm(100)
y2 = 0.1 * rnorm(100)

# null hypothesis is true:
df = melt(data.frame(x, y1))
leveneTest(df$value, df$variable)

# alternative hypothesis is true:
df = melt(data.frame(x, y2))
leveneTest(df$value, df$variable)
one-way anova
two or more normal variables have same mean
x1 = randn(100, 1)
x2 = randn(100, 1)
x3 = randn(100, 1)
x = [x1; x2; x3]
y = [x1; x2; x3 + 0.5]
units = ones(100, 1)
grp = [units; 2 * units; 3 * units]

% null hypothesis is true:
anova(x, grp)

% alternative hypothesis is true:
anova(y, grp)

# null hypothesis that all means are the same
# is true:

x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)

df = melt(data.frame(x1, x2, x3))
fit = lm(df$value ~ df$variable)
time series
time series# first observation time is 1:
y = ts(rnorm(100))

# first observation time is 0:
y2 = ts(rnorm(100), start=0)

# first observation time is 0:
y = pd.Series(randn(100))

# first observation time is 1:
y2 = pd.Series(randn(100), index=range(1,101))

monthly time series# monthly observations 1993-1997:
y = ts(rnorm(60), frequency=12, start=1993)

# monthly observations from Oct 1993:
y2 = ts(rnorm(60), frequency=12, start=c(1993, 10))

dt = pd.datetime(2013, 1, 1)
idx = pd.date_range(dt, periods=60, freq='M')
y = pd.Series(randn(60), index=idx)

dt2 = pd.datetime(2013, 10, 1)
idx2 = pd.date_range(dt2, periods=60, freq='M')
y2 = pd.Series(randn(60), index=idx2)
lookup by timestart = tsp(y2)[1]
end = tsp(y2)[2]
freq = tsp(y2)[3]

# value for Jan 1994:
y2[(1994 - start) * freq + 1]
y2[pd.datetime(2014, 1, 31)]
lookup by position in seriesfor (i in 1:length(y)) {
for i in range(0, len(y)):
aligned arithmeticy = ts(rnorm(10), start=0)
y2 = ts(rnorm(10), start=5)

# time series with 5 data points:
y3 = y + y2
y = pd.Series(randn(10))
y2 = pd.Series(randn(10), index=range(5, 15))

# time series with 15 data points; 10 of
# which are NaN:

# y3 = y + y2
lag operatorx = ts(rnorm(100))
y = x + lag(x, 1)
x = pd.Series(randn(100))
y = x + x.shift(-1)
lagged difference
delta = diff(y, lag=1)delta = y.diff(1)
simple moving averageinstall.packages('TTR')

ma = SMA(y, n=4)

lines(ma, col='red')
y = pd.Series(randn(50))
ma = pd.rolling_mean(y, 4)

plot(y, 'k', ma, 'r')
weighted moving averageinstall.packages('TTR')

ma = WMA(y, n=4, wts=c(1, 2, 3, 4))

lines(ma, col='red')
exponential smoothingx = rnorm(100)
fit = HoltWinters(x, alpha=0.5, beta=F, gamma=F)

values = fit$fitted
alpha = 0.5
span = (2 / alpha) - 1
fit = pd.ewma(y, span=span, adjust=False)

exponential smoothing with best least squares fitx = rnorm(100)
fit = HoltWinters(x, beta=F, gamma=F)

alpha = fit$a
decompose into seasonal and trendraw = seq(1,100) + rnorm(100) + rep(seq(1,10), 10)
y = ts(raw, frequency=10)

# additive model: t + s + r:
yd = decompose(y)


# multiplicative model: t * s * r:
yd2 = decompose(y, type="multiplicative")
correlogramx = rnorm(100)
x2 = append(x[4:100], x[1:3])

acf(x, lag.max=20)
acf(x + x2, lag.max=20)
test for stationarity
arima with automatic model selection
fast fourier transform
fftx = 3 * sin(1:100) + sin(3 * (1:100)) + randn(1, 100)

dft = fft(x)
inverse fft
shift constant component to center
two-dimensional fft
n-dimensional fft
distance matrixpts = [1 1; 1 2; 2 1; 2 3; 3 4; 4 4]

% value at (i, j) is distance between i-th
% and j-th observation

dm = squareform(pdist(pts, 'euclidean'))
distance options'euclidean'
hierarchical clusters
silhouette plot
univariate charts
5039793334_f76edece33_m.jpgvertical bar chartbar([7 3 8 5 5])cnts = c(7,3,8,5,5)
names(cnts) = c("a","b","c","d","e")

x = floor(6*runif(100))
cnts = [7,3,8,5,5],len(cnts)), cnts)
horizontal bar chart
barh([7 3 8 5 5])cnts = c(7,3,8,5,5)
names(cnts) = c("a","b","c","d","e")
barplot(cnts, horiz=T)
cnts = [7,3,8,5,5]
plt.barh(range(0,len(cnts)), cnts)
pie chart
labels = {'a','b','c','d','e'}
pie([7 3 8 5 5], labels)
cnts = c(7,3,8,5,5)
names(cnts) = c("a","b","c","d","e")
cnts = [7,3,8,5,5]
labs = ['a','b','c','d','e']
plt.pie(cnts, labels=labs)
stem-and-leaf plot
nonegenerates an ascii chart:

hist(randn(1, 100), 10)hist(rnorm(100), breaks=10)plt.hist(sp.randn(100),
box plot
boxplot(randn(1, 100))

boxplot([randn(1, 100)
  exprnd(1, 1, 100)
  unifrnd(0, 1, 1, 100)]')


chart titlebar([7 3 8 5 5])
title('bar chart example')
all chart functions except for stem accept a main parameter:
  main="boxplot example",
  sub="to illustrate options")
plt.title('boxplot example')
grid of subplots15816199467_a0a0f159d0_t.jpg% 3rd arg refers to the subplot;
% subplots are numbered in row-major order.

for i = 1:4
  subplot(2, 2, i), hist(randn(50))
for (i in split.screen(c(2, 2))) {
for i in [1, 2, 3, 4]:
  plt.subplot(2, 2, i)
  plt.hist(sp.randn(100), bins=range(-5,5))
open new plot windowopen new plot
open new plot
close all plot windowsclose
save plot as pngf = figure
print(f, '-dpng', 'histogram.png')
y = randn(50)
bivariate charts
stacked bar chart
d = [7 1; 3 2; 8 1; 5 3; 5 1]
bar(d, 'stacked')
d = matrix(c(7,1,3,2,8,1,5,3,5,1),
labels = c("a","b","c","d","e")
a1 = [7,3,8,5,5]
a2 = [1,2,1,3,1],5), a1, color='r'),5), a2, color='b')
grouped bar chart
d = [7 1; 3 2; 8 1; 5 3; 5 1]
d = matrix(c(7,1,3,2,8,1,5,3,5,1),
labels = c("a","b","c","d","e")
scatter plot
plot(randn(1,50),randn(1,50),'+')plot(rnorm(50), rnorm(50))plt.scatter(sp.randn(50), sp.randn(50), marker='x')
point types'.': point
'o': circle
'x': x-mark
'+': plus
'*': star
's': square
'd': diamond
'v': triangle (down)
'^': triangle (up)
'<': triangle (left)
'>': traingle (right)
'p': pentagram
'h': hexagram
Integer values for pch parameter:

0: open square
1: open circle
2: open triangle, points up
3: cross
4: x
5: open diamond
6: open triangle, points down
15: solid square
16: solid circle
17: solid triangle, points up
18: solid diamond
marker parameter takes these string values:

'.': point
',': pixel
'o': circle
'v': triangle_down
'^': triangle_up
'<': triangle_left
'>': triangle_right
'1': tri_down
'2': tri_up
'3': tri_left
'4': tri_right
'8': octagon
's': square
'p': pentagon
'*': star
'h': hexagon1
'H': hexagon2
'+': plus
'x': x
'D': diamond
'd': thin_diamond
'|': vline
'_': hline
hexagonal binning

linear regression line
x = 0:20
y = 2 * x + rnorm(21)*10

fit = lm(y ~ x)

lines(x, fit$fitted.values, type='l')
x = range(0,20)
err = sp.randn(20)*10
y = [2*i for i in x] + err

A = np.vstack([x,np.ones(len(x))]).T
m, c = np.linalg.lstsq(A, y)[0]

plt.scatter(x, y)
plt.plot(x, [m*i + c for i in x])
polygonal line plot
plot(1:20,randn(1,20))plot(1:20, rnorm(20), type="l")plt.plot(range(0,20), sp.randn(20), '-')
line typesOptional 3rd argument to plot:

'-': solid
':': dotted
'-.': dashdot
'—': dashed
Integer or string values for lty parameter:

0: 'blank'
1: 'solid' (default)
2: 'dashed'
3: 'dotted'
4: 'dotdash'
5: 'longdash'
6: 'twodash'
Optional 3rd argument to plot:

'-': solid line
'--': dashed line
'-.': dash-dot line
':': dotted line
'.': point
',': pixel
'o': circle
'v': triangle_down
'^': triangle_up
'<': triangle_left
'>': triangle_right
'1': tri_down
'2': tri_up
'3': tri_left
'4': tri_right
's': square
'p': pentagon
'*': star
'h': hexagon1
'H': hexagon2
'+': plus
'x': x
'D': diamond
'd': thin_diamond
'|': vline
'_': hline
function plot
fplot(@sin, [-4 4])x = seq(-4, 4, .01)
plot(sin(x), type="l")
x = [i * .01 for i in range(-400, 400)]
plt.plot(x, sin(x), '-')
quantile-quantile plot
qqplot(runif(50), rnorm(50))
lines(c(-9,9), c(-9,9), col="red")
axis labelsplot( 1:20, (1:20) .** 2)
ylabel('x squared')
plot(1:20, (1:20)^2,
  xlab="x", ylab="x squared")
x = range(0, 20)
plt.plot(x, [i * i for i in x], '-')
plt.ylabel('x squared')
axis limitsplot( 1:20, (1:20) .** 2)
% [xmin, xmax, ymin, ymax]:
axis([1 20 -200 500])
plot(1:20, (1:20)^2,
  xlim=c(0, 20), ylim=c(-200,500))
x = range(0, 20)
plt.plot(x, [i * i for i in x], '-')
plt.xlim([0, 20])
plt.ylim([-200, 500])
logarithmic y-axissemilogy(x, x .** 2,
  x, x .** 3,
  x, x .** 4,
  x, x .** 5)
x = 0:20
plot(x, x^2, log="y",type="l")
lines(x, x^3, col="blue")
lines(x, x^4, col="green")
lines(x, x^5, col="red")
x = range(0, 20)

for i in [2,3,4,5]:
  y.append([j**i for j in x])

for i in [0,1,2,3]:
  semilogy(x, y[i])
multivariate charts
additional line set
plot(1:20, randn(1, 20),
  1:20, randn(1, 20))

optional method:
plot(1:20, randn(1, 20))
hold on
plot(1:20, randn(1, 20))
plot(1:20, rnorm(20), type="l")
lines(1:20, rnorm(20), col="red")
colorsUse color letters by themselves for colored lines. Use '.r' for red dots.

'b': blue
'g': green
'r': red
'c': cyan
'm': magenta
'y': yellow
'k': black
'w': white
# Use the col parameter to specify the color of points
# and lines.
# The colors() function returns a list of recognized
# names for colors.

plot(rnorm(10), col='red')
plot(rnorm(10), col='#FF0000')
superimposed plots with different y-axis scalesx <- 1:10
y <- rnorm(10)
z <- rnorm(10) * 1000
par(mar = c(5, 4, 4, 4) + 0.3)
plot(x, y, type='l')
plot(x, z, col='red', type='l', axes=F,
  xlab='', ylab='')
axis(side=4, col='red', col.axis='red',
mtext('z', side=4, line=3, col='red')
x = (1:20)
y = x + rnorm(20)
y2 = x - 2 + rnorm(20)

plot(x, y, type="l", col="black")
lines(x, y2, type="l", col="red")
legend('topleft', c('first', 'second'),
  lty=c(1,1), lwd=c(2.5, 2.5),
  col=c('black', 'red'))
additional point set
plot(randn(20), randn(20), '.k', randn(20), randn(20), '.r')plot(rnorm(20), rnorm(20))
points(rnorm(20), rnorm(20),
stacked area chart

x = rep(0:4, each=3)
y = round(5 * runif(15))
letter = rep(LETTERS[1:3], 5)
df = data.frame(x, y, letter)

p = ggplot(df, aes(x=x, y=y,
p + geom_area(position='stack')
overlapping area chart

x = rep(0:4, each=3)
y = round(5 * runif(15))
letter = rep(LETTERS[1:3], 5)
df = data.frame(x, y, letter)
alpha = rep(I(2/10), each=15)

p = ggplot(df, aes(x=x, ymin=0, ymax=y,
    group=letter, fill=letter,
p + geom_ribbon()
3d scatter plot

scatterplot3d(rnorm(50), rnorm(50),
  rnorm(50), type="h")
bubble chart

df = data.frame(x=rnorm(20),
  y=rnorm(20), z=rnorm(20))

p = ggplot(df, aes(x=x, y=y, size=z))
p + geom_point()
scatter plot matrix
x = rnorm(20)
y = rnorm(20)
z = x + 3*y
w = y + 0.1*rnorm(20)
df = data.frame(x, y, z, w)

contour plot
m = matrix(0, 100, 100)
for (i in 2:100) {
  for (j in 2:100) {
    m[i,j] = (m[i-1,j] + m[i,j-1])/2 +
      runif(1) - 0.5

filled.contour(1:100, 1:100, m)


version used

The version of software used to check the examples in the reference sheet.

show version

How to determine the version of an installation.

implicit prologue

Code which examples in the sheet assume to have already been executed.


The ggplot2 library must be installed and loaded to use the plotting functions qplot and ggplot.

Grammar and Invocation


How to invoke the interpreter on a script.


How to launch a command line read-eval-print loop for the language.


R installations come with a GUI REPL.

The shell zsh has a built-in command r which re-runs the last command. Shell built-ins take precedence over external commands, but one can invoke the R REPL with:

$ command r

command line program

How to pass the code to be executed to the interpreter as a command line argument.

environment variables

How to get and set an environment variable.

block delimiters

Punctuation or keywords which define blocks.


The list of keywords which define blocks is not exhaustive. Blocks are also defined by

  • switch, case, otherwise, endswitch
  • unwind_protect, unwind_protect_cleanup, end_unwind_protect
  • try, catch, end_try_catch

statement separator

How statements are separated.


Semicolons are used at the end of lines to suppress output. Output echoes the assignment performed by a statement; if the statement is not an assignment the value of the statement is assigned to the special variable ans.

In Octave, but not MATLAB, newlines are not separators when preceded by a backslash.

end-of-line comment

Character used to start a comment that goes to the end of the line.


Octave, but not MATLAB, also supports shell-style comments which start with #.

Variables and Expressions



Traditionally <- was used in R for assignment. Using an = for assignment was introduced in version 1.4.0 sometime before 2002. -> can also be used for assignment:

3 -> x

compound assignment

The compound assignment operators.


Octave, but not MATLAB, has compound assignment operators for arithmetic and bit operations:

+= -= *= /=  **=  ^=
&= |=

Octave, but not MATLAB, also has the C-stye increment and decrement operators ++ and --, which can be used in prefix and postfix position.

increment and decrement operator

The operator for incrementing the value in a variable; the operator for decrementing the value in a variable.



NaN can be used for missing numerical values. Using a comparison operator on it always returns false, including NaN == NaN. Using a logical operator on NaN raises an error.


Octave, but not MATLAB, provides NA which is a synonym of NaN.


Relational operators return NA when one of the arguments is NA. In particular NA == NA is NA. When acting on values that might be NA, the logical operators observe the rules of ternary logic, treating NA is the unknown value.

null test

How to test if a value is null.


Octave, but not MATLAB, has isna and isnull, which are synonyms of isnan and isempty.

conditional expression

A conditional expression.

Arithmetic and Logic

true and false

The boolean literals.


true and false are functions which return matrices of ones and zeros of type logical. If no arguments are specified they return single entry matrices. If one argument is provided, a square matrix is returned. If two arguments are provided, they are the row and column dimensions.


Values which evaluate to false in a conditional test.


When used in a conditional, matrices evaluate to false unless they are nonempty and all their entries evaluate to true. Because strings are matrices of characters, an empty string ('' or "") will evaluate to false. Most other strings will evaluate to true, but it is possible to create a nonempty string which evaluates to false by inserting a null character; e.g. "false\000".


When used in a conditional, a vector evaluates to the boolean value of its first entry. Using a vector with more than one entry in a conditional results in a warning message. Using an empty vector in a conditional, c() or NULL, raises an error.

logical operators

The boolean operators.


Octave, but not MATLAB, also uses the exclamation point '!' for negation.

relational operators

The relational operators.


Octave, but not MATLAB, also uses != for an inequality test.

arithmetic operators

The arithmetic operators: addition, subtraction, multiplication, division, quotient, remainder.


mod is a function and not an infix operator. mod returns a positive value if the first argument is positive, whereas rem returns a negative value.

integer division

How to compute the quotient of two integers.

integer division by zero

What happens when an integer is divided by zero.

float division

How to perform float division, even if the arguments are integers.

float division by zero

What happens when a float is divided by zero.



Octave, but not MATLAB, supports ** as a synonym of ^.


The square root function.


The result of taking the square root of a negative number.

transcendental functions

The standard transcendental functions.

transcendental constants

Constants for pi and e.

float truncation

Ways of converting a float to a nearby integer.

absolute value

The absolute value and signum of a number.

integer overflow

What happens when an expression evaluates to an integer which is too big to be represented.

float overflow

What happens when an expression evaluates to a float which is too big to be represented.

float limits

The machine epsilon; the largest representable float and the smallest (i.e. closest to negative infinity) representable float.

complex construction

Literals for complex numbers.

complex decomposition

How to decompose a complex number into its real and imaginary parts; how to decompose a complex number into its absolute value and argument; how to get the complex conjugate.

random number

How to generate a random integer from a uniform distribution; how to generate a random float from a uniform distribution.

random seed

How to set, get, and restore the seed used by the random number generator.


At startup the random number generator is seeded using operating system entropy.


At startup the random number generator is seeded using the current time.


On Unix the random number generator is seeded at startup from /dev/random.

bit operators

The bit operators left shift, right shift, and, or , xor, and negation.


bitshift takes a second argument which is positive for left shift and negative for right shift.

bitcmp takes a second argument which is the size in bits of the integer being operated on. Octave is not compatible with MATLAB in how the integer size is indicated.


There is a library on CRAN called bitops which provides bit operators.



The syntax for a string literal.

newline in literal

Can a newline be included in a string literal? Equivalently, can a string literal span more than one line of source code?


Double quote strings are Octave specific.

A newline can be inserted into a double quote string using the backslash escape \n.

A double quote string can be continued on the next line by ending the line with a backslash. No newline is inserted into the string.

literal escapes

Escape sequences for including special characters in string literals.


C-style backslash escapes are not recognized by string literals, but they are recognized by the IO system; the string 'foo\n' contains 5 characters, but emits 4 characters when written to standard output.


How to concatenate strings.


How to create a string which consists of a character of substring repeated a fixed number of times.

index of substring

How to get the index of first occurrence of a substring.

extract substring

How to get the substring at a given index.


Octave supports indexing string literals directly: 'hello'(1:4).


How to split a string into an array of substrings. In the original string the substrings must be separated by a character, string, or regex pattern which will not appear in the array of substrings.

The split operation can be used to extract the fields from a field delimited record of data.


Cell arrays, which are essentially tuples, are used to store variable-length strings.

A two dimensional array of characters can be used to store strings of the same length, one per row. Regular arrays cannot otherwise be used to store strings.


How to join an array of substrings into single string. The substrings can be separated by a specified character or string.

Joining is the inverse of splitting.


How to remove whitespace from the beginning and the end of a string.

Trimming is often performed on user provided input.


How to pad the edge of a string with spaces so that it is a prescribed length.

number to string

How to convert a number to a string.

string to number

How to convert a string to number.

translate case

How to put a string into all caps. How to put a string into all lower case letters.


How to create a string using a printf style format.


How to get the number of characters in a string.

character access

How to get the character in a string at a given index.


Octave supports indexing string literals directly: 'hello'(1).

chr and ord

How to convert an ASCII code to a character; how to convert a character to its ASCII code.

Regular Expressions

character class abbreviations

The supported character class abbreviations.

A character class is a set of one or more characters. In regular expressions, an arbitrary character class can be specified by listing the characters inside square brackets. If the first character is a circumflex ^, the character class is all characters not in the list. A hyphen - can be used to list a range of characters.


The C-style backslash escapes, which can be regarded as character classes which match a single character, are a feature of the regular expression engine and not string literals like in other languages.


The supported anchors.

The \< and \> anchors match the start and end of a word respectively.

match test

How to test whether a string matches a regular expression.

case insensitive match test

How to perform a case insensitive match test.


How to replace all substring which match a pattern with a specified string; how to replace the first substring which matches a pattern with a specified string.

backreference in match and substitution

How to use backreferences in a regex; how to use backreferences in the replacement string of substitution.

Date and Time

current date/time

How to get the current date and time.


Sys.time() returns a value of type POSIXct.

date/time type

The data type used to hold a combined date and time value.


The Gregorian calendar was introduced in 1582. The Proleptic Gregorian Calendar is sometimes used for earlier dates, but in the Proleptic Gregorian Calendar the year 1 CE is preceded by the year 1 BCE. The MATLAB epoch thus starts at the beginning of the year 1 BCE, but uses a zero to refer to this year.

date/time difference type

The data type used to hold the difference between two date/time types.

get date parts

How to get the year, the month as an integer from 1 through 12, and the day of the month from a date/time value.


In Octave, but not MATLAB, one can use index notation on the return value of a function:

t = now

get time parts

How to get the hour as an integer from 0 through 23, the minute, and the second from a date/time value.

build date/time from parts

How to build a date/time value from the year, month, day, hour, minute, and second as integers.

convert to string

How to convert a date value to a string using the default format for the locale.

parse datetime

How to parse a date/time value from a string in the manner of strptime from the C standard library.

format datetime

How to write a date/time value to a string in the manner of strftime from the C standard library.



The name of the data type which implements tuples.


How to create a tuple, which we define as a fixed length, inhomogeneous list.

lookup element

How to access an element of a tuple.

update element

How to change one of a tuple's elements.


How to get the number of elements in a tuple.


This section covers one-dimensional arrays which map integers to values.

Multidimensional arrays are a generalization which map tuples of integers to values.

Vectors and matrices are one-dimensional and two-dimensional arrays respectively containing numeric values. They support additional operations including the dot product, matrix multiplication, and norms.

Here are the data types covered in each section:

arraysmatrix (ndims = 1)vectorlist
multidimensional arraysmatrixarraynp.array
vectorsmatrix (ndims = 1)vectornp.array (ndim = 1)
matricesmatrix (ndims = 2)matrixnp.matrix

element type

How to get the type of the elements of an array.

permitted element types

Permitted data types for array elements.


Arrays in Octave can only contain numeric elements.

Array literals can have a nested structure, but Octave will flatten them. The following literals create the same array:

[ 1 2 3 [ 4 5 6] ]
[ 1 2 3 4 5 6 ]

Logical values can be put into an array because true and false are synonyms for 1 and 0. Thus the following literals create the same arrays:

[ true false false ]
[ 1 0 0 ]

If a string is encountered in an array literal, the string is treated as an array of ASCII values and it is concatenated with other ASCII values to produce as string. The following literals all create the same string:

[ 'foo', 98, 97, 114]
[ 'foo', 'bar' ]

If the other numeric values in an array literal that includes a string are not integer values that fit into a ASCII byte, then they are converted to byte sized values.


Array literals can have a nested structure, but R will flatten them. The following literals produce the same array of 6 elements:


If an array literal contains a mixture of booleans and numbers, then the boolean literals will be converted to 1 (for TRUE and T) and 0 (for FALSE and F).

If an array literal contains strings and either booleans or numbers, then the booleans and numbers will be converted to their string representations. For the booleans the string representations are "TRUE'" and "FALSE".


The syntax, if any, for an array literal.


The array literal


will create an array with 5 elements of class char.


The array literal


will create an array of 3 elements of class character, which is the R string type.


How to get the number of values in an array.

empty test



out-of-bounds behavior

index of element


slice to end




How to make an address copy, a shallow copy, and a deep copy of an array.

After an address copy is made, modifications to the copy also modify the original array.

After a shallow copy is made, the addition, removal, or replacement of elements in the copy does not modify of the original array. However, if elements in the copy are modified, those elements are also modified in the original array.

A deep copy is a recursive copy. The original array is copied and a deep copy is performed on all elements of the array. No change to the contents of the copy will modify the contents of the original array.

Arithmetic Sequences

An arithmetic sequence is a sequence of numeric values in which consecutive terms have a constant difference.

unit difference

An arithmetic sequence with a difference of 1.

difference of 10

An arithmetic sequence with a difference of 10.

difference of 0.1

An arithmetic sequence with a difference of 0.1.


How to iterate over an arithmetic sequence.

convert to array

How to convert an arithmetic sequence to an array.

Multidimensional Arrays

Multidimensional arrays are a generalization of arrays which map tuples of integers to values. All tuples in the domain of a multidimensional array have the same length; this length is the dimension of the array.

The multidimensional arrays described in this sheet are homogeneous, meaning that the values are all of the same type. This restriction allows the implementation to store the values of the multidimensional array in a contiguous region of memory without the use of references or points.

Multidimensional arrays should be contrasted with nested arrays. When arrays are nested, the innermost nested arrays contain the values and the other arrays contain references to arrays. The syntax for looking up a value is usually different:

# nested:

# multidimensional:
a[1, 2]

element type

How to get the type of the values stored in a multidimensional array.




construct from sequence—2d

construct from sequence—3d

construct from nested sequences—2d

construct from nested sequences—3d

construct from rows—2d

construct from columns—2d

construct from subarrays—2d

construct 3d array from 2d arrays




1d lookup on 2d array


out-of-bounds behavior


slice to end

slice subarray


permute axes



circular shift—2d


apply function element-wise

apply function to linear subarrays



The syntax for a dictionary literal.


How to get the number of keys in a dictionary.


How to use a key to lookup a value in a dictionary.


How to add or key-value pair or change the value for an existing key.

missing key behavior

What happens when looking up a key that isn't in the dictionary.


How to delete a key-value pair from a dictionary.


How to iterate over the key-value pairs.

keys and values as arrays

How to get an array containing the keys; how to get an array containing the values.


How to merge two dictionaries.


Tables are a data type which correspond to the tables of relational databases. In R this data type is called a data frame. The Python library Pandas provides a table data type called DataFrame.

A table is an array of tuples, each of the same length and type. If the type of the first element of the first type is integer, then all the tuples in the table must have first elements which are integers. The type of the tuples corresponds to the schema of a relational database table.

A table can also be

Pandas types: Series(), DataFrame(), Index()

construct from column arrays

How to construct a data frame from a set of arrays representing the columns.


Octave does not have the table data type.


How to get the number of columns and number of rows in a table.

construct from row tuples

column names as array

How to show the names of the columns.

access column as array

How to access a column in a data frame.

access row as tuple

How to access a row in a data frame.


people[1, ] returns the 1st row from the data frame people as a new data frame with one row. This can be converted to a list using the function as.list. There is often no need because lists and one row data frames have nearly the same behavior.

access datum

How to access a single datum in a data frame; i.e. the value in a column of a single row.

order rows by column

How to sort the rows in a data frame according to the values in a specified column.

order rows by multiple columns

order rows in descending order

How to sort the rows in descending order according to the values in a specified column.

limit rows

How to select the first n rows according to some ordering.

offset rows

How to select rows starting at offset n according to some ordering.

attach columns

How to make column name a variable in the current scope which refers to the column as an array.


Each column of the data frame is copies into a variable named after the column containing the column as a vector. Modifying the data in the variable does not alter the original data frame.

detach columns

How to remove attached column names from the current scope.

spreadsheet editor

How to view and edit the data frame in a spreadsheet.

Import and Export

import tab delimited file

Load a data frame from a tab delimited file.


By default strings are converted to factors. In older versions of R, this could reduce the amount of memory required to load the data frame; this is no longer true in newer versions.

import comma-separated values file

Load a data frame from a CSV file.

set column separator

How to set the column separator when importing a delimited file.

set quote character

How to change the quote character. Quoting is used when strings contain the column separator or the line terminator.

import file w/o header

How to import a file that lacks a header.

set column names

How to set the column names.

set column types

How to indicate the type of the columns.


If the column types are not set or if the type is set to NA or NULL, then the type will be set to logical, integer, numeric, complex, or factor.

recognize null values

Specify the input values which should be converted to null values.

unequal row length behavior

What happen when a row of input has less than or more than the expected number of columns.

skip comment lines

How to skip comment lines.

skip rows

maximum rows to read

index column

export tab delimited file

export comma-separated values file

Save a data frame to a CSV file.


If row.names is not set to F, the initial column will be the row number as a string starting from "1".

Relational Algebra

map data frame

How to apply a mapping transformation to the rows of a data set.

filter data set

How to select the rows of a data set that satisfy a predicate.



define function

How to define a function.

invoke function

How to invoke a function.

nested function

missing argument behavior

What happens when a function is invoked with too few arguments.

extra argument behavior

What happens when a function is invoked with too many arguments.

default argument

How to assign a default argument to a parameter.

variadic function

How to define a function which accepts a variable number of arguments.

return value

How the return value of a function is determined.

multiple return values

How to return multiple values from a function.

anonymous function literal

The syntax for an anonymous function.

invoke anonymous function


function as value

How to store a function in a variable.

Execution Control


How to write a branch statement.


How to write a conditional loop.


How to write a C-style for statement.


How to break out of a loop. How to jump to the next iteration of a loop.

raise exception

How to raise an exception.

handle exception

How to handle an exception.

File Handles

standard file handles

Standard input, standard output, and standard error.

read line from stdin

write line to stdout

How to write a line to stdout.


The backslash escape sequence \n is stored as two characters in the string and interpreted as a newline by the IO system.

write formatted string to stdout

open file for reading

open file for writing

open file for appending

close file

i/o errors

read line

iterate over file by line

read file into array of strings

write string

write line

flush file handle

file handle position

redirect stdout to file


working directory

How to get and set the working directory.

Processes and Environment

command line arguments

How to get the command line arguments.

environment variables

How to get and set and environment variable.

Libraries and Namespaces

load library

How to load a library.

list loaded libraries

Show the list of libraries which have been loaded.

library search path

The list of directories the interpreter will search looking for a library to load.

source file

How to source a file.


When sourcing a file, the suffix if any must be specified, unlike when loading library. Also, a library may contain a shared object, but a sourced file must consist of just R source code.

install package

How to install a package.

list installed packages

How to list the packages which have been installed.


data type

How to get the data type of a value.


For vectors class returns the mode of the vector which is the type of data contained in it. The possible modes are

  • numeric
  • complex
  • logical
  • character
  • raw

Some of the more common class types for non-vector entities are:

  • matrix
  • array
  • list
  • factor
  • data.frame


How to get the attributes for an object.


Arrays and vectors do not have attributes.


How to get the methods for an object.

variables in scope

How to list the variables in scope.

undefine variable

How to undefine a variable.

undefine all variables

How to undefine all variables.


How to interpret a string as source code and execute it.

function documentation

How to get the documentation for a function.

list library functions

How to list the functions and other definitions in a library.

search documentation

How to search the documentation by keyword.


A vector is a one dimensional array which supports these operations:

  • addition on vectors of the same length
  • scalar multiplication
  • a dot product
  • a norm

The languages in this reference sheet provide the above operations for all one dimensional arrays which contain numeric values.

vector literal

element-wise arithmetic operators

scalar multiplication

dot product

cross product



The norm function returns the p-norm, where the second argument is p. If no second argument is provided, the 2-norm is returned.


literal or constructor

Literal syntax or constructor for creating a matrix.

The elements of a matrix must be specified in a linear order. If the elements of each row of the matrix are adjacent to other elements of the same row in the linear order we say the order is row-major. If the elements of each column are adjacent to other elements of the same column we say the order is column-major.


Square brackets are used for matrix literals. Semicolons are used to separate rows, and commas separate row elements. Optionally, newlines can be used to separate rows and whitespace to separate row elements.


Matrices are created by passing a vector containing all of the elements, as well as the number of rows and columns, to the matrix constructor.

If there are not enough elements in the data vector, the values will be recycled. If there are too many extra values will be ignored. However, the number of elements in the data vector must be a factor or a multiple of the number of elements in the final matrix or an error results.

When consuming the elements in the data vector, R will normally fill by column. To change this behavior pass a byrow=T argument to the matrix constructor:

A = matrix(c(1,2,3,4),nrow=2,byrow=T)

constant matrices

How to create a matrices with zeros for entries or with ones for entries.

diagonal matrices

How to create diagonal matrices including the identity matrix.

A matrix is diagonal if and only if aij = 0 for all i ≠ j.


How to get the dimensions of a matrix.

element access

How to access an element of a matrix. All languages described here follow the convention from mathematics of specifying the row index before the column index.


Rows and columns are indexed from one.


Rows and columns are indexed from one.

row access

How to access a row.

column access

How to access a column.

submatrix access

How to access a submatrix.

scalar multiplication

How to multiply a matrix by a scalar.

element-wise operators

Operators which act on two identically sized matrices element by element. Note that element-wise multiplication of two matrices is used less frequently in mathematics than matrix multiplication.

from numpy import array
matrix(array(A) * array(B))
matrix(array(A) / array(B))


How to multiply matrices. Matrix multiplication should not be confused with element-wise multiplication of matrices. Matrix multiplication in non-commutative and only requires that the number of columns of the matrix on the left match the number of rows of the matrix. Element-wise multiplication, by contrast, is commutative and requires that the dimensions of the two matrices be equal.

kronecker product

The Kronecker product is a non-commutative operation defined on any two matrices. If A is m x n and B is p x q, then the Kronecker product is a matrix with dimensions mp x nq.


How to test two matrices for equality.


== and != perform entry-wise comparison. The result of using either operator on two matrices is a matrix of boolean values.

~= is a synonym for !=.


== and != perform entry-wise comparison. The result of using either operator on two matrices is a matrix of boolean values.


How to compute the 1-norm, the 2-norm, the infinity norm, and the frobenius norm.


norm(A) is the same as norm(A,2).

Sparse Matrices

sparse matrix construction

How to construct a sparse matrix using coordinate format.

Coordinate format specifies a matrix with three arrays: the row indices, the the column indices, and the values.

sparse matrix decomposition

sparse identity matrix

dense matrix to sparse matrix

sparse matrix storage


A linear program is an optimization problem in which one tries to minimize or maximize a linear function called the objective function subject to linear inequality constraints.

The unknowns in the objective function and the constraints are called the decision variables.

linear minimization

How to solve a linear program where the objective function is minimized.

If we have a maximization problem c * x, then we can find the solution by minimizing -c * x; that is, replacing c with -c.


linprog is part of the Optimization toolbox.

The function linprog minimizes c * x subject to the constraint that A * x <= b. A * x and b are vectors, and when we say that A * x <= b, we mean that each component of A * x must be less than or equal to the corresponding component of b.

When formulating constraints in the form A * x <= b, we may find that we have some greater-than constraints of the form: ai1 x1 + … + ain xn >= bi. We convert each such constraint to a less-than constraint by multiplying it by -1. We use the values -ai1, …, -ain in A and -bi in b.

linear minimization with bounds on decision variables

How to solve a linear program where the objective function is minimized and with extra constraints on the decision variables of the form Li <= xi <= Ui.

linear minimization with equality constraints

quadratic minimization

integer linear minimization


exact polynomial fit

cubic spline

How to connect the dots of a data set with a line which has a continuous 2nd derivative.

Descriptive Statistics

A statistic is a single number which summarizes a population of data. The most familiar example is the mean or average. Statistics defined for discrete populations can often be meaningfully extended to continuous distributions by replacing summations with integration.

An important class of statistics are the nth moments. The nth moment $$\mu'_n$$ of a population of k values xi with mean μ is:

$$\begin{align} \mu'_n = \sum_{i=1}^k x_i^n \end{align}$$

The nth central moment μn of the same population is:

$$\begin{align} \mu_n = \sum_{i=1}^k (x_i - \mu)^n \end{align}$$

first moment statistics

The sum and the mean.

The mean is the first moment. It is one definition of the center of the population. The median and the mode are also used to define the center. In most populations they will be close to but not identical to the mean.

second moment statistics

The variance and the standard deviation. The variance is the second central moment. It is a measure of the spread or width of the population.

The standard deviation is the square root of the variance. It is also a measurement of population spread. The standard deviation has the same units of measurement as the data in the population.

second moment statistics for samples

The sample variance and sample standard deviation.


The skewness of a population.

The skewness measures the asymmetricality of the population. The skewness will be negative, positive, or zero when the population is more spread out on the left, more spread out on the right, or similarly spread out on both sides, respectively.

The skewness can be calculated from the third moment and the standard deviation:

$$\begin{align} \gamma_1 = E\Big[\Big(\frac{x - \mu}{\sigma}\Big)^3\Big] = \frac{\mu_3}{\sigma^3} \end{align}$$

When estimating the population skewness from a sample a correction factor is often used, yielding the sample skewness:

$$\begin{align} \frac{(n(n-1))^{\frac{1}{2}}}{n-2} \gamma_1 \end{align}$$

octave and matlab:

Octave uses the sample standard deviation to compute skewness. This behavior is different from Matlab and should possibly be regarded as a bug.

Matlab, but not Octave, will take a flag as a second parameter. When set to zero Matlab returns the sample skewness:

skewness(x, 0)


Set the named parameter bias to False to get the sample skewness:

stats.skew(x, bias=False)


The kurtosis of a population.

The formula for kurtosis is:

$$\begin{align} \gamma_2 = \frac{\mu_4}{\sigma^4} - 3 \end{align}$$

When kurtosis is negative the sides of a distribution tend to be more convex than when the kurtosis is is positive. A negative kurtosis distribution tends to have a wide, flat peak and narrow tails. Such a distribution is called platykurtic. A positive kurtosis distribution tends to have a narrow, sharp peak and long tails. Such a distribution is called leptokurtic.

The fourth standardized moment is

$$\begin{align} \beta_2 = \frac{\mu_4}{\sigma^4} \end{align}$$

The fourth standardized moment is sometimes taken as the definition of kurtosis in older literature. The reason the modern definition is preferred is because it assigns the normal distribution a kurtosis of zero.


Octave uses the sample standard deviation when computing kurtosis. This should probably be regarded as a bug.


R uses the older fourth standardized moment definition of kurtosis.

nth moment and nth central moment

How to compute the nth moment (also called the nth absolute moment) and the nth central moment for arbitrary n.


The mode is the most common value in the sample.

The mode is a measure of central tendency like the mean and the median. A problem with the mean is that it can produce values not found in the data. For example the mean number of persons in an American household was 2.6 in 2009.

The mode might not be unique. If there are two modes the sample is said to be bimodal, and in general if there is more than one mode the sample is said to be multimodal.

quantile statistics

If the data is sorted from smallest to largest, the minimum is the first value, the median is the middle value, and the maximum is the last value. If there are an even number of data points, the median is the average of the two middle points. The median divides the population into two halves.

When the population is divided into four parts the division markers are called the first, second, and third quartiles. The interquartile range (IQR) is the difference between the 3rd and 1st quartiles.

When the population is divided into ten parts the division markers are called deciles.

When the population is divided into a hundred parts the division markers are called percentiles.

If the population is divided into n parts the markers are called the 1st, 2nd, …, (n-1)th n-quantiles.

bivariate statistics

The correlation and the covariance.

The correlation is a number from -1 to 1. It is a measure of the linearity of the data, with values of -1 and 1 representing indicating a perfectly linear relationship. When the correlation is positive the quantities tend to increase together and when the correlation is negative one quantity will tend to increase as the other decreases.

A variable can be completely dependent on another and yet the two variables can have zero correlation. This happens for Y = X2 where uniform X on the interval [-1, 1]. Anscombe's quartet gives four examples of data sets each with the same fairly high correlation 0.816 and yet which show significant qualitative differences when plotted.

The covariance is defined by

$$\begin{align} E[X -\mu_X)(Y- \mu_Y)] \end{align}$$

The correlation is the normalized version of the covariance. It is defined by

$$\begin{align} \frac{E[X -\mu_X)(Y- \mu_Y)]}{\sigma_X \sigma_Y} \end{align}$$

correlation matrix

data set to frequency table

How to compute the frequency table for a data set. A frequency table counts how often each value occurs in the data set.


The table function returns an object of type table.

frequency table to data set

How to convert a frequency table back into the original data set.

The order of the original data set is not preserved.


How to bin a data set. The result is a frequency table where each frequency represents the number of samples from the data set for an interval.


The cut function returns a factor.

A labels parameter can be provided with a vector argument to assign the bins names. Otherwise bin names are constructed from the breaks using "[0.0,1.0)" style notation.

The hist function can be used to bin a data set:

x = c(1.1, 3.7, 8.9, 1.2, 1.9, 4.1)
hist(x, breaks=c(0, 3, 6, 9), plot=FALSE)

hist returns an object of type histogram. The counts are in the $counts attribute.


A distribution density function f(x) is a non-negative function which, when integrated over its entire domain is equal to one. The distributions described in this sheet have as their domain the real numbers. The support of a distribution is the part of the domain on which the density function is non-zero.

A distribution density function can be used to describe the values one is likely to see when drawing an example from a population. Values in areas where the density function is large are more likely than values in areas where the density function is small. Values where there density function is zero do not occur. Thus it can be useful to plot the density function.

To derive probabilities from a density function one must integrate or use the associated cumulative density function

$$\begin{align} F(x) = \int_{-\infty}^x f(t) dt \end{align}$$

which gives the probability of seeing a value less than or equal to x. As probabilities are non-negative and no greater than one, F is a function from (-, ) to [0,1]. The inverse of F is called the inverse cumulative distribution function or the quantile function for the distribution.

For each distribution statistical software will generally provide four functions: the density, the cumulative distribution, the quantile, and a function which returns random numbers in frequencies that match the distribution. If the software does not provide a random number generating function for the distribution, the quantile function can be composed with the built-in random number generator that most languages have as long as it returns uniformly distributed floats from the interval [0, 1].

probability density
probability mass
cumulative density
cumulative distribution
inverse cumulative density
inverse cumulative distribution
percent point
random variate

Discrete distributions such as the binomial and the poisson do not have density functions in the normal sense. Instead they have probability mass functions which assign probabilities which sum up to one to the integers. In R warnings will be given if non integer values are provided to the mass functions dbinom and dpoiss.

The cumulative distribution function of a discrete distribution can still be defined on the reals. Such a function is constant except at the integers where it may have jump discontinuities.

Most well known distributions are in fact parametrized families of distributions. This table lists some of them with their parameters and properties.

The information entropy of a continuous distribution with density f(x) is defined as:

$$\begin{align} -\int_\mathbb{R} f(x) \; \log(f(x)) \; dx \end{align}$$

In Bayesian analysis the distribution with the greatest entropy, subject to the known facts about the distribution, is called the maximum entropy probability distribution. It is considered the best distribution for modeling the current state of knowledge.


The probability mass, cumulative distribution, quantile, and random number generating functions for the binomial distribution.

The binomial distribution is a discrete distribution. It models the number of successful trails when n is the number of trials and p is the chance of success for each trial. An example is the number of heads when flipping a coin 100 times. If the coin is fair then p is 0.50.


Random numbers in a binomial distribution can also be generated with:

np.random.binomial(n, p)


The probability mass, cumulative distribution, quantile, and random number generating functions for the Poisson distribution.

The Poisson distribution is a discrete distribution. It is described by a parameter lam which is the mean value for the distribution. The Poisson distribution is used to model events which happen at a specified average rate and independently of each other. Under these circumstances the time between successive events will be described by an exponential distribution and the events are said to be described by a poisson process.


Random numbers in a Poisson distribution can also be generated with:

np.random.poisson(lam, size=1)


The probability density, cumulative distribution, quantile, and random number generating functions for the normal distribution.

The parameters are the mean μ and the standard deviation σ. The standard normal distribution has μ of 0 and σ of 1.

The normal distribution is the maximum entropy distribution for a given mean and variance. According to the central limit theorem, if {X1, …, Xn} are any independent and identically distributed random variables with mean μ and variance σ2, then Sn := Σ Xi / n converges to a normal distribution with mean μ and variance σ2/n.


Random numbers in a normal distribution can also be generated with:



The probability density, cumulative distribution, quantile, and random number generating functions for the gamma distribution.

The parameter k is called the shape parameter and θ is called the scale parameter. The rate of the distribution is β = 1/θ.

If Xi are n independent random variables with Γ(ki, θ) distribution, then Σ Xi has distribution Γ(Σ ki, θ).

If X has Γ(k, θ) distribution, then αX has Γ(k, αθ) distribution.


The probability density, cumulative distribution, quantile, and random number generating functions for the exponential distribution.


The probability density, cumulative distribution, quantile, and random number generating functions for the chi-squared distribution.


The probability density, cumulative distribution, quantile, and random number generating functions for the beta distribution.


The probability density, cumulative distribution, quantile, and random number generating functions for the uniform distribution.

The uniform distribution is described by the parameters a and b which delimit the interval on which the density function is nonzero.

The uniform distribution the is maximum entropy probability distribution with support [a, b].

Consider the uniform distribution on [0, b]. Suppose that we take k samples from it, and m is the largest of the samples. The minimum variance unbiased estimator for b is

$$\begin{align} \frac{k+1}{k}m \end{align}$$

octave, r, numpy:

a and b are optional parameters and default to 0 and 1 respectively.

Student's t

The probability density, cumulative distribution, quantile, and random number generating functions for Student's t distribution.

Snedecor's F

The probability density, cumulative distribution, quantile, and random number generating functions for Snedecor's F distribution.

empirical density function

How to construct a density function from a sample.

empirical cumulative distribution

empirical quantile function

Linear Regression

simple linear regression

How to get the slope a and intercept b for a line which best approximates the data. How to get the residuals.

If there are more than two data points, then the system is overdetermined and in general there is no solution for the slope and the intercept. Linear regression looks for line that fits the points as best as possible. The least squares solution is the line that minimizes the sum of the square of the distances of the points from the line.

The residuals are the difference between the actual values of y and the calculated values using ax + b. The norm of the residuals can be used as a measure of the goodness of fit.

no intercept

multiple linear regression


logistic regression

Statistical Tests

A selection of statistical tests. For each test the null hypothesis of the test is stated in the left column.

In a null hypothesis test one considers the p-value, which is the chance of getting data which is as or more extreme than the observed data if the null hypothesis is true. The null hypothesis is usually a supposition that the data is drawn from a distribution with certain parameters.

The extremeness of the data is determined by comparing the expected value of a parameter according to the null hypothesis to the estimated value from the data. Usually the parameter is a mean or variance. In a one-tailed test the p-value is the chance the difference is greater than the observed amount; in a two-tailed test the p-value is the chance the absolute value of the difference is greater than the observed amount.

Octave and MATLAB have different names for the statistical test functions. The sheet shows the Octave functions; the corresponding MATLAB functions are:


wilcoxon signed-rank test


wilcoxon_test() is an Octave function. The MATLAB function is ranksum().

kruskal-wallis rank sum test

kolmogorov-smirnov test

Test whether two samples are drawn from the same distribution.


kolmogorov_smirnov_test_2() and kolmogorov_smirnov_test() are Octave functions. The corresponding MATLAB functions are kstest2() and kstest().

kolmogorov_smirnov_test() is a one sample test; it tests whether a sample is drawn from one of the standard continuous distributions. A one sample KS test gives a repeatable p-value; generating a sample and using a two sample KS test does not.

x = randn(100, 1)

% null hypothesis is true:
kolmogorov_smirnov_test(x, "norm", 0, 1)

% alternative hypothesis is true:
kolmogorov_smirnov_test(x, "unif", -0.5, 0.5)


one-sample t-test

independent two-sample t-test

Test whether two normal variables have same mean.


If the normal variables are known to have the same variance, the variance can be pooled to estimate standard error:

t.test(x, y, var.equal=T)

If the variance cannot be pooled, then Welch's t-test is used. This uses a lower (often non-integral) degrees-of-freedom value, which in turn results in a higher p-value.

one-sample binomial test

two-sample binomial test

chi-squared test

poisson test

F test

pearson product moment test

shapiro-wilk test

bartlett's test

A test whether variables are drawn from normal distributions with the same variance.

levene's test

A test whether variables are drawn from distributions with the same variance.

one-way anova

Time Series

A time series is a sequence of data points collected repeatedly on a uniform time interval.

A time series can be represented by a dictionary which maps timestamps to the type of the data points. A more efficient implementation exploits the fact that the time interval is uniform and stores the data points in an array. To recover the timestamps of the data points, the timestamp of the first data point and the length of the time interval are also stored.

time series

How to create a time series from an array.

monthly time series

How to create a time series with one data point per month.

lookup by time

How to get to a data point in a time series by when the data point was collected.

lookup by position in series

How to get a data point in a time series by its ordinal position.

aligned arithmetic

lagged difference

simple moving average

weighted moving average

exponential smoothing

decompose into seasonal and trend



Fast Fourier Transform

[fft fft

inverse fft

shift constant component to center

two-dimensional fft

n-dimensional fft


Univariate Charts

vertical bar chart

A chart in which numerical values are represented by horizontal bars. The bars are aligned at the bottom.


How to produce a bar chart using ggplot2:

cnts = c(7,3,8,5,5)
names = c("a","b","c","d","e")
df = data.frame(names, cnts)
qplot(names, data=df, geom="bar", weight=cnts)

horizontal bar chart

A bar chart with horizontal bars which are aligned on the left.

pie chart

A pie chart displays values using the areas of circular sectors or equivalently the lengths of the arcs of those sectors.

A pie chart implies that the values are percentages of a whole.

stem-and-leaf plot

A stem-and-leaf plot is a concise way of storing a small set of numbers which makes their distribution visually evident.

The original set of numbers can be recovered with some loss of accuracy by appending the number on the left with each of the digits on the right. In the example below the original data set contained -43, -42, -41, -39, -38, -35, …, 35, 44, 46, 50, 58.

> stem(20*rnorm(100))

  The decimal point is 1 digit(s) to the right of the |

  -4 | 321
  -2 | 98544054310
  -0 | 8864333111009998776444332222110
   0 | 0001122333333466667778899122334555666789
   2 | 00023669025
   4 | 4608


A histogram is a bar chart where each bar represents a range of values that the data points can fall in. The data is tabulated to find out how often data points fall in each of the bins and in the final chart the length of the bars corresponds to the frequency.

A common method for choosing the number of bins using the number of data points is Sturges' formula:

$$\begin{align} \lceil \log_2{x} + 1 \rceil \end{align}$$


How to make a histogram with the ggplot2 library:

qplot(rnorm(50), geom="histogram", binwidth=binwidth)
binwidth = (max(x)-min(x))/10
qplot(rnorm(50), geom="histogram", binwidth=binwidth)

box plot

Also called a box-and-whisker plot.

The box shows the locations of the 1st quartile, median, and 3rd quartile. These are the same as the 25th percentile, 50th percentile, and 75th percentile.

The whiskers are sometimes used to show the maximum and minimum values of the data set. Outliers are sometimes shown explicitly with dots, in which case all remaining data points occur inside the whiskers.


How to create a box plot with ggplot2:

qplot(x="rnorm", y=rnorm(50), geom="boxplot")

qplot(x=c("rnorm", "rexp", "runif"), y=c(rnorm(50), rexp(50), runif(50)), geom="boxplot")

chart title

How to set the chart title.


The qplot commands supports the main options for setting the title:

qplot(x="rnorm", y=rnorm(50), geom="boxplot", main="boxplot example")

Bivariate Charts

stacked bar chart

Two or more data sets with a common set of labels can be charted with a stacked bar chart. This makes the sum of the data sets for each label readily apparent.

grouped bar chart

Optionally data sets with a common set of labels can be charted with a grouped bar chart which clusters the bars for each label. The grouped bar chart makes it easier to perform comparisons between labels for each data set.

scatter plot

A scatter plot can be used to determine if two variables are correlated.


How to make a scatter plot with ggplot:

x = rnorm(50)
y = rnorm(50)
p = ggplot(data.frame(x, y), aes(x, y))
p = p + layer(geom="point")

point types

hexagonal binning

A hexagonal binning is the two-dimensional analog of a histogram. The number of data points in each hexagon is tabulated, and then color or grayscale is used to show the frequency.

A hexagonal binning is superior to a scatter-plot when the number of data points is high because most scatter-plot software doesn't indicate when points are occur on top of each other.

linear regression line

How to plot a line determined by linear regression on top of a scatter plot.

polygonal line plot

How to connect the dots of a data set with a polygonal line.

line types

function plot

How to plot a function.

quantile-quantile plot

Also called a Q-Q plot.

A quantile-quantile plot is a scatter plot created from two data sets. Each point depicts the quantile of the first data set with its x position and the corresponding quantile of the second data set with its y position.

If the data sets are drawn from the same distribution then most of the points should be close to the line y = x. If the data sets are drawn from distributions which have a linear relation then the Q-Q plot should also be close to linear.

If the two data sets have the same number of elements, one can simply sort them and create the scatterplot.

If the number of elements is different, one a set of quantiles (such as percentiles) for each set. The quantile function of MATLAB and R is convenient for this. With Python, one can use scipy.stats.scoreatpercentile.

axis labels

How to label the x and y axes.


How to label the axes with ggplot2:

x = rnorm(20)
y = x^2

p = ggplot(data.frame(x, y), aes(x, y))
p + layer(geom="point") + xlab('x') + ylab('x squared')

axis limits

How to manually set the range of values displayed by an axis.

logarithmic y-axis

Multivariate Charts

additional line set

How to add another line to a plot.


How to set the color of points and lines.

superimposed plots with different y-axis scales

How to superimpose two plots with different y-axis scales.

To minimize the risk that the reader will read off an incorrect y-value for a data point, the example uses the same color for the y-axis as it does for the corresponding data set.


How to put a legend on a chart.


These strings can be used as the first argument to control the legend position:

  • "bottomright"
  • "bottom"
  • "bottomleft"
  • "left"
  • "topleft"
  • "top"
  • "topright"
  • "right"
  • "center"

The named parameter lwd is the line width. It is roughly the width in pixels, though the exact interpretation is device specific.

The named parameter lty specifies the line type. The value can be either an integer or a string:


additional point set

stacked area chart

overlapping area chart

3d scatter plot

bubble chart

scatter plot matrix

contour plot


Octave Manual
MATLAB Documentation
Differences between Octave and MATLAB
Octave-Forge Packages

The basic data type of MATLAB is a matrix of floats. There is no distinction between a scalar and a 1x1 matrix, and functions that work on scalars typically work on matrices as well by performing the scalar function on each entry in the matrix and returning the results in a matrix with the same dimensions. Operators such as the logical operators ('&' '|' '!'), relational operators ('==', '!=', '<', '>'), and arithmetic operators ('+', '-') all work this way. However the multiplication '*' and division '/' operators perform matrix multiplication and matrix division, respectively. The .* and ./ operators are available if entry-wise multiplication or division is desired.

Floats are by default double precision; single precision can be specified with the single constructor. MATLAB has convenient matrix literal notation: commas or spaces can be used to separate row entries, and semicolons or newlines can be used to separate rows.

Arrays and vectors are implemented as single-row (1xn) matrices. As a result an n-element vector must be transposed before it can be multiplied on the right of a mxn matrix.

Numeric literals that lack a decimal point such as 17 and -34 create floats, in contrast to most other programming languages. To create an integer, an integer constructor which specifies the size such as int8 and uint16 must be used. Matrices of integers are supported, but the entries in a given matrix must all have the same numeric type.

Strings are implemented as single-row (1xn) matrices of characters. Matrices cannot contain strings. If a string is put in matrix literal, each character in the string becomes an entry in the resulting matrix. This is consistent with how matrices are treated if they are nested inside another matrix. The following literals all yield the same string or 1xn matrix of characters:

[ 'f' 'o' 'o' ]
[ 'foo' ]
[ [ 'f' 'o' 'o' ] ]

true and false are functions which return matrices of ones and zeros. The ones and zeros have type logical instead of double, which is created by the literals 1 and 0. Other than having a different class, the 0 and 1 of type logical behave the same as the 0 and 1 of type double.

MATLAB has a tuple type (in MATLAB terminology, a cell array) which can be used to hold multiple strings. It can also hold values with different types.


An Introduction to R
Advanced R Programming
The Comprehensive R Archive Network

The primitive data types of R are vectors of floats, vectors of strings, and vectors of booleans. There is no distinction between a scalar and a vector with one entry in it. Functions and operators which accept a scalar argument will typically accept a vector argument, returning a vector of the same size with the scalar operation performed on each the entries of the original vector.

The scalars in a vector must all be of the same type, but R also provides a list data type which can be used as a tuple (entries accessed by index), record (entries accessed by name), or even as a dictionary.

In addition R provides a data frame type which is a list (in R terminology) of vectors all of the same length. Data frames are equivalent to the data sets of other statistical analysis packages.


NumPy and SciPy Documentation
matplotlib intro
NumPy for Matlab Users
Pandas Documentation
Pandas Method/Attribute Index

NumPy is a Python library which provides a data type called array. It differs from the Python list data type in the following ways:

  • N-dimensional. Although the list type can be nested to hold higher dimension data, the array can hold higher dimension data in a space efficient manner without using indirection.
  • homogeneous. The elements of an array are restricted to be of a specified type. The NumPy library introduces new primitive types not available in vanilla Python. However, the element type of an array can be object which permits storing anything in the array.

In the reference sheet the array section covers the vanilla Python list and the multidimensional array section covers the NumPy array.

List the NumPy primitive types

SciPy, Matplotlib, and Pandas are libraries which depend on Numpy.


issue tracker | content of this page licensed under creative commons attribution-sharealike 3.0