Python

Publish Date: 2020-03-15

Author: Isaac Zhou

Word Count: 1.1k

Read Times: 7 Min

A Primer on Numpy

The core data structure used in numpy is array.

Import Numpy

After install numpy, simply import it. Usually people import it as np for abbreviation.

import numpy as np

Check version

print(np.__version__)

Why using numpy? Why not python list?

Python List

L = [i for i in range(10)]
print(L)

output

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python’s list may contain different data types
- pro: flexibility
- con: inefficient

Python Array

Python also has data type array which can only take one type of data structure

import array
arr = array.array("i", (i for i in range(10)))
# "i" is the data type integer

An error will be raised if we assign another data type to array

arr[5] = "python"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-249200ff8ac8> in <module>
----> 1 arr[5]="python"

TypeError: an integer is required (got type str)

Python array does not treat the array data as vector or matrix, hence the methods used for vector or matrix are not implemented

Numpy Array

It’s simple to create a numpy array:

nparr = np.array([i for i in range(10)])

nparr

output

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

numpy array can only store one type of data

Check data type

nparr.dtype

output

dtype('int32')

If we change an element to a different data type (e.g. from int to float),
numpy array will automatically change this value back to the array datatype

nparr[5] = 5.8
print(nparr)
print(nparr.dtype)

output

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dtype('int32')

To create an array with float data type

nparr2 = np.array([1,2,3.0])
nparr2.dtype

output

dtype('float64')

Create Numpy Array and Matrix

Alternative ways to create array

There are a number of other built-in methods to create arrays

Create a zero array

npzero = np.zeros(10)
npzero
npzero.dtype

output

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dtype('float64')

If we want to create a zero array of int data type

np.zeros(10, dtype=int)

output

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

To create a matrix of 5 rows x 6 cols, instead of passing the size, we pass a tuple of row and col sizes

np.zeros((5,6))

output

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

Create a one array

np.ones(10)

output

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Create a one matrix

np.ones((3,4))

output

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Create an array/matrix with specific value

np.full((3,4), 20) # int
# same as np.full(shape=(3,4), fill=20)
np.full((3,4), 20.0) # float

output

array([[20, 20, 20, 20],
       [20, 20, 20, 20],
       [20, 20, 20, 20]])

array([[20., 20., 20., 20.],
       [20., 20., 20., 20.],
       [20., 20., 20., 20.]])

arange method

In python, there’s a method called range

[i for i in range(1,20,3)]
#range(start, end not inclusive, step)

output

[1, 4, 7, 10, 13, 16, 19]

In numpy, there’s a built-in method called arange, with similar usage

[i for i in np.arange(1,20,3)]

output is the same

[1, 4, 7, 10, 13, 16, 19]

However the big difference here is in python, the step can only be int,
whereas float number can be used for step in np arange

[i for i in range(0,20,0.4)]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-ccd0e40a6b64> in <module>
----> 1 [i for i in range(0,20,0.4)]

TypeError: 'float' object cannot be interpreted as an integer

For np.arange

[i for i in np.arange(15,20,0.5)]

output

[15.0, 15.5, 16.0, 16.5, 17.0, 17.5, 18.0, 18.5, 19.0, 19.5]

Linspace method

linspace method is used to create an evenly segmented range.

To create 10 evenly distributed data points between a range from 0 to 20

Note: for linspace the end point 20 is inclusive, which is different from aranage

np.linspace(0, 20, 10)

array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])

Random

To use the random module

np.random.random

Int random

To get a random int between 0 and 10 (non-inclusive)

np.random.randint(0,10)

To get an 8 element array containing int number between 0 and 100

np.random.randint(0,100,8)

output

array([47,  9, 23, 48, 79, 42, 32, 80])

To get a 4x5 random int matrix

np.random.randint(0,100,(4,5))

output

array([[69,  3, 25, 83, 48],
       [19, 52, 52, 13, 62],
       [ 6, 22, 79, 63, 48],
       [ 8,  8, 53, 21, 20]])

Set seed

all random numbers generated by computers are pseudo-random.
Use a random see to make sure that the random numbers generated each time are the same

To set a seed

np.random.seed(42)

# run random matrix
np.random.randint(0,100,(4,5))

array([[51, 92, 14, 71, 60],
       [20, 82, 86, 74, 74],
       [87, 99, 23,  2, 21],
       [52,  1, 87, 29, 37]])

repeat above

np.random.seed(42)
np.random.randint(0,100,(4,5))

array([[51, 92, 14, 71, 60],
       [20, 82, 86, 74, 74],
       [87, 99, 23,  2, 21],
       [52,  1, 87, 29, 37]])

Float Random

Single random float

np.random.random()

0.9922115592912175

A random array

np.random.random(10)

array([0.61748151, 0.61165316, 0.00706631, 0.02306243, 0.52477466,
       0.39986097, 0.04666566, 0.97375552, 0.23277134, 0.09060643])

A random 3x4 matrix

np.random.random((3,4))

array([[0.61838601, 0.38246199, 0.98323089, 0.46676289],
       [0.85994041, 0.68030754, 0.45049925, 0.01326496],
       [0.94220176, 0.56328822, 0.3854165 , 0.01596625]])

Normal Distribution

In statistics, it’s common to generate a normal distribution.
To generate a random normal distribution:

A random normal distribution number

np.random.normal()

0.230893825622149

A random normal distribution number with mean=10, variance=4

np.random.normal(10,4)

13.116770538060019

To get a random normal distribution array

np.random.normal(0,1,10)

array([-1.10109776,  1.13022819,  0.37311891, -0.38647295, -1.15877024,
        0.56611283, -0.70445345, -1.3779393 , -0.35311665, -0.46146572])

To get a random normal distribution matrix

np.random.normal(0,1,(3,5))

array([[ 0.39637394, -0.6256163 , -0.52246148,  0.0130339 ,  0.37776959],
       [ 0.0628246 ,  0.50159637, -0.14694539,  0.18062297,  0.96481058],
       [-1.06483115,  0.1087118 ,  0.10576365,  0.92066672, -0.22672246]])

Reprint policy