A Primer on Numpy
The core data structure used in numpy is array.
Import Numpy
- After install numpy, simply import it. Usually people import it as np for abbreviation.
import numpy as np
- Check version
print(np.__version__)
Why using numpy? Why not python list?
Python List
L = [i for i in range(10)]
print(L)
output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
- Python’s list may contain different data types
- pro: flexibility
- con: inefficient
Python Array
Python also has data type array which can only take one type of data structure
import array
arr = array.array("i", (i for i in range(10)))
# "i" is the data type integer
An error will be raised if we assign another data type to array
arr[5] = "python"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-58-249200ff8ac8> in <module>
----> 1 arr[5]="python"
TypeError: an integer is required (got type str)
- Python array does not treat the array data as vector or matrix, hence the methods used for vector or matrix are not implemented
Numpy Array
It’s simple to create a numpy array:
nparr = np.array([i for i in range(10)])
nparr
output
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
numpy array can only store one type of data
- Check data type
nparr.dtype
output
dtype('int32')
If we change an element to a different data type (e.g. from int to float),
numpy array will automatically change this value back to the array datatype
nparr[5] = 5.8
print(nparr)
print(nparr.dtype)
output
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dtype('int32')
To create an array with float data type
nparr2 = np.array([1,2,3.0])
nparr2.dtype
output
dtype('float64')
Create Numpy Array and Matrix
Alternative ways to create array
There are a number of other built-in methods to create arrays
- Create a zero array
npzero = np.zeros(10)
npzero
npzero.dtype
output
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dtype('float64')
If we want to create a zero array of int data type
np.zeros(10, dtype=int)
output
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
To create a matrix of 5 rows x 6 cols, instead of passing the size, we pass a tuple of row and col sizes
np.zeros((5,6))
output
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
- Create a one array
np.ones(10)
output
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Create a one matrix
np.ones((3,4))
output
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
- Create an array/matrix with specific value
np.full((3,4), 20) # int
# same as np.full(shape=(3,4), fill=20)
np.full((3,4), 20.0) # float
output
array([[20, 20, 20, 20],
[20, 20, 20, 20],
[20, 20, 20, 20]])
array([[20., 20., 20., 20.],
[20., 20., 20., 20.],
[20., 20., 20., 20.]])
arange method
In python, there’s a method called range
[i for i in range(1,20,3)]
#range(start, end not inclusive, step)
output
[1, 4, 7, 10, 13, 16, 19]
In numpy, there’s a built-in method called arange, with similar usage
[i for i in np.arange(1,20,3)]
output is the same
[1, 4, 7, 10, 13, 16, 19]
However the big difference here is in python, the step can only be int,
whereas float number can be used for step in np arange
[i for i in range(0,20,0.4)]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-89-ccd0e40a6b64> in <module>
----> 1 [i for i in range(0,20,0.4)]
TypeError: 'float' object cannot be interpreted as an integer
For np.arange
[i for i in np.arange(15,20,0.5)]
output
[15.0, 15.5, 16.0, 16.5, 17.0, 17.5, 18.0, 18.5, 19.0, 19.5]
Linspace method
linspace
method is used to create an evenly segmented range.
To create 10 evenly distributed data points between a range from 0 to 20
Note: for linspace
the end point 20 is inclusive, which is different from aranage
np.linspace(0, 20, 10)
array([ 0. , 2.22222222, 4.44444444, 6.66666667, 8.88888889,
11.11111111, 13.33333333, 15.55555556, 17.77777778, 20. ])
Random
To use the random
module
np.random.random
Int random
- To get a random int between 0 and 10 (non-inclusive)
np.random.randint(0,10)
- To get an 8 element array containing int number between 0 and 100
np.random.randint(0,100,8)
output
array([47, 9, 23, 48, 79, 42, 32, 80])
- To get a 4x5 random int matrix
np.random.randint(0,100,(4,5))
output
array([[69, 3, 25, 83, 48],
[19, 52, 52, 13, 62],
[ 6, 22, 79, 63, 48],
[ 8, 8, 53, 21, 20]])
Set seed
- all random numbers generated by computers are pseudo-random.
- Use a random see to make sure that the random numbers generated each time are the same
To set a seed
np.random.seed(42)
# run random matrix
np.random.randint(0,100,(4,5))
array([[51, 92, 14, 71, 60],
[20, 82, 86, 74, 74],
[87, 99, 23, 2, 21],
[52, 1, 87, 29, 37]])
repeat above
np.random.seed(42)
np.random.randint(0,100,(4,5))
array([[51, 92, 14, 71, 60],
[20, 82, 86, 74, 74],
[87, 99, 23, 2, 21],
[52, 1, 87, 29, 37]])
Float Random
- Single random float
np.random.random()
0.9922115592912175
- A random array
np.random.random(10)
array([0.61748151, 0.61165316, 0.00706631, 0.02306243, 0.52477466,
0.39986097, 0.04666566, 0.97375552, 0.23277134, 0.09060643])
- A random 3x4 matrix
np.random.random((3,4))
array([[0.61838601, 0.38246199, 0.98323089, 0.46676289],
[0.85994041, 0.68030754, 0.45049925, 0.01326496],
[0.94220176, 0.56328822, 0.3854165 , 0.01596625]])
Normal Distribution
In statistics, it’s common to generate a normal distribution.
To generate a random normal distribution:
- A random normal distribution number
np.random.normal()
0.230893825622149
- A random normal distribution number with mean=10, variance=4
np.random.normal(10,4)
13.116770538060019
- To get a random normal distribution array
np.random.normal(0,1,10)
array([-1.10109776, 1.13022819, 0.37311891, -0.38647295, -1.15877024,
0.56611283, -0.70445345, -1.3779393 , -0.35311665, -0.46146572])
- To get a random normal distribution matrix
np.random.normal(0,1,(3,5))
array([[ 0.39637394, -0.6256163 , -0.52246148, 0.0130339 , 0.37776959],
[ 0.0628246 , 0.50159637, -0.14694539, 0.18062297, 0.96481058],
[-1.06483115, 0.1087118 , 0.10576365, 0.92066672, -0.22672246]])