NumPy, short for Numerical Python, is the foundational package for scientific computing in Python. provides, among other things:
For numerical data, NumPy arrays are a much more efficient way of storing and manipulating data than the other built-in Python data structures.
Install it using this command:(write the command on the command python command prompt/ ANACONDA prompt)
pip install numpy
or
conda install numpy
Once NumPy is installed, import it in your applications by adding the __import__
keyword:
The version string is stored under __version__
attribute.
import numpy as np
print(np.__version__)
1.21.5
NumPy is used to work with arrays. The array object in NumPy is called ndarray
We can create a NumPy ndarray object by using the array()
function.
my_list= [1,2,3,4,5]
oned_array=np.array(my_list)
oned_array
array([1, 2, 3, 4, 5])
type(oned_array) ## type() is a inbuilt finction
numpy.ndarray
## to get the dimension of Array
oned_array.shape ## shape() is a inbuilt finction
1
## length of a array/vaector
oned_array.size
5
## Mumtinested array
L1= [1,2,3,4,5]
L2= [2,3,4,5,6]
L3= [3,4,5,6,7]
twodarray= np.array([L1,L2,L3])
twodarray ## Gives a 3X5 Matrix
array([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]])
## dimmension of the Matrix
twodarray.shape
(3, 5)
## Total no. of observations of the matrix(i.e, length)
twodarray.size
15
threedarray= np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
threedarray
array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
threedarray.shape
(2, 2, 3)
threedarray.size
12
To change the shape of an array numpy function reshape()
is used.
twodarray
print(twodarray.shape)
(3, 5)
twodarray.reshape(5,3)
array([[1, 2, 3], [4, 5, 2], [3, 4, 5], [6, 3, 4], [5, 6, 7]])
Example: If we want to put the numbers 1 through 9 in a 3×3 grid, we can do the following:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3] [4 5 6] [7 8 9]]
## Get the first element from the following array:
oned_array[0] # 1st elemrnt of the oned_array.
5
oned_array[3] # 4 th elemrnt of the oned_array.
4
oned_array[:3] # upto 4th element, but excluding the 4th element
array([1, 2, 3])
oned_array[2:] # from 3nd element upto end, with including the 3rd element
array([3, 4, 5])
## Negative indexing
print(oned_array[-1]) # Last element
5
print(oned_array[-2]) # 2nd element from the last
4
print(oned_array[-2:]) # Last two elements
[4 5]
print(oned_array[:-2]) # all elemeents instead of last two
[1 2 3]
twodarray[0,] # 1st row of the matrix
array([1, 2, 3, 4, 5])
twodarray[2,] # 3rd row of the matrix
array([3, 4, 5, 6, 7])
twodarray[0,3] # (1,4)th element of the matrix
4
twodarray[1,:3] # 1st three elements of the 2nd row from the matrix
array([2, 3, 4])
twodarray[:2,] # 1st two rows
array([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6]])
twodarray[1:,3:]
array([[5, 6], [6, 7]])
twodarray[:-1,:-2]
array([[1, 2, 3], [2, 3, 4]])
threedarray[0,1,] # 1st matrix 2nd row
array([4, 5, 6])
threedarray[1,1,2] #2nd matrix (2,3) element
6
NumPy has some extra data types, and refer to data types.
i
- integerb
- booleanu
- unsigned integerf
- floatc
- complex floatm
- timedeltaM
- datetimeO
- objectS
- stringU
- unicode stringV
- fixed chunk of memory for other type ( void )arr_1 = np.array([1, 2, 3, 4], dtype='S')
print(arr_1)
print(arr.dtype) # dtype() use for checking the type of array
[b'1' b'2' b'3' b'4'] object
arr_2= np.array(["a","b","c"])
print(arr_2)
print(arr.dtype)
['a' 'b' 'c'] object
Example: converting strings of arr_2
from lower to upper case
arr_2_upper=arr_2
for i in range(len(arr_2)):
arr_2_upper[i]= arr_2[i].upper()
print(arr_2_upper)
['A' 'B' 'C']
The best way to change the data type of an existing array, is to make a copy of the array with the astype()
method.
The astype()
function creates a copy of the array, and allows us to specify the data type as a parameter.
## Change data type from float to integer by using int as parameter value:
arr_3 = np.array([1.1, 2.1, 3.1])
newarr_3 = arr_3.astype(int)
print(newarr_3)
print(newarr_3.dtype)
[1 2 3] int32
## Change data type from integer to boolean:
arr = np.array([1, 0, 3])
newarr = arr.astype(bool)
print(newarr)
print(newarr.dtype)
[ True False True] bool
arr= np.zeros(5)
arr
array([0., 0., 0., 0., 0.])
Similarly see for:
zeros_like
: Return an array of zeros with shape and type of input.empty
: Return a new uninitialized array.ones
: Return a new array setting values to one.full
: Return a new array of given shape filled with value.The copy
SHOULD NOT be affected by the changes made to the original array.
The view
SHOULD be affected by the changes made to the original array.
## Make a copy, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42
print(arr)
print(x)
[42 2 3 4 5] [1 2 3 4 5]
## Make a view, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42
print(arr)
print(x)
[42 2 3 4 5] [42 2 3 4 5]
# Join two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
[1 2 3 4 5 6]
# Join two 2-D arrays along rows (axis=1):
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
print(arr)
[[1 2 5 6] [3 4 7 8]]
stack()
.It is same as concatenation, the only difference is that stacking is done along a new axis.Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple.
We use array_split()
for splitting arrays, we pass it the array we want to split and the number of splits.
# Split the array in 3 parts:
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
[array([1, 2]), array([3, 4]), array([5, 6])]
# Split the 2-D array into three 2-D arrays.
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
print(newarr)
[array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]]), array([[ 9, 10], [11, 12]])]
hsplit()
opposite of hstack()
# Finding the indexes where the value is 4:
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x= np.where(arr == 4)
print(x)
(array([3, 5, 6], dtype=int64),)
# Finding the indexes where the values are even:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
print(x)
(array([1, 3, 5, 7], dtype=int64),)
To search for more than one value,we can use searchsorted()
with an array of the specified values.
# Finding the indexes where the values 2, 4, and 6 should be inserted:
arr = np.array([1, 3, 5, 7])
x = np.searchsorted(arr, [2, 4, 6])
print(x)
[1 2 3]
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
[0 1 2 3]
arr = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr))
['apple' 'banana' 'cherry']
arr = np.array([[3, 2, 4], [5, 0, 1]])
print("Original")
print(arr)
print("After Sorting")
print(np.sort(arr))
Original [[3 2 4] [5 0 1]] After Sorting [[2 3 4] [0 1 5]]
Example 1: Create a filter array that will return only values higher than 42:
arr = np.array([41, 42, 43, 44])
# Create an empty list
filter_arr = []
# go through each element in arr
for element in arr:
# if the element is higher than 42, set the value to True, otherwise False:
if element > 42:
filter_arr.append(True)
else:
filter_arr.append(False)
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
[False, False, True, True] [43 44]
Example 2: Create a filter array that will return only even elements from the original array:
arr = np.array([1, 2, 3, 4, 5, 6, 7])
# Create an empty list
filter_arr = []
# go through each element in arr
for element in arr:
# if the element is completely divisble by 2, set the value to True, otherwise False
if element % 2 == 0:
filter_arr.append(True)
else:
filter_arr.append(False)
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
[False, True, False, True, False, True, False] [2 4 6]
np.linspace(start=0,stop=5,num=10)
array([0. , 0.55555556, 1.11111111, 1.66666667, 2.22222222, 2.77777778, 3.33333333, 3.88888889, 4.44444444, 5. ])
To generate random numbers NumPy offers the random
module.
from numpy import random
# Generate a random integer from 0 to 100:
x = random.randint(100)
print(x)
16
# Generate a random float from 0 to 1:
x = random.rand()
print(x)
0.49101416191343095
# Generate Random 1D-Array:
## fixing random seed
random.seed(10)
## integer array from [10,50] of length 20
x = random.randint(low=10,high=50,size=20)
print(x)
[19 46 25 10 38 35 39 39 18 19 10 46 26 46 21 34 43 18 46 24]
# Generate Random 2D-Array:
## fixed random seed
random.seed(10)
## float 3x5 matrix from [0,1] of length 20
x = random.rand(3,5)
print(x)
[[0.77132064 0.02075195 0.63364823 0.74880388 0.49850701] [0.22479665 0.19806286 0.76053071 0.16911084 0.08833981] [0.68535982 0.95339335 0.00394827 0.51219226 0.81262096]]
The choice()
method allows you to generate a random value based on an array of values.
Arguments: choice(a, size=None, replace=True, p=None)
#random.seed(100)
x= random.randint(low=0,high=10,size=10) ## Population
print("Population:")
print(x)
s= random.choice(x,size=5,replace=False) ## Sample with SRSWOR
print("Sample:")
print(s)
Population: [5 4 7 8 8 2 6 2 8 8] Sample: [4 8 2 8 5]
# Generate random sample of size 1000 from an array [a,b,c,d,e] where probabilities (0.09,0.2,0.4,0.3,0.01) by with replacement.
## population
p= ["a","b","c","d","e"]
print("Population")
print(p)
## samples
s= random.choice(p,size=100,replace=True,p=(0.09,0.2,0.4,0.3,0.01))
print("\nSample")
print(s)
## sample in a matrix form
s2= random.choice(p,size=(4,25),replace=True,p=(0.09,0.2,0.4,0.3,0.01))
print("\nSample in a 4x20 matrix form")
print(s2)
Population ['a', 'b', 'c', 'd', 'e'] Sample ['c' 'c' 'c' 'a' 'b' 'c' 'b' 'd' 'c' 'a' 'a' 'd' 'b' 'b' 'b' 'd' 'b' 'd' 'd' 'b' 'c' 'd' 'c' 'c' 'b' 'a' 'd' 'd' 'd' 'd' 'd' 'b' 'b' 'a' 'e' 'd' 'd' 'd' 'd' 'b' 'c' 'd' 'c' 'd' 'd' 'c' 'd' 'c' 'c' 'c' 'd' 'd' 'b' 'd' 'c' 'd' 'd' 'd' 'a' 'b' 'c' 'c' 'c' 'a' 'c' 'c' 'd' 'b' 'c' 'd' 'c' 'b' 'c' 'b' 'c' 'd' 'c' 'c' 'c' 'd' 'd' 'a' 'b' 'b' 'b' 'c' 'c' 'c' 'c' 'c' 'c' 'c' 'd' 'c' 'c' 'c' 'b' 'a' 'd' 'c'] Sample in a 4x20 matrix form [['c' 'b' 'a' 'd' 'c' 'd' 'e' 'd' 'd' 'b' 'b' 'b' 'c' 'd' 'c' 'c' 'b' 'd' 'b' 'd' 'c' 'a' 'b' 'c' 'c'] ['c' 'a' 'd' 'b' 'c' 'a' 'd' 'c' 'c' 'd' 'c' 'b' 'c' 'b' 'c' 'd' 'd' 'd' 'd' 'd' 'a' 'a' 'c' 'b' 'c'] ['d' 'c' 'd' 'c' 'c' 'c' 'c' 'd' 'a' 'e' 'b' 'a' 'c' 'd' 'c' 'c' 'b' 'd' 'c' 'd' 'd' 'c' 'd' 'c' 'b'] ['c' 'd' 'b' 'd' 'c' 'c' 'b' 'b' 'd' 'b' 'd' 'c' 'b' 'd' 'd' 'b' 'c' 'c' 'b' 'd' 'd' 'a' 'c' 'c' 'b']]
random.shuffle()
is used to shuffle the elements of an array:
arr= np.array([1,2,3,4,5,6])
random.shuffle(arr)
print(arr)
[3 5 4 6 1 2]
Distribution | function |
---|---|
Normal | normal(loc=0.0, scale=1.0, size=None) |
Binomial | binomial(n, p, size=None) |
Poisson | poisson(lam=1.0, size=None) |
Uniform | uniform(low=0.0, high=1.0, size=None) |
Lognormal | lognormal(mean=0.0, sigma=1.0, size=None) |
Gamma | gamma(shape, scale=1.0, size=None) |
Beta | beta(a, b, size=None) |
Pareto | pareto(a, size=None) |
Multinomial | multinomial(n, pvals, size=None) |
MultivariateNormal | multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-8) |
Chi Square | chisquare(df, size=None) |
Logistic | logistic(loc=0.0, scale=1.0, size=None) |
## 1000 random samples from N(5,91)
x= random.normal(5,9,1000)
#print(x)
## ploting Distribution of Random sample drawn from Normal(5,81)
import matplotlib.pyplot as plt ## matplotlib is used to visialize plots
import seaborn as sns ## seabon package is used to visualize Distribution
sns.histplot(x,kde=True)
plt.show()
## 1000 random samples from Chi square distribution with df=1
x= random.chisquare(df=1,size=1000)
## plotting
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(x,kde=True)
plt.show()
ufuncs stands for "Universal Functions" and they are NumPy functions that operate on the ndarray object.
Arithmetic operations
Operator | Equivalent ufunc | Description |
---|---|---|
+ | np.add |
Addition (e.g., 1 + 1 = 2) |
np.sum |
Summation | |
- | np.subtract |
Subtraction (e.g., 3 - 2 = 1) |
- | np.negative |
Unary negation (e.g., -2) |
* | np.multiply |
Multiplication (e.g., 2 * 3 = 6) |
/ | np.divide |
Division (e.g., 3 / 2 = 1.5) |
// | np.floor_divide |
Floor division (e.g., 3 // 2 = 1) |
** | np.power |
Exponentiation (e.g., 2 ** 3 = 8) |
% | np.mod |
Modulus/remainder (e.g., 9 % 4 = 1) |
Absolute value
np.absolute
--> Absolute Value(e.g., np.absolute(-2)=2)Trigonometric functions
np.sin()
--> $sin(\theta)$np.cos()
--> $cos(\theta)$np.tan()
--> $tan(\theta)$Exponents and logarithms
np.exp()
--> $e^x$np.exp2()
--> $2^x$np.power(3,x)
--> $3^x$np.log()
--> $ln(x)$np.log2()
--> $ln_2(x)$np.log10()
--> $ln_{10}(x)$Aggregation functions
Function Name | NaN-safe Version | Description |
---|---|---|
np.sum |
np.nansum |
Compute sum of elements |
np.prod |
np.nanprod |
Compute product of elements |
np.mean |
np.nanmean |
Compute median of elements |
np.std |
np.nanstd |
Compute standard deviation |
np.var |
np.nanvar |
Compute variance |
np.min |
np.nanmin |
Find minimum value |
np.max |
np.nanmax |
Find maximum value |
np.argmin |
np.nanargmin |
Find index of minimum value |
np.argmax |
np.nanargmax |
Find index of maximum value |
np.median |
np.nanmedian |
Compute median of elements |
np.percentile |
np.nanpercentile |
Compute rank-based statistics of elements |
np.any |
N/A | Evaluate whether any elements are true |
np.all |
N/A | Evaluate whether all elements are true |
Comparison operators
Operator | Equivalent ufunc |
---|---|
== | np.equal |
!= | np.not_equal |
< | np.less |
<= | np.less_equal |
> | np.greater |
>= | np.greater_equal |
Boolean operators
Operator | Equivalent ufunc | |
---|---|---|
& | np.bitwise_and |
|
$ | $ | np.bitwise_or |
^ | np.bitwise_xor |
|
~ | np.bitwise_not |
Counting entries
np.count_nonzero
--> To count the number of True entries in a Boolean array(e.g.,how many values less than 6?: np.count_nonzero(x < 6)
)np.any()
--> To quickly checking whether any values are true(e.g.,are there any values greater than 8? : np.any(x > 8)
)