Numpy -Slicing and Dicing: A Beginner’s Guide

Chrisx10 Last Updated : 15 Jul, 2021

8 min read

This article was published as a part of the Data Science Blogathon

Introduction :

Numpy is a package for scientific calculation in Python. It’s a ndarray under the hood and provides support for various mathematical operations such as basic linear algebra, basic linear statistics. Sklearn, pandas packages are built on top of numpy, and the transformation and manipulation operations work on the base numpy ndarrays. Numpy is the foundation on which a sizable chunk of data science stack is built on python. Below are a few use cases and instances to support my claim.

Table of Content :

Demonstration of numpy with an example.
Numpy is Faster than a python list.
Numpy hands on.
Numpy Slicing Visualization.
Conclusion.

Demonstration of numpy with an example :

Let’s check out the fit method of LinearRegression.

fit(X, y, sample_weight=None)

Fit linear model

X : {array-like, sparse matrix} of shape (n_samples, n_features) Training data

y : array-like of shape (n_samples,) or (n_samples, n_targets) Target values.

Both the train data as well as the test data are arrays.

Below is a basic error which beginners face when with linear regression, when a 1D array train data is passed to a LinearRegression.fit() object, it will throw an error. The reason, X expected a 2D array. This demonstrates how numpy is very resourceful. And with a little numpy knowledge this error can be fixed.

Code:

import pandas as pd
import matplotlib as pt
#import data set
x = [ 7.   8.4 10.1  6.5  6.9  7.9  5.8  7.4  9.3 10.3  7.3  8.1]
y = [ 7.   8.4 10.1  6.5  6.9  7.9  5.8  7.4  9.3 10.3  7.3  8.1]
#Spliting the dataset into Training set and Test Set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=0)
#linnear Regression
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
y_pred = regressor.predict(x_test)

Error:

ValueError: Expected 2D array, got 1D array instead:
array=[ 7.   8.4 10.1  6.5  6.9  7.9  5.8  7.4  9.3 10.3  7.3  8.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Fix:

x= np.array(x).reshape(-1, 1)

y= np.array(y).reshape(-1, 1)

Equipped with this basic idea about the resourcefulness of numpy, let’s get started.

In this article, we will go through a few of the important inbuilt numpy methods and understand their use case. No code is flawless, keeping this in mind and helping the reader to debug these smaller errors, errors logs have been included as well.

Numpy is Faster than a python list :

Let find the lowest value using the inbuilt python list method and numpy method.

from random import random 
print("Min in python using list")
c = [random() for z in range(100000)]
%timeit min(c)
print("nMin in Numpy")
numpy_array = np.array(c)
%timeit numpy_array.min()  # () is a method

Output :
Min in python using list
4.7 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Min in Numpy
81.2 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This is a novice example but it clearly demonstrates how numpy executes methods faster than a list. The list timing is 4.7 Microseconds, whereas for numpy it’s 81.2 Milliseconds.

Numpy Hands-On :

As always don’t forget to import numpy.

import numpy as np

1. Create an array of zeroes:

array_ = np.zeros(10)  
Output:  array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

2. Reshape the (10,) array to (5,2) :

Numpy reshapes the 10 rows into 5 rows and 2 columns. Reshape is commonly used Pandas series as well.

array_.reshape(5,2)
Outout: array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

3. Create an array for ones:

np.ones(10)
Output :  array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

4. Initialize a numpy array:

Unlike np.zeroes or np.ones, np.empty initializes an empty array with random values

np.empty(10)
Output : array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

5. Create a numpy array with lower and upper bounds and length:

linspaces takes 3 arguments and then creates an array with inclusive start and stop parameters. In this case, 1 and 3 are included in the final array.

np.linspace(1,3,9)  ## start ## stop ## total length
Output : array([1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75, 3.  ])

6. Create numpy array from a python list:

np.array([10,20])
output : array([10, 20])

7. Slicing arrays :

Return the first element of an array:

my_list = [1,2,3,4,5,6,6,76,7,7,88,]
my_array = np.array(my_list)
output : my_array[0:1]

Return the last element of an array:

my_array[-1]
Ouput : 88

Slicing array’s based on indices: as indices start from 0, 0 corresponds to 23, 1 corresponds to 343, and 2 corresponds to 5. So the output is [5, 6]

test_list = [23,343,5,6,45,22,2232,3]
test_array = (test_list)
test_array[2:4]
Ouput: [5, 6]

Flatten an N-dimensional array using the ravel() method

gene0 = [100,200,0,400]
gene1 = [50,0,0,100]
gene2 = [350,100,50,200]
expression_gene = [gene0, gene1, gene2]
a = np.array(expression_gene)
display(a)
Output : 
array([[100, 200,   0, 400],
       [ 50,   0,   0, 100],
       [350, 100,  50, 200]])

a.ravel() ## () method output:

array([100, 200, 0, 400, 50, 0, 0, 100, 350, 100, 50, 200])

Slicing N-dimensional matrices

a[1::3]  ## start ## stop ## step
Output:
array([[ 50,   0,   0, 100]])

a[::2, ::2] ## second and fourth will be sliced
## a[rows, columns]
Output:
array([[100,   0],
       [350,  50]])

8. Array operations:

Let’s consider 2 array a and b:

a = np.array([1,2,3,4])
b = np.array([5,6,7])

a. Adding 2 arrays elements as a+b will result in an error. This is because, for matrix addition, we need matrices of similar sizes.

ValueError                                Traceback (most recent call last)
 in 
----> 1 a+b
ValueError: operands could not be broadcast together with shapes (4,) (3,)

Fixing the previous error

b = np.array([5,6,7,8])
a+b
output : array([ 6,  8, 10, 12])

b. Array Multiplication:

a*b

array([ 5, 12, 21, 32])

b. Dot Product
a@b  or a.dot(b) or np.matmul(a,b)
Output : 70

d. Adding an integer to a matrix:

This is called broadcasting, the simplest broadcasting example occurs when an array and a scalar value are combined in an operation. Broadcasting is discussed later in the article.

a+10
Ouput : array([11, 12, 13, 14])

e. Multiply an integer to a matrix (Broadcasting):

a*10
Output : array([10, 20, 30, 40])

f. Sum of an array:

test_list = np.array([23,343,5,6,45,22,2232,3]
test_list.sum()
Output : 2679

9. Sorting arrays using np.sort:

to_sort = np.array([1,4,6,8,342,45,6,None,9,9,967,])
print("Array to be sorted")
display(to_sort)
Output :
Array to be sorted
array([1, 4, 6, 8, 342, 45, 6, None, 9, 9, 967], dtype=object)

np.sort(to_sort)
Output :
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
----> 1 np.sort(to_sort)
 in sort(*args, **kwargs)
~Anaconda3libsite-packagesnumpycorefromnumeric.py in sort(a, axis, kind, order)
    989     else:
    990         a = asanyarray(a).copy(order="K")
--> 991     a.sort(axis=axis, kind=kind, order=order)
    992     return a
    993 
TypeError: '<' not supported between instances of 'NoneType' and 'int'

We receive a Type error because a None type cannot be sorted, it cannot be compared with an integer value or float. So it’s always important to know whether a None type exists while comparing or using > < = operations.

Fixing the error:

to_sort = np.array([1,4,6,8,342,45,6,None,9,9,967,])
print("Array to be sorted")
display(to_sort)
print("nSorted array ")
np.sort(np.where(to_sort == None, 0, to_sort))

Ouput:

Array to be sorted

array([1, 4, 6, 8, 342, 45, 6, None, 9, 9, 967], dtype=object)

Sorted array

array([0, 1, 4, 6, 6, 8, 9, 9, 45, 342, 967], dtype=object)

10. Numpy Broadcasting

In numpy, broadcasting describes how numpy treats an array of different lengths. Above, in the addition operation, due to different array lengths, ValueError popped up. In another instance when adding an integer 5 to say an array, there were no errors. These rules of numpy are called broadcasting. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Let’s consider adding 4 to an array of A = [0,1, 2], numpy considers A as a larger array and adds 4 to all the elements.

np.arange(3)
Ouput : array([0, 1, 2])
print("For a 3*1 array adding 4 to all 3  - - ",np.arange(3) + 4 )
Output : For a 3*1 array adding 4 to all 3  - -  [4 5 6]

Let consider another example of adding (3*3) and (3*1). In this case (3*3) is the larger matrix and the smaller matric is broadcasted through the larger matrix.

np.arange(3)
Ouput : array([0, 1, 2])

np.ones((3,3))
Output : 
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
print("Broadcasting through 3*3 and 3*1")
np.ones((3,3)) + np.arange(3)
Output :
Broadcasting through 3*3 and 3*1
array([[1., 2., 3.],[1., 2., 3.], [1., 2., 3.]])

11. Masking

Real data is messy and there are instances when columns or row values need to be neglected. Maybe a sensor error has less to wrong data or a wrong enter at POS data entry. Masks are basically True, False flags for an array.

In the below code we filter out the first two elements and keep the third element only.

mask = [False,False,True]
x = array([[1,2],[2,3],[3,4]])
x[np.array(mask)]
Output:
array([[3, 4]])

Use the above code and validate its output. Comment below if there was an error or the mask executed as intended.

## PRACTICE
print("Now lets check out slicing and indexing ")
display(np.arange(9).reshape(3,3))
test_array = np.arange(9).reshape(3,3)
print("nMask")
mask = [1,2,0]
display(test_array[mask, :2])
print("The columns selected is 0 and 1 nThe rows are now shaped according to the mask 2 as 1, 3 and 2 and 1 as last"

12. Basic Numpy Attributes:

Use the below code to understand the basic attributes of numpy. Comment the same below in the comment section.

def print_info(a):
    '''
    prints out info of an array
    '''
    display(a)
    print("nn")
    print("# of elements {}".format(a.size))
    print("# of dimentions {}".format(a.ndim))
    print("Shape of the array {}".format(a.shape))
    print("Data type of the array {}".format(a.dtype))
    print("Strides {}".format(a.strides))
    print("Flags {}".format(a.flags)) ## gives how data is stored in memory
    print("Itemsize {}".format(a.itemsize))
    print("memory location {}".format(a.data))

Ouput:
array([[100, 200,   0, 400],
       [ 50,   0,   0, 100],
       [350, 100,  50, 200]])
# of elements 12
# of dimentions 2
Shape of the array (3, 4)
Data type of the array int32
Strides (16, 4)
Flags   C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
Itemsize 4
memory location

13. Numpy Argsort()

Argsort returns an array of sorted indexes along a given axis. Axis 1, 0 are row and column-wise operations respectively.

## lets create another random array 
new_array = np.array(np.random.random([10,3]))
display(new_array)
print("nValue of the new array {}".format(new_array[0]))
print("nLocation of the lowest value {}".format(np.argmin(new_array)))
print("nIndex of lowest value for each row {}".format(np.argmin(new_array, axis = 1)))
print("nIndex of the lowest value of each column {}".format(np.argmin(new_array, axis = 0)))

Output:
array([[0.95234456, 0.92563045, 0.81733628],
       [0.22210529, 0.93374309, 0.11194205],
       [0.05755499, 0.53818092, 0.21981649],
       [0.51079701, 0.80416857, 0.48691974],
       [0.58506116, 0.9411828 , 0.80336708],
       [0.69882165, 0.84273752, 0.40003603],
       [0.33068863, 0.51168931, 0.31263486],
       [0.81036761, 0.09136795, 0.6150059 ],
       [0.10078944, 0.39371561, 0.12124675],
       [0.29131749, 0.68948136, 0.73810813]])
Value of the new array [0.95234456 0.92563045 0.81733628]
Location of the lowest value 6
Index of lowest value for each row [2 2 0 2 0 2 2 1 0 0]
Index of the lowest value of each column [2 7 1]

Numpy Slicing Visualization:

The idea here is to show the power of numpy visually. As images are array, numpy is be used for image transformation. Let’s numpy using this beautiful image of Venice.

from skimage import io
photo = io.imread("venice.jpg")
print("type of image {} . Shape of image {}".format(type(photo), photo.shape) )
import matplotlib.pyplot as plt
plt.imshow(photo)
plt.show()

Reverse the image:

plt.imshow(photo[::-1]) ## the row are untouched
## the columns are reversed

Slice the image using indexes:

plt.imshow(photo[0:100, 100:200])

Masked image:

photo_masked = np.where(photo>100, 250, 0)
plt.imshow(photo_masked)

Conclusion:

I hope that this tutorial was helpful to rekindle the love for NumPy and will lead to more numpy based functions being implemented. Feel free to comment on NumPy tricks and tips below.

Here is my Linkedin profile in case you want to connect with me. I’ll be happy to be connected with you. Too lazy to type/copy the code? Clone the repo(here).

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Chrisx10

Data scientist. Extensively using data mining, data processing algorithms, visualization, statistics, and predictive modeling to solve challenging business problems and generate insights. My responsibilities as a Data Scientist include but are not limited to developing analytical models, data cleaning, explorations, feature engineering, feature selection, modeling, building prototype, documentation of an algorithm, and insights for projects such as pricing analytics for a craft retailer, promotion analytics for a fortune 500 wholesale club, inventory management/demand forecasting for a jewelry retailer and collaborating with on-site teams to deliver highly accurate results on time.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

Numpy -Slicing and Dicing: A Beginner’s Guide

Introduction :

Table of Content :

Demonstration of numpy with an example :

Numpy is Faster than a python list :

Numpy Hands-On :

1. Create an array of zeroes:

2. Reshape the (10,) array to (5,2) :

3. Create an array for ones:

4. Initialize a numpy array:

5. Create a numpy array with lower and upper bounds and length:

6. Create numpy array from a python list:

7. Slicing arrays :

8. Array operations:

9. Sorting arrays using np.sort:

10. Numpy Broadcasting

11. Masking

12. Basic Numpy Attributes:

13. Numpy Argsort()

Numpy Slicing Visualization:

Reverse the image:

Slice the image using indexes:

Masked image:

Conclusion:

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm