Scientific Python 2: Elegant Arrays
14 May 2019Unlike programming in C++, where loops are extremely efficient, the loops in Python are notoriously slow. Usually we need a lot of workarounds to eliminate loops in Python to achieve high performance. In my opinion, numpy can make loop unnecessary in at least 80 percent of scenarios. In this post, I demonstrate some common scenarios where proper numpy techniques can reduce the amount of code and improve performance. I would recommend that every time someone finds it necessary to write loops in Python for scientific computation, he/she should refer to the documentation of numpy/scipy or stackoverflow to see if there is built-in functions to help eliminate loops. Surprisingly, such functions usually exist (because we usually underestimate the number of APIs in numpy/scipy).
Make use of numpy APIs
numpy provides a number of useful APIs for general scientific computation. Here I list some of the functions that I use most often. This is just a glimpse of the enormous amount of all APIs.
np.mean
,np.var
np.searchsorted
: binary searchnp.clip
,np.pad
np.intersect1d
,np.union1d
,np.setdiff1d
Assigning values for multi-dimensional arrays
Suppose we have a 3-dimensional array arr3
(essentially an image), which is defined by:
arr3 = np.zeros((100, 100, 3), np.uint8)
If we want to assign all elements with a particular value, we can do the following:
arr3.fill(255)
If we want to assign each pixel with a particular RGB tuple, we can do the following:
arr3[:, :, :] = [125, 200, 50]
The test case for following scenarios
Assume the test case is given as below:
import numpy as np
# makes it possible to reproduce results
np.random.seed(0x1a2b3c4d)
# random array of size 100
arr = np.random.randint(-2, 10, 100)
Suppose there is a predicate Pred
for each element, which is given by:
def Pred(x):
return x < 0
Looking for elements that satisfy the predicate
We would like to apply this predicate to all elements and output those elements that satisfy the predicate. The most naive way of doing it is:
indices = []
# note that the len(arr) and arr.size can be different
# for multi-dimensional arrays
for index in range(arr.size):
if(Pred(arr[index])):
indices.append(index)
We can eliminate this loop by writing:
indices = np.where(Pred(arr))[0]
It is a simple one-liner that achieves the same result at much higher speed. The np.where
function returns the indices of True
elements in an array. When it is given a \(n\) dimensional array, its output would be \(n\) arrays, each containing the index of a True
element of that dimension. Therefore, the columns of the output would be the coordinate of a True
element. That is why the subscript [0]
is needed here. In fact, if the predicate takes such a simple form, we can rewrite the code into:
indices = np.where(arr < 0)[0]
Replacing elements that satisfy the predicate
Now we would like to replace all elements that satisfy the predicate with a certain value (for example, 100). I will not provide the naive code here. The elegant way of doing it is:
arr[Pred(arr)] = 100
Replacing elements according to a “dictionary”
Assume that there is an array of nonnegative integers, where each element has a corresponding item in a “dictionary”. An example is shown as below:
arr = np.random.randint(0, 10, 20)
dictionary = ['zero', 'one', 'two', 'three',
'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
We need to replace the elements of arr
by their corresponding elements in dictionary
. It can be done in the following way:
dictionary = np.asarray(dictionary, dtype = object)
result = dictionary[arr]
A possible content of arr
and its corresponding result
is shown as below.
>>> arr
[4 7 2 7 3 6 6 8 7 6 4 6 3 3 9 6 4 0 2 2]
>> result
['four' 'seven' 'two' 'seven' 'three' 'six' 'six' 'eight' 'seven' 'six'
'four' 'six' 'three' 'three' 'nine' 'six' 'four' 'zero' 'two' 'two']