How To Create Ndarray Python
The 10 Best Ways to Create NumPy Arrays
NumPy is a library that every data scientist who uses Python should be familiar with. It is the backbone on which the modern Python data science stack built.
The library is often picked up in pieces along your learning journey. Eventually, it makes sense to learn the key parts of the library systematically. As a first step, you need to know how to quickly create NumPy arrays to meet your needs. In this article I'll show you the functions and methods to make NumPy arrays in a snap. ๐
Note: I ori g inally published this article for Deepnote here. You can run fork, run, and extend it there. ๐
NumPy arrays objects are technically of the class numpy.ndarray
. I'll refer to them as arrays below.
Without further ado, here are the essential ways to make a NumPy array:
Convert a list
- Convert a list with
array
.
Create and fill a NumPy array with…
- equally spaced data with
arange
,linspace
, orlogspace
. - the same value with
zeros
,ones
, orfull
. - the same value that matches the shape and dtype of a pre-existing array with
zeros_like
,ones_like
, orfull_like
. - random floats drawn from the standard normal distribution with
random.randn
. - random floats drawn from a uniform distribution with
random.rand
.
Convert from a…
- pandas DataFrame with
df.to_numpy
. - TensorFlow TensorProto with
tf.make_ndarray(existing_proto_tensor)
. - PyTorch Tensor with
existing_tensor.numpy
. - SciPy sparse matrix with
existing_sparse_matrix.toarray
.
Let's see these in action! ๐
First, import the libraries we'll need under their usual aliases. The code in this article is meant to be run in a Jupyter notebook. The expected output follows ---
.
import numpy as np
import scipy
import pandas as pd
import tensorflow as tf
import torch
import sklearn
from sklearn.preprocessing import OneHotEncoder
If you don't have the libraries you need installed, run the code below and then run the imports again.
!pip install -U numpy scipy pandas tensorflow torch scikit-learn
Let's check our package versions.
print(f'NumPy: {np.__version__}')
print(f'SciPy {scipy.__version__}')
print(f'pandas: {pd.__version__}')
print(f'TensorFlow {tf.__version__}')
print(f'PyTorch: {torch.__version__}')
print(f'scikit-learn: {sklearn.__version__}') --- NumPy: 1.19.2
SciPy 1.5.3
pandas: 1.1.3
TensorFlow 2.3.1
PyTorch: 1.6.0
scikit-learn: 0.23.2
Let's make some arrays!
Convert a list with array
You can convert a list into a NumPy array with the array
constructor. First, let's make a list of tree heights that we can use in our example. ๐ฒ
tree_heights = [55, 60, 62, 44]
Now let's convert our list into a NumPy array so we can use all of NumPy's ndarray method goodness with it.
np.array(tree_heights) --- array([55, 60, 62, 44])
Boom! ๐งจ
Passing array
a list of lists will make a two-dimensional NumPy array.
np.array(list(enumerate(tree_heights))) array([[ 0, 55],
[ 1, 60],
[ 2, 62],
[ 3, 44]])
Cool! ๐
Next let's see ways to create and pre-fill arrays with a range of values.
Equally spaced data
NumPy has several helpful functions for creating arrays filled with values spaced at intervals.
arange
arange
is the equivalent of vanilla Python's range
, but for NumPy arrays.
np.arange(7) --- array([0, 1, 2, 3, 4, 5, 6])
Pass one integer and you get an 1-dimensional array of integers starting at 0 and up to, but not including the integer passed.
Pass two integers arguments and you get the starting value through the final value, where the final value is exclusive. Here's an example with the keywords specified:
np.arange(start=1, stop=7) --- array([1, 2, 3, 4, 5, 6])
Pass step
— the third positional argument — to skip values. The default step is 1.
np.arange(start=1, stop=7, step=2) --- array([1, 3, 5])
Careful with the spelling! Array has two rs, but think of arange as a range. ⚠️
linspace
When you want to create an array of evenly spaced decimal values, use linspace
.
linspace
splits a pie into evenly-sized pieces. ๐ฅง
You need to pass linspace
the start and stop values.
np.linspace(10, 50) --- array([10. , 10.81632653, 11.63265306, 12.44897959, 13.26530612,
14.08163265, 14.89795918, 15.71428571, 16.53061224, 17.34693878,
18.16326531, 18.97959184, 19.79591837, 20.6122449 , 21.42857143,
22.24489796, 23.06122449, 23.87755102, 24.69387755, 25.51020408,
26.32653061, 27.14285714, 27.95918367, 28.7755102 , 29.59183673,
30.40816327, 31.2244898 , 32.04081633, 32.85714286, 33.67346939,
34.48979592, 35.30612245, 36.12244898, 36.93877551, 37.75510204,
38.57142857, 39.3877551 , 40.20408163, 41.02040816, 41.83673469,
42.65306122, 43.46938776, 44.28571429, 45.10204082, 45.91836735,
46.73469388, 47.55102041, 48.36734694, 49.18367347, 50. ])
By default you get 50 slices of pie. But you can change that by passing a third argument.
np.linspace(10, 50, 5) --- array([10., 20., 30., 40., 50.])
logspace
logspace
returns evenly spaced numbers on a logarithmic scale with a base of 10 by default.
np.logspace(1, 2) array([ 10. , 10.48113134, 10.98541142, 11.51395399,
12.06792641, 12.64855217, 13.25711366, 13.89495494,
14.56348478, 15.26417967, 15.9985872 , 16.76832937,
17.57510625, 18.42069969, 19.30697729, 20.23589648,
21.20950888, 22.22996483, 23.29951811, 24.42053095,
25.59547923, 26.82695795, 28.11768698, 29.47051703,
30.88843596, 32.37457543, 33.93221772, 35.56480306,
37.2759372 , 39.06939937, 40.94915062, 42.9193426 ,
44.98432669, 47.14866363, 49.41713361, 51.79474679,
54.28675439, 56.89866029, 59.63623317, 62.50551925,
65.51285569, 68.6648845 , 71.9685673 , 75.43120063,
79.06043211, 82.86427729, 86.85113738, 91.0298178 ,
95.40954763, 100. ])
Passing logspace
1 and 2 for the start and stop values returns an array of 50 values ranging between 10 and 100 on the base 10 scale.
Here's how to create an array of 10 values, equally spaced between 0 and 3 on the base 10 scale.
np.logspace(start=0, stop=3, num=10) array([ 1. , 2.15443469, 4.64158883, 10. ,
21.5443469 , 46.41588834, 100. , 215.443469 ,
464.15888336, 1000. ])
logspace
can be handy when hyperparameter tuning in scikit-learn. ๐
Fill with a constant value
You'll often want an array filled with zeros, ones, or some other value.
zeros
Create and fill an array with zeros by using the zeros
function.
np.zeros(5) --- array([0., 0., 0., 0., 0.])
No e after the o! ๐
Note that yet get back floats for the dtype.
Pass a tuple with the number of rows followed by the number of columns to make a two dimensional array.
six_zeros = np.zeros(shape=(2, 3))
six_zeros --- array([[0., 0., 0.],
[0., 0., 0.]])
This time we used the keyword argument for demonstration purposes. We saved the array as a variable because we'll use it in a minute.
ones
ones
behaves similarly to zeros
.
np.ones((3, 5)) --- array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
full
Fill an array with any value by using full
.
np.full(shape=6, fill_value='Winner, winner, chicken dinner!') --- array(['Winner, winner, chicken dinner!',
'Winner, winner, chicken dinner!',
'Winner, winner, chicken dinner!',
'Winner, winner, chicken dinner!',
'Winner, winner, chicken dinner!',
'Winner, winner, chicken dinner!'], dtype='<U31')
Like zeros
and ones
, pass a tuple as the first argument if you want a multi-dimensional array.
np.full((3, 2), "I'm in a 2d array!") --- array([["I'm in a 2d array!", "I'm in a 2d array!"],
["I'm in a 2d array!", "I'm in a 2d array!"],
["I'm in a 2d array!", "I'm in a 2d array!"]], dtype='<U18')
Now let's see how to fill an array with a value AND match the shape and dtype of another array.
zeros_like
When you want to create an array that matches the shape of another array, append _like
to zeros
, ones
, or full
. Then you're in business! ๐ธ
Let's see how to use zeros_like
with our size_zeros array we saved earlier - it was a 2x3 array.
np.zeros_like(six_zeros) --- array([[0., 0., 0.],
[0., 0., 0.]])
ones_like
ones_like
behaves similarly.
np.ones_like(six_zeros) --- array([[1., 1., 1.],
[1., 1., 1.]])
full_like
full_like
acts how you'd expect. ๐
np.full_like(six_zeros, 22) --- array([[22., 22., 22.],
[22., 22., 22.]])
Note that you get back an array of the same shape AND dtype as the array or list you pass in.
np.full_like([1, 2, 65, 3], fill_value=22) --- array([22, 22, 22, 22])
Random data
It's often useful to fill a NumPy array with randomly distributed data. Let's see how to do that.
random.randn
Create an array filled with random floats drawn from the standard normal distribution with random.randn
.
np.random.randn(10) --- array([ 0.48141843, 0.33463071, 0.37107953, -1.16044437, -1.15956598,
1.24637982, -0.21480563, 1.61006107, -0.88036176, -0.52745888])
Strangely, unlike the functions above, here you don't pass a tuple to indicate the shape, you just pass the rows as the first argument and the columns as the second argument. ⚠️
np.random.randn(5, 2) --- array([[ 1.13164593, -0.35241179],
[-1.81246707, 0.76773381],
[ 1.13485416, -0.50449109],
[-0.32666705, -0.90184535],
[ 1.38867755, -1.08018813]])
If you want to pass the tuple to indicate the shape, you have to use random.standard_normal
. It does the same thing as random.rand
. This is a bit of a historical artifact that doesn't really keep with the Zen of Python's one obvious way to do it, but it is what it is. I generally just use randn
and then get error messages. ๐
np.random.standard_normal((4, 5)) --- array([[ 0.22812825, 0.60446763, -0.42118075, 1.79680568, -0.33793378],
[ 0.07964594, -0.39447251, 0.60948288, -0.03175253, -0.30030963],
[-0.37746859, -0.33789088, -2.30195465, 0.19532716, -1.74321666],
[-0.40882198, 0.08589203, -1.29910817, 0.64159252, -2.13985143]])
If you want reproducible results, set the random seed like this:
np.random.seed(123)
Note, NumPy now has a new, more complicated way to make random numbers that it officially recommends in the docs. I expect most folks will stick with the random functions I use here because they are well known, require less code, and work fine for most situations.
rand
Create an array filled with random floats drawn from a uniform distribution between 0 and 1 with random.rand
.
np.random.rand(10) --- array([0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897,
0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752])
random.rand
is like random.randn
in the sense that it takes the dimensions directly as arguments, and not as a tuple.
np.random.rand(2, 3) --- array([[0.18249173, 0.17545176, 0.53155137],
[0.53182759, 0.63440096, 0.84943179]])
Alright, you've seen how to use NumPy to make arrays. These aren't the only functions for making arrays, but they should cover over 95% of use cases. ๐
Convert from another library
Often you'll be using another library and have a data structure that you want to convert into a NumPy array for processing. Let's see how to do that with pandas.
pandas
pandas is a very popular library for data manipulation. It extends NumPy. Let's make a pandas DataFrame of scores and turn it into a NumPy array. ๐ผ
df_scores = pd.DataFrame(dict(age=[22, 44, 67], score=[5, 6, 8]))
df_scores
The to_numpy
method is the officially recommended way to convert a pandas DataFrame or Series into a NumPy array. It was introduced in version 0.24.0.
df_scores.to_numpy() --- array([[22, 5],
[44, 6],
[67, 8]])
The values
attribute works, too, but is no longer recommended.
Now let's check out TensorFlow.
TensorFlow
TensorFlow is a very popular deep learning framework. Converting a TensorFlow tensor to a NumPy array takes a few steps.
First let's make a TensorFlow tensor object.
tf_tensor = tf.constant([[3,15,2],[55,5,6]])
tf_tensor --- <tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[ 3, 15, 2],
[55, 5, 6]], dtype=int32)>
Then let's make it into a proto_tensor.
proto_tensor = tf.make_tensor_proto(tf_tensor)
proto_tensor --- dtype: DT_INT32
tensor_shape {
dim {
size: 2
}
dim {
size: 3
}
}
tensor_content: "\003\000\000\000\017\000\000\000\002\000\000\0007\000\000\000\005\000\000\000\006\000\000\000"
Now we can convert it into a NumPy array.
tf.make_ndarray(proto_tensor) --- array([[ 3, 15, 2],
[55, 5, 6]], dtype=int32)
This kind of fun is why PyTorch is becoming more and more popular. ๐
See my article on PyTorch vs. TensorFlow popularity here.
PyTorch
PyTorch is the other large deep learning framework. It's a bit more Pythonic than TensorFlow. ๐
Let's make a PyTorch tensor.
my_pytorch_tensor = torch.ones(5)
my_pytorch_tensor --- tensor([1., 1., 1., 1., 1.])
And let's convert it into a NumPy tensor.
my_pytorch_tensor.numpy() --- array([1., 1., 1., 1., 1.], dtype=float32)
That was refreshingly straightforward. ๐
PyTorch is closely interoperable with NumPy. Note that after converting between Torch tensors and NumPy arrays they "will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other." — the docs. ⚠️
SciPy
SciPy sparse matrices are very efficient for storing data that is filled with mostly 0s (or some other single value). For example, after one-hot encoding an array in scikit-learn, the resulting data structure is a ScPy sparse matrix.
Sometimes you'll want to convert a SciPy sparse matrix into a NumPy array so you can inspect it or do a certain operation on it.
We imported OneHotEncoder
from scikit-learn earlier, so let's use that now to create a sparse matrix.
ohe = OneHotEncoder()
dummified_scores = ohe.fit_transform(df_scores)
dummified_scores --- <3x6 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
The low memory usage is great, but let's say I want to see the resulting data. I could pass the argument sparse=False
at instantiation to get a NumPy array back. Or I can just use toarray
.
dummified_scores.toarray() --- array([[1., 0., 0., 1., 0., 0.],
[0., 1., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 1.]])
Note that pandas uses to_numpy
while SciPy uses toarray
. This frequently trips me up. ⚠️
Summary
You've seen how to create NumPy arrays filled with the data you want. You've also seen how to convert other Python data structures into NumPy arrays. Now you're ready to manipulate arrays in NumPy! Awesome! ๐
I hope found this tour of this creating NumPy arrays useful. If you did, please share it on your favorite social media so other folks can find it, too. ๐
NumPy underlies much of the open source scientific computing revolution. Special thanks to Travis Oliphant and the other developers of NumPy who helped make this valuable tool what it is today. ๐
I originally published this article for Deepnote here. You can run the article as a notebook there. ๐
I write about data science, Python, SQL, and other tech topics. If any of that's of interest to you, sign up for my mailing list of awesome data science resources and read more to help you grow your skills here. ๐
Happy array creating! ๐
How To Create Ndarray Python
Source: https://towardsdatascience.com/the-ten-best-ways-to-create-numpy-arrays-8b1029a972a7
Posted by: williamsalannow.blogspot.com
0 Response to "How To Create Ndarray Python"
Post a Comment