How To Create Ndarray Python

The 10 Best Ways to Create NumPy Arrays

NumPy is a library that every data scientist who uses Python should be familiar with. It is the backbone on which the modern Python data science stack built.

The library is often picked up in pieces along your learning journey. Eventually, it makes sense to learn the key parts of the library systematically. As a first step, you need to know how to quickly create NumPy arrays to meet your needs. In this article I'll show you the functions and methods to make NumPy arrays in a snap. 😀

Note: I ori g inally published this article for Deepnote here. You can run fork, run, and extend it there. 🚀

NumPy arrays objects are technically of the class numpy.ndarray. I'll refer to them as arrays below.

Without further ado, here are the essential ways to make a NumPy array:

Convert a list

Convert a list with array.

Create and fill a NumPy array with…

equally spaced data with arange, linspace, or logspace.
the same value with zeros, ones, or full.
the same value that matches the shape and dtype of a pre-existing array with zeros_like, ones_like, or full_like.
random floats drawn from the standard normal distribution with random.randn.
random floats drawn from a uniform distribution with random.rand.

Convert from a…

pandas DataFrame with df.to_numpy.
TensorFlow TensorProto with tf.make_ndarray(existing_proto_tensor).
PyTorch Tensor with existing_tensor.numpy.
SciPy sparse matrix with existing_sparse_matrix.toarray.

Let's see these in action! 🚀

First, import the libraries we'll need under their usual aliases. The code in this article is meant to be run in a Jupyter notebook. The expected output follows ---.

          import numpy as np
import scipy            
import pandas as pd
import tensorflow as tf
import torch
import sklearn
from sklearn.preprocessing import OneHotEncoder

If you don't have the libraries you need installed, run the code below and then run the imports again.

          !pip install -U numpy scipy pandas tensorflow torch scikit-learn

Let's check our package versions.

          print(f'NumPy: {np.__version__}')
print(f'SciPy {scipy.__version__}')
print(f'pandas: {pd.__version__}')
print(f'TensorFlow {tf.__version__}')
print(f'PyTorch: {torch.__version__}')
print(f'scikit-learn: {sklearn.__version__}')          ---          NumPy: 1.19.2
SciPy 1.5.3
pandas: 1.1.3
TensorFlow 2.3.1
PyTorch: 1.6.0
scikit-learn: 0.23.2

Let's make some arrays!

Convert a list with `array`

You can convert a list into a NumPy array with the array constructor. First, let's make a list of tree heights that we can use in our example. 🌲

          tree_heights = [55, 60, 62, 44]

Now let's convert our list into a NumPy array so we can use all of NumPy's ndarray method goodness with it.

          np.array(tree_heights)          ---          array([55, 60, 62, 44])

Boom! 🧨

Passing array a list of lists will make a two-dimensional NumPy array.

          np.array(list(enumerate(tree_heights)))          array([[ 0, 55],
            [ 1, 60],
            [ 2, 62],
            [ 3, 44]])

Cool! 😎

Next let's see ways to create and pre-fill arrays with a range of values.

Equally spaced data

NumPy has several helpful functions for creating arrays filled with values spaced at intervals.

arange

teton mountain range — A range. Source: pixabay.com

arange is the equivalent of vanilla Python's range, but for NumPy arrays.

          np.arange(7)          ---          array([0, 1, 2, 3, 4, 5, 6])

Pass one integer and you get an 1-dimensional array of integers starting at 0 and up to, but not including the integer passed.

Pass two integers arguments and you get the starting value through the final value, where the final value is exclusive. Here's an example with the keywords specified:

          np.arange(start=1, stop=7)          ---          array([1, 2, 3, 4, 5, 6])

Pass step — the third positional argument — to skip values. The default step is 1.

          np.arange(start=1, stop=7, step=2)          ---          array([1, 3, 5])

Careful with the spelling! Array has two rs, but think of arange as a range. ⚠️

linspace

When you want to create an array of evenly spaced decimal values, use linspace.

linspace splits a pie into evenly-sized pieces. 🥧

You need to pass linspace the start and stop values.

          np.linspace(10, 50)          ---          array([10.        , 10.81632653, 11.63265306, 12.44897959, 13.26530612,
            14.08163265, 14.89795918, 15.71428571, 16.53061224, 17.34693878,
            18.16326531, 18.97959184, 19.79591837, 20.6122449 , 21.42857143,
            22.24489796, 23.06122449, 23.87755102, 24.69387755, 25.51020408,
            26.32653061, 27.14285714, 27.95918367, 28.7755102 , 29.59183673,
            30.40816327, 31.2244898 , 32.04081633, 32.85714286, 33.67346939,
            34.48979592, 35.30612245, 36.12244898, 36.93877551, 37.75510204,
            38.57142857, 39.3877551 , 40.20408163, 41.02040816, 41.83673469,
            42.65306122, 43.46938776, 44.28571429, 45.10204082, 45.91836735,
            46.73469388, 47.55102041, 48.36734694, 49.18367347, 50.        ])

By default you get 50 slices of pie. But you can change that by passing a third argument.

          np.linspace(10, 50, 5)          ---          array([10., 20., 30., 40., 50.])

logspace

logspace returns evenly spaced numbers on a logarithmic scale with a base of 10 by default.

          np.logspace(1, 2)          array([ 10.        ,  10.48113134,  10.98541142,  11.51395399,
            12.06792641,  12.64855217,  13.25711366,  13.89495494,
            14.56348478,  15.26417967,  15.9985872 ,  16.76832937,
            17.57510625,  18.42069969,  19.30697729,  20.23589648,
            21.20950888,  22.22996483,  23.29951811,  24.42053095,
            25.59547923,  26.82695795,  28.11768698,  29.47051703,
            30.88843596,  32.37457543,  33.93221772,  35.56480306,
            37.2759372 ,  39.06939937,  40.94915062,  42.9193426 ,
            44.98432669,  47.14866363,  49.41713361,  51.79474679,
            54.28675439,  56.89866029,  59.63623317,  62.50551925,
            65.51285569,  68.6648845 ,  71.9685673 ,  75.43120063,
            79.06043211,  82.86427729,  86.85113738,  91.0298178 ,
            95.40954763, 100.        ])

Passing logspace 1 and 2 for the start and stop values returns an array of 50 values ranging between 10 and 100 on the base 10 scale.

Here's how to create an array of 10 values, equally spaced between 0 and 3 on the base 10 scale.

          np.logspace(start=0, stop=3, num=10)          array([   1.        ,    2.15443469,    4.64158883,   10.        ,
            21.5443469 ,   46.41588834,  100.        ,  215.443469  ,
            464.15888336, 1000.        ])

logspace can be handy when hyperparameter tuning in scikit-learn. 👍

Fill with a constant value

You'll often want an array filled with zeros, ones, or some other value.

zeros

Create and fill an array with zeros by using the zeros function.

          np.zeros(5)          ---          array([0., 0., 0., 0., 0.])

No e after the o! 😀

Note that yet get back floats for the dtype.

Pass a tuple with the number of rows followed by the number of columns to make a two dimensional array.

          six_zeros = np.zeros(shape=(2, 3))
six_zeros          ---          array([[0., 0., 0.],
            [0., 0., 0.]])

This time we used the keyword argument for demonstration purposes. We saved the array as a variable because we'll use it in a minute.

ones

ones behaves similarly to zeros.

          np.ones((3, 5))          ---          array([[1., 1., 1., 1., 1.],
            [1., 1., 1., 1., 1.],
            [1., 1., 1., 1., 1.]])

full

Fill an array with any value by using full.

          np.full(shape=6, fill_value='Winner, winner, chicken dinner!')          ---          array(['Winner, winner, chicken dinner!',
            'Winner, winner, chicken dinner!',
            'Winner, winner, chicken dinner!',
            'Winner, winner, chicken dinner!',
            'Winner, winner, chicken dinner!',
            'Winner, winner, chicken dinner!'], dtype='<U31')

Like zeros and ones, pass a tuple as the first argument if you want a multi-dimensional array.

          np.full((3, 2), "I'm in a 2d array!")          ---          array([["I'm in a 2d array!", "I'm in a 2d array!"],
            ["I'm in a 2d array!", "I'm in a 2d array!"],
            ["I'm in a 2d array!", "I'm in a 2d array!"]], dtype='<U18')

Now let's see how to fill an array with a value AND match the shape and dtype of another array.

zeros_like

When you want to create an array that matches the shape of another array, append _like to zeros, ones, or full. Then you're in business! 💸

Let's see how to use zeros_like with our size_zeros array we saved earlier - it was a 2x3 array.

          np.zeros_like(six_zeros)          ---          array([[0., 0., 0.],
            [0., 0., 0.]])

ones_like

ones_like behaves similarly.

          np.ones_like(six_zeros)          ---          array([[1., 1., 1.],
            [1., 1., 1.]])

full_like

full_like acts how you'd expect. 😉

          np.full_like(six_zeros, 22)          ---          array([[22., 22., 22.],
            [22., 22., 22.]])

Note that you get back an array of the same shape AND dtype as the array or list you pass in.

          np.full_like([1, 2, 65, 3], fill_value=22)          ---          array([22, 22, 22, 22])

Random data

dice falling — Random dice roll. Source: pixabay.com

It's often useful to fill a NumPy array with randomly distributed data. Let's see how to do that.

random.randn

Create an array filled with random floats drawn from the standard normal distribution with random.randn.

          np.random.randn(10)          ---          array([ 0.48141843,  0.33463071,  0.37107953, -1.16044437, -1.15956598,
            1.24637982, -0.21480563,  1.61006107, -0.88036176, -0.52745888])

Strangely, unlike the functions above, here you don't pass a tuple to indicate the shape, you just pass the rows as the first argument and the columns as the second argument. ⚠️

          np.random.randn(5, 2)          ---          array([[ 1.13164593, -0.35241179],
            [-1.81246707,  0.76773381],
            [ 1.13485416, -0.50449109],
            [-0.32666705, -0.90184535],
            [ 1.38867755, -1.08018813]])

If you want to pass the tuple to indicate the shape, you have to use random.standard_normal. It does the same thing as random.rand. This is a bit of a historical artifact that doesn't really keep with the Zen of Python's one obvious way to do it, but it is what it is. I generally just use randn and then get error messages. 🙃

          np.random.standard_normal((4, 5))          ---          array([[ 0.22812825,  0.60446763, -0.42118075,  1.79680568, -0.33793378],
            [ 0.07964594, -0.39447251,  0.60948288, -0.03175253, -0.30030963],
            [-0.37746859, -0.33789088, -2.30195465,  0.19532716, -1.74321666],
            [-0.40882198,  0.08589203, -1.29910817,  0.64159252, -2.13985143]])

If you want reproducible results, set the random seed like this:

          np.random.seed(123)

Note, NumPy now has a new, more complicated way to make random numbers that it officially recommends in the docs. I expect most folks will stick with the random functions I use here because they are well known, require less code, and work fine for most situations.

rand

Create an array filled with random floats drawn from a uniform distribution between 0 and 1 with random.rand.

          np.random.rand(10)          ---          array([0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897,
            0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752])

random.rand is like random.randn in the sense that it takes the dimensions directly as arguments, and not as a tuple.

          np.random.rand(2, 3)          ---          array([[0.18249173, 0.17545176, 0.53155137],
            [0.53182759, 0.63440096, 0.84943179]])

Alright, you've seen how to use NumPy to make arrays. These aren't the only functions for making arrays, but they should cover over 95% of use cases. 🚀

Convert from another library

Often you'll be using another library and have a data structure that you want to convert into a NumPy array for processing. Let's see how to do that with pandas.

pandas

pandas is a very popular library for data manipulation. It extends NumPy. Let's make a pandas DataFrame of scores and turn it into a NumPy array. 🐼

          df_scores = pd.DataFrame(dict(age=[22, 44, 67], score=[5, 6, 8]))
df_scores

The to_numpy method is the officially recommended way to convert a pandas DataFrame or Series into a NumPy array. It was introduced in version 0.24.0.

          df_scores.to_numpy()          ---          array([[22,  5],
            [44,  6],
            [67,  8]])

The values attribute works, too, but is no longer recommended.

Now let's check out TensorFlow.

TensorFlow

waterfall — Water flow. Source: pixabay.com

TensorFlow is a very popular deep learning framework. Converting a TensorFlow tensor to a NumPy array takes a few steps.

First let's make a TensorFlow tensor object.

          tf_tensor = tf.constant([[3,15,2],[55,5,6]])
tf_tensor          ---          <tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[ 3, 15,  2],
            [55,  5,  6]], dtype=int32)>

Then let's make it into a proto_tensor.

          proto_tensor = tf.make_tensor_proto(tf_tensor)
proto_tensor          ---          dtype: DT_INT32
tensor_shape {
            dim {
            size: 2
            }
            dim {
            size: 3
            }
}
tensor_content: "\003\000\000\000\017\000\000\000\002\000\000\0007\000\000\000\005\000\000\000\006\000\000\000"

Now we can convert it into a NumPy array.

          tf.make_ndarray(proto_tensor)          ---          array([[ 3, 15,  2],
            [55,  5,  6]], dtype=int32)

This kind of fun is why PyTorch is becoming more and more popular. 😉

See my article on PyTorch vs. TensorFlow popularity here.

PyTorch

blowtorch — Blowtorch. Source: pixabay.com

PyTorch is the other large deep learning framework. It's a bit more Pythonic than TensorFlow. 🐍

Let's make a PyTorch tensor.

          my_pytorch_tensor = torch.ones(5)
my_pytorch_tensor          ---          tensor([1., 1., 1., 1., 1.])

And let's convert it into a NumPy tensor.

          my_pytorch_tensor.numpy()          ---          array([1., 1., 1., 1., 1.], dtype=float32)

That was refreshingly straightforward. 😀

PyTorch is closely interoperable with NumPy. Note that after converting between Torch tensors and NumPy arrays they "will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other." — the docs. ⚠️

SciPy

SciPy sparse matrices are very efficient for storing data that is filled with mostly 0s (or some other single value). For example, after one-hot encoding an array in scikit-learn, the resulting data structure is a ScPy sparse matrix.

Sometimes you'll want to convert a SciPy sparse matrix into a NumPy array so you can inspect it or do a certain operation on it.

We imported OneHotEncoder from scikit-learn earlier, so let's use that now to create a sparse matrix.

          ohe = OneHotEncoder()
dummified_scores = ohe.fit_transform(df_scores)
dummified_scores          ---          <3x6 sparse matrix of type '<class 'numpy.float64'>'
            with 6 stored elements in Compressed Sparse Row format>

The low memory usage is great, but let's say I want to see the resulting data. I could pass the argument sparse=False at instantiation to get a NumPy array back. Or I can just use toarray.

          dummified_scores.toarray()          ---          array([[1., 0., 0., 1., 0., 0.],
            [0., 1., 0., 0., 1., 0.],
            [0., 0., 1., 0., 0., 1.]])

Note that pandas uses to_numpy while SciPy uses toarray. This frequently trips me up. ⚠️

Summary

You've seen how to create NumPy arrays filled with the data you want. You've also seen how to convert other Python data structures into NumPy arrays. Now you're ready to manipulate arrays in NumPy! Awesome! 🎉

I hope found this tour of this creating NumPy arrays useful. If you did, please share it on your favorite social media so other folks can find it, too. 👍

NumPy underlies much of the open source scientific computing revolution. Special thanks to Travis Oliphant and the other developers of NumPy who helped make this valuable tool what it is today. 🚀

I originally published this article for Deepnote here. You can run the article as a notebook there. 😀

I write about data science, Python, SQL, and other tech topics. If any of that's of interest to you, sign up for my mailing list of awesome data science resources and read more to help you grow your skills here. 😀

Happy array creating! 😀