> For the complete documentation index, see [llms.txt](https://fennaf.gitbook.io/bfvm22prog1/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://fennaf.gitbook.io/bfvm22prog1/study-cases/why-we-love-numpy.md).

# Why we love Numpy

NumPy (<http://numpy.org>) is a module for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The name is an acronym for "Numeric Python" or "Numerical Python". It is an extension module for Python, mostly written in C. This makes sure that the precompiled mathematical and numerical functions and functionalities of Numpy guarantee great execution speed.

NumPy enriches the programming language Python with powerful data structures, implementing multi-dimensional arrays and matrices. These data structures guarantee efficient calculations with matrices and arrays. The implementation is even aiming at huge matrices and arrays, better known under the heading of "big data". Besides that, the module supplies a large library of high-level mathematical functions to operate on these matrices and arrays.

```python
import numpy as np
```

## Study case the linear regression model error

In the example below the usage of NumPy and the matrix calculations are demonstrated. In the example the cost $$J(\theta)$$ (the error) of a linear regression model is computed using the equation:

$$J(\theta) = \frac{1}{2m} \sum\_{i=1}^{m} ( h\_\theta(x^{(i)}) - y^{(i)} ) ^2$$

where $$h\_\theta(x) = \theta\_0 + \theta\_1x\_1 + \theta\_2x\_2 + \theta\_3x\_3 + ..\theta\_nx\_n$$

In which $$J(\theta)$$ is the total cost calculated by the current weight values of  $$\theta$$ ; $$h\_\theta(x)$$ *is the hypothesised value, the prediction, and* $$y$$ *is the actual value.* $$h\_\theta(x)$$is calculated for each observation $$h\_\theta(x^{(i)})$$ and compared to the actual value $$y^{(i)}$$. By adding up and eventually averaging the difference between these two values (hypothesis - actual) for each data observation, we arrive at the predictive value that the formula has with the current weight values of $$\theta$$&#x20;

To compute this we can use a naive loop or we can use the matrix computation functions included in Numpy, the so-called vectorized implementation. To demonstrate the difference in performance solutions for both methods are provided. We time the execution time

```python
import time
```

## Use NumPy to generate $$m \times n$$ matrix and $$m \times 1$$ vector and $$1 \times n$$ vector

First a dataset is generated. The dataset contains a number of features (columns in the dataset except for the last one $$n-1$$). The final column contains a class variable. The dataset has a number of observations (the rows $$m$$ ). For this the numpy function `np.random.rand(m, n)` is used. Next a vector containing the weights is generated (the $$\theta$$ vector). The last column, containing the class variable is sliced to a vector $$y$$ and the features columns are put into a matrix $$X$$. For the computational purpose, a column of 1's is added to the feature matrix $$X$$&#x20;

$$X = \begin{bmatrix}  1 & x\_1^{(1)} & x\_2^{(1)} & .. & x\_n^{(1)}\ 1 & x\_1^{(2)} & x\_2^{(2)} & .. & x\_n^{(2)}\ 1 & x\_1^{(3)} & x\_2^{(3)} & .. & x\_n^{(3)} \ 1 & .. & .. & .. & .. \ 1 & x\_1^{(m)} & x\_2^{(m)} & .. & x\_n^{(m)} \ \end{bmatrix}$$  $$y =     \begin{bmatrix} \    y^{(1)} \    y^{(2)} \     y^{(3)} \     .. \     y^{(m)} \     \end{bmatrix}$$ $$\theta =     \begin{bmatrix}     \theta\_0 & \theta\_1 & .. & \theta\_n    \end{bmatrix}$$ &#x20;

```python
num_features = 150
num_observaties = 50000
data = np.random.rand(num_observaties, num_features) #generate dataset
theta = np.random.rand(1,num_features) #generate vector containing weights

m,n = data.shape
X = data[:, :n-1]    #all the columns except the last one contain the features
y = data[:, [n-1]]   #last column is the class variable
X = np.c_[np.ones(m), X] #add a first column with ones for the theta0 compution

print("y", y.shape, "vector")
print("X", X.shape, "matrix")
print("𝜃", theta.shape, "vector")
print (f"There are {num_features} features,  and {num_observaties} observations")
```

```
y (50000, 1) vector
X (50000, 150) matrix
𝜃 (1, 150) vector
There are 150 features,  and 50000 observations
```

## Naive loop implementation

The naive loop implementation of calculating the error $$J$$ computes for each row $$i$$ the prediction $$h$$ which is subtracted with the actual value $$h - y^{(i)}$$ to get the difference between the actual value and the model value. The prediction is calculated using a for loop to compute the weight times the feature value for each feature according the equation $$h = \theta\_0 \times 1 + \theta\_1 \times x\_1 + \theta\_2 \times x\_2 + ...\theta\_n \times x\_n$$ The difference between the actual value and the model value is squared and averaged to estimate the average error of the model

```python
#naive implementation
print ("Naive implementation")
start_time = int(round(time.time() * 1000))
J_val1 = 0
theta_nav = theta[0] # get rid of the [[]] -> []
for i in range(m):
    xi = X[i]
    prediction = 0
    for j in range(len(theta_nav)):
        prediction += theta_nav[j]*xi[j] # predict value based on weight theta and feature xi
    delta = (prediction - y[i]) ** 2     # square difference of hypothesed value and actual value
    J_val1 += delta                      # sum of squares

J_val_nav = J_val1/ (2 * m)              # take average of sum of squares
end_time = int(round(time.time() * 1000))
print (f"Error: {J_val_nav}")
print (f"Execution time {end_time - start_time} millis")
```

```
Naive implementation
Error: [636.06025802]
Execution time 5116 millis
```

## Vectorized implemention

For the hypothesis we can use a vectorized implementation: $$h\_\theta(x) = \theta^T.X$$&#x20;

```python
#vectorial implementatie
print ("Vectorial implementation")

start_time = int(round(time.time() * 1000))
h = np.dot(X, theta.T)            #matrix calculation features times weights theta resulting in prediction vector
errors = (h - y) ** 2             #vector substraction predictions minus actual values
J_val_vec = np.mean(errors)/2     #vector average

end_time = int(round(time.time() * 1000))
print (f"Error: {J_val_vec}")
print (f"Execution time {end_time - start_time} millis")
```

```
Vectorial implementation
Error: 636.060258018252
Execution time 4 millis
```

## Conclusion

* With Numpy we can easily generate and manipulate vectors and matrices.&#x20;
* We can transpose vectors and matrices using .T
* We can apply vectorized computations using power, division, subtractions, multiplications with `np.dot` and get mean with `np.mean`
* Vectorized implementation is incredibly faster than an ordinary loop
* You should use Numpy arrays, or a library that builds upon Numpy like pandas, for data processing
* **You are stupid if you use a for loop for dataprocessing**

## Next

Learn more about Numpy: <https://nbviewer.jupyter.org/github/ageron/handson-ml/blob/master/tools_numpy.ipynb>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://fennaf.gitbook.io/bfvm22prog1/study-cases/why-we-love-numpy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
