> For the complete documentation index, see [llms.txt](https://fennaf.gitbook.io/bfvm19prog1/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://fennaf.gitbook.io/bfvm19prog1/data-inspection.md).

# Data inspection

Before we move into the analysis we need to inspect the structure and format of the data. We need to check the quality and quantity of the data. For each column in a DataFrame, we should understand the content, the meaning of the column label, the used datatype, and the used units. We cannot interpret the results if we do not understand fully the context of the data and the quality of the data. A few standard questions you should always consider are

* What is the structure of the data (datatypes)
* How many features and how many observations are there available
* Are there any missing data and is imputation needed?
* Is my data balanced?&#x20;
* Are there outliers?
* Is normalization needed?
* Is data transformation needed?
* Is data redundant?
* Is data independent from each other or are there any relations?

Furthermore, it might be needed to reshape data, subset the data, or combine it with other sources to structure it in a final format. Some of the above questions can be answered by analyzing the structure and the statistics of the data. Other questions need to be answered by plotting the data. A list of most commonly used methods and attributes:

```python
df.head() # show the first records of a dataset
list(df.columns) # list the columnnames
len(df) # returns number of rows
df.shape # returns the number of rows and columns of the dataset
df.dtypes # returns the datatypes
df.info() # returns a summary of the structure
df.describe().T #returns descriptive stats (transposed) 
df.isnull().sum() # returns the sum of missing values across columns
df['column'].unique() # returns uniques in a column
df['solumn'].value_counts() # returns uniques and counts
df.corr() # creates a correlation matrix, input for a heatmap
```

How data is distributed can be derived with the `.describe` method in case of numerical data and with `.value_counts()` in case of categorical data or booleans. A graphical inspection however is highly recommended. Please read the Data visualization tutorial for more information. An example of data inspection is given in the Heart Failure Case Study.&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://fennaf.gitbook.io/bfvm19prog1/data-inspection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
