Data loading, Storage and File Formats
Assessing data is the first step in translating data into meaningful information. Pandas features a number of functions for reading tabular or dataframe data.
Data files can have different sources and different formats. Previously you learned to work with flat files such as files that contain sequence information. In this part of the programming course, we will work mainly with tabular data or data structures that can be easily transformed into a tabular format. Later on, you will learn to work with images and text.

Pandas has a number of methods for reading tabular data as a DataFrame object.
read_csv
read_fwf
read_clipboard
read_excel
read_hdf
read_html
read_json
read_msgpack
read_pickle
read_sas
read_sql
read_stata
read_feather
The methods reads the data directly into a Pandas DataFrame. Most of these methods have options to skip NaN values, read a specific part of the file by defining a number of rows of or the chunk size or skip the footer.
na_values
skiprows
sep
nrows
chunksize
skip_footer
encoding
Read the documentation to learn about the specifics. In this ebook, we look at the most commonly used in data science. The csv
, json
, xml
, html
and pdf
files.
Last updated
Was this helpful?