Data loading, Storage and File Formats

Assessing data is the first step in translating data into meaningful information. Pandas features a number of functions for reading tabular or dataframe data.

Data files can have different sources and different formats. Previously you learned to work with flat files such as files that contain sequence information. In this part of the programming course, we will work mainly with tabular data or data structures that can be easily transformed into a tabular format. Later on, you will learn to work with images and text.

source: Wynand Alkema 2020

Pandas has a number of methods for reading tabular data as a DataFrame object.

    read_csv
    read_fwf
    read_clipboard
    read_excel
    read_hdf
    read_html
    read_json
    read_msgpack
    read_pickle
    read_sas
    read_sql
    read_stata
    read_feather

The methods reads the data directly into a Pandas DataFrame. Most of these methods have options to skip NaN values, read a specific part of the file by defining a number of rows of or the chunk size or skip the footer.

    na_values
    skiprows
    sep
    nrows
    chunksize
    skip_footer
    encoding

Read the documentation to learn about the specifics. In this ebook, we look at the most commonly used in data science. The csv , json, xml, html and pdf files.

Last updated