Python Data Cleaning Cookbook prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI

Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Walker, Michael (VerfasserIn)
Format: Elektronisch E-Book
Sprache:English
Veröffentlicht: Birmingham Packt Publishing Ltd. May 2024
Ausgabe:Second edition
Schriftenreihe:Expert Insight
Online-Zugang:DE-1102
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS data -- Importing R data -- Persisting tabular data -- Summary -- Chapter 2: Anticipating Data Cleaning Issues When Working with HTML, JSON, and Spark Data -- Technical requirements -- Importing simple JSON data -- Importing more complicated JSON data from an API -- Importing data from web pages -- Working with Spark data -- Persisting JSON data -- Versioning data -- Summary -- Chapter 3: Taking the Measure of Your Data -- Technical requirements -- Getting a first look at your data -- Selecting and organizing columns -- Selecting rows -- Generating frequencies for categorical variables -- Generating summary statistics for continuous variables -- Using generative AI to display descriptive statistics -- Summary -- Chapter 4: Identifying Outliers in Subsets of Data -- Technical requirements -- Identifying outliers with one variable -- Identifying outliers and unexpected values in bivariate relationships -- Using subsetting to examine logical inconsistencies in variable relationships -- Using linear regression to identify data points with significant influence -- Using k-nearest neighbors to find outliers -- Using Isolation Forest to find anomalies -- Using PandasAI to identify outliers -- Summary -- Chapter 5: Using Visualizations for the Identification of Unexpected Values -- Technical requirements -- Using histograms to examine the distribution of continuous variables -- Using boxplots to identify outliers for continuous variables -- Using grouped boxplots to uncover unexpected values in a particular group.
Beschreibung:1 Online-Ressource (xvii, 453 Seiten) Illustrationen
ISBN:9781803246291