Python Data Cleaning Cookbook prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI

Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Walker, Michael (VerfasserIn)
Format:	Elektronisch E-Book
Sprache:	English
Veröffentlicht:	Birmingham Packt Publishing Ltd. May 2024
Ausgabe:	Second edition
Schriftenreihe:	Expert Insight
Online-Zugang:	DE-1102
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS data -- Importing R data -- Persisting tabular data -- Summary -- Chapter 2: Anticipating Data Cleaning Issues When Working with HTML, JSON, and Spark Data -- Technical requirements -- Importing simple JSON data -- Importing more complicated JSON data from an API -- Importing data from web pages -- Working with Spark data -- Persisting JSON data -- Versioning data -- Summary -- Chapter 3: Taking the Measure of Your Data -- Technical requirements -- Getting a first look at your data -- Selecting and organizing columns -- Selecting rows -- Generating frequencies for categorical variables -- Generating summary statistics for continuous variables -- Using generative AI to display descriptive statistics -- Summary -- Chapter 4: Identifying Outliers in Subsets of Data -- Technical requirements -- Identifying outliers with one variable -- Identifying outliers and unexpected values in bivariate relationships -- Using subsetting to examine logical inconsistencies in variable relationships -- Using linear regression to identify data points with significant influence -- Using k-nearest neighbors to find outliers -- Using Isolation Forest to find anomalies -- Using PandasAI to identify outliers -- Summary -- Chapter 5: Using Visualizations for the Identification of Unexpected Values -- Technical requirements -- Using histograms to examine the distribution of continuous variables -- Using boxplots to identify outliers for continuous variables -- Using grouped boxplots to uncover unexpected values in a particular group.
Beschreibung:	1 Online-Ressource (xvii, 453 Seiten) Illustrationen
ISBN:	9781803246291

Python Data Cleaning Cookbook prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI

Ähnliche Einträge