Python Data Cleaning Cookbook prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham
Packt Publishing Ltd.
May 2024
|
Ausgabe: | Second edition |
Schriftenreihe: | Expert Insight
|
Online-Zugang: | DE-1102 |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Chapter 1: Anticipating Data Cleaning Issues When Importing Tabular Data with pandas -- Technical requirements -- Importing CSV files -- Importing Excel files -- Importing data from SQL databases -- Importing SPSS, Stata, and SAS data -- Importing R data -- Persisting tabular data -- Summary -- Chapter 2: Anticipating Data Cleaning Issues When Working with HTML, JSON, and Spark Data -- Technical requirements -- Importing simple JSON data -- Importing more complicated JSON data from an API -- Importing data from web pages -- Working with Spark data -- Persisting JSON data -- Versioning data -- Summary -- Chapter 3: Taking the Measure of Your Data -- Technical requirements -- Getting a first look at your data -- Selecting and organizing columns -- Selecting rows -- Generating frequencies for categorical variables -- Generating summary statistics for continuous variables -- Using generative AI to display descriptive statistics -- Summary -- Chapter 4: Identifying Outliers in Subsets of Data -- Technical requirements -- Identifying outliers with one variable -- Identifying outliers and unexpected values in bivariate relationships -- Using subsetting to examine logical inconsistencies in variable relationships -- Using linear regression to identify data points with significant influence -- Using k-nearest neighbors to find outliers -- Using Isolation Forest to find anomalies -- Using PandasAI to identify outliers -- Summary -- Chapter 5: Using Visualizations for the Identification of Unexpected Values -- Technical requirements -- Using histograms to examine the distribution of continuous variables -- Using boxplots to identify outliers for continuous variables -- Using grouped boxplots to uncover unexpected values in a particular group. |
---|---|
Beschreibung: | 1 Online-Ressource (xvii, 453 Seiten) Illustrationen |
ISBN: | 9781803246291 |