Big data integration

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dong, Xin Luna (VerfasserIn), Srivastava, Divesh (VerfasserIn)
Format: Buch
Sprache:English
Veröffentlicht: Berlin Springer 2022
Ausgabe:First edition, reprint of the original edition Morgan & Claypool 2015
Schriftenreihe:Synthesis lectures on data management 40
Schlagworte:
Online-Zugang:Inhaltsverzeichnis
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!

MARC

LEADER 00000nam a2200000 cb4500
001 BV049665751
003 DE-604
005 20240605
007 t
008 240425s2022 a||| |||| 00||| eng d
020 |a 9783031007255  |9 978-3-031-00725-5 
035 |a (OCoLC)1437841747 
035 |a (DE-599)BVBBV049665751 
040 |a DE-604  |b ger  |e rda 
041 0 |a eng 
049 |a DE-739 
084 |a ST 530  |0 (DE-625)143679:  |2 rvk 
100 1 |a Dong, Xin Luna  |e Verfasser  |0 (DE-588)1071341634  |4 aut 
245 1 0 |a Big data integration  |c Xin Luna Dong ; Divesh Srivastava 
250 |a First edition, reprint of the original edition Morgan & Claypool 2015 
264 1 |a Berlin  |b Springer  |c 2022 
300 |a XX, 178 Seiten  |b Illustrationen, Diagramme 
336 |b txt  |2 rdacontent 
337 |b n  |2 rdamedia 
338 |b nc  |2 rdacarrier 
490 1 |a Synthesis lectures on data management  |v 40 
650 0 7 |a Datenintegration  |0 (DE-588)4197730-0  |2 gnd  |9 rswk-swf 
650 0 7 |a Big Data  |0 (DE-588)4802620-7  |2 gnd  |9 rswk-swf 
689 0 0 |a Big Data  |0 (DE-588)4802620-7  |D s 
689 0 1 |a Datenintegration  |0 (DE-588)4197730-0  |D s 
689 0 |5 DE-604 
700 1 |a Srivastava, Divesh  |e Verfasser  |0 (DE-588)1071341707  |4 aut 
776 0 8 |i Erscheint auch als  |n Online-Ausgabe  |z 978-1-62705-224-5 
830 0 |a Synthesis lectures on data management  |v 40  |w (DE-604)BV036766043  |9 40 
856 4 2 |m Digitalisierung UB Passau - ADAM Catalogue Enrichment  |q application/pdf  |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA  |3 Inhaltsverzeichnis 

Datensatz im Suchindex

_version_ 1805082223983984640
adam_text Contents List of Figures.xv List of Tables. xvii Preface. xix Acknowledgments. xix 1. Motivation: Challenges and Opportunities for BDI. 1 1.1 Traditional Data Integration. 1.1.1 The Flights Example: Data Sources. 1.1.2 The Flights Example: Data Integration . 1.1.3 Data Integration: Architecture Three Major Steps. 2 2 6 9 1.2 BDI: Challenges. 11 1.2.1 The “V” Dimensions. 11 1.2.2 Case Study: Quantity of Deep Web Data. 13 1.2.3 Case Study: Extracted Domain-Specific Data. 15 1.2.4 Case Study: Quality of Deep Web Data. 20 1.2.5 Case Study: Surface Web Structured Data. 23 1.3 1.4 2. 1.2.6 Case Study: Extracted Knowledge Triples. 26 BDI: Opportunities . 27 1.3.1 Data Redundancy. 27 1.3.2 Long Data . 28 1.3.3 Big Data Platforms. . . 29 Outline of Book. 29 SchemaAlignment 2.1 Traditional Schema Alignment: A Quick Tour . . 2.1.1 2.1.2 2.1.3 2.2 . 31 . 32 Mediated Schema. 32 Attribute Matching. 32 Schema Mapping. 33 2.1.4 Query Answering. 34 Addressing the Variety and Velocity Challenges . 35 2.2.1 Probabilistic Schema Alignment. 36 2.2.2 Pay-As-You-Go User Feedback. 47 xii CONTENTS 2.3 3. Record Linkage. 63 3.1 3.2 3.3 3.4 3.5 4. Traditional Record Linkage: A Quick Tour. 64 3.1.1 Pairwise Matching . . . . . 65 3.1.2 Clustering. 67 3.1.3 Blocking. 68 Addressing the Volume Challenge. 71 3.2.1 Using MapReduce to Parallelize Blocking. 71 3.2.2 Meta-blocking: Pruning Pairwise Matchings. 77 Addressing the Velocity Challenge. 82 3.3.1 Incremental Record Linkage .82 Addressing the Variety Challenge. 88 3.4.1 Linking Text Snippets to Structured Data.89 Addressing the Veracity Challenge. 94 3.5.1 Temporal Record Linkage. 94 3.5.2 Record Linkage with Uniqueness Constraints. 100 BDI: Data Fusion. 107 4.1 Traditional Data Fusion: A Quick Tour. 108 4.2 Addressing the Veracity Challenge. 4.2.1 Accuracy of a Source . 4.2.2 Probability of a Value Being True . 4.2.3 Copying Between Sources . 4.2.4 The End-to-End Solution. 4.2.5 Extensions and Alternatives. Addressing the Volume Challenge. 4.3.1 A MapReduce-Based Framework for Offline Fusion. 109 Ill Ill 114 120 123 126 126 4.4 4.3.2 Online Data Fusion. Addressing the Velocity Challenge. 127 133 4.5 Addressing the Variety Challenge . . '. 136 BDI: Emerging Topics. 139 Role of Crowdsourcing. 5.1.1 Leveraging Transitive Relations. 5.1.2 Crowdsourcing the End-to-End Workflow. 139 140 144 4.3 5. Addressing the Variety and Volume Challenges. 49 2.3.1 Integrating Deep Web Data. 49 2.3.2 Integrating Web Tables . 54 5.1 CONTENTS 5.1.3 Future Work. Source Selection. 146 146 5.2.1 Static Sources. 5.2.2 Dynamic Sources. ·. 5.2.3 Future Work. Source Profiling. 5.3.1 The Bellman System . 5.3.2 Summarizing Sources. 148 150 153 153 155 157 Future Work. 160 Conclusions . 163 Bibliography. 165 Authors’ Biographies. 175 Index. 177 5.2 5.3 5.3.3 6. xiii
any_adam_object 1
author Dong, Xin Luna
Srivastava, Divesh
author_GND (DE-588)1071341634
(DE-588)1071341707
author_facet Dong, Xin Luna
Srivastava, Divesh
author_role aut
aut
author_sort Dong, Xin Luna
author_variant x l d xl xld
d s ds
building Verbundindex
bvnumber BV049665751
classification_rvk ST 530
ctrlnum (OCoLC)1437841747
(DE-599)BVBBV049665751
discipline Informatik
edition First edition, reprint of the original edition Morgan & Claypool 2015
format Book
fullrecord <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV049665751</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240605</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">240425s2022 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031007255</subfield><subfield code="9">978-3-031-00725-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1437841747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049665751</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Dong, Xin Luna</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1071341634</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Big data integration</subfield><subfield code="c">Xin Luna Dong ; Divesh Srivastava</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition, reprint of the original edition Morgan &amp; Claypool 2015</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Berlin</subfield><subfield code="b">Springer</subfield><subfield code="c">2022</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XX, 178 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">40</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenintegration</subfield><subfield code="0">(DE-588)4197730-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenintegration</subfield><subfield code="0">(DE-588)4197730-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Srivastava, Divesh</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1071341707</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-62705-224-5</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">40</subfield><subfield code="w">(DE-604)BV036766043</subfield><subfield code="9">40</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&amp;doc_library=BVB01&amp;local_base=BVB01&amp;doc_number=035008867&amp;sequence=000001&amp;line_number=0001&amp;func_code=DB_RECORDS&amp;service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield></record></collection>
id DE-604.BV049665751
illustrated Illustrated
indexdate 2024-07-20T07:29:15Z
institution BVB
isbn 9783031007255
language English
oai_aleph_id oai:aleph.bib-bvb.de:BVB01-035008867
oclc_num 1437841747
open_access_boolean
owner DE-739
owner_facet DE-739
physical XX, 178 Seiten Illustrationen, Diagramme
publishDate 2022
publishDateSearch 2022
publishDateSort 2022
publisher Springer
record_format marc
series Synthesis lectures on data management
series2 Synthesis lectures on data management
spelling Dong, Xin Luna Verfasser (DE-588)1071341634 aut
Big data integration Xin Luna Dong ; Divesh Srivastava
First edition, reprint of the original edition Morgan & Claypool 2015
Berlin Springer 2022
XX, 178 Seiten Illustrationen, Diagramme
txt rdacontent
n rdamedia
nc rdacarrier
Synthesis lectures on data management 40
Datenintegration (DE-588)4197730-0 gnd rswk-swf
Big Data (DE-588)4802620-7 gnd rswk-swf
Big Data (DE-588)4802620-7 s
Datenintegration (DE-588)4197730-0 s
DE-604
Srivastava, Divesh Verfasser (DE-588)1071341707 aut
Erscheint auch als Online-Ausgabe 978-1-62705-224-5
Synthesis lectures on data management 40 (DE-604)BV036766043 40
Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle Dong, Xin Luna
Srivastava, Divesh
Big data integration
Synthesis lectures on data management
Datenintegration (DE-588)4197730-0 gnd
Big Data (DE-588)4802620-7 gnd
subject_GND (DE-588)4197730-0
(DE-588)4802620-7
title Big data integration
title_auth Big data integration
title_exact_search Big data integration
title_full Big data integration Xin Luna Dong ; Divesh Srivastava
title_fullStr Big data integration Xin Luna Dong ; Divesh Srivastava
title_full_unstemmed Big data integration Xin Luna Dong ; Divesh Srivastava
title_short Big data integration
title_sort big data integration
topic Datenintegration (DE-588)4197730-0 gnd
Big Data (DE-588)4802620-7 gnd
topic_facet Datenintegration
Big Data
url http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
volume_link (DE-604)BV036766043
work_keys_str_mv AT dongxinluna bigdataintegration
AT srivastavadivesh bigdataintegration