Big data integration
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Berlin
Springer
2022
|
Ausgabe: | First edition, reprint of the original edition Morgan & Claypool 2015 |
Schriftenreihe: | Synthesis lectures on data management
40 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV049665751 | ||
003 | DE-604 | ||
005 | 20240605 | ||
007 | t | ||
008 | 240425s2022 a||| |||| 00||| eng d | ||
020 | |a 9783031007255 |9 978-3-031-00725-5 | ||
035 | |a (OCoLC)1437841747 | ||
035 | |a (DE-599)BVBBV049665751 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Dong, Xin Luna |e Verfasser |0 (DE-588)1071341634 |4 aut | |
245 | 1 | 0 | |a Big data integration |c Xin Luna Dong ; Divesh Srivastava |
250 | |a First edition, reprint of the original edition Morgan & Claypool 2015 | ||
264 | 1 | |a Berlin |b Springer |c 2022 | |
300 | |a XX, 178 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on data management |v 40 | |
650 | 0 | 7 | |a Datenintegration |0 (DE-588)4197730-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 0 | 1 | |a Datenintegration |0 (DE-588)4197730-0 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Srivastava, Divesh |e Verfasser |0 (DE-588)1071341707 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-62705-224-5 |
830 | 0 | |a Synthesis lectures on data management |v 40 |w (DE-604)BV036766043 |9 40 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
Datensatz im Suchindex
_version_ | 1805082223983984640 |
---|---|
adam_text |
Contents List of Figures.xv List of Tables. xvii Preface. xix Acknowledgments. xix 1. Motivation: Challenges and Opportunities for BDI. 1 1.1 Traditional Data Integration. 1.1.1 The Flights Example: Data Sources. 1.1.2 The Flights Example: Data Integration . 1.1.3 Data Integration: Architecture Three Major Steps. 2 2 6 9 1.2 BDI: Challenges. 11 1.2.1 The “V” Dimensions. 11 1.2.2 Case Study: Quantity of Deep Web Data. 13 1.2.3 Case Study: Extracted Domain-Specific Data. 15 1.2.4 Case Study:
Quality of Deep Web Data. 20 1.2.5 Case Study: Surface Web Structured Data. 23 1.3 1.4 2. 1.2.6 Case Study: Extracted Knowledge Triples. 26 BDI: Opportunities . 27 1.3.1 Data Redundancy. 27 1.3.2 Long Data . 28 1.3.3 Big Data Platforms. . . 29 Outline of Book. 29 SchemaAlignment 2.1 Traditional Schema Alignment: A Quick Tour . . 2.1.1 2.1.2 2.1.3 2.2 . 31 . 32 Mediated Schema. 32 Attribute Matching. 32 Schema Mapping. 33 2.1.4 Query
Answering. 34 Addressing the Variety and Velocity Challenges . 35 2.2.1 Probabilistic Schema Alignment. 36 2.2.2 Pay-As-You-Go User Feedback. 47
xii CONTENTS 2.3 3. Record Linkage. 63 3.1 3.2 3.3 3.4 3.5 4. Traditional Record Linkage: A Quick Tour. 64 3.1.1 Pairwise Matching . . . . . 65 3.1.2 Clustering. 67 3.1.3 Blocking. 68 Addressing the Volume Challenge. 71 3.2.1 Using MapReduce to Parallelize Blocking. 71 3.2.2 Meta-blocking: Pruning Pairwise Matchings. 77 Addressing the Velocity Challenge. 82 3.3.1 Incremental Record Linkage .82 Addressing the Variety Challenge. 88 3.4.1 Linking Text Snippets to Structured Data.89 Addressing the Veracity Challenge. 94 3.5.1 Temporal Record
Linkage. 94 3.5.2 Record Linkage with Uniqueness Constraints. 100 BDI: Data Fusion. 107 4.1 Traditional Data Fusion: A Quick Tour. 108 4.2 Addressing the Veracity Challenge. 4.2.1 Accuracy of a Source . 4.2.2 Probability of a Value Being True . 4.2.3 Copying Between Sources . 4.2.4 The End-to-End Solution. 4.2.5 Extensions and Alternatives. Addressing the Volume Challenge. 4.3.1 A MapReduce-Based Framework for Offline Fusion. 109 Ill Ill 114 120 123 126 126 4.4 4.3.2 Online Data Fusion. Addressing the Velocity Challenge. 127 133 4.5 Addressing the Variety Challenge . .
'. 136 BDI: Emerging Topics. 139 Role of Crowdsourcing. 5.1.1 Leveraging Transitive Relations. 5.1.2 Crowdsourcing the End-to-End Workflow. 139 140 144 4.3 5. Addressing the Variety and Volume Challenges. 49 2.3.1 Integrating Deep Web Data. 49 2.3.2 Integrating Web Tables . 54 5.1
CONTENTS 5.1.3 Future Work. Source Selection. 146 146 5.2.1 Static Sources. 5.2.2 Dynamic Sources. ·. 5.2.3 Future Work. Source Profiling. 5.3.1 The Bellman System . 5.3.2 Summarizing Sources. 148 150 153 153 155 157 Future Work. 160 Conclusions . 163 Bibliography. 165 Authors’ Biographies. 175
Index. 177 5.2 5.3 5.3.3 6. xiii |
any_adam_object | 1 |
author | Dong, Xin Luna Srivastava, Divesh |
author_GND | (DE-588)1071341634 (DE-588)1071341707 |
author_facet | Dong, Xin Luna Srivastava, Divesh |
author_role | aut aut |
author_sort | Dong, Xin Luna |
author_variant | x l d xl xld d s ds |
building | Verbundindex |
bvnumber | BV049665751 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)1437841747 (DE-599)BVBBV049665751 |
discipline | Informatik |
edition | First edition, reprint of the original edition Morgan & Claypool 2015 |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV049665751</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240605</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">240425s2022 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031007255</subfield><subfield code="9">978-3-031-00725-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1437841747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049665751</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Dong, Xin Luna</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1071341634</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Big data integration</subfield><subfield code="c">Xin Luna Dong ; Divesh Srivastava</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition, reprint of the original edition Morgan & Claypool 2015</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Berlin</subfield><subfield code="b">Springer</subfield><subfield code="c">2022</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XX, 178 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">40</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenintegration</subfield><subfield code="0">(DE-588)4197730-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenintegration</subfield><subfield code="0">(DE-588)4197730-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Srivastava, Divesh</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1071341707</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-62705-224-5</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">40</subfield><subfield code="w">(DE-604)BV036766043</subfield><subfield code="9">40</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield></record></collection> |
id | DE-604.BV049665751 |
illustrated | Illustrated |
indexdate | 2024-07-20T07:29:15Z |
institution | BVB |
isbn | 9783031007255 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-035008867 |
oclc_num | 1437841747 |
open_access_boolean | |
owner | DE-739 |
owner_facet | DE-739 |
physical | XX, 178 Seiten Illustrationen, Diagramme |
publishDate | 2022 |
publishDateSearch | 2022 |
publishDateSort | 2022 |
publisher | Springer |
record_format | marc |
series | Synthesis lectures on data management |
series2 | Synthesis lectures on data management |
spelling | Dong, Xin Luna Verfasser (DE-588)1071341634 aut Big data integration Xin Luna Dong ; Divesh Srivastava First edition, reprint of the original edition Morgan & Claypool 2015 Berlin Springer 2022 XX, 178 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on data management 40 Datenintegration (DE-588)4197730-0 gnd rswk-swf Big Data (DE-588)4802620-7 gnd rswk-swf Big Data (DE-588)4802620-7 s Datenintegration (DE-588)4197730-0 s DE-604 Srivastava, Divesh Verfasser (DE-588)1071341707 aut Erscheint auch als Online-Ausgabe 978-1-62705-224-5 Synthesis lectures on data management 40 (DE-604)BV036766043 40 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Dong, Xin Luna Srivastava, Divesh Big data integration Synthesis lectures on data management Datenintegration (DE-588)4197730-0 gnd Big Data (DE-588)4802620-7 gnd |
subject_GND | (DE-588)4197730-0 (DE-588)4802620-7 |
title | Big data integration |
title_auth | Big data integration |
title_exact_search | Big data integration |
title_full | Big data integration Xin Luna Dong ; Divesh Srivastava |
title_fullStr | Big data integration Xin Luna Dong ; Divesh Srivastava |
title_full_unstemmed | Big data integration Xin Luna Dong ; Divesh Srivastava |
title_short | Big data integration |
title_sort | big data integration |
topic | Datenintegration (DE-588)4197730-0 gnd Big Data (DE-588)4802620-7 gnd |
topic_facet | Datenintegration Big Data |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035008867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV036766043 |
work_keys_str_mv | AT dongxinluna bigdataintegration AT srivastavadivesh bigdataintegration |