Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes

The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of sy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hacohen, Uri, Haviv, Adi, Sarfaty, Shahar, Friedman, Bruria, Elkin-Koren, Niva, Livni, Roi, Bermano, Amit H
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Hacohen, Uri Haviv, Adi Sarfaty, Shahar Friedman, Bruria Elkin-Koren, Niva Livni, Roi Bermano, Amit H
description	The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of synthetic content and sparked legal disputes over copyright infringement. To address these challenges, this paper introduces a novel approach that leverages the learning capacity of GenAI models for copyright legal analysis, demonstrated with GPT2 and Stable Diffusion models. Copyright law distinguishes between original expressions and generic ones (Sc\`enes \`a faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works. We propose a data-driven approach to identify the genericity of works created by GenAI, employing "data-driven bias" to assess the genericity of expressive compositions. This approach aids in copyright scope determination by utilizing the capabilities of GenAI to identify and prioritize expressive elements and rank them according to their frequency in the model's dataset. The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts in determining copyright scope during litigation, inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works, and help copyright owners signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.
doi_str_mv	10.48550/arxiv.2403.17691
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_17691</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_17691</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-c695f19d1d1fe4cf797646e8c271029498d7370307c08121f7155e9c22bbcf243</originalsourceid><addsrcrecordid>eNotz71OwzAYheEsDKhwAUx8N5BgO04cs4WklEgRDHSPXOdzail_OG5E7x5omc70HukJggdKIp4lCXlS7tuuEeMkjqhIJb0Nju-Th7zv4dMOtlfOeosL5A6hcKg8trD9Oqn-GWpc0anOjh2UyquwdHbFEV6sWn4DP0E1mskNsMMxr6CY5rOz3dFDaZf55HG5C26M6he8_99NsH_d7ou3sP7YVUVehyoVNNSpTAyVLW2pQa6NkCLlKWaaCUqY5DJrRSxITIQmGWXUCJokKDVjh4M2jMeb4PF6e6E2s7ODcufmj9xcyPEPq0pQVQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes</title><source>arXiv.org</source><creator>Hacohen, Uri ; Haviv, Adi ; Sarfaty, Shahar ; Friedman, Bruria ; Elkin-Koren, Niva ; Livni, Roi ; Bermano, Amit H</creator><creatorcontrib>Hacohen, Uri ; Haviv, Adi ; Sarfaty, Shahar ; Friedman, Bruria ; Elkin-Koren, Niva ; Livni, Roi ; Bermano, Amit H</creatorcontrib><description>The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of synthetic content and sparked legal disputes over copyright infringement. To address these challenges, this paper introduces a novel approach that leverages the learning capacity of GenAI models for copyright legal analysis, demonstrated with GPT2 and Stable Diffusion models. Copyright law distinguishes between original expressions and generic ones (Sc\`enes \`a faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works. We propose a data-driven approach to identify the genericity of works created by GenAI, employing "data-driven bias" to assess the genericity of expressive compositions. This approach aids in copyright scope determination by utilizing the capabilities of GenAI to identify and prioritize expressive elements and rank them according to their frequency in the model's dataset. The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts in determining copyright scope during litigation, inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works, and help copyright owners signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.</description><identifier>DOI: 10.48550/arxiv.2403.17691</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.17691$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.17691$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Hacohen, Uri</creatorcontrib><creatorcontrib>Haviv, Adi</creatorcontrib><creatorcontrib>Sarfaty, Shahar</creatorcontrib><creatorcontrib>Friedman, Bruria</creatorcontrib><creatorcontrib>Elkin-Koren, Niva</creatorcontrib><creatorcontrib>Livni, Roi</creatorcontrib><creatorcontrib>Bermano, Amit H</creatorcontrib><title>Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes</title><description>The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of synthetic content and sparked legal disputes over copyright infringement. To address these challenges, this paper introduces a novel approach that leverages the learning capacity of GenAI models for copyright legal analysis, demonstrated with GPT2 and Stable Diffusion models. Copyright law distinguishes between original expressions and generic ones (Sc\`enes \`a faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works. We propose a data-driven approach to identify the genericity of works created by GenAI, employing "data-driven bias" to assess the genericity of expressive compositions. This approach aids in copyright scope determination by utilizing the capabilities of GenAI to identify and prioritize expressive elements and rank them according to their frequency in the model's dataset. The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts in determining copyright scope during litigation, inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works, and help copyright owners signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYheEsDKhwAUx8N5BgO04cs4WklEgRDHSPXOdzail_OG5E7x5omc70HukJggdKIp4lCXlS7tuuEeMkjqhIJb0Nju-Th7zv4dMOtlfOeosL5A6hcKg8trD9Oqn-GWpc0anOjh2UyquwdHbFEV6sWn4DP0E1mskNsMMxr6CY5rOz3dFDaZf55HG5C26M6he8_99NsH_d7ou3sP7YVUVehyoVNNSpTAyVLW2pQa6NkCLlKWaaCUqY5DJrRSxITIQmGWXUCJokKDVjh4M2jMeb4PF6e6E2s7ODcufmj9xcyPEPq0pQVQ</recordid><startdate>20240326</startdate><enddate>20240326</enddate><creator>Hacohen, Uri</creator><creator>Haviv, Adi</creator><creator>Sarfaty, Shahar</creator><creator>Friedman, Bruria</creator><creator>Elkin-Koren, Niva</creator><creator>Livni, Roi</creator><creator>Bermano, Amit H</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240326</creationdate><title>Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes</title><author>Hacohen, Uri ; Haviv, Adi ; Sarfaty, Shahar ; Friedman, Bruria ; Elkin-Koren, Niva ; Livni, Roi ; Bermano, Amit H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-c695f19d1d1fe4cf797646e8c271029498d7370307c08121f7155e9c22bbcf243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Hacohen, Uri</creatorcontrib><creatorcontrib>Haviv, Adi</creatorcontrib><creatorcontrib>Sarfaty, Shahar</creatorcontrib><creatorcontrib>Friedman, Bruria</creatorcontrib><creatorcontrib>Elkin-Koren, Niva</creatorcontrib><creatorcontrib>Livni, Roi</creatorcontrib><creatorcontrib>Bermano, Amit H</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hacohen, Uri</au><au>Haviv, Adi</au><au>Sarfaty, Shahar</au><au>Friedman, Bruria</au><au>Elkin-Koren, Niva</au><au>Livni, Roi</au><au>Bermano, Amit H</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes</atitle><date>2024-03-26</date><risdate>2024</risdate><abstract>The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of synthetic content and sparked legal disputes over copyright infringement. To address these challenges, this paper introduces a novel approach that leverages the learning capacity of GenAI models for copyright legal analysis, demonstrated with GPT2 and Stable Diffusion models. Copyright law distinguishes between original expressions and generic ones (Sc\`enes \`a faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works. We propose a data-driven approach to identify the genericity of works created by GenAI, employing "data-driven bias" to assess the genericity of expressive compositions. This approach aids in copyright scope determination by utilizing the capabilities of GenAI to identify and prioritize expressive elements and rank them according to their frequency in the model's dataset. The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts in determining copyright scope during litigation, inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works, and help copyright owners signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.</abstract><doi>10.48550/arxiv.2403.17691</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2403.17691
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2403_17691
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
title	Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T17%3A21%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Not%20All%20Similarities%20Are%20Created%20Equal:%20Leveraging%20Data-Driven%20Biases%20to%20Inform%20GenAI%20Copyright%20Disputes&rft.au=Hacohen,%20Uri&rft.date=2024-03-26&rft_id=info:doi/10.48550/arxiv.2403.17691&rft_dat=%3Carxiv_GOX%3E2403_17691%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true