A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning

Reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is a brain‐inspired reinforcement learning (RL) rule, exhibiting potential for decision‐making tasks and artificial general intelligence. However, the hardware implementation of the reward‐modulation process in R‐STDP usually requires compl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advanced materials (Weinheim) 2022-12, Vol.34 (48), p.e2107754-n/a
Hauptverfasser: Zhou, Yue, Wang, Yasai, Zhuge, Fuwei, Guo, Jianmiao, Ma, Sijie, Wang, Jingli, Tang, Zijian, Li, Yi, Miao, Xiangshui, He, Yuhui, Chai, Yang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 48
container_start_page e2107754
container_title Advanced materials (Weinheim)
container_volume 34
creator Zhou, Yue
Wang, Yasai
Zhuge, Fuwei
Guo, Jianmiao
Ma, Sijie
Wang, Jingli
Tang, Zijian
Li, Yi
Miao, Xiangshui
He, Yuhui
Chai, Yang
description Reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is a brain‐inspired reinforcement learning (RL) rule, exhibiting potential for decision‐making tasks and artificial general intelligence. However, the hardware implementation of the reward‐modulation process in R‐STDP usually requires complicated Si complementary metal–oxide–semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n‐type and the other as p‐type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/−1.23), and large Gmax/Gmin ratio of 30 are realized. By applying positive/negative reward to (anti‐)STDP component of 2T cell, R‐STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart–pole problem, exhibiting a way for realizing low‐power (32 pJ per forward process) and highly area‐efficient (100 µm2) hardware chip for reinforcement learning. Hardware implementation for reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is demonstrated with a unique 2T synaptic cell structure, which realizes the functions of both STDP and anti‐STDP using a simple hardware structure. The total synaptic weight is increased (decreased) by applying feedback signal to gate1 or gate2 when, respectively, a positive or negative reward signal comes.
doi_str_mv 10.1002/adma.202107754
format Article
fullrecord <record><control><sourceid>proquest_wiley</sourceid><recordid>TN_cdi_proquest_miscellaneous_2624950871</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2742938858</sourcerecordid><originalsourceid>FETCH-LOGICAL-p2664-5b6154f4169a26a4f6767c331b30356588d5d16deab623606abb02c36b58b4a73</originalsourceid><addsrcrecordid>eNpdkMtKw0AYhQdRsFa3rgNu3KTOPTPLUK9QEduKy2EmmZQpySRmGkp3PoLP6JM4peLC1eH_-Th8HAAuEZwgCPGNLhs9wRAjmGWMHoERYhilFEp2DEZQEpZKTsUpOAthDSGUHPIReM2TuS1aX7nV0GtT22S5bb8_v94XFsdY9toHFzZtnyx2XncbVyRTW9dJFT9z63zMwjbWb5KZ1b13fnUOTipdB3vxm2Pwdn-3nD6ms5eHp2k-SzvMOU2Z4YjRiiIuNeaaVjzjWUEIMgQSxpkQJSsRL602HJPoqo2BuCDcMGGozsgYXB96u779GGzYqMaFIrppb9shKMwxlQyKDEX06h-6bofeRzuFM4olEYKJSMkDtXW13amud43udwpBtZ9X7edVf_Oq_PY5_7vID-XycPo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2742938858</pqid></control><display><type>article</type><title>A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning</title><source>Wiley Online Library All Journals</source><creator>Zhou, Yue ; Wang, Yasai ; Zhuge, Fuwei ; Guo, Jianmiao ; Ma, Sijie ; Wang, Jingli ; Tang, Zijian ; Li, Yi ; Miao, Xiangshui ; He, Yuhui ; Chai, Yang</creator><creatorcontrib>Zhou, Yue ; Wang, Yasai ; Zhuge, Fuwei ; Guo, Jianmiao ; Ma, Sijie ; Wang, Jingli ; Tang, Zijian ; Li, Yi ; Miao, Xiangshui ; He, Yuhui ; Chai, Yang</creatorcontrib><description>Reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is a brain‐inspired reinforcement learning (RL) rule, exhibiting potential for decision‐making tasks and artificial general intelligence. However, the hardware implementation of the reward‐modulation process in R‐STDP usually requires complicated Si complementary metal–oxide–semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n‐type and the other as p‐type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (&gt;6 bit) conductance states, ultralow nonlinearity (0.56/−1.23), and large Gmax/Gmin ratio of 30 are realized. By applying positive/negative reward to (anti‐)STDP component of 2T cell, R‐STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart–pole problem, exhibiting a way for realizing low‐power (32 pJ per forward process) and highly area‐efficient (100 µm2) hardware chip for reinforcement learning. Hardware implementation for reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is demonstrated with a unique 2T synaptic cell structure, which realizes the functions of both STDP and anti‐STDP using a simple hardware structure. The total synaptic weight is increased (decreased) by applying feedback signal to gate1 or gate2 when, respectively, a positive or negative reward signal comes.</description><identifier>ISSN: 0935-9648</identifier><identifier>EISSN: 1521-4095</identifier><identifier>DOI: 10.1002/adma.202107754</identifier><language>eng</language><publisher>Weinheim: Wiley Subscription Services, Inc</publisher><subject>2D semiconductors ; Circuit design ; CMOS ; Ferroelectric materials ; Ferroelectricity ; Hardware ; Learning ; Materials science ; Neural networks ; Power consumption ; Reconfiguration ; reinforcement learning ; reward‐modulated spike‐timing‐dependent plasticity ; Semiconductor devices ; synaptic device ; Transistors</subject><ispartof>Advanced materials (Weinheim), 2022-12, Vol.34 (48), p.e2107754-n/a</ispartof><rights>2022 Wiley‐VCH GmbH</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8943-0861</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fadma.202107754$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fadma.202107754$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1416,27922,27923,45572,45573</link.rule.ids></links><search><creatorcontrib>Zhou, Yue</creatorcontrib><creatorcontrib>Wang, Yasai</creatorcontrib><creatorcontrib>Zhuge, Fuwei</creatorcontrib><creatorcontrib>Guo, Jianmiao</creatorcontrib><creatorcontrib>Ma, Sijie</creatorcontrib><creatorcontrib>Wang, Jingli</creatorcontrib><creatorcontrib>Tang, Zijian</creatorcontrib><creatorcontrib>Li, Yi</creatorcontrib><creatorcontrib>Miao, Xiangshui</creatorcontrib><creatorcontrib>He, Yuhui</creatorcontrib><creatorcontrib>Chai, Yang</creatorcontrib><title>A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning</title><title>Advanced materials (Weinheim)</title><description>Reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is a brain‐inspired reinforcement learning (RL) rule, exhibiting potential for decision‐making tasks and artificial general intelligence. However, the hardware implementation of the reward‐modulation process in R‐STDP usually requires complicated Si complementary metal–oxide–semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n‐type and the other as p‐type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (&gt;6 bit) conductance states, ultralow nonlinearity (0.56/−1.23), and large Gmax/Gmin ratio of 30 are realized. By applying positive/negative reward to (anti‐)STDP component of 2T cell, R‐STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart–pole problem, exhibiting a way for realizing low‐power (32 pJ per forward process) and highly area‐efficient (100 µm2) hardware chip for reinforcement learning. Hardware implementation for reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is demonstrated with a unique 2T synaptic cell structure, which realizes the functions of both STDP and anti‐STDP using a simple hardware structure. The total synaptic weight is increased (decreased) by applying feedback signal to gate1 or gate2 when, respectively, a positive or negative reward signal comes.</description><subject>2D semiconductors</subject><subject>Circuit design</subject><subject>CMOS</subject><subject>Ferroelectric materials</subject><subject>Ferroelectricity</subject><subject>Hardware</subject><subject>Learning</subject><subject>Materials science</subject><subject>Neural networks</subject><subject>Power consumption</subject><subject>Reconfiguration</subject><subject>reinforcement learning</subject><subject>reward‐modulated spike‐timing‐dependent plasticity</subject><subject>Semiconductor devices</subject><subject>synaptic device</subject><subject>Transistors</subject><issn>0935-9648</issn><issn>1521-4095</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpdkMtKw0AYhQdRsFa3rgNu3KTOPTPLUK9QEduKy2EmmZQpySRmGkp3PoLP6JM4peLC1eH_-Th8HAAuEZwgCPGNLhs9wRAjmGWMHoERYhilFEp2DEZQEpZKTsUpOAthDSGUHPIReM2TuS1aX7nV0GtT22S5bb8_v94XFsdY9toHFzZtnyx2XncbVyRTW9dJFT9z63zMwjbWb5KZ1b13fnUOTipdB3vxm2Pwdn-3nD6ms5eHp2k-SzvMOU2Z4YjRiiIuNeaaVjzjWUEIMgQSxpkQJSsRL602HJPoqo2BuCDcMGGozsgYXB96u779GGzYqMaFIrppb9shKMwxlQyKDEX06h-6bofeRzuFM4olEYKJSMkDtXW13amud43udwpBtZ9X7edVf_Oq_PY5_7vID-XycPo</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Zhou, Yue</creator><creator>Wang, Yasai</creator><creator>Zhuge, Fuwei</creator><creator>Guo, Jianmiao</creator><creator>Ma, Sijie</creator><creator>Wang, Jingli</creator><creator>Tang, Zijian</creator><creator>Li, Yi</creator><creator>Miao, Xiangshui</creator><creator>He, Yuhui</creator><creator>Chai, Yang</creator><general>Wiley Subscription Services, Inc</general><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8943-0861</orcidid></search><sort><creationdate>20221201</creationdate><title>A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning</title><author>Zhou, Yue ; Wang, Yasai ; Zhuge, Fuwei ; Guo, Jianmiao ; Ma, Sijie ; Wang, Jingli ; Tang, Zijian ; Li, Yi ; Miao, Xiangshui ; He, Yuhui ; Chai, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p2664-5b6154f4169a26a4f6767c331b30356588d5d16deab623606abb02c36b58b4a73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>2D semiconductors</topic><topic>Circuit design</topic><topic>CMOS</topic><topic>Ferroelectric materials</topic><topic>Ferroelectricity</topic><topic>Hardware</topic><topic>Learning</topic><topic>Materials science</topic><topic>Neural networks</topic><topic>Power consumption</topic><topic>Reconfiguration</topic><topic>reinforcement learning</topic><topic>reward‐modulated spike‐timing‐dependent plasticity</topic><topic>Semiconductor devices</topic><topic>synaptic device</topic><topic>Transistors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Yue</creatorcontrib><creatorcontrib>Wang, Yasai</creatorcontrib><creatorcontrib>Zhuge, Fuwei</creatorcontrib><creatorcontrib>Guo, Jianmiao</creatorcontrib><creatorcontrib>Ma, Sijie</creatorcontrib><creatorcontrib>Wang, Jingli</creatorcontrib><creatorcontrib>Tang, Zijian</creatorcontrib><creatorcontrib>Li, Yi</creatorcontrib><creatorcontrib>Miao, Xiangshui</creatorcontrib><creatorcontrib>He, Yuhui</creatorcontrib><creatorcontrib>Chai, Yang</creatorcontrib><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><jtitle>Advanced materials (Weinheim)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Yue</au><au>Wang, Yasai</au><au>Zhuge, Fuwei</au><au>Guo, Jianmiao</au><au>Ma, Sijie</au><au>Wang, Jingli</au><au>Tang, Zijian</au><au>Li, Yi</au><au>Miao, Xiangshui</au><au>He, Yuhui</au><au>Chai, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning</atitle><jtitle>Advanced materials (Weinheim)</jtitle><date>2022-12-01</date><risdate>2022</risdate><volume>34</volume><issue>48</issue><spage>e2107754</spage><epage>n/a</epage><pages>e2107754-n/a</pages><issn>0935-9648</issn><eissn>1521-4095</eissn><abstract>Reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is a brain‐inspired reinforcement learning (RL) rule, exhibiting potential for decision‐making tasks and artificial general intelligence. However, the hardware implementation of the reward‐modulation process in R‐STDP usually requires complicated Si complementary metal–oxide–semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n‐type and the other as p‐type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (&gt;6 bit) conductance states, ultralow nonlinearity (0.56/−1.23), and large Gmax/Gmin ratio of 30 are realized. By applying positive/negative reward to (anti‐)STDP component of 2T cell, R‐STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart–pole problem, exhibiting a way for realizing low‐power (32 pJ per forward process) and highly area‐efficient (100 µm2) hardware chip for reinforcement learning. Hardware implementation for reward‐modulated spike‐timing‐dependent plasticity (R‐STDP) is demonstrated with a unique 2T synaptic cell structure, which realizes the functions of both STDP and anti‐STDP using a simple hardware structure. The total synaptic weight is increased (decreased) by applying feedback signal to gate1 or gate2 when, respectively, a positive or negative reward signal comes.</abstract><cop>Weinheim</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/adma.202107754</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-8943-0861</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0935-9648
ispartof Advanced materials (Weinheim), 2022-12, Vol.34 (48), p.e2107754-n/a
issn 0935-9648
1521-4095
language eng
recordid cdi_proquest_miscellaneous_2624950871
source Wiley Online Library All Journals
subjects 2D semiconductors
Circuit design
CMOS
Ferroelectric materials
Ferroelectricity
Hardware
Learning
Materials science
Neural networks
Power consumption
Reconfiguration
reinforcement learning
reward‐modulated spike‐timing‐dependent plasticity
Semiconductor devices
synaptic device
Transistors
title A Reconfigurable Two‐WSe2‐Transistor Synaptic Cell for Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T14%3A50%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_wiley&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Reconfigurable%20Two%E2%80%90WSe2%E2%80%90Transistor%20Synaptic%20Cell%20for%20Reinforcement%20Learning&rft.jtitle=Advanced%20materials%20(Weinheim)&rft.au=Zhou,%20Yue&rft.date=2022-12-01&rft.volume=34&rft.issue=48&rft.spage=e2107754&rft.epage=n/a&rft.pages=e2107754-n/a&rft.issn=0935-9648&rft.eissn=1521-4095&rft_id=info:doi/10.1002/adma.202107754&rft_dat=%3Cproquest_wiley%3E2742938858%3C/proquest_wiley%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2742938858&rft_id=info:pmid/&rfr_iscdi=true