Semiconductor final test scheduling with Sarsa( λ, k) algorithm
► We propose a multi-step reinforcement learning algorithm called Sarsa( λ, k). ► We construct forward view Sarsa( λ, k) and backward view Sarsa( λ, k) and prove their equivalence in off-line updating. ► We provide the upper bound of the error of the action-value function in tabular Sarsa( λ, k) whe...
Gespeichert in:
Veröffentlicht in: | European journal of operational research 2011-12, Vol.215 (2), p.446-458 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | ► We propose a multi-step reinforcement learning algorithm called Sarsa(
λ,
k). ► We construct forward view Sarsa(
λ,
k) and backward view Sarsa(
λ,
k) and prove their equivalence in off-line updating. ► We provide the upper bound of the error of the action-value function in tabular Sarsa(
λ,
k) when solving deterministic problems. ► Sarsa(
λ,
k) outperforms the Industrial Method and any individual action.
Semiconductor test scheduling problem is a variation of reentrant unrelated parallel machine problems considering multiple resource constraints, intricate {product, tester, kit, enabler assembly} eligibility constraints, sequence-dependant setup times, etc. A multi-step reinforcement learning (RL) algorithm called Sarsa(
λ,
k) is proposed and applied to deal with the scheduling problem with throughput related objective. Allowing enabler reconfiguration, the production capacity of the test facility is expanded and scheduling optimization is performed at the bottom level. Two forms of Sarsa(
λ,
k), i.e. forward view Sarsa(
λ,
k) and backward view Sarsa(
λ,
k), are constructed and proved equivalent in off-line updating. The upper bound of the error of the action-value function in tabular Sarsa(
λ,
k) is provided when solving deterministic problems. In order to apply Sarsa(
λ,
k), the scheduling problem is transformed into an RL problem by representing states, constructing actions, the reward function and the function approximator. Sarsa(
λ,
k) achieves smaller mean scheduling objective value than the Industrial Method (IM) by 68.59% and 76.89%, respectively for real industrial problems and randomly generated test problems. Computational experiments show that Sarsa(
λ,
k) outperforms IM and any individual action constructed with the heuristics derived from the existing heuristics or scheduling rules. |
---|---|
ISSN: | 0377-2217 1872-6860 |
DOI: | 10.1016/j.ejor.2011.05.052 |