In this thesis, an algorithm for learning subsequential transducers is presented from two different perspectives: as an extension of Angluin's algorithm for learning deterministic finite automata and as an instantiation of a more generic categorical algorithm valid for a larger class of automata. The adopted categorical approach considers automata as functors from a category representing words to a certain output category. Some sufficient properties for yielding the existence of minimal automata are presented, together with some additional hypotheses relative to termination to ensure the correctness of the generic algorithm. Remarkably, the conditions required in Angluin’s original algorithm and in its extended version for subsequential transducers naturally arise as the generic categorical algorithm is instantiated with the proper output categories.
It is not uncommon to understand facts, processes and results better by looking at them from above: learning is not an exception. Learning is a crucial area in computer science, especially in artificial intelligence: knowing how to deal with communication, mistakes and experience plays an essential role to progress in learning. But even more important is learning what makes communication possible: a language. A language can be initially thought as a subset of a set of words over an alphabet, always supposed to be finite.The relation between languages and automata has become clearer and clearer in the last decades, since Noam Chomsky gave a mathematical model of a grammar in the second half of the last century.
A deterministic finite automaton accepts a language, called regular, and for every regular language there exists a deterministic finite automaton being minimal, i.e. with a minimal number of states, that accepts it. An analogous thing happens for subsequential transducers, which are automata more complex than deterministic finite automata: in this case, some partial functions, called subsequential, from a set of words to another over possibly different alphabets are accepted;asubsequential transducer accepts a particular subsequential function and for every subsequential function there exists a minimal subsequential transducer accepting it. This is the reason why learning regular languages and subsequential functions may be pursued by learning the minimal automata accepting them.
Table of Contents
1. Angluin’s learning algorithm
1.1 Deterministic finite automata
1.2 Minimal deterministic finite automata
1.3 Hypothesis deterministic finite automata
1.4 Angluin’s algorithm for DFA
1.5 Efficiency of Angluin’s algorithm for DFA
1.6 A running DFA example
2. Learning subsequential transducers
2.1 Subsequential transducers
2.2 Mathematical background for subsequential transducers
2.3 Minimal subsequential transducers
2.4 Hypothesis subsequential transducers
2.5 The learning algorithm for SST
2.6 Correctness and termination of the learning algorithm for SST
2.7 Efficiency of the learning algorithm for SST
2.8 A running SST example
3. A categorical approach for minimizing automata
3.1 Factorization systems
3.2 Languages and automata as functors
3.3 Minimization of automata
4. A categorical approach for learning automata
4.1 Hypothesis automata
4.2 The basic categorical learning algorithm
4.3 The optimized categorical learning algorithm
5. Learning subsequential transducers categorically
5.1 A category for subsequential transducers
5.2 Initial and final subsequential transducers
5.3 Minimality and noetherianity for SST
5.4 The learning algorithm for SST from the categorical algorithm
Research Objectives and Topics
This thesis presents an algorithm for learning subsequential transducers from two distinct perspectives: first as an extension of Angluin's algorithm for deterministic finite automata, and second as a categorical instantiation using the generic FunL*-algorithm. The primary research goal is to demonstrate how category theory formalizes automaton learning and provides a unified framework for minimizing and learning various classes of automata by manipulating output categories.
- Extension of Angluin’s L*-algorithm for subsequential transduction.
- Categorical formulation of automata as functors.
- Unification of learning processes through factorization systems.
- Establishment of sufficient conditions for minimality and termination.
- Optimization of the categorical learning algorithm.
Excerpt from the Book
A categorical approach for learning automata
In this chapter, the generic FunL*-algorithm for learning word automata, originally presented in the research article written by the author and the supervisors [9], is finally provided.
Just as in Angluin’s algorithm, there are a teacher and a learner. Throughout this section, the alphabet A, the output category C and its factorization system (E,M) are fixed and known to both teacher and learner. The teacher knows a language L: OA* → C, hereafter called the target language. The learner wants to find this language, the output of the algorithm being the minimal automaton Min(L) accepting L. The learner can ask the two following kinds of queries, which can be thought as high-level generalizations of Angluin’s original ones in the special case of deterministic automata (see [1]).
Evaluation queries: given a certain word w, what is L(w)?
Equivalance queries: does a certain automaton accept the target language? If it does not, what is a counterexample for it not doing that?
In order to formulate the generic algorithm, the notions of table and hypothesis automaton from Angluin’s original algorithm must be generalized. This is done in Section 4.1. The generic algorithm is provided together with its correctness and termination in Section 4.2. An optimized version of the learning algorithm is finally provided in Section 4.3.
Summary of Chapters
1. Angluin’s learning algorithm: This chapter reviews the original L*-algorithm for inferring minimal deterministic finite automata via membership and equivalence queries.
2. Learning subsequential transducers: This chapter extends the L*-algorithm to learn subsequential transductions, introducing necessary mathematical background and the f*-algorithm.
3. A categorical approach for minimizing automata: This chapter models automata as functors between categories, formalizing minimization using factorization systems.
4. A categorical approach for learning automata: This chapter introduces the generic FunL*-algorithm, which generalizes the learning process into a categorical framework.
5. Learning subsequential transducers categorically: This chapter instantiates the FunL*-algorithm specifically to learn subsequential transducers, demonstrating the power of the categorical approach.
Keywords
Automata Learning, Subsequential Transducers, Angluin’s Algorithm, Categorical Automata Theory, FunL*-algorithm, Deterministic Finite Automata, Factorization Systems, Minimal Automata, Formal Languages, Transduction, Query Learning, Functors, Noetherianity, Output Categories, Hypothesis Automaton.
Frequently Asked Questions
What is the core focus of this thesis?
The work focuses on the categorical interpretation of automaton learning, specifically proposing a generic algorithm (FunL*) that encompasses and extends traditional learning algorithms like Angluin's L*.
What are the primary themes discussed?
The thesis covers the extension of learning algorithms from simple deterministic finite automata to more complex subsequential transducers, using category theory as the unifying mathematical language.
What is the primary scientific goal?
The goal is to provide a minimalistic, category-theoretic framework for machine learning that generalizes earlier approaches and highlights the deep relationship between minimizing automata and learning them.
Which scientific method is primarily employed?
The author uses mathematical category theory, specifically involving Kleisli categories, Monads, and Factorization Systems, to construct and prove the convergence of new learning algorithms.
What is addressed in the main body (Part II)?
Part II develops the categorical framework, defining automata as functors, establishing minimal automata through factorization, and defining the generic FunL*-algorithm.
Which keywords best characterize this research?
Key terms include Automata Learning, Subsequential Transducers, FunL*-algorithm, Categorical Perspective, and Factorization Systems.
How is the "onward" property important for transducers?
Onwardness ensures the uniqueness of minimal subsequential transducers, which is critical for providing a canonical target that the learning algorithm can correctly identify.
What is the role of counterexamples in the learning process?
In both the classic L* and the categorical FunL* algorithms, counterexamples provided by the teacher serve to force the learner to refine their hypothesis and increase the set of states (Q) or constraints (T) until the correct target language is captured.
- Arbeit zitieren
- Riccardo Stabile (Autor:in), 2024, Learning Subsequential Transducers. A Categorical Approach, München, GRIN Verlag, https://www.grin.com/document/1467389