函子

先决条件

模块

简介

在本教程中，我们将学习如何应用函子以及如何编写函子。我们还将展示一些涉及函子的用例。

顾名思义，函子几乎就像一个函数。但是，函数是在值之间，而函子是在模块之间。函子以模块作为参数，并返回一个模块作为结果。OCaml 中的函子是参数化的模块，不要与数学中的函子混淆。

注意：说明本教程的文件可作为Git 仓库获取。

项目设置

本教程使用Dune构建工具。请确保已安装 3.7 或更高版本。我们首先创建一个新的项目。我们需要一个名为 funkt 的目录，其中包含文件 dune-project、dune 和 funkt.ml。

$ mkdir funkt; cd funkt

将以下内容放在文件 dune-project 中

(lang dune 3.7)
(package (name funkt))

文件 dune 的内容应为

(executable
  (name funkt)
  (public_name funkt)
  (libraries str))

创建一个空文件 funkt.ml。

使用 opam exec -- dune exec funkt 命令检查这是否有效。它不应该做任何事情（空文件是有效的 OCaml 语法），但也不应该失败。语句 libraries str 使 Str 模块（我们稍后将使用它）可用。

使用现有函子：`Set.Make`

标准库包含一个Set模块，用于处理集合。此模块使您能够对集合执行并集、交集和差集等操作。您可以查看Set教程以了解有关此模块的更多信息，但这不是遵循本教程的必要条件。

要为给定的元素类型创建集合模块（允许您使用提供的类型及其关联的函数），需要使用 Set 模块提供的函子 Set.Make。以下是 Set 接口的简化版本

module type OrderedType = sig
  type t
  val compare : t -> t -> int
end

module Make : functor (Ord : OrderedType) -> Set.S

以下是其含义（从底部开始，然后向上）

像函数（由箭头 -> 表示）一样，函子 Set.Make
- 接受一个具有 OrderedType 签名的模块，并且
- 返回一个具有Set.S签名的模块
模块类型 OrderedType 需要一个类型 t 和一个函数 compare，它们用于执行集合元素之间的比较。

注意：大多数集合操作都需要比较元素以检查它们是否相同。为了允许使用用户定义的比较算法，Set.Make 函子接受一个模块，该模块指定元素类型 t 和 compare 函数。例如，像在 Array.sort 中那样将比较函数作为高阶参数传递会增加很多样板代码。将集合操作作为函子提供允许仅指定一次比较函数。

以下是如何使用 Set.Make 的示例

funkt.ml

module StringCompare = struct
  type t = string
  let compare = String.compare
end

module StringSet = Set.Make(StringCompare)

这定义了一个模块 Funkt.StringSet。Set.Make 需要的是

类型 t，这里为 string
允许比较两个类型为 t 的值的函数，这里为 String.compare

注意：类型t必须在参数模块中定义，这里为StringCompare。大多数情况下，如本例所示，t是另一个类型的别名，这里为string。如果另一个类型也称为t，编译器将触发错误“类型缩写t是循环的”。可以通过向类型定义添加nonrec关键字来解决此问题，如下所示

type nonrec t = t

这可以使用匿名模块表达式简化

module StringSet = Set.Make(struct
  type t = string
  let compare = String.compare
end)

模块表达式struct ... end在Set.Make调用中内联。

但是，由于模块String已经定义了

类型名称t，它是string的别名
函数compare的类型为t -> t -> bool，用于比较两个字符串

这可以进一步简化为

module StringSet = Set.Make(String)

在所有版本中，由函子应用Set.Make产生的模块都绑定到名称StringSet，并且它具有签名Set.S。模块StringSet提供字符串集的操作。函数String.compare由StringSet内部使用。

当您运行opam exec -- dune exec funkt时，它不会执行任何操作，但也不应该失败。

让我们向funkt.ml添加一些代码，使其执行某些操作。

funkt.ml

module StringSet = Set.Make(String)

let _ =
  In_channel.input_lines stdin
  |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
  |> StringSet.of_list
  |> StringSet.iter print_endline

以下是这段代码的工作原理

In_channel.input_lines：从标准输入读取文本行
List.concat_map：将行拆分为单词并生成单词列表
StringSet.of_list : string list -> StringSet.t：将单词列表转换为集合
StringSet.iter : StringSet.t -> unit：显示集合的元素

函数StringSet.of_list和StringSet.iter在函子的应用结果中可用。

$ opam exec -- dune exec funkt < dune
executable
libraries
name
public_name
str
funkt

Set中没有重复项。因此，字符串"funkt"仅显示一次，尽管它在dune文件中出现了两次。

使用标准库函子扩展模块

使用include语句，以下是如何公开Set.Make(String)创建的模块的另一种方法

funkt.ml

module String = struct
  include String
  module Set = Set.Make(String)
end

let _ =
  stdin
  |> In_channel.input_lines
  |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
  |> String.Set.of_list
  |> String.Set.iter print_endline

这允许用户看似用子模块Set扩展模块String。使用opam exec -- dune exec funkt < dune检查行为。

使用函子参数化模块

标准库中的函子

函子几乎是一个模块，只是它需要应用于一个模块。这会将其转换为模块。从这个意义上说，函子允许模块参数化。

标准库提供的集合、映射和哈希表就是这种情况。它就像函子和开发者之间的一份契约。

如果您提供了一个模块，实现了参数接口中描述的预期内容
函子将返回一个模块，实现了结果接口中描述的承诺内容

以下是函子Set.Make和Map.Make期望的模块签名

module type OrderedType = sig
  type t
  val compare : t -> t -> int
end

以下是函子Hashtbl.Make期望的模块签名

module type HashedType = sig
  type t
  val equal : t -> t -> bool
  val hash : t -> int
end

函子Set.Make、Map.Make和Hashtbl.Make返回满足接口Set.S、Map.S和Hashtbl.S（分别）的模块，它们都包含一个抽象类型t和关联函数。有关它们提供的详细信息，请参阅文档

编写您自己的函子

编写函子的一个原因是提供一个参数化的数据结构。这与其他数据结构上的Set和Map相同。在本节中，我们以堆为例。

有许多种堆数据结构。例如二叉堆、左式堆、二项堆或斐波那契堆。

本文档中未讨论用于实现堆的数据结构和算法。

实现任何堆的共同前提是能够比较它们包含的元素。这与Set.Make和Map.Make的参数具有相同的签名

module type OrderedType = sig
  type t
  val compare : t -> t -> int
end

使用这样的参数，堆实现必须至少提供以下接口

module type HeapType = sig
  type elt
  type t
  val empty : t
  val is_empty : t -> bool
  val insert : t -> elt -> t
  val merge : t -> t -> t
  val find : t -> elt
  val delete : t -> t
end

堆实现可以表示为从OrderedType到HeapType的函子。每种堆将是一个不同的函子。

这是一个可能的实现的框架

heap.ml

module type OrderedType = sig
  type t
  val compare : t -> t -> int
end

module type S = sig
  type elt
  type t
  val empty : t
  val is_empty : t -> bool
  val insert : t -> elt -> t
  val merge : t -> t -> t
  val find : t -> elt
  val delete : t -> t
end

module Binary(Elt: OrderedType) : S = struct
  type elt (* Add your own type definition *)
  type t (* Add your own type definition *)
  (* Add private functions here *)
  let empty = failwith "Not yet implemented"
  let is_empty h = failwith "Not yet implemented"
  let insert h e = failwith "Not yet implemented"
  let merge h1 h2 = failwith "Not yet implemented"
  let find h = failwith "Not yet implemented"
  let delete h = failwith "Not yet implemented"
end

这里，二叉堆是唯一建议的实现。这可以通过为每个实现添加一个函子来扩展到其他实现，例如Heap.Leftist、Heap.Binomial、Heap.Fibonacci等。

使用函子注入依赖项

模块之间的依赖关系

这是funkt程序的新版本

funkt.ml

module StringSet = Set.Make(String)

module IterPrint : sig
  val f : string list -> unit
end = struct
  let f = List.iter (fun s -> Out_channel.output_string stdout (s ^ "\n"))
end

let _ =
  stdin
  |> In_channel.input_lines
  |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
  |> StringSet.of_list
  |> StringSet.elements
  |> IterPrint.f

它嵌入了一个额外的IterPrint模块，该模块公开了一个类型为string list -> unit的单一函数f，并且有两个依赖项

通过List.iter和f的类型使用模块List
通过Out_channel.output_string使用模块Out_channel

使用opam exec -- dune exec funkt < dune检查程序的行为。

依赖注入

依赖注入是一种参数化依赖项的方法。

以下是使用此技术的IterPrint模块的重构

iterPrint.ml

module type Iterable = sig
  type 'a t
  val iter : ('a -> unit) -> 'a t -> unit
end

module type S = sig
  type 'a t
  val f : string t -> unit
end

module Make(Dep: Iterable) : S with type 'a t := 'a Dep.t = struct
  let f = Dep.iter (fun s -> Out_channel.output_string stdout (s ^ "\n"))
end

模块IterPrint被重构为一个函子，它将提供函数iter的模块作为参数。约束with type 'a t := 'a Dep.t表示来自参数Dep的类型t替换结果模块中的类型t。这允许f的类型使用Dep的t类型。通过此重构，IterPrint只有一个依赖项。在编译时，尚未提供函数iter的任何实现。

注意：OCaml 接口文件（.mli）必须是模块，而不是函子。函子必须嵌入到模块中。因此，习惯上将它们称为Make。

funkt.ml

module StringSet = Set.Make(String)
module IterPrint = IterPrint.Make(List)

let _ =
  stdin
  |> In_channel.input_lines
  |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
  |> StringSet.of_list
  |> StringSet.elements
  |> IterPrint.f

依赖项List在编译模块Funkt时被注入。观察使用IterPrint的代码没有变化。使用opam exec -- dune exec funkt < dune检查程序的行为。

替换依赖项

现在，替换IterListPrint内部iter的实现不再是重构；它是另一个带有另一个依赖项的函子应用。这里，Array替换了List

funkt.ml

module StringSet = Set.Make(String)
module IterPrint = IterPrint.Make(Array)

let _ =
  stdin
  |> In_channel.input_lines
  |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
  |> StringSet.of_list
  |> StringSet.elements
  |> Array.of_list
  |> IterPrint.f

使用opam exec -- dune exec funkt < dune检查程序的行为。

注意：IterPrint.Make接收和返回的模块都具有类型t。with type ... := ...约束表明这两个类型t是相同的。这使得来自注入的依赖项和结果模块的函数使用完全相同的类型。当参数的包含类型未被结果模块公开（即，当它是一个实现细节时），with type约束不是必需的。

命名和作用域

with type约束统一了函子的参数和结果模块中的类型。我们在上一节中使用了它。本节介绍此约束的命名和作用域机制。

简单来说，我们可能已将Iter.Make定义如下

module Make(Dep: Iterable) : S = struct
  type 'a t = 'a Dep.t
  let f = Dep.iter (fun s -> Out_channel.output_string stdout (s ^ "\n"))
end

如果函数f未使用，则项目将编译成功。

但是，由于Make被调用以在funkt.ml中创建模块IterPrint，因此项目无法编译，并显示以下错误消息

5 | ..stdin
6 |   |> In_channel.input_lines
7 |   |> List.concat_map Str.(split (regexp "[ \t.,;:()]+"))
8 |   |> StringSet.of_list
9 |   |> StringSet.elements
Error: This expression has type string list
       but an expression was expected of type string IterPrint.t

在函子外部，不知道type 'a t设置为Dep.t。在funkt.ml中，IterPrint.t显示为由Make的结果公开的抽象类型。这就是为什么需要with type约束的原因。它传播了IterPrint.t与Dep.t（在本例中为List.t）相同的知识。

使用with type约束的类型不会被函子主体内的定义遮蔽。在示例中，Make函子可以重新定义如下

module Make(Dep: Iterable) : S with type 'a t := 'a Dep.t = struct
  type 'a t = LocalType
  let g LocalType = "LocalType"
  let f = Dep.iter(fun s ->
    Out_channel.output_string stdout (g LocalType ^ "\n");
    Out_channel.output_string stdout (s ^ "\n"))
end

在上面的示例中，来自with type的t优先于局部t，后者只有局部作用域。不要过于频繁地遮蔽名称，因为这会使代码更难以理解。

编写一个函子来扩展模块

在本节中，我们定义了一个函子以相同的方式扩展多个模块。这与使用标准库函子扩展模块中的想法相同，只是我们自己编写了函子。

在此示例中，我们使用函数scan_left扩展了List和Array模块。它与fold_left几乎相同，只是它返回所有中间值，而不是像fold_left那样返回最后一个值。

使用以下文件创建一个新的目录

dune-project

(lang dune 3.7)

dune

(library (name scanLeft))

scanLeft.ml

module type LeftFoldable = sig
  type 'a t
  val fold_left : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b
  val of_list : 'a list -> 'a t
end

module type S = sig
  type 'a t
  val scan_left : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b t
end

module Make(F: LeftFoldable) : S with type 'a t := 'a F.t = struct
  let scan_left f b u =
    let f (b, u) a =
      let b' = f b a in
      (b', b' :: u) in
    u |> F.fold_left f (b, []) |> snd |> List.rev |> F.of_list
end

运行dune utop命令。进入顶级后，输入以下命令。

# module Array = struct
    include Stdlib.Array
    include ScanLeft.Make(Stdlib.Array)
  end;;

# module List = struct
    include List
    include ScanLeft.Make(struct
      include List
      let of_list = Fun.id
    end)
  end;;

# Array.init 10 Fun.id |> Array.scan_left ( + ) 0;;
- : int array = [|0; 1; 3; 6; 10; 15; 21; 28; 36; 45|]

# List.init 10 Fun.id |> List.scan_left ( + ) 0;;
- : int list = [0; 1; 3; 6; 10; 15; 21; 28; 36; 45]

模块Array和List显示为添加了Array.scan_left和List.scan_left。为简洁起见，此处未显示前两个顶级命令的输出。

有状态模块的初始化

模块可以保存状态。函子可以提供一种初始化有状态模块的方法。例如，以下是如何将随机数生成种子作为状态处理的一种可能方法

random.ml

module type SeedType : sig
  val v : int array
end

module type S : sig
  val reset_state : unit -> unit

  val bits : unit -> int
  val bits32 : unit -> int32
  val bits64 : unit -> int64
  val nativebits : unit -> nativeint
  val int : int -> int
  val int32 : int32 -> int32
  val int64 : int64 -> int64
  val nativeint : nativeint -> nativeint
  val full_int : int -> int
  val float : float -> float
  val bool : unit -> bool
end

module Make(Seed: SeedType) : S = struct
  let state = Seed.v |> Random.State.make |> ref
  let reset_state () = state := Random.State.make Seed.v

  let bits () = Random.State.bits !state
  let bits32 () = Random.State.bits32 !state
  let bits64 () = Random.State.bits64 !state
  let nativebits () = Random.State.nativebits !state
  let int = Random.State.int !state
  let int32 = Random.State.int32 !state
  let int64 = Random.State.int64 !state
  let nativeint = Random.State.nativeint !state
  let full_int = Random.State.full_int !state
  let float = Random.State.float !state
  let bool () = Random.State.bool !state
end

创建此文件并启动utop。

# #mod_use "random.ml";;

# module R1 = Random.Make(struct let v = [|0; 1; 2; 3|] end);;

# module R2 = Random.Make(struct let v = [|0; 1; 2; 3|] end);;

# R1.bits ();;
- : int = 75783189

# R2.bits ();;
- : int = 75783189

# R1.bits ();;
- : int = 774473149

# R1.reset_state ();;
- : unit = ()

# R2.bits ();;
- : int = 774473149

# R1.bits ();;
- : int = 75783189

模块R1和R2使用相同的状态创建；因此，对R1.bits和R2.bits的第一次调用返回相同的值。

对R1.bits的第二次调用将R1的状态移动一步并返回相应的位。对R1.reset_state的调用将R1的状态设置为其初始值。

再次调用R2.bits表明模块之间没有共享状态。否则，将返回第一次调用bits时的值。

第三次调用R1.bits返回的结果与第一次调用相同，这证明状态确实已重置。

结论

函子应用的工作方式与函数应用基本相同：传递参数并获取结果。不同之处在于我们传递的是模块而不是值。除了方便之外，它还支持一种设计方法，这种方法不仅可以将关注点分离到孤岛中（这由模块启用），还可以将关注点分层堆叠。

仍然需要帮助？

咨询OCaml社区

提交问题

帮助改进我们的文档

所有OCaml文档都是开源的。发现错误或不清楚的地方？提交拉取请求。

贡献

函子

先决条件

简介

项目设置

使用现有函子：Set.Make

使用标准库函子扩展模块

使用函子参数化模块

标准库中的函子

编写您自己的函子

使用函子注入依赖项

命名和作用域

编写一个函子来扩展模块

有状态模块的初始化

结论

仍然需要帮助？

帮助改进我们的文档

使用现有函子：`Set.Make`