Rust实现函数式列表

本文有参考(Learn Rust With Entirely Too Many Linked Lists)[https://rust-unofficial.github.io/too-many-lists/index.html],但原文中跳过了很多难处理的情况,比如用Option代替代数数据类型,链表和节点分开定义,显然这显然很不函数式。

Rust和Haskell非常相似,比如有ADT、模式匹配,trait/impl和class/instance几乎是一样的。
列表是函数式语言中最常用的数据结构,下面尝试在Rust中实现他。

首先

在Haskell中List大概是这样:

1
data List a = Cons a (List a) | Nil

翻译成Rust:

1
2
3
4
enum List<T>{
Cons(T,List<T>),
Nil
}
1
2
3
4
5
6
7
8
error[E0072]: recursive type `List` has infinite size
--> src/lib.rs:6:1
|
6 | enum List<T>{
| ^^^^^^^^^^^^ recursive type has infinite size
7 | Cons(T,List<T>),
| ------- recursive without indirection
|

Rust需要在编译期知道知道结构体/枚举的大小,这里是递归的数据结构,而引用或是指针大小是固定的。涉及到引用就必须加上lifetime,然后它就变成了这样。

1
2
3
4
enum List<'a,T>{
Cons(T,&'a List<'a,T>),
Nil
}

栈上引用可能会出现悬挂指针的问题。

cannot return value referencing temporary value
returns a value referencing data owned by the current function

Box

A pointer type for heap allocation.
Box<T>, casually referred to as a ‘box’, provides the simplest form of heap allocation in Rust. Boxes provide ownership for this allocation, and drop their contents when they go out of scope.

Box是在堆上分配的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
enum List<T> {
Cons(T, Box<List<T>>),
Nil,
}

impl List<u32> {
fn range(l: i32, h: i32) -> List<i32> {
if l < h {
List::Cons(l, Box::new(List::range(l + 1, h)))
} else {
List::Nil
}
}
}

迭代器

Iterator需要实现的方法签名:

1
fn next(&mut self) -> Option<Self::Item>;

对应不同类型的元素(Self::Item),但结构体本身还是必要,因为需要保存中间遍历的位置。

  • T -> IntoIter
  • &T -> Iter
  • &mut T -> IterMut

先从IntoIter开始,初步实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct IntoIter<T>(List<T>);

impl <T> Iterator for IntoIter<T>{
type Item = T;

fn next(&mut self) -> Option<Self::Item> {
match self.0 {
List::Cons(item,rest) => {
self.0 = *rest;
Some(item)
}
List::Nil => None
}
}
}
1
2
3
4
5
6
7
8
9
10
11
error[E0507]: cannot move out of `self.0.1` which is behind a mutable reference
--> src/main.rs:21:15
|
21 | match self.0 {
| ^^^^^^ help: consider borrowing here: `&self.0`
22 | List::Cons(item,rest) => {
| ----
| |
| data moved here
| move occurs because `rest` has type `std::boxed::Box<List<T>>`, which does not implement the `Copy` trait
^^^^ ^^^^

List,Box均没有实现Copy,所以出现了move,但是引用没有所有权。

std::mem::replace

https://doc.rust-lang.org/std/mem/fn.replace.html

1
2
pub fn replace<T>(dest: &mut T, src: T) -> T
// Moves src into the referenced dest, returning the previous dest value.Neither value is dropped.

先塞一个List::Nil进去把旧值换出来,这样就获得了旧值的所有权。Option的take方法就是replace的包装。

1
2
3
4
5
6
7
8
9
10
11
12
13
impl <T> Iterator for IntoIter<T>{
type Item = T;

fn next(&mut self) -> Option<Self::Item> {
match std::mem::replace(&mut self.0,List::Nil) {
List::Cons(item,rest) => {
self.0 = *rest;
Some(item)
}
List::Nil => None
}
}
}

Iter(也就是Item类型为&T)比较顺畅,因为有引用加上了生命周期,返回值和列表生命周期一致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct Iter<'a,T>(&'a List<T>);

impl <'a,T> Iterator for Iter<'a,T>{
type Item = &'a T;

fn next(&mut self) -> Option<Self::Item> {
match self.0 {
List::Cons(item,rest) => {
self.0 = rest;
Some(item)
}
List::Nil => None
}
}
}

但是对于元素类型为&mut T的迭代器又有新的问题。(像素级拷贝上述代码,就不贴了)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
error[E0495]: cannot infer an appropriate lifetime for pattern due to conflicting requirements
--> src/main.rs:54:24
|
54 | List::Cons(item,rest) =>{
| ^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 52:5...
--> src/main.rs:52:5
|
52 | fn next(&mut self) -> Option<Self::Item> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:54:24
|
54 | List::Cons(item,rest) =>{
| ^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined on the impl at 49:6...
--> src/main.rs:49:6
|
49 | impl<'a,T> Iterator for IterMut<'a,T>{
| ^^
note: ...so that the types are compatible
--> src/main.rs:52:46
|
52 | fn next(&mut self) -> Option<Self::Item> {
| ______________________________________________^
53 | | match self.0 {
54 | | List::Cons(item,rest) =>{
55 | | self.0 = rest.as_mut();
... |
63 | | }
64 | | }
| |_____^
= note: expected `Iterator`
found `Iterator`

error[E0495]: cannot infer an appropriate lifetime for pattern due to conflicting requirements
--> src/main.rs:54:29
|
54 | List::Cons(item,rest) =>{
| ^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 52:5...
--> src/main.rs:52:5
|
52 | fn next(&mut self) -> Option<Self::Item> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:54:29
|
54 | List::Cons(item,rest) =>{
| ^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined on the impl at 49:6...
--> src/main.rs:49:6
|
49 | impl<'a,T> Iterator for IterMut<'a,T>{
| ^^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:55:26
|
55 | self.0 = rest.as_mut();
| ^^^^^^^^^^^^^

这里self.0分裂成了两个不交叉的可变引用(这是合法的),但是后续self.0还依然持有self的可变引用,因此多个可变引用冲突。但报的是生命周期的错😅。

这一次需要替换的是引用,因此需要创建一个空指针,引用和指针但区别在于,引用是合法的,但指针可能是非法的,所以这里把引用替换出来,之后要注意把合法引用还回去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
impl<'a,T> Iterator for IterMut<'a,T>{
type Item = &'a mut T;

fn next(&mut self) -> Option<Self::Item> {
let uninit = MaybeUninit::<&'a mut List<T>>::uninit();
match std::mem::replace(&mut self.0,unsafe{uninit.assume_init()}) {
List::Cons(item,rest) =>{
self.0 = rest.as_mut();
Some(item)
},
nil @ List::Nil => {
self.0 = nil;
None
}
}
}
}

不过既然用了unsafe,直接用std::ptr::read更简洁,而且这里读出来是引用,不用担心重复drop。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
impl<'a,T> Iterator for IterMut<'a,T>{
type Item = &'a mut T;

fn next(&mut self) -> Option<Self::Item> {
match unsafe{std::ptr::read(&mut self.0)} {
List::Cons(item,rest) =>{
self.0 = rest.as_mut();
Some(item)
},
nil @ List::Nil => {
self.0 = nil;
None
}
}
}
}

unsafe

unsafe警察出警!👮‍♀️
因为Rust的强语法限制,Rust的unsafe是非常值得了解的一块内容,而另一块是宏。unsafe也没有想象中那样简单,unsafe一套万事大吉。而前面两个例子比较简单但原因是涉及到的都是引用,不用担心忘记释放和重复释放的问题。

让我们重新回顾一下引用和所有权规则。

  1. 一个值同时可以有多个不可变引用
  2. 一个值同时可以有一个可变引用,可变引用存在的同时不能有不可变引用
  3. 引用的生命周期受生命周期限制

而生命周期就是持有值的形式参数或局部变量的作用域,因为被包含在作用域内,由所有者释放就可以了,与此同时也不能超出生命周期。
而可变引用不同的地方在于他在赋予新值的时候,会释放旧值。

现在给他加一些小的函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
impl List<u32> {
fn range(l: u32, h: u32) -> List<u32> {
let mut r = List::Nil;
let mut t = h;
while l <= t {
r = Cons(t,Box::new(r));
t -= 1;
}
r
}
}

impl<A> List<A> {
fn into_iter(self) -> IntoIter<A> {
IntoIter(self)
}

fn iter(&self) -> Iter<A> {
Iter(self)
}

fn iter_mut(&mut self) -> IterMut<A> {
IterMut(self)
}
}

1
2
3
4
fn main() {
let x = List::range(1, 100_000_000).into_iter().fold(0, |_,a| a);
print!("{}", x)
}
1
2
3
    Finished dev [unoptimized + debuginfo] target(s) in 0.46s
Running `target/debug/temp`
100000000%

没有问题

1
2
3
4
fn main() {
let x = List::range(1, 100_000_000).iter().fold(0, |_,a| *a);
print!("{}", x)
}

💥Boom

1
2
3
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1] 15047 abort (core dumped) cargo run

调试一下
debug
递归drop爆栈,而第一段不爆是因为手动展开了

After drop is run, Rust will recursively try to drop all of the fields of self.
This is a convenience feature so that you don’t have to write “destructor boilerplate” to drop children. If a struct has no special logic for being dropped other than dropping its children, then it means Drop doesn’t need to be implemented at all!
There is no stable way to prevent this behavior in Rust 1.0.

如果手动实现Drop,但这里链表和节点是一体的。而对List实现了Drop之后无法通过模式匹配取值,只能取引用,这直接导致Drop实现异常复杂。(这是原博跳过的地方,而且原博既然提到函数式,没有理由用一个结构体来保存链表的头)

由于实现Drop之后无法move,这里用std::ptr::read。read是复制,因此是合法的。这里要十分注意,不然可能会出现重复释放。要么调用forget,不做清理,要么用std::ptr::write写回去,而write是直接覆盖,不会释放旧值。

另外,栈上的变量本身就不需要显式释放,函数调用完毕,栈帧收缩自动就释放了,所以这里核心是堆上的结构。

1
2
3
4
5
6
7
8
9
10
11
12
impl <T> Drop for List<T> {
fn drop(&mut self) {
unsafe{
let mut t = std::ptr::read(self);
while let Cons(_,xs) = &t {
let next = std::ptr::read(xs);
std::ptr::write(&mut t, *next);
}
std::ptr::write(self, t);
}
}
}

实现Drop之后IntoIter也需要修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
impl<T> Iterator for IntoIter<T> {
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
match &self.0 {
Cons(item, rest) => {
unsafe{
let x = std::ptr::read(item);
let xs = std::ptr::read(rest);
std::ptr::write(&mut self.0, *xs);
Some(x)
}
}
Nil => None,
}
}
}