find_each_with_order

上一篇blog,分析了一下find_each的源码,后来又查了一下解决的办法,这里记录一下找到的几种可以代替find_each的方法。

第一种 先排好序,获取到已经排好序的ids数组,然后对数组分组执行

1
2
3
4
5
6
7
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
# Do things with thing
end
end

第二种 通过指定好数组区间,然后分组获取,原理和方法一相同,代码如下

1
2
3
4
5
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end

第三种 根据find_in_batches,写了一个新的方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
module ActiveRecord
module FindInBatchesWithOrder
def find_in_batches_with_order(options = {})
relation = self
# we have to be explicit about the options to ensure proper ordering and retrieval
direction = options.delete(:direction) || (arel.orders.first.try(:ascending?) ? :asc : nil) || (arel.orders.first.try(:descending?) ? :desc : nil) || :desc
start = options.delete(:start)
batch_size = options.delete(:batch_size) || 1000
# try to deduct the property_key, but safer to specificy directly
property_key = options.delete(:property_key) || arel.orders.first.try(:value).try(:name)
sanitized_key = ActiveRecord::Base.connection.quote_column_name(property_key)
relation = relation.limit(batch_size)
# in strictmode, we return records with same values as the last record of the last batch
strict_mode = options.delete(:strict_mode) || true
records = start ? (direction == :desc ? relation.where("#{sanitized_key} <= ?", start).to_a : relation.where("#{sanitized_key} >= ?", start).to_a) : relation.to_a
while records.any?
records_size = records.size
yield records
break if records_size < batch_size
start = records.last.try(property_key)
records = strict_mode ? (direction == :desc ? relation.where("#{sanitized_key} <= ?", start).to_a : relation.where("#{sanitized_key} >= ?", start).to_a) : (direction == :desc ? relation.where("#{sanitized_key} < ?", start).to_a : relation.where("#{sanitized_key} > ?", start).to_a)
end
end
# note that in strict mode we might itereate perpetually if the overlap in values is too high in relation to the batch size
def find_each_with_order(options = {})
last_record = nil
find_in_batches_with_order(options) do |records|
records.each do |record|
# we need to find the last record of the previous batch
next if last_record and (record != last_record)
if last_record
last_record = nil
next
end
yield record
end
last_record = records.last
end
end
end
class Relation
include FindInBatchesWithOrder
end
end

可以通过find_each_with_order来调用,源代码在这里,源代码不能通过order(‘id desc’)来排序,我在原有的基础上修改了一下代码,在这里

参考链接
https://github.com/nambrot/ar-find-in-batches-with-order
https://github.com/nambrot/ar-find-in-batches-with-order/blob/master/lib/ar-find-in-batches-with-order.rb