Scan查询过程
步骤1. HTable.getScanner()
- 关掉之前在server端打开的Scanner,防止server端过多的资源占用
- client端:ScannerCallable.call() -> close(scannerId)
- server端:HRegionServer.close(scannerId)
- 根据localStartKey在指定region上打开scanner
- client端:ScannerCallable.call() -> openScanner(regionName,scan)
- server端:
- 创建RegionScanner
- 把scanner加入server的map集合
- 为新生成的scanner创建Lease
步骤2. ResultScanner.next()
- 从client端缓存中或者server端获取kv
- client端:cache.poll() 或者 next(scannerId, caching)
- server端:HRegionServer.next(scannerId,nbRows)
- RegionScannerImpl.nextRaw(List outResults, int limit, String metric)
Scanner的种类
- Server端:InternalScanner & KeyValueScanner
- Client端:ResultScanner
- 其他(HFileScanner、MetaScanner)
1. InternalScanner
- 是server端内部较高层次的scanner抽象,实现类:
- RegionScannerImpl
- StoreScanner
- KeyValueHeap
- 接口包括:
- next(),返回KeyValue List
- close(),关闭scanner并释放server段资源
2. KeyValueScanner
- 是底层的scanner,用来获取KeyValue,实现类有:
- StoreScanner
- StoreFileScanner
- KeyValueHeap
- NonLazyKeyValueScanner 每次都会做doRealSeek(forward)?reseek(kv):seek(kv);
- MemStoreScanner
- StoreScanner
- KeyValueHeap
- 常用接口:
- peek()
- next()
- seek() 定位到指定的KeyValue
- reseek() 从当前scanner位置之后的定位到KeyValue
- requestSeek()
KeyValueHeap
- 在Region层面用来组合访问多个store,在Store层面用来组合访问memstore和storefiles
- PriorityQueue存储Scanner,KVScannerComparator对scanner进行排序,先比较peak的kv,再比较SequenceID
- MemStoreScanner = Long.MAX_VALUE
- StoreFileScanner = SequenceID
- StoreScanner = 0
- pollRealKV()从PriorityQueue中寻找可以做real seek的scanner
ScanQueryMatcher
- 在查找KV过程中确定是否包含当前KV,以及接下来如何操作
- StoreScanner.getScanners(matcher) -> StoreFileScanner
- MatchCode的十种状态
- INCLUDE
- INCLUDE_AND_SEEK_NEXT_ROW : moreRowsMayExistAfter(),getKeyForNextRow()
- INCLUDE_AND_SEEK_NEXT_COL : getKeyForNextColumn()
- DONE
- DONE_SCAN
- SEEK_NEXT_ROW : moreRowsMayExistAfter()
- SEEK_NEXT_COL : getKeyForNextColumn()
- SKIP : heap.next()
- SEEK_NEXT_USING_HINT : getNextKeyHint()
- NEXT(没用到): Do not include, jump to next StoreFile or memstore (in time order)
- public MatchCode match(KeyValue kv)
- 比较是否是相同row
- 比较版本是否过期
- 检查是否被删除
- 检查是否在time range
- Filters过滤
- ColumnTracker检查
- ColumnTracker
- ScanWildcardColumnTracker
- ExplicitColumnTracker
- DeleteTracker
- ScanDeleteTracker
- 针对删除的查询策略
- retainDeletesInOutput
- keepDeletedCells=true,不会再做删除检查
- seePastDeleteMarkers