hbase倒叙scan可能会遇到的坑

hbase

字数统计: 731阅读时长: 2 min

 2020/11/08   Share

写在前面

hbase的Rowkey根据不同业务需求可能设计成各式各样的格式，因为业务复杂多变，所以除了最基础的那些原则，另外一般是根据搜索频率、搜索有效性质来排序拼接成rowkey，在最近开发中设计了一个操作日志表，设计的rowkey为：操作Id+时间戳+模块。这样做可以快速搜索到某个人的操作日志，但是在实际测试中却遇到了一个查询数据丢失的问题。

正文

首先去hbase，直接shell执行scan：

1	scan 'user_log';

得到结果发现hbase本身并没有数据丢失，如下：

scan结果

而且可以看到scan的结果是按照时间戳正序排序的。那么再跟踪代码中看到构建scan start key和end key的时候是这么构建的：

//因为设计的rowkey是正序排序，业务需求是按照倒叙搜索，所以这里设置成倒叙搜索，endTIme构建startKey，startTime构建endKey
scan.setReversed(true);
startKey = generateUserLogScanRowKey(projectId, condition.getEndTime());
stopKey = generateUserLogScanRowKey(projectId, condition.getStartTime());



//generateUserLogScanRowKey
public static byte[] generateUserLogScanRowKey(String projectId, long timestamp) throws Exception {
        if (StringUtils.isEmpty(projectId) || timestamp == 0L) {
            throw new Exception("projectId and timestamp不能为空");
        }

        // 一共3个分区
        int salt = Math.abs(projectId.hashCode() % 9);

        byte[] projectBytes = BizByteUtil.string2Bytes(projectId);
        int projectLength = projectBytes.length;
        byte[] timeBytes = HbaseResourceUtils.serializeResourceTimestamp(timestamp);
        int timeLength = timeBytes.length;

        byte[] rowKey = new byte[projectLength+timeLength+3];

        rowKey[0] = (byte)salt;
        System.arraycopy(projectBytes, 0, rowKey, 1, projectLength);
        rowKey[projectLength+1] = ROW_KEY_INVISIBLE_SEPARATOR_CHARACTER;
        System.arraycopy(timeBytes, 0, rowKey, projectLength+2, timeLength);
        rowKey[projectLength+timeLength+2] = ROW_KEY_INVISIBLE_SEPARATOR_CHARACTER;

        return rowKey;
    }

经过分析发现了问题所在：可以看到构建代码中rowkey仅仅构建了操作id和时间戳，而真实的rowkey却是三位,少了一位,那么从start开始扫的时候绝对会遗漏数据，为了方便理解，画个简单的图来展示：

scan结果

那么该如何解决呢？这里想到了一个解决思路：扩大startkey的最后一位，如1/103变为1/104，并扫描后判断是否为需要数据，不是则抛弃。

具体代码实现如下：

// 因为对startKey进行放大了，对于hbase返回的rowkey需要进行验证是否符合查询条件
biggerScanRowKey(startKey);

//biggerScanRowKey
public static void biggerScanRowKey(byte[] rowKey) {
        for (int i=rowKey.length-1; i>=0; --i) {
            if (rowKey[i] != Byte.MAX_VALUE) {
                rowKey[i]++;
              //放大最后一位后，break
                break;
            }
        }
    }
    
  //处理结果
  Result result = resultScanner.next();
        while (null != result) {
        //如果startKey被放大了，那么验证真实rowkey和搜索rowkey大小，大于的话继续往下搜索
            if (shouldValidateBigger && !ResourceManagerImpl.validateRowKey(result.getRow(), scanner.getBiggerBytesOriginal())) {
            //大于的话继续往下搜索
                result = resultScanner.next();
                continue;
            }

            //否则：找到正确数据的开始，开始业务处理。
        }