我使用 Clickhouse
进行 JSON
数据清洗,代码如下:
SELECT *
FROM
(
WITH JSONExtract(
field,
'Tuple(transactions Nested(hash String, blockNumber String, value String, logs Nested(address String, data String, topics Array(String))))'
) AS parsed_json
SELECT
untuple(arrayJoin(tupleElement(parsed_json, 'transactions'))) as tx
FROM
jsonTemp jt
LIMIT 5
)
LEFT ARRAY JOIN `tx.4`;
查询结果如下:
我使用以下代码创建了表:
CREATE TABLE logsTemp
(
txHash String,
txblockNumber String,
value String,
txlogs Tuple
(
address String,
txlogsData String,
topics Array(String)
)
)
ENGINE = MergeTree
ORDER BY txblockNumber
使用以下代码执行插入操作:
INSERT INTO logsTemp
SELECT *
FROM
(
SELECT *
FROM
(
WITH JSONExtract(
field,
'Tuple(transactions Nested(hash String, blockNumber String, value String, logs Nested(address String, data String, topics Array(String))))'
) AS parsed_json
SELECT
untuple(arrayJoin(tupleElement(parsed_json, 'transactions'))) as tx
FROM
jsonTemp jt
)
LEFT ARRAY JOIN `tx.4`
)
结果发现插入后 txlogsData
缺失,如下:
此处使用 Clickhouse
处于单机部署状态,版本为 22.13.1.1942
1
wangshouh OP 想办法自己解决了,怀疑是 `tuple` 的问题,所以在插入时进行了 `untuple` 操作,修正后的代码如下:
```sql INSERT INTO logsTemp SELECT `tx.1`, `tx.2`, `tx.3`, untuple(`tx.4`) FROM ( SELECT * FROM ( WITH JSONExtract( field, 'Tuple(transactions Nested(hash String, blockNumber String, value String, logs Nested(address String, data String, topics Array(String))))' ) AS parsed_json SELECT untuple(arrayJoin(tupleElement(parsed_json, 'transactions'))) as tx FROM jsonTemp jt ) LEFT ARRAY JOIN `tx.4` ) ``` 但仍不明白为什么 tuple 会导致数据列丢失 ![]( ) |