mongodb数据导出CSV文件

alt

要不是因为我,你能有今天?
要不是我伤害你,你能成长?

需求

产品需要分析达人文章的标签,需要把2019年1月1号到现在的标签,从mongo导出来,导出格式为csv,查询条件如下:

1
db.content_medium_hismatch.find({"content_type_id":"28","update_time":{"$gte":1546272000000},"is_delete":0}, {"content_id":1,"content_tags":1});

操作

使用MongoDB中的mongoexport命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
mongoexport --help
Usage:
mongoexport

Export data from MongoDB in CSV or JSON format.

See http://docs.mongodb.org/manual/reference/program/mongoexport/ for more information.

general options:
--help print usage
--version print the tool version and exit

verbosity options:
-v, --verbose= more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)
--quiet hide all log output

connection options:
-h, --host= mongodb host to connect to (setname/host1,host2 for replica sets)
--port= server port (can also use --host hostname:port)

ssl options:
--ssl connect to a mongod or mongos that has ssl enabled
--sslCAFile= the .pem file containing the root certificate chain from the certificate authority
--sslPEMKeyFile= the .pem file containing the certificate and key
--sslPEMKeyPassword= the password to decrypt the sslPEMKeyFile, if necessary
--sslCRLFile= the .pem file containing the certificate revocation list
--sslAllowInvalidCertificates bypass the validation for server certificates
--sslAllowInvalidHostnames bypass the validation for server name
--sslFIPSMode use FIPS mode of the installed openssl library

authentication options:
-u, --username= username for authentication
-p, --password= password for authentication
--authenticationDatabase= database that holds the user's credentials
--authenticationMechanism= authentication mechanism to use

namespace options:
-d, --db= database to use
-c, --collection= collection to use

output options:
-f, --fields=[,]* comma separated list of field names (required for exporting CSV) e.g. -f "name,age"
--fieldFile= file with field names - 1 per line
--type= the output format, either json or csv (defaults to 'json') (default: json)
-o, --out= output file; if not specified, stdout is used
--jsonArray output to a JSON array rather than one object per line
--pretty output JSON formatted to be human-readable
--noHeaderLine export CSV data without a list of field names at the first line

querying options:
-q, --query= query filter, as a JSON string, e.g., '{x:{$gt:1}}'
--queryFile= path to a file containing a query filter (JSON)
-k, --slaveOk allow secondary reads if available (default true) (default: false)
--readPreference=| specify either a preference name or a preference json object
--forceTableScan force a table scan (do not use $snapshot)
--skip= number of documents to skip
--limit= limit the number of documents to export
--sort= sort order, as a JSON string, e.g. '{x:1}'
--assertExists if specified, export fails if the collection does not exist (default: false)

操作命令如下:

1
/usr/local/mongodb/bin/mongoexport --port 29001 --host=localhost -user=*** --password=***** --authenticationDatabase=**** --db=db --collection=collection --query='{"content_type_id":"28","update_time":{"$gte":1546272000000},"is_delete":0}, {"content_id":1,"content_tags":1}' --type=csv --out=***.csv

1
2
2019-05-17T13:18:52.325+0800    error validating settings: query '[123 34 99 111 110 116 101 110 116 95 116 121 112 101 95 105 100 34 58 34 50 56 34 44 34 117 112 100 97 116 101 95 116 105 109 101 34 58 123 34 36 103 116 101 34 58 49 53 52 54 50 55 50 48 48 48 48 48 48 125 44 34 105 115 95 100 101 108 101 116 101 34 58 48 125 44 32 123 34 99 111 110 116 101 110 116 95 105 100 34 58 49 44 34 99 111 110 116 101 110 116 95 116 97 103 115 34 58 49 125]' is not valid JSON: invalid character ',' after top-level value
2019-05-17T13:18:52.325+0800 try 'mongoexport --help' for more information

初步判断是导出CSV文件中需要的逗号(,)分割。字段发生错误。语句添加–fields _id,content_id,content_tags

1
/usr/local/mongodb/bin/mongoexport --port 29001 --host=localhost -user=*** --password=***** --authenticationDatabase=**** --db=db --collection=collection --query='{"content_type_id":"28","update_time":{"$gte":1546272000000},"is_delete":0}, {"content_id":1,"content_tags":1}' --type=csv --fields _id,content_id,content_tags --out=***.csv

1
2
`2019-05-17T13:18:54.325+0800    error validating settings: query '[123 34 99 111 110 116 101 110 116 95 116 121 112 101 95 105 100 34 58 34 50 56 34 44 34 117 112 100 97 116 101 95 116 105 109 101 34 58 123 34 36 103 116 101 34 58 49 53 52 54 50 55 50 48 48 48 48 48 48 125 44 34 105 115 95 100 101 108 101 116 101 34 58 48 125 44 32 123 34 99 111 110 116 101 110 116 95 105 100 34 58 49 44 34 99 111 110 116 101 110 116 95 116 97 103 115 34 58 49 125]' is not valid JSON: invalid character ',' after top-level value
2019-05-17T13:18:54.325+0800 try 'mongoexport --help' for more information``bash

开始排查为什么还继续报错。使用 mongo shell 连接到mongodb中查询发现是能获取到数据的。去掉–query 条件之后再执行一次发现是没问题

1
/usr/local/mongodb/bin/mongoexport --port 29001 --host=localhost -user=*** --password=***** --authenticationDatabase=**** --db=db --collection=collection --type=csv --fields _id,content_id,content_tags --out=***.csv

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
2019-05-17T13:24:18.176+0800    connected to: localhost:29001
2019-05-17T13:24:19.167+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:20.167+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:21.168+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:22.167+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:23.168+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:24.168+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:25.167+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
2019-05-17T13:24:26.168+0800 [........................] feeds.content_medium_hismatch 0/138863565 (0.0%)
^C2019-05-17T13:24:26.900+0800 signal 'interrupt' received; forcefully terminating
public-ops-mongodb2 seclogin # ls
content_medium_hismatch.csv
public-ops-mongodb2 seclogin # cat content_medium_hismatch.csv |more
_id,content_id,content_tags
ObjectId(5b0f598eaeeb42d8582db599),1229,"[""奶瓶"",""吃奶"",""吸吮"",""乳头混淆"",""母乳""]"
ObjectId(5b0f598eaeeb42d8582db5c8),1290,"[""衣服"",""二手衣服"",""清洗"",""皮肤"",""细菌""]"
ObjectId(5b0f598eaeeb42d8582db5cd),1297,"[""身材"",""产后妈妈"",""新妈妈"",""裙子"",""衣服""]"
ObjectId(5b0f598eaeeb42d8582db5e1),1325,"[""护肤品"",""哺乳期"",""皮肤"",""健康"",""护肤""]"
ObjectId(5b0f598eaeeb42d8582db5e8),1343,"[""收纳"",""家具"",""奶粉"",""分享"",""三角形""]"
ObjectId(5b0f598eaeeb42d8582db618),1393,"[""背带"",""婴儿"",""带宝宝"",""出门"",""肌肉发育""]"
ObjectId(5b0f598eaeeb42d8582db65e),1465,"[""维生素"",""食物"",""骨骼"",""眼睛"",""帮助""]"
ObjectId(5b0f598eaeeb42d8582db660),1467,"[""晒太阳"",""紫外线"",""维生素"",""黄疸"",""眼睛""]"
ObjectId(5b0f598eaeeb42d8582db666),1472,"[""伞车"",""推车"",""婴儿伞车"",""婴儿推车"",""优点""]"
ObjectId(5b0f598eaeeb42d8582db669),1474,"[""食物"",""味觉发育"",""口味"",""食盐"",""饮食""]"
ObjectId(5b0f598eaeeb42d8582db674),1490,"[""患病"",""父母"",""指甲"",""健康"",""打呼噜""]"
ObjectId(5b0f598eaeeb42d8582db67a),1500,"[""睡眠"",""疾病"",""父母"",""症状"",""医院""]"
ObjectId(5b0f598eaeeb42d8582db68c),1513,"[""声音"",""奶睡"",""美国"",""摇晃"",""宝宝睡觉""]"
ObjectId(5b0f598eaeeb42d8582db6a0),1535,"[""安全"",""清洗"",""坐便器"",""马桶"",""优点""]"
ObjectId(5b0f598eaeeb42d8582db746),2732,"[""荔枝"",""进食"",""葡萄糖"",""上火"",""症状""]"
ObjectId(5b0f598eaeeb42d8582db749),2733,"[""宝宝喂养"",""食物"",""辅食"",""母乳"",""饮食""]"
ObjectId(5b0f598eaeeb42d8582db752),2742,"[""乳头"",""喂奶"",""乳头皲裂"",""哺乳"",""乳房""]"
ObjectId(5b0f598eaeeb42d8582db758),2745,"[""食物"",""挑食"",""偏食"",""爸爸"",""菠菜""]"
ObjectId(5b0f598eaeeb42d8582db75b),2746,"[""吃饭"",""爸爸"",""咀嚼"",""食物"",""时间""]"
ObjectId(5b0f598eaeeb42d8582db75f),2747,"[""辅食"",""小宝"",""核桃油"",""健康"",""亚麻酸""]"
ObjectId(5b0f598eaeeb42d8582db761),2748,"[""油炸"",""健康"",""食品"",""食物"",""油脂""]"
ObjectId(5b0f598eaeeb42d8582db768),2751,"[""辅食"",""膳食纤维"",""消化不良"",""食物"",""肠道""]"
ObjectId(5b0f598eaeeb42d8582db76d),2753,"[""吃饭"",""饮食习惯"",""帮助"",""习惯"",""爸爸""]"
ObjectId(5b0f598eaeeb42d8582db770),2754,"[""微波炉"",""辅食"",""营养"",""食物"",""维生素""]"
ObjectId(5b0f598eaeeb42d8582db773),2755,"[""葡萄糖"",""营养"",""习惯"",""家长"",""宝宝生病""]"
ObjectId(5b0f598eaeeb42d8582db775),2756,"[""祛湿"",""食物"",""帮助"",""食谱"",""红豆""]"
ObjectId(5b0f598eaeeb42d8582db778),2757,"[""水果"",""蔬菜"",""果汁"",""配方奶"",""母乳""]"
ObjectId(5b0f598eaeeb42d8582db77e),2759,"[""挑食"",""维生素"",""营养"",""宝宝挑食"",""缺锌""]"
ObjectId(5b0f598eaeeb42d8582db781),2760,"[""吃肉"",""健康"",""水果"",""饱和脂肪酸"",""便秘""]"
ObjectId(5b0f598eaeeb42d8582db784),2761,"[""水果"",""营养"",""食物"",""胡萝卜素"",""维生素""]"
ObjectId(5b0f598eaeeb42d8582db787),2762,"[""体重"",""宝宝瘦"",""营养不良"",""原因"",""父母""]"
ObjectId(5b0f598eaeeb42d8582db789),2763,"[""食物"",""营养"",""宝宝喂养"",""进食"",""时间""]"
ObjectId(5b0f598eaeeb42d8582db790),2765,"[""辅食"",""宝宝断奶"",""断奶"",""保存"",""冰箱""]"
ObjectId(5b0f598eaeeb42d8582db792),2766,"[""断奶"",""母乳"",""环境"",""夏季"",""宝宝生病""]"
ObjectId(5b0f598eaeeb42d8582db795),2767,"[""益生菌"",""药物"",""食品"",""平衡"",""抗生素""]"
ObjectId(5b0f598eaeeb42d8582db798),2768,"[""食物"",""咳嗽"",""宝宝咳嗽"",""饮食"",""辛辣""]"
ObjectId(5b0f598eaeeb42d8582db79b),2769,"[""辅食"",""饮食"",""断奶"",""宝宝断奶"",""保存""]"
ObjectId(5b0f598eaeeb42d8582db7a1),2773,"[""补钙"",""维生素"",""宝宝缺钙"",""母乳"",""配方奶""]"
ObjectId(5b0f598faeeb42d8582db7a6),2775,"[""果泥"",""水果"",""维生素"",""哈密瓜"",""香蕉""]"
ObjectId(5b0f598faeeb42d8582db7a9),2776,"[""饮食"",""食物"",""辅食"",""进食"",""水果""]"
ObjectId(5b0f598faeeb42d8582db7ac),2777,"[""有机蔬菜"",""蔬菜"",""农药"",""有机"",""农药残留""]"
ObjectId(5b0f598faeeb42d8582db7b4),2780,"[""断奶"",""乳头"",""乳房"",""乳腺炎"",""回奶""]"
ObjectId(5b0f598faeeb42d8582db7b8),2783,"[""叶黄素"",""太阳镜"",""胡萝卜"",""南瓜"",""芒果""]"

就能确定这么执行是没有问题。重点排查--query条件。根据之前报错

1
2
2019-05-17T13:25:56.341+0800    error validating settings: query '[123 99 111 110 116 101 110 116 95 116 121 112 101 95 105 100 58 50 56 44 117 112 100 97 116 101 95 116 105 109 101 58 123 36 103 116 101 58 49 53 52 54 50 55 50 48 48 48 48 48 48 125 44 105 115 95 100 101 108 101 116 101 58 48 125 44 123 99 111 110 116 101 110 116 95 105 100 58 49 44 99 111 110 116 101 110 116 95 116 97 103 115 58 49 125]' is not valid JSON: invalid character ',' after top-level value
2019-05-17T13:25:56.341+0800 try 'mongoexport --help' for more information

发现一个点invalid character ',' after top-level value中提示有逗号,的问题,排查一下--query条件发现有'{"content_type_id":"28","update_time":{"$gte":1546272000000},"is_delete":0}, {"content_id":1,"content_tags":1}'有一个逗号,

去掉逗号,和后面的逗号之后的条件发现执行成功

1
/usr/local/mongodb/bin/mongoexport --port 29001 --host=localhost -user=*** --password=***** --authenticationDatabase=**** --db=db --collection=collection --query='{"content_type_id":"28","update_time":{"$gte":1546272000000},"is_delete":0}' --type=csv --fields _id,content_id,content_tags --out=***.csv

1
2
3
4
5
6
7
8
9
2019-05-17T13:41:33.068+0800    connected to: localhost:29001
2019-05-17T13:41:34.067+0800 feeds.content_medium_hismatch 0
2019-05-17T13:41:35.067+0800 feeds.content_medium_hismatch 0
2019-05-17T13:41:36.067+0800 feeds.content_medium_hismatch 0
2019-05-17T13:41:36.656+0800 feeds.content_medium_hismatch 18870
2019-05-17T13:41:36.656+0800 exported 18870 records
public-ops-mongodb2 seclogin # cat content_medium_hismatch.csv |wc -l
18871
public-ops-mongodb2 seclogin #

和开发确认一下这个条件能否却掉,开发确认可以去掉。

总结

mongoexport在执行有条件的导出文件,--query 条件要写在一个花括号{}里面,如果要有两个花括号{}的条件,中间用逗号,分割,这样是不行的。

-------------本文结束感谢您的阅读-------------

本文标题:mongodb数据导出CSV文件

文章作者:Wang Jiemin

发布时间:2019年05月17日 - 13:05

最后更新:2019年05月17日 - 14:05

原始链接:https://jiemin.wang/2019/05/17/mongodb-to-csv/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%