# 4.数据存储

利用爬虫爬取数据之后，进行存储。可用以下几种方式进行存储。

## 文本存储

### TXT 文件存储

#### **实例**

利用open()方法打开一个文本文件，并获取一个文件操作的对象

```
import requests
from pyquery import PyQuery as pq

url = "https://www.zhihu.com/explore"
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}
response = requests.get(url,headers=headers)
content = response.text
# pq解析
doc = pq(content)
items = doc('.explore-tab .feed-item').items()
for item in items:
    question = item('.question_link').text()
    author = item('.avatar-link').text()
    answer = pq(item.find('.content').html()).text()
    with open('explore.txt','a',encoding='utf-8') as f:
        f.write('\n'.join([question,author,answer]))
        f.write('\n'+'='*50+'\n')
```

#### **打开方式**

```
r:以只读方式打开文件
rb:以二进制格式打开文件用于只读
r+:以读写方式打开
rb+:以二进制方式打开并读写文件
w:以只写方式打开文件，如果文件已存在则会被覆盖，不存在则会新建一个文件
w+:以读写方式打开文件，如果文件已存在则会被覆盖，不存在则会新建一个文件
wb+:以二进制方式打开并读写文件，如果文件已存在则会被覆盖，不存在则会新建一个文件
a:打开文件向文本追加新内容，如果文件已存在则会被覆盖，不存在则会新建一个文件
a+:以二进制方式打开文件向文本追加新内容，如果文件已存在则会被覆盖，不存在则会新建一个文件
```

#### **写法**

有两种打开文件的方法

```
    with open('explore.txt','a',encoding='utf-8') as f:
        f.write('\n'.join([question,author,answer]))
        f.write('\n'+'='*50+'\n')
```

文件会自动关闭，不需要再去调用close()方法

另一种需要调用close()方法，关闭打开的文件

```
    f = open('explore.txt','a',encoding='utf-8')
    f.write('\n'.join([question,author,answer]))
    f.write('\n'+'='*50+'\n')
    f.close()
```

### JSON文件存储

#### **说明**

Json，全称为JavaScript Object Notation，即JavaScript对象标记。

json格式:

```
[{
    "name": "Bob",
    "gender": "male",
    "birthday": "1992-10-18"
}, {
     "name": "Selina",
    "gender": "female",
    "birthday": "1995-10-18"
}]
```

#### **取Json**

通过json库实现对json文件的读写操作，调用json库的loads()方法将Json文本字符串转为Json对象，可以通过dumps()方法将Json对象转为文本字符串

```
import json

content = """
            [{
                "name": "Bob",
                "gender": "male",
                "birthday": "1992-10-18"
            }, {
                "name": "Selina",
                "gender": "female",
                "birthday": "1995-10-18"
            }]
        """
print(type(content))
content = json.loads(content)
print(type(content))
print(content)
```

运行结果:

```
<class 'str'>
<class 'dict'>
[{'name': 'Bob', 'gender': 'male', 'birthday': '1992-10-18'}, {'name': 'Selina', 'gender': 'female', 'birthday': '1995-10-18'}]
```

注意: Json 的数据需要用双引号来包围，不能使用单引号，不然会抛出如下错误

```
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 3 column 5 (char 8)
```

从json文本中读取内容

```
import json

with open('test.json','r') as f:
    content = f.read()
    content = json.loads(content)
    print(content)
```

运行结果:

```
[{'name': 'Bob', 'gender': 'male', 'birthday': '1992-10-18'}, {'name': 'Selina', 'gender': 'female', 'birthday': '1995-10-18'}]
```

#### **输出json**

利用dumps()方法将json对象转化为字符串，然后使用write()方法写入文本中

```
import json

content = [{
        'name': 'Bob',
        'gender': 'male',
        'birthday': '1992-10-18'
}]
with open('test2.json','w') as f:
    f.write(json.dumps(content))
```

如果json中包含中文字符，为了保证输出正文，需要指定一个参数ensure\_ascii=False，另外还需要指定文件输出的编码

```
import json

content = [{
        'name': '天使',
        'gender': '女',
        'birthday': '1992-10-18'
}]
with open('test2.json','w',encoding='utf-8') as f:
    f.write(json.dumps(content,ensure_ascii=False,indent=2))
```

### CSV文件存储

#### **说明**

CSV，全称叫做Comma-Separated Values，即逗号分隔符或字符分隔符

#### **写入**

通过open()方法写入并创建一个csv文件，调用csv库的writer()方法初始化一个写入对象，传入该语柄，然后调用writerow()方法传入每行的数据即可完成写入

```
import csv

with open('data.csv', 'w',newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'name'])
    writer.writerow(['10001', '米库'])
```

注意:如果不带newline=‘’，会发现也能写入结果，但是每行内容之间总是会多出一个空行

如果想修改列与列之间的分隔符可以传入delimiter参数

```
import csv

with open('data.csv', 'w',newline='') as csvfile:
    writer = csv.writer(csvfile,delimiter='0')
    writer.writerow(['id', 'name'])
    writer.writerow(['10001', '米库'])
```

默认的分隔符逗号(,)会被换成delimeter参数设置的分隔符

可以同时传入多行，需要使用witerrows()方法

注意:是writerrows而不是writerow

```
import csv

with open('data.csv', 'w',newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'name'])
    writer.writerows([['10001', '米库'],['10001', '米库'],['10001', '米库']])
```

以字典的形式写入csv文件

```
import csv

with open('data.csv', 'w',newline='') as csvfile:
    # 预先定义字段
    fieldnames = ['id','name']
    # 初始化一个字典写入对象
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    # 调用writeheader()方法先写入头信息
    writer.writeheader()
    # 调用writerow传入相应字典
    writer.writerow({'id':1,"name":"angle"})
```

追加csv文本内容

注意这里就不同调用writeheader()方法，不然会将头部信息一并写入

```
import csv

with open('data.csv', 'a',newline='') as csvfile:
    # 预先定义字段
    fieldnames = ['id','name']
    # 初始化一个字典写入对象
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    # 调用writerow传入相应字典
    writer.writerow({'id':1,"name":"angle"})
```

若将中文写入csv文件中，需要指定编码格式

```
import csv

with open('data.csv', 'a',newline=''，encoding='utf-8') as csvfile:
    # 预先定义字段
    fieldnames = ['id','name']
    # 初始化一个字典写入对象
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    # 调用writerow传入相应字典
    writer.writerow({'id':1,"name":"angle"})
```

利用pandas库写入csv文件

注意，字典的键值对中值的类型是列表形式

```
import pandas as pd

data = {
    'name':["angle"],
    "age":[18],
}
print(data)
df = pd.DataFrame(data)
# print(df)
df.to_csv('csv1.csv')
```

#### **读取**

```
import csv

with open('data.csv','r',encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)
    for content in reader:
        print(content)
```

运行结果:

```
['id', 'name']
['1', 'angle']
['id', 'name']
['1', 'angle']
['id', 'name']
```

利用pandas中read\_csv()方法将数据从csv中读取出来

```
import csv
with open('data.csv', 'w',newline='',encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'name'])
    writer.writerows([['10001', '米库'],['10001', '米库'],['10001', '米库']])


# 用pandas读取csv文本内容，需要写入csv文本后时指定编码格式utf-8

import pandas as pd

print(pd.read_csv('data.csv'))
```

运行结果:

```
      id name
0  10001   米库
1  10001   米库
2  10001   米库
```

## 数据库

### 关系数据库

[Mysql](https://github.com/TheFoxFairy/StudyNotes/blob/main/编程笔记/Python笔记/3.数据库/Mysql.md)

### 非关系数据库

[MongoDB](https://github.com/TheFoxFairy/StudyNotes/blob/main/编程笔记/Python笔记/3.数据库/MongoDB.md)

[Redis](https://github.com/TheFoxFairy/StudyNotes/blob/main/编程笔记/Python笔记/3.数据库/redis.md)

### ORM模型

[SQLAlchemy](https://github.com/TheFoxFairy/StudyNotes/blob/main/编程笔记/Python笔记/3.数据库/sqlalchemy.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://luoran.gitbook.io/luo-ran-de-bi-ji-ben/python-bian-cheng/04python-pa-chong/4.-shu-ju-cun-chu.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
