如何使用python批量修改文本文件编码格式

发布时间:2023-03-26

  

使用python批量修改文本文件编码格式

  把文本文件的编码格式进行批量幻化,比如ascii, gb2312, utf8等,相互转化,字符集的大小来看,utf8>gb2312>ascii,因此最好把gb2312转为utf8,否则容易出现乱码。

  gb2312和utf-8的主要区别:

  关于字库规模: UTF-8 > gb2312(utf8字全而gb2312只有汉字)

  关于保存大小: UTF-8> gb2312 (utf8更臃肿、加载更慢,gb2312更小巧,加载更快)

  关于适用范围:gb2312主要在中国地区使用,是一个本地化的字符集,UTF-8包含全世界所有国家需要用到的字符,是国际编码,通用性强。UTF-8编码的文字可以在各国支持UTF8字符集的浏览器上显示。

  

  

?

  

1

  

2

  

3

  

4

  

5

  

6

  

7

  

8

  

9

  

10

  

11

  

12

  

13

  

14

  

15

  

16

  

17

  

18

  

19

  

20

  

21

  

22

  

23

  

24

  

25

  

26

  

27

  

28

  

29

  

30

  

31

  

32

  

33

  

34

  

35

  

36

  

37

  

38

  

39

  

40

  

41

  

42

  

importsys

  

importchardet

  

importcodecs

  

  

defget_encoding_type(fileName):

  

print the encoding format of a txt file

  

with open(fileName, rb) as f:

  

data =f.read()

  

encoding_type =chardet.detect(data)

  

#print(encoding_type)

  

returnencoding_type

  

# such as {encoding: GB2312, confidence: 0.99, language: Chinese}

  

  

defconvert_encoding_type(filename_in, filename_out, encode_in=gb2312, encode_out=utf-8):

  

convert encoding format of txt file

  

#filename_in = flash.c

  

#filename_out = flash_gb2312.c

  

#encode_in = utf-8 # 输入文件的编码类型

  

#encode_out = gb2312# 输出文件的编码类型

  

with codecs.open(filename=filename_in, mode=r, encoding=encode_in) as fi:

  

data =fi.read()

  

with open(filename_out, mode=w, encoding=encode_out) as fo:

  

fo.write(data)

  

fo.close()

  

# with open(filename_out, rb) as f:

  

# data = f.read()

  

# print(chardet.detect(data))

  

  

if__name__==__main__:

  

# fileName = argv[1]

  

# get_encoding_type(fileName)

  

# convert_encoding_type(fileName, fileName)

  

filename_of_files =sys.argv[1] #the file contain full file path at each line

  

with open(filename_of_files, rb) as f:

  

lines =f.readlines()

  

forline inlines:

  

fileName =line[:-1]

  

encoding_type =get_encoding_type(fileName)

  

ifencoding_type[encoding]==GB2312:

  

print(encoding_type)

  

convert_encoding_type(fileName, fileName)

  

print(fileName)

  

  

  

补充:python实现文件批量转为utf-8格式

  python实现文件批量转为utf-8格式

  

  

?

  

1

  

2

  

3

  

4

  

5

  

6

  

7

  

xml_path =./

  

with open(xml_path , rb+) as f:

  

content =f.read()

  

codeType =detect(content)[encoding]

  

content =content.decode(codeType, ignore).encode(utf8)

  

fp.seek(0)

  

fp.write(content)

  

  

  到此这篇关于如何使用python批量修改文本文件编码格式的文章就介绍到这了,更多相关python批量修改文本文件编码格式内容请搜索陆零网络以前的文章或继续浏览下面的相关文章希望大家以后多多支持陆零网络!

注册即送1000元现金券