我在git文件中遇到了讨厌的CRLF / LF冲突,该冲突可能是从Windows计算机提交的。是否有跨平台的方法(最好是在Python中)来检测文件中占主导地位的换行类型?
我有以下代码(基于来自https://stackoverflow.com/a/10562258/239247的想法):
import sys
if not sys.argv[1:]:
sys.exit('usage: %s <filename>' % sys.argv[0])
with open(sys.argv[1],"rb") as f:
d = f.read()
crlf, lfcr = d.count('\r\n'), d.count('\n\r')
cr, lf = d.count('\r'), d.count('\n')
print('crlf: %s' % crlf)
print('lfcr: %s' % lfcr)
print('cr: %s' % cr)
print('lf: %s' % lf)
print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))
但这给统计信息带来了错误(对于此文件):
crlf: 1123
lfcr: 58
cr: 1123
lf: 1123
cr-crlf-lfcr: -58
lf-crlf-lfcr: -58
total (lf+cr-2*crlf-2*lfcr): -116
import sys
def calculate_line_endings(path):
# order matters!
endings = [
b'\r\n',
b'\n\r',
b'\n',
b'\r',
]
counts = dict.fromkeys(endings, 0)
with open(path, 'rb') as fp:
for line in fp:
for x in endings:
if line.endswith(x):
counts[x] += 1
break
print(counts)
if __name__ == '__main__':
if len(sys.argv) == 2:
calculate_line_endings(sys.argv[1])
sys.exit('usage: %s <filepath>' % sys.argv[0])
提供文件输出
crlf: 1123
lfcr: 0
cr: 0
lf: 0
够了吗
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句