Python

conda

  • https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf

退出 base 环境

# run this in an interactive shell
# enable environment called "base", the default env from conda
conda activate base
# deactivate an environment
conda deactivate

强制阻止 base 环境启动

conda config --set auto_activate_base false

默认时总是会启动 base 环境,很烦人。

安装 packages

安装时使用一个好用的国内package代理库:

pip install scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple

Python 语法

Array 数组常见用法

Ref:

  • https://numpy.org/doc/stable/reference/generated/numpy.zeros.html#numpy.zeros

数组的初始化:

A = np.zeros(5) # create 1D array with 5 elements, all zeros
B = np.zeros((2,3), dtype=int) # 2D array. Other common types: np.int8, np.float

# Here C's shape is the same as A, which is (5,). NOTE that this is NOT equal to (5,1) or (1,5) which are actually 2D matrices instead of 1D.
C = np.array([1,2,3,4,5]) 
D = np.zeros((5,1)) # 2D matrix with shape (5,1), not (5,)

# Note that a list is different from numpy array.
E = list(range(0, 52, 1)) # a list with 51 elements from 0 to 51
E_arr = np.asarray(E) # convert list to numpy 1D array
print(len(E)) # will print 3. NOTE that a list doesn't have '.shape' element
print(E_arr.shape) # will print (3,), since E_arr is a 1D array

相应的还有 np.ones(), np.full()等函数,用法完全相同。

List indexing

这个链接总结和最常用的 list indexing 的用法: https://stackoverflow.com/questions/509211/understanding-slice-notation

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

Python 里面的 list 可以理解为一个 1D Array,注意它并不是一个行或者列等于 1 的numpy matrix。两者 type 根本不同。

a[start:stop]  # items start through stop-1
a[start:]      # items start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a copy of the whole array
a[start:stop:step] # start through not past stop, by step

a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items
a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

数组 Indexing

这里对比一下 list, numpy array 和 numpy matrix 的区别:

  • list 是 python 提供的,感觉类似 C++ vector 这种,获取它的大小只能用 len(arr) 函数,它没有像 numpy 一样提供一个 .shape 成员;
  • numpy array 和 matrix 的类型相同,都是 numpy.ndarray 这个类型,它们也都有 .shape 成员可以查看大小。不过,一个 numpy array A.shape 是 (N,),即第二个尺度是空的,表明它是一个 numpy array。而一个 Nx1 或 1xN numpy matrix B.shape 输出将会是 (N, 1) 或者 (1, N),表明它是一个 2D 矩阵,即便它的行或者列可能是 1。

下面例子中也列举了这几者之间的转换方式。

import numpy as np
a = np.eye(4, dtype=float) # 4x4 identity matrix 
b = np.zeros([3,3], dtype=float) # 3x3 matrix with all 0s
c = np.ones([2,4], dtype=float) # 2x4 matrix with all 1s

a[0:3, 0:3] = b   # block assignment (NOTE that the left-right shape must match)

# reshape matrix to another shape 4x2 (using row-major on original matrix by default)
d = c.reshape(4,2) 
# reshape matrix using column-major
e = c.reshape(4,2, order='F')

# Use -1 to reshape the matrix to a 1D array. But NOTE that it's a numpy array, NOT a python list. And NOTE that even though a numpy array is with same type as a numpy matrix (numpy.ndarray), but they are different that, a numpy array's shape is (N,), while a Nx1 or 1xN numpy matrix's shape is (N,1) or (1,N).
f = c.reshape(-1)

# Reshape also supports different order. Here use 'tolist()' to convert a numpy array into a list.  
g_list = c.reshape(-1, order='F').tolist()

# Convert list to numpy array
g_arr = np.array(g_list)

获取 basename 和 dirname

https://stackoverflow.com/questions/22272003/what-is-the-difference-between-os-path-basename-and-os-path-dirname

path = '/foo/bar/item.jpg'
dir_str = os.path.dirname(path) # dir_str = '/foo/bar'
base_str = os.path.basename(path) # base_str = 'item.jpg'

Split text 获取字符串的前缀和后缀

注意和上面的 basename 和 dirname 的区别。

import os

filepath = '/dev/abc.jpg'
# base, suf 分别是一个字符串路径的 basename 和 suffix, 这里 base = '/dev/abc', suf = '.jpg'
base, suf = os.path.splitext(filepath)
# 或者用 base = os.path.splitext(filepath)[0] 只获取前缀(或用[1]只获取后缀)

python script 中运行其他程序

例如下面运行 ffmpeg 的例子:

import os
command = 'ffmpeg -hide_banner -loglevel panic -i ' + input_path + ' -vf scale=' + \
            str(width) + ':' + str(height) + ' ' + args.out_image_path
os.system(command)

去掉字符串尾部字符

url = 'abcdc.com'
url1 = url[:-1] # 去掉最后一个字符
if url.endswith('.com'):
    url = url[:-4] # 去掉倒数四个字符

数字前加上 leading 0s

例如这样是生成 6 位数的前面补充 0 的字符串:str(i).zfill(6)

去掉字符串首尾的 0 (remove leading and/or trailing 0s)

  • https://stackoverflow.com/questions/13142347/how-to-remove-leading-and-trailing-zeros-in-a-string-python#:~:text=to%20remove%20both%20trailing%20and,for%20only%20the%20leading%20ones).
res = your_string.strip("0") # 同时去掉首尾的 0
res1 = your_string.rstrip("0") # 去掉尾部的 0
res2 = your_string.lstrip("0") # 去掉首部的 0

输入参数的开头有 dash ‘-‘ 符号

如果你的某个输入参数必须是 dash 开头,例如 -option_str -abc,直接这样用会报错,因为第二个 -abc被认为是一个参数类型而不是参数数值。一个简单的解决方法可以使用等号传入参数数值,像这样:-option_str="-abc"

Data IO

写入文档:

text_file = open("error_list.txt", "w") # use "a" to append text to existing file
text_file.write('some strings\n')
# 写入多种类型变量
text_file.write("%s %d %d\n" % (one_group, video_length, json_count))
text_file.close()

json

Ref:

  • https://pynative.com/python-json-dumps-and-dump-for-json-encoding/#:~:text=The%20json.,object%20into%20JSON%20formatted%20String.

  • https://stackoverflow.com/questions/9170288/pretty-print-json-data-to-a-file-using-python

  • https://www.programiz.com/python-programming/json

import json

## -- You can use a dict object directly and output it into json file later
images_json = {}
images_json['src_images'] = []
images_json['result_jsons'] = []
images_json['src_images'] = ['abcde.jpg']
images_json['result_jsons'].append(1.2)

## -- Create a json from an existing map (dictionary)
person = '{"name": "Bob", "languages": ["English", "Fench"]}'
person_json = json.loads(person)

## -- Create a json by loading from local json file
with open(json_path) as f:
  my_json = json.load(f)

## -- Write json or a dict into local json file
with open(output_json_path, 'w') as outfile:
  json.dump(out_json, outfile)
  
## NOTE: You can also use this to prettify the output json if you want.
## Here 'dump' or 'dumps' are both fine.
with open(output_json_path, 'w') as outfile:
  json.dump(person_dict, outfile, indent = 4, sort_keys=True)
  

NOTE for the two functions dump() and dumps():

  • The json.dump() method (without “s” in “dump”) used to write Python serialized object as JSON formatted data into a file.
  • The json.dumps() method encodes any Python object into JSON formatted String

Matrix, Vision, Graphics 相关

Rotation

  • https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html

Transpose

https://note.nkmk.me/en/python-numpy-transpose/

乘除法

Ref: https://stackoverflow.com/questions/21562986/numpy-matrix-vector-multiplication

符号 * 代表的相乘并非矩阵相乘,而是对应项相乘(element-wise operations )。同理,+, -, *, /都是如此。这是因为 numpy array 并不会被当做 matrix 来看待,而是普通的 array,即便它是多维的。

例如:

a = np.array([[ 5, 1 ,3], [ 1, 1 ,1], [ 1, 2 ,1]])
b = np.array([1, 2, 3])
print(a*b)
>> [[5 2 9]
   [1 2 3]
   [1 4 3]]

即,a * b 其实是将 b 这个 1x3 矩阵和 a 中的每一行的 1x3 阵的对应项元素相乘,即 element-wise

如果想要矩阵相乘,可以使用 a @ b,或者np.matmul(a,b),或者 a.dot(b)。又或者,可以使用 numpy.matrix 类型。

另注:两个星号 ** 代表次方,例如一个 numpy array arr 的 arr**2 就是

Parallel Computation for loop 并行计算

Ref:

单参数函数执行并计算

import multiprocessing

def f(x):
    return x*x

# Get all cores
cores = multiprocessing.cpu_count()
# start a pool
pool = multiprocessing.Pool(processes=cores)

tasks = [1,2,3,4,5]

# do parallel calculate
print(pool.map(f,tasks))

其实map函数在执行的时候就已经是并行操作了,只不过multiprocessing模块集成了map方法

多参数函数执行并行计算

  • Python3 中使用starmap方法来实现
import multiprocessing

def add(x, y):
	return x+y

# Get all worker processes
cores = multiprocessing.cpu_count()

# Start all worker processes
pool = multiprocessing.Pool(processes=cores)
x1 = list(range(5))
y1 = list(range(5))

tasks = [(x,y) for x in x1 for y in y1]

print(pool.starmap(add,tasks))
  • Python2 中需要一个函数对多参数函数包装下
import multiprocessing

def add(x, y):
	return x+y

def merge_add(args):
	return add(*args)


# Get all worker processes
cores = multiprocessing.cpu_count()

# Start all worker processes
pool = multiprocessing.Pool(processes=cores)
x1 = list(range(5))
y1 = list(range(5))

tasks = [(x,y) for x in x1 for y in y1]

print(pool.map(merge_add,tasks))

显示 video (在 Jupyter notebook上)

from IPython.display import HTML
from base64 import b64encode
mp4 = open('video.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls autoplay loop>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

Search

    Table of Contents