class: middle, inverse, center #后端那些事 ### yihaibo 2015.12.29 --- # 后端研究什么 ##总结:4个一 - 语言(python) - 操作系统(linux) - 存储(mysql) - 领域(web, 数据分析,web安全,密码技术) --- # 语言 一.静态语言,动态语言,弱类型语言,强类型语言 - 区别 - 举例 Python是动态强类型语言 Golang是静态强类型语言 C是静态弱类型语言 - 反射(自省) --- 二.并发与并行 - 区别 - 举例 Erlang, Golang是天生支持并行的 Python由于GIL限制单进程只能并发 --- 三.Python - 数据类型, 数据结构 - 推导式与函数式 - 算法 - 反射 - 属性拦截 - 装饰器 - 描述符和属性 - 生成器 - 元类 - 垃圾回收 - 多线程与多进程 - 其它 --- ###数据类型 所有数据类型都是对象 ``` # 64位操作系统 In [40]: sys.getsizeof(1) Out[40]: 24 In [41]: sys.getsizeof(1L) Out[41]: 28 In [42]: sys.getsizeof("") Out[42]: 37 In [43]: sys.getsizeof("hello") Out[43]: 42 In [44]: sys.getsizeof([]) Out[44]: 72 In [45]: sys.getsizeof([1]) Out[45]: 80 ``` --- class: middle ####什么时候2+2==5? ``` In [50]: 2+2 Out[50]: 4 In [51]: ctypes.memmove(id(4), id(5), 24) Out[51]: 11637008 In [52]: 2+2 Out[52]: 5 In [53]: 4 Out[53]: 5 ``` --- ###数据结构 - list, tuple, dict, set - collection.namedtuple, collection.deque, collection.OrderedDict, collections.defaultdict - Queue.Queue ``` Person=collections.namedtuple('Person','name age gender') p=Person("kehan", 2, "female") p.name Out[12]: 'kehan' ``` ####思考 底层原理??? --- - 可变与不可变 ``` In [1]: def test(gameids=[1,2]): ...: gameids.append(3) ...: print(gameids) In [2]: test() [1, 2, 3] In [3]: test() [1, 2, 3, 3] In [4]: test.func_defaults Out[4]: ([1, 2, 3, 3],) ``` ``` In [18]: a={} In [19]: b=[] In [20]: a[b]=1 --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
in
() ----> 1 a[b]=1 TypeError: unhashable type: 'list' ``` --- class: middle ``` persons = {} class Person(object): def __init__(self, name): self.name = name def __hash__(self): return hash(self.name) persons[Person("kehan")] = 1 print(Person("kehan") in persons) #False ``` --- class: middle ``` persons = {} class Person(object): def __init__(self, name): self.name = name def __hash__(self): return hash(self.name) def __eq__(self, r): return True if r.name == self.name else False persons[Person("kehan")] = 1 print(Person("kehan") in persons) #True ``` --- ### 推导式与函数式 comprehension: list, dict, set 函数式: map, filter, reduce等 ```python t_columns = filter(lambda x: x.endswith("_time"), model.FIELDS) t_columns = [f for f in model.FIELDS if f.endswith("_time")] # 小于0置为0 ages = [-1, 10, 20] sum([age if age >=0 else 0 for age in ages ]) sum(age if age >=0 else 0 for age in ages) # 生成器 sum(map(lambda age: age if age >=0 else 0, ages)) ``` --- ###算法 ####排序 ``` peoples = [ {"age": 1, "name": "kehan"}, {"age": 24, "name": "lwy"}, {"age": 25, "name": "skycrab"}, ] # age从小到大, sorted(peoples, key=lambda p:p["age"]) # 从大到小 sorted(peoples, key=operator.itemgetter("age"), reverse=True) operator.attrgetter # 属性 operator.itemgetter #元素 operator.methodcaller #方法 ``` --- class: middle ``` #最大age max(peoples, key=operator.itemgetter("age")) {'age': 25, 'name': 'skycrab'} min(peoples, key=operator.itemgetter("age")) {'age': 25, 'name': 'skycrab'} # operator模块是一个宝藏 a = [1, 2, 3] reduce(operator.imul, a) # 6 reduce(lambda x, y: x+y, a) # 6 ``` --- ####二分查找 前提: 已排序数列 ``` a=[1, 5, 10, 15,20] # bisect.bisect_right(a, x) 返回a中插入x的序号 bisect.bisect_right(a, 11) Out[11]: 3 bisect.insort_right(a,11) In [13]: a Out[13]: [1, 5, 10, 11, 15, 20] ``` --- ###堆排序 解决TOP N问题 ``` # heapq是最小堆 a=[1] heapq.heappush(a, 10) # a=[1, 10] heapq.heappop(a) #弹出1 heapq.nlargest(2, peoples, key=operator.itemgetter("age")) [{'age': 25, 'name': 'skycrab'}, {'age': 24, 'name': 'lwy'}] ``` --- ###反射(自省) - getattr, setattr, hasattr, dir - \_\_dict\_\_, \_\_slots\_\_ - callable, isinstance - traceback - inspect 好处是什么??? - orm --- class:inverse ``` In [13]: class Person(object): ....: desc = "person object" ....: def __init__(self): ....: self.age = 100 In [14]: Person.__dict__ Out[14]:
, '__doc__': None, '__init__':
, '__module__': '__main__', '__weakref__':
, 'desc': 'person object'}> In [15]: p=Person() In [16]: p.__dict__ Out[16]: {'age': 100} In [22]: hasattr(p, "age") Out[22]: True In [23]: getattr(p, "desc") Out[23]: 'person object' ``` --- class:inverse ``` class Point(object): def __init__(self, x, y): self.x = x self.y = y In [24]: P=Point(1,1) In [28]: P.__dict__ Out[28]: {'x': 1, 'y': 1} In [38]: P.age=100 class EPoint(object): __slots__ = ("x","y") def __init__(self, x, y): self.x = x self.y = y In [33]: Ep=EPoint(1,1) In [35]: Ep.__dict__ AttributeError: 'EPoint' object has no attribute '__dict__' In [39]: Ep.age=100 AttributeError: 'EPoint' object has no attribute 'age' ``` --- class:inverse ``` In [26]: isinstance(p, Person) Out[26]: True In [29]: callable(os.path.join) Out[29]: True In [33]: class FunCall(object): def __call__(self, *args, **kwargs): print(args, kwargs) In [34]: f=FunCall() In [35]: callable(f) Out[35]: True In [36]: f("hello", name="funcall") (('hello',), {'name': 'funcall'}) In [10]: try: ....: 1/0 ....: except Exception: ....: print(traceback.format_exc()) ....: .........ZeroDivisionError: integer division or modulo by zero ``` --- ###属性拦截 ``` # dataeye/statistics/bi/sdkincome.py class Merge(object): ""获取django多个结果"" def __init__(self, query_set): self.query_set = query_set def __getattr__(self, name): return sum(getattr(q, name) for q in self.query_set) class ObjectDict(dict): """字典当做对象使用""" def __getattr__(self, name): try: return self[name] except KeyError: raise AttributeError(name) def __setattr__(self, name, value): self[name] = value ``` --- ``` # dataeye/statistics/context_processors.py class PermWrapper(object): def __init__(self, user): self.user = user self.superuser = user.is_superuser self.perms = set(["101", "102"]) def __getitem__(self, module_name): if hasattr(self, module_name): return getattr(self, module_name) return module_name in self.perms perm = PermWrapper() perm["superuser"] perm["101"] ``` --- ###装饰器 - 闭包 - 好处, 解耦 - 面向切面编程(aop) - 用处(日志记录,权限控制,事物处理等) - 分类(不带参数,带参数,类装饰器) ``` cf = ConfigParser.ConfigParser() cf.read("config.ini") get = functools.partial(cf.get, "analyse") cf.get("analyse", "host") print(get("host"), get("db"), get("user")) ``` --- ``` def XXX(func): @wraps(func) def wrapper(*args, **kwargs): # before hook return func(*args, **kwargs) # after hook return wrapper def super_required(view_func): """超级用户控制""" @wraps(view_func) def decorator(request, *args, **kwargs): if request.user.is_authenticated() and request.user.is_superuser: return view_func(request,*args,**kwargs) else: return HttpResponseRedirect("/login/ ?next= {0}".format(request.path)) return decorator ``` --- 需求:使用torndb,记录执行的sql语句和执行时间 属性拦截与装饰器结合 ``` # analyse/script/fabfile.py 简化 def log(level): """记录日志""" assert level in ("debug", "info", "warn", "error") def decorator(func): @wraps(func) def wrapper(*args, **kwargs): start = time.time() result = func(*args, **kwargs) write = getattr(logger, level) write("sql: %s\nrun time: %s seconds", args, time.time()-start) return result return wrapper return decorator ``` --- ``` # analyse/script/fabfile.py class Connection(object): def __init__(self): self.conn = torndb.Connection(HOST, DB, USER, PASSWORED) def __getattr__(self, name): func = getattr(self.conn, name) return log("info")(func) conn = Connection() conn.get("select * from manage_ana_app where id=7") conn.execute("delete * from manage_ana_app") sql: select * from manage_ana_app where id=7 run time: 0.5111111 seconds ``` --- ### 描述符和属性 - 作用: 覆盖默认属性查找方式 a.x -> a.\_\_dict\_\_['x'] ->type(a).\_\_dict\_\_['x'] -> baseclass(type(a)).\_\_dict\_\_['x'] - \_\_get\_\_, \_\_set\_\_, \_\_delete\_\_ 数据描述符: \_\_get\_\_, \_\_set\_\_ (描述符优先) 非数据描述符: \_\_get\_\_ (\_\_dict\_\_优先) - 只支持new style objects 描述符是由\_\_getattribute\_\_调用,只在新式类中才有 - properties, methods, statis methods, class methods, super - classmethod实现原理 详细参考: https://docs.python.org/2/howto/descriptor.html?highlight=descriptor --- class: middle ``` #非数据描述符 class descriptor(object): def __init__(self, value): self.value = value def __get__(self, obj, type=None): print obj, type return self.value class Test(object): def __init__(self): self.age = 100 age = descriptor(200) t= Test() print(t.age) # 100 ``` --- ``` #数据描述符 class descriptor(object): def __init__(self, value): self.value = value def __get__(self, obj, type=None): return self.value def __set__(self, obj, value): pass class Test(object): def __init__(self): self.age = 100 age = descriptor(200) t= Test() print(t.age) # 200 ``` --- class: middle ClassMethod实现原理 ``` class ClassMethod(object): "Emulate PyClassMethod_Type() in Objects/funcobject.c" def __init__(self, f): self.f = f def __get__(self, obj, klass=None): if klass is None: klass = type(obj) def newfunc(*args): return self.f(klass, *args) return newfunc ``` --- ``` # analyse/bi/util.py class class_property(object): """ A property can decorator class or instance class Foo(object): @class_property def foo(cls): return 42 print(Foo.foo) print(Foo().foo) """ def __init__(self, func, name=None, doc=None): self.__name__ = name or func.__name__ self.__module__ = func.__module__ self.__doc__ == doc or func.__doc__ self.func = func def __get__(self, obj, type=None): value = self.func(type) return value ``` --- class:middle, inverse ``` class cached_property(object): """ A property that is only computed once per instance and then replaces itself with an ordinary attribute. Deleting the attribute resets the property. class Foo(object): @cached_property def foo(self): return 42*100*100 print(Foo().foo) """ def __init__(self, func): self.__doc__ = getattr(func, '__doc__') self.func = func def __get__(self, obj, cls): if obj is None: return self value = obj.__dict__[self.func.__name__] = self.func(obj) return value ``` --- class:middle ``` class class_cached_property(object): def __init__(self, func): self.__doc__ = getattr(func, '__doc__') self.func = func def __get__(self, obj, cls): if cls is None: return self value = self.func(cls) setattr(cls, self.func.__name__, value) return value ``` --- ###生成器 ``` In [2]: g=(2*x for x in range(2)) In [3]: g Out[3]:
at 0x00000000043E6630> In [4]: g.next() Out[4]: 0 In [5]: g.next() Out[5]: 2 In [6]: g.next() --------------------------------------------------------------------------- StopIteration Traceback (most recent call last)
in
() ----> 1 g.next() StopIteration: ``` --- ``` In [12]: def task(): ....: for x in range(2): ....: yield x ....: In [13]: t=task() In [14]: t Out[14]:
In [15]: print(t.next()) 0 In [16]: print(t.next()) 1 In [17]: print(t.next()) --------------------------------------------------------------------------- StopIteration Traceback (most recent call last)
in
() ----> 1 print(t.next()) StopIteration: ``` --- class: middle ``` def task(): begin = yield print("begin",begin) yield for x in range(begin): yield x t = task() t.send(None) t.send(2) print([x for x in t]) result: ('begin', 2) [0, 1] ``` --- ``` def thread1(): for x in range(4): yield x def thread2(): for x in range(4,8): yield x threads=[] threads.append(thread1()) threads.append(thread2()) def run(threads): #写这个函数,模拟线程并发 pass run(threads) ``` --- class:middle ``` def run(threads): #写这个函数,模拟线程并发 for t in threads: try: print t.next() except StopIteration: pass else: threads.append(t) 结果: 0 4 1 5 2 6 3 7 ``` --- class:inverse ``` class Task(object): def __init__(self): self._queue = collections.deque() self.work = 0 def add(self, gen): self._queue.append(gen) self.work += 1 def finish(self): return self.work <= 0 def run(self): while not self.finish(): try: gen = self._queue.popleft() gen.send(None) except StopIteration: self.work -= 1 else: self._queue.append(gen) t=Task() t.add(thread1()) t.add(thread2()) t.run() ``` --- ###上下文管理器 ####with 协议 ``` class cd(object): def __init__(self, path): self.src = os.getcwd() self.dest = path def __enter__(self): os.chdir(self.dest) def __exit__(self, exc_type, exc_val, exc_tb): """返回True忽略异常""" os.chdir(self.src) ``` --- class: middle ``` from contextlib import contextmanager @contextmanager def cd(path): cwd = os.getcwd() try: os.chdir(path) yield finally: os.chdir(cwd) ``` --- ###元类 - type ``` In [25]: class A(object): ....: pass ....: In [26]: type(A) Out[26]: type In [27]: B = type("B", (object,), {"name":"BBB"}) In [28]: B Out[28]: __main__.B In [30]: B.name Out[30]: 'BBB' ``` --- - \_\_new\_\_(cls, classname, bases, dict_attr) - \_\_init\_\_(cls, classname, bases, dict_attr) 注意: - \_\_new\_\_的cls参数是元类自己 - \_\_init\_\_的cls参数是元类创建的那个类 ``` _extract = {} _site = re.compile(r".*\.(?P
.+?)\.com") def parse(url): """解析视频真实地址""" site = _site.search(url).group('site') parser = _extract[site] return parser.parse(url) parse("http://www.letv.com/ptv/vplay/1111.html") ``` --- ``` class VideoMeta(type): def __init__(cls, classname, bases, dict_attr): """注意这里是cls, 如Video, LeTv""" assert hasattr(cls, "parse") assert hasattr(cls, "NAME") _extract[dict_attr["NAME"]] = cls() return type.__init__(cls, classname, bases, dict_attr) class Video(object): __metaclass__ = VideoMeta NAME = "BASE" def parse(self, url): raise NotImplementedError class LeTv(Video): """乐视TV""" NAME = "letv" def parse(self, url): pass ``` --- class: middle ``` class UpperMeta(type): """将属性修改为大写, 只能是__new__""" def __new__(cls, classname, bases, dict_attr): attr = {k.upper() if not k.startswith("__") else k : v for k,v in dict_attr.items()} return type.__new__(cls, classname, bases, attr) class Person(object): __metaclass__ = UpperMeta name = "lwy" age = 24 print( "name" in Person.__dict__) #False print( "NAME" in Person.__dict__) #True ``` --- ###垃圾回收 引用计数为主,标记-清除和分代收集为辅 ``` In [33]: class A(object): ....: pass ....: In [34]: a=A() In [36]: sys.getrefcount(a) Out[36]: 2 In [37]: sys.getrefcount? Return the reference count of object. The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount(). In [38]: b=a In [39]: sys.getrefcount(a) Out[39]: 3 ``` --- 引用计数 - 优点 简单, 实时 - 缺点 吞吐量不高, 循环引用 标记-清除: ``` gc.set_debug(gc.DEBUG_STATS) a = [] b = [] a.append(b) b.append(a) del a del b print("unreachable:", gc.collect()) gc: collecting generation 2... gc: objects in each generation: 504 3398 0 gc: done, 2 unreachable, 0 uncollectable, 0.0010s elapsed. ('unreachable:', 2) ``` --- class:middle ``` # 没有不可达对象,靠引用计数就可以 gc.set_debug(gc.DEBUG_STATS) a = [] b = [] a.append(b) b.append(a) del a print("unreachable:", gc.collect()) gc: collecting generation 2... gc: objects in each generation: 504 3398 0 gc: done, 0.0010s elapsed. ('unreachable:', 0) ``` --- - gc.enable() - gc.disable() - gc.get_threshold() (700, 10, 10) - gc.collect([generation]) 返回不可达对象数量 - gc.is_tracked() ``` In [62]: gc.is_tracked(1) Out[62]: False In [63]: gc.is_tracked("") Out[63]: False In [64]: gc.is_tracked([]) Out[64]: True In [67]: gc.is_tracked((1,)) Out[67]: True In [68]: gc.is_tracked({}) Out[68]: False In [69]: gc.is_tracked({'a':1}) Out[69]: False In [70]: gc.is_tracked({'a':[]}) Out[70]: True ``` --- ### 当循环引用遇到\_\_del\_\_ ``` # gc确定不了回收a,b的顺序,先回收b,a中__del__可能引用b gc.set_debug(gc.DEBUG_STATS) class A(object): def __del__(self): pass class B(object): def __del__(self): pass a = A() b= B() a.b = b b.a = a del a del b gc: collecting generation 2... gc: objects in each generation: 522 3398 0 gc: done, 4 unreachable, 4 uncollectable, 0.0010s elapsed. ``` --- ###\_\_del\_\_操作最好是无害的 ``` # socket.py class _fileobject(object): """Faux file object attached to a socket object.""" def close(self): try: if self._sock: self.flush() finally: if self._close: self._sock.close() self._sock = None def __del__(self): try: self.close() except: # close() may fail if __init__ didn't complete pass ``` --- ###多线程与多进程 - 并发模型: 多线程 异步IO ->以回调为主(tornado, nodejs) 协程 ->gevent, golang - GIL 每执行100(sys.getcheckinterval())条指令,将切换线程 - 策略 ctypes调用c库 使用多进程 --- #操作系统 - top ![](img/top.png) - 查看哪个进程占了6666端口 ![](img/netstat.png) - grep, sort, uniq grep -l, grep -h, sort -r, uniq -u, uniq -d --- class: middle dau grep -h 2015-12-28 * | grep BI_login | awk -F'|' '{print $3}' | sort | uniq | wc -l 付费 grep -h payment * | awk -F'|' '{print $10}' | awk '{r=index($0,"}"); sum+=substr($0,8,r-8)} END {print sum}' --- - 网络 tcp与udp的区别 time wait与close wait ``` tcp:面向连接,粘包问题 udp:无连接, 没有粘包问题 ``` 没有server bind 9999端口的进程,send一样返回 ``` In [2]: from socket import * In [3]: s=socket(AF_INET,SOCK_DGRAM) In [4]: s.connect(("localhost",9999)) In [5]: s.send("1111") Out[5]: 4 ``` --- ### epoll与select, poll比较 - 区别 - 水平触发与边缘触发 边缘触发防止饥饿问题 - 举例 水平触发: tornado, skynet 边缘触发: nginx, golang --- #数据库 - mysql字符集和排序规则 utf8 utfu_general_ci - varchar与char的区别? char最大255个字符,varchar最大65535字节(略小) - myisam与innodb区别? 事务支持,锁级别, innodb两阶段锁定协议 - 索引 原理,前缀索引, unique key, primary key unique key与primary key区别 --- class:middle ###B+数 ![](img/b.jpg) --- ###Innodb 主键 - 为什么使用innodb主键大多数是自增的整形??? Innodb 是聚簇索引 ``` CREATE TABLE all_role ( `id` int(11) NOT NULL AUTO_INCREMENT, `openid` varchar(100) NOT NULL, `clientid` int(11) NOT NULL, `roleid` varchar(64) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `openid_clientid` (`openid`, `clientid`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 select * from all_role where openid="xxxx" and clientid="xxxx"; 第一步: 通过索引openid_clientid获取主键id 第二部: 通过主键id获取其它字段 ``` --- class:middle - 为什么django_session不是这样? ``` CREATE TABLE `django_session` ( `session_key` varchar(40) NOT NULL, `session_data` longtext NOT NULL, `expire_date` datetime NOT NULL, PRIMARY KEY (`session_key`), KEY `django_session_c25c2c28` (`expire_date`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; ``` --- #领域 --- ##高性能网络库Gevent ![](img/net.png) http://nichol.as/benchmark-of-python-web-servers https://github.com/gevent/gevent/wiki/Projects(goagent,openstack) --- ##起源 - 多进程/多线程 以linux fork为代表,如早期的apache - select/poll 多路复用 - epoll/kqueue/iocp 以异步回调为主, 以nodejs为代表 - 协程 以lua coroutine/golang goroutine为代表 --- ##简单Tcp Server ``` # On Unix: Access with ``$ nc 127.0.0.1 5000`` # On Window: Access with ``$ telnet 127.0.0.1 5000`` from gevent.server import StreamServer def handle(socket, address): socket.send("Hello from a telnet!\n") for i in range(5): socket.send(str(i) + '\n') socket.close() server = StreamServer(('127.0.0.1', 5000), handle) server.serve_forever() ``` --- ##原理 coroutine ![](img/coroutine.jpeg) --- ##背后的力量 - libev C写的高性能网络库(对比memcache使用libevent) - greenlet Python协程实现(对比lua coroutine/golang goroutine) - c-ares 异步DNS请求 - cython ##现实世界应用 - 借助gevent让flask(django)拥有实时推送能力(对比tornado) (http://xlambda.com/gevent-tutorial/#server_2) - 游戏后端框架(https://github.com/9miao/Firefly) - 爬虫 。。。 --- 定时任务 ``` def f(): print time.time() print 'eeeee' from gevent.core import loop l = loop() timer = l.timer(2,3) #2秒后启动,3秒后再次启动 print time.time() timer.start(f) l.run() ``` --- ##借助greenlet自己写一个gevent ``` def accept(self): while True: try: print("accept") client_sock, address = self.sock.accept() break except socket.error as e: print(e) if e.args[0] != errno.EAGAIN: raise # switch是回调函数,当loop回调时会再次回到这里 switch = rawgreenlet.getcurrent().switch # 注册读事件,当sock可读时调用回调函数 ioloop.wait(self.sock, switch, ioloop.READ) ``` 参考: https://github.com/Skycrab/code/blob/master/Python/mygevent.py --- class: center, middle, inverse # 谢谢大家