Simple spatial scaling rules behind complex cities

Ruiqi Li,Lei Dong 

城市自其出现以来，已然成为人类发展的重要驱动，目前全球有超过50%的人口居住在城市之中，超过80%的财富与90%的创新都产生于城市；但城市的发展同时也带来许多社会问题，例如污染、交通拥堵、各类犯罪等等。城市是典型的由多种元素构成的复杂系统，过去的研究往往只关注于城市的某一方面，而且目前对于城市的定量研究仍不够充分，难以定量预测城市中主要元素的空间分布(例如人口、道路、与社会经济相关的相互作用等等)。本文基于空间吸引和匹配生长机制提出了一个简单模型，首次揭示了城市中主要元素的空间标度律，而且各主要元素可以由统一的框架来解释，而这使得我们可以使用任何单一的分布来对其他的分布进行推断。此外文中提出的模型不但可以解释介观尺度的空间分布，还可以对跨城市的宏观超线性与亚线性标度律的起源作出一般性解释，并准确预测公里级的社会经济活动。而且我们的理论方法也突破了过去城市研究领域中全局平均场理论的假设，直接从增长演化的视角对城市进行建模。文章还提出一些全新的概念，例如活跃人口，这一概念可以解决过去对于人口分布形式究竟是指数还是幂律的部分争论，还可应用于城市街区安全评估。简言之，我们的工作提出了揭示城市元素间相互作用与演变的新视角和新方法，未来它将有广泛的应用场景。

The Hidden Flow Structure and Metric Space of Network Embedding Algorithms Based on Random Walks

Gu, Weiwei,Li Gong 

Network embedding which encodes all vertices in a network as a set of numerical vectors in accordance with it’s local and global structures, has drawn widespread attention. Network embedding not only learns significant features of a network, such as the clustering and linking prediction but also learns the latent vector representation of the nodes which provides theoretical support for a variety of applications, such as visualization, link prediction, node classification, and recommendation. As the latest progress of the research, several algorithms based on random walks have been devised.
Although those algorithms have drawn much attention for their high scores in learning efficiency and accuracy, there is still a lack of theoretical explanation, and the transparency of those algorithms has been doubted. Here, we propose an approach based on the openflow network model to reveal the underlying flow structure and its hidden metric space of different random walk strategies on networks. We show that the essence of embedding based on random walks is the latent metric structure defined on the openflow network. This not only deepens our understanding of random walkbased embedding algorithms but also helps in finding new potential applications in network embedding. 
The Atlas of Chinese World Wide Web Ecosystem Shaped by the Collective Attention Flows

Xiaodan Lou,You Li 

我们采用CNNIC的调查数据，构造了由大量中国网民浏览行为形成的注意力流网络。根据流网络上的流距离度规，我们将整个注意力流网络嵌入到了一个高维空间中。在这个空间下，我们研究了整个中国互联网生态系统的分布情况。我们发现，网站会自发地形成4各区块。其中BAT三家大型网站构成了整个注意力的吞噬中心，它们占据了70%以上的注意力。另外，知识问答类、电子商务类、娱乐类、综合类网站分别聚集到地图中的不同位置上。 
Populationweighted efficiency in transportation networks

Lei Dong,Ruiqi Li 

Transportation efficiency is critical for the operation of cities and is attracting great attention worldwide. Improving the transportation efficiency can not only decrease energy consumption, reduce carbon emissions, but also accelerate people’s interactions, which will become more and more important for sustainable urban living. Generally, traffic conditions in lessdeveloped countries are not so good due to the undeveloped economy and road networks, while this issue is rarely studied before, because traditional survey data in these areas are scarce. Nowadays, with the development of ubiquitous mobile phone data, we can explore the transportation efficiency in a new way. In this paper, based on users’ call detailed records (CDRs), we propose an indicator named populationweighted efficiency (PWE) to quantitatively measure the efficiency of the transportation networks. PWE can provide insights into transportation infrastructure development, according to which we identify dozens of inefficient routes at both the intra and intercity levels, which are verified by several ongoing construction projects in Senegal. In addition, we compare PWE with excess commuting indices, and the fitting result of PWE is better than excess commuting index, which also proves the validity of our method.
本篇文章运用人类移动数据定义了一种由人口流量调节的道路疏运效率指标，并用这个指标分析了塞内加尔的城市道路网络，挖掘出了一些低效率的路段。同时，文章还为每个城市计算了一个交通运输能力指数，对塞内加尔的不同城市做了系统性的比较。这套指标的好处在于它不依赖于官方提供的道路信息，只需要运用手机通讯数据以及Google提供的地图查询服务就可以完成计算。因而这套方法特别适用于那些经济落后地区，以及调查数据缺失的地区。 
A Geometric Representation of Collective Attention Flows

Peiteng Shi,Xiaohan Huang 

With the fast development of Internet and WWW, “information overload” has become an overwhelming problem, and collective attention of users will play a more important role nowadays. As a result, knowing how collective attention distributes and flows among different websites is the first step to understand the underlying dynamics of attention on WWW. In this paper, we propose a method to embed a large number of web sites into a high dimensional Euclidean space according to the novel concept of flow distance, which both considers connection topology between sites and collective click behaviors of users. With this geometric representation, we visualize the attention flow in the data set of Indiana university clickstream over one day. It turns out that all the websites can be embedded into a 20 dimensional ball, in which, close sites are always visited by users sequentially. The distributions of websites, attention flows, and dissipations can be divided into three spherical crowns (core, interim, and periphery). 20% popular sites (Google.com, Myspace.com, Facebook.com, etc.) attracting 75% attention flows with only 55% dissipations (log off users) locate in the central layer with the radius 4.1. While 60% sites attracting only about 22% traffics with almost 38% dissipations locate in the middle area with radius between 4.1 and 6.3. Other 20% sites are far from the central area. All the cumulative distributions of variables can be well fitted by “S”shaped curves. And the patterns are stable across different periods. Thus, the overall distribution and the dynamics of collective attention on websites can be well exhibited by this geometric representation.
本篇文章通过点击流数据构建流网络，计算了任意两个网站的流距离，它能够反映两个网站相互联系的紧密程度。根据这些流距离，我们将所有的网站嵌入到一个20维空间中，于是我们可以看到不同的网站具有了不同的生态位。注意力流和网站的分布形成了三层的洋葱结构，最内层少数几个网站占据了绝大部分流量，中层有大多数网站，流量却很小，最外层少数网站有少数流量。 
Scaling behaviours in the growth of networked systems and their geometric origins

Jiang Zhang,Xintong Li 

In many networked systems (cities, online communities), links (or interactions) grow faster than nodes, as well, the diversity of nodes grow slower than nodes. We build a simple random network model based on geometric matching mechanism to reproduce both phenomena. The extensive model is further applied to model the distribution of natural cities.
在很多网路系统（在线社区、城市）中，连边（相互作用）总是会比节点以更快速度的增长。与此同时，节点的类别多样性会比节点以更慢的速度增长。本文章提出了一个简单的随机几何网络增长模型，同时给出了连边超线性生长和多样性亚线性生长的现象。通过改进该模型，我们还可以模拟城市系统的生长和分布。 
Open Flow Distances on Open Flow Networks

Liangzhu Guo,Xiaodan Lou 

Open flow network is a special weighted directed graph in which weighted links are flows, and the flows are in balance. We define a new set of distance metrics, which measure the average length of particles flow from i to j. Based on the distances, we discuss the calculation of trophic levels of specied on energetic food webs, the centrality of nodes, and the industrial clustering problem on inputouput networks, etc. We also compare the new distances with old distances on graph.
开放流网络是一种特殊的加权有向网，其中加权连边表示流量，同时节点满足流平衡。在这种网络上，我们定义了一组流距离，即流子沿连边流动从i到j经历的平均路径长度。在此基础上，我们讨论了食物网的营养级计算、投入产出网上的节点中心度、产业聚类等问题，我们还比较了新的距离与其它网络距离。 
Maximum Entropy for the International Division of Labor

Hongmei Lei,Ying Chen 

As a result of the international division of labor, the trade value distribution on different products substantiated by international trade flows can be regarded as one country’s strategy for competition. Each country wants to diversify their investments on different products as well as make profits as possible as they can. We build a model based on maximum entropy principle to reproduce the distribution curves of countries. The results show that almost all countries' export share distributions can be explained by the maximum entropy model if the constraints are properly selected.
我们可以将每个国家在不同产品上的出口份额看作是一种面向国际市场的劳动分工策略。每个国家都在尽量多样化自己的出口多样性的同时牟取最大的经济利益。本文提出了一个最大化熵模型以解释各个国家在不同产品上的出口份额分布曲线。在合适地选择了最大化熵的约束条件后，我们成功地用一个单参数最大熵模型较好地拟合了100多个国家的出口分布曲线。 
Allometric scaling, size distribution and pattern formation of natural cities

Xintong Li,Xinran Wang 

In this paper, we treated connected clusters of nighttime light as natural cities. We then study the allometric scaling laws, Zipf laws, and fractals on these natural cities. A concise model based on geometric matching mechanism is built to reproduce all the observed patterns.
在这篇文章中，我们把夜光形成的连同区域看作一种自然城市。我们系统性地探索了这些自然城市的异速标度律以及尺度分布。最后，我们构造了一个基于“几何匹配”的随机几何图模型成功地浮现出了所有观察到的实证现象。 
The Metabolism and Growth of Web Forums

Lingfei Wu,Jiang Zhang 

We view web forums as virtual living organisms feeding on user's clicks and investigate how they grow at the expense of clickstreams. We find that (the number of page views in a given time period) and (the number of unique visitors in the time period) of the studied forums satisfy the law of the allometric growth, i.e., . We construct clickstream networks and explain the observed temporal dynamics of networks by the interactions between nodes. We describe the transportation of clickstreams using the function , in which is the total amount of clickstreams passing through node and is the amount of the clickstreams dissipated from to the environment. It turns out that , an indicator for the efficiency of network dissipation, not only negatively correlates with , but also sets the bounds for . In particular, when and when . Our findings have practical consequences. For example, can be used as a measure of the “stickiness” of forums, which quantifies the stable ability of forums to remain users “lockin” on the forum. Meanwhile, the correlation between and provides a method to predict the longterm “stickiness” of forums from the clickstream data in a short time period. Finally, we discuss a random walk model that replicates both of the allometric growth and the dissipation function.
我们视网络论坛为虚拟的生命体，它们通过消耗用户的注意力来新陈代谢和生长。这些生物体服从广义的Kleiber定律，也就是论坛的独立访客数（UV）与总点击量（PV）之间存在着幂律关系，其中幂律指数可以用来刻画论坛的黏度。进一步研究发现，网络社区存在着耗散律，耗散律指数影响了黏度 
Hierarchicality of Trade Flow Networks Reveals Complexity of Products

Peiteng Shi,Jiang Zhang 

With globalization, countries are more connected than before by trading flows, which amounts to at least 36 trillion dollars today. Interestingly, around 3060 percents of exports consist of intermediate products in global. Therefore, the trade flow network of particular product with high added values can be regarded as value chains. The problem is weather we can discriminate between these products from their unique flow network structure? This paper applies the flow analysis method developed in ecology to 638 trading flow networks of different products. We claim that the allometric scaling exponent \eta can be used to characterize the degree of hierarchicality of a flow network, i.e., whether the trading products flow on long hierarchical chains. Then, it is pointed out that the flow networks of products with higher added values and complexity like machinary, transport equipment etc. have larger exponents, meaning that their trade flow networks are more hierarchical. As a result, without the extra data like global inputoutput table, we can identify the product categories with higher complexity, and the relative importance of a country in the global value chain by the trading network solely. 
Common patterns of energy flow and biomass distribution on weighted food webs

Jiang Zhang,Yuanjing Feng 

Some new common patterns such as dissipation law, gravity law, and allometric scaling are found on the collected energetic food webs.
我们发现能量流食物网上存在着普遍的耗散律和引力定律，以及异速标度律等现象。 
Capabilities’ substitutability and the “S” curve of export diversity

Hongmei Lei,Jiang Zhang 

Product diversity, which is highly important in economic systems, has been high
lighted by recent studies on international trade. We found an empirical pattern, designated as “Sshaped curve”, that models the relationship between economic size (logarithmic GDP) and export diversity (the number of varieties of export products) on the detailed international trade data. As the economic size of a country begins to increase, its export diversity initially increases in an exponential manner, but overtime, this diversity growth slows and eventually reaches an upper limit. The interdependence between size and diversity takes the shape of an Sshaped curve that can be fitted by a logistic equation. To explain this phenomenon, we introduce a parameter called “substitutability” into the list of capabilities or factors of products in the tripartite network model (i.e., the countrycapabilityproduct model) of Hidalgo et al. As we observe, when the substitutability is zero, the model returns to Hidalgo’s original model but failed to reproduce the Sshaped curve. However, in a plot of data, the data increasingly resembles an Sshaped curve as the substitutability expands. Therefore, the diversity ceiling effect can be explained by the substitutability of different capabilities. 
Allometry and Dissipation of Ecological Flow Networks

Jiang Zhang,Lingfei Wu 

本文研究了19个生态流网络中普适的异速标度律和耗散律，并指出了异速标度律指数与耗散律指数之间存在着负相关关系。 
The decentralized ﬂow structure of clickstreams on the web

Lingfei Wu,Jiang Zhang 

该文章将食物网中计算能量流网络的异速生长律的方法引入到点击流网络中，发现异速生长指数普遍小于1，这揭示出了一种点击流网络的去中心化结构。 
Centralized Flow Structure of International Trade Networks for Different Products

Peiteng Shi,Jingfei Luo 

All countries in the world are connected with each other and one country's export may impact other countries deeply through global value chains as the development of globalization. Owing to the regional differences in culture and resources, some countries may dominate the producing and exporting of one product in the global trade. This paper treats international trade webs of different products as flow networks to reveal a power law relationship between the goods flow through a country i (Ai) and the power impact of i to other countries (Ci) in the trade network of a specific product, Ci~Ai^\eta . Where the exponent \eta can be used to reflect the degree of the impact concentration on high through flow nodes of the whole network, it is centralized or decentralized. We discover that most of trade networks are centralized (i.e., \eta>1) and the manufacture products with high added values such as machinery equipment, iron and chemical products have larger exponent, i.e., their trade networks are more centralized than the ones of the primary products like agricultural and raw materials. This also means that the flow structure of the former is more centralized than the latter. 
The Common Extremalities in Biology and Physics

Adam Moroz 

Maximum Energy Dissipation Principle in Chemistry, Biology, Physics and Evolution 
Maximum Entropy and Ecology

John Harte 

This book builds the foundation for, and constructs upon it, a theory of ecology
designed to explain a lot from a little. By “little” I mean only inferential logic
derived from information theory. And by “a lot” I mean the ecological phenomena
that are the focus of macroecology: patterns in the partitioning of space and energy
by individual organisms and by species in ecosystems. 
BOOK  Mining the Social Web

Matthew A. Russell 

BOOK  Mining the Social Web 
