Tuesday, June 10, 2014

Google’s Dremel Makes Big Data Look Small

Old news: http://www.wired.com/2012/08/googles-dremel-makes-big-data-look-small/


Mike Olson runs a company that specializes in the world’s hottest software. He’s the CEO of Cloudera, a Silicon Valley startup that deals in Hadoop, an open source software platform based on tech that turned Google into the most dominant force on the web.
Hadoop is expected to fuel a $813 million software market by the year 2016. But even Olson says it’s already old news.
Hadoop sprung from two research papers Google published in late 2003 and 2004. One described the Google File System, a way of storing massive amounts of data across thousands of dirt-cheap computer servers, and the other detailed MapReduce, which pooled the processing power inside all those servers and crunched all that data into something useful. Eight years later, Hadoop is widely used across the web, for data analysis and all sorts of other number-crunching tasks. But Google has moved on.
In 2009, the web giant started replacing GFS and MapReduce with new technologies, and Mike Olson will tell you that these technologies are where the world is going. “If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now,” Olson said during a recent panel discussion alongside Wired.
‘If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now.’
— Mike Olson
Since the rise of Hadoop, Google has published three particularly interesting papers on the infrastructure that underpins its massive web operation. One details Caffeine, the software platform that builds the index for Google’s web search engine. Another shows off Pregel, a “graph database” designed to map the relationships between vast amounts of online information. But the most intriguing paper is the one that describes a tool called Dremel.
“If you had told me beforehand me what Dremel claims to do, I wouldn’t have believed you could build it,” says Armando Fox, a professor of computer science at the University of California, Berkeley who specializes in these sorts of data-center-sized software platforms.
Dremel is a way of analyzing information. Running across thousands of servers, it lets you “query” large amounts of data, such as a collection of web documents or a library of digital books or even the data describing millions of spam messages. This is akin to analyzing a traditional database using SQL, the Structured Query Language that has been widely used across the software world for decades. If you have a collection of digital books, for instance, you could run an ad hoc query that gives you a list of all the authors — or a list of all the authors who cover a particular subject.
“You have a SQL-like language that makes it very easy to formulate ad hoc queries or recurring queries — and you don’t have to do any programming. You just type the query into a command line,” says Urs Hölzle, the man who oversees the Google infrastructure.
The difference is that Dremel can handle web-sized amounts of data at blazing fast speed. According to Google’s paper, you can run queries on multiple petabytes — millions of gigabytes — in a matter of seconds.
Hadoop already provides tools for running SQL-like queries on large datasets. Sister projects such as Pig and Hive were built for this very reason. But with Hadoop, there’s lag time. It’s a “batch processing” platform. You give it a task. It takes a few minutes to run the task — or a few hours. And then you get the result. But Dremel was specifically designed for instant queries.
“Dremel can execute many queries over such data that would ordinarily require a sequence of MapReduce jobs, but at a fraction of the execution time,” reads Google’s Dremel paper. Hölzle says it can run a query on a petabyte of data in about three seconds.
According to Armando Fox, this is unprecedented. Hadoop is the centerpiece of the “Big Data” movement, a widespread effort to build tools that can analyze extremely large amounts of information. But with today’s Big Data tools, there’s often a drawback. You can’t quite analyze the data with the speed and precision you expect from traditional data analysis or “business intelligence” tools. But with Dremel, Fox says, you can.
“They managed to combine large-scale analytics with the ability to really drill down into the data, and they’ve done it in a way that I wouldn’t have thought was possible,” he says. “The size of the data and the speed with which you can comfortably explore the data is really impressive. People have done Big Data systems before, but before Dremel, no one had really done a system that was that big and that fast.
“Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.”
‘Before Dremel, no one had really done a system that was that big and that fast. Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.’
— Armando Fox
According to Google’s paper, the platform has been used inside Google since 2006, with “thousands” of Googlers using it to analyze everything from the software crash reports for various Google services to the behavior of disks inside the company’s data centers. Sometimes, the tool is used with tens of servers, sometime with thousands.
Despite Hadoop’s undoubted success, Cloudera’s Mike Olson says that the companies and developers who built the platform were rather slow off the blocks. And we’re seeing the same thing with Dremel. Google published the Dremel paper in 2010, but we’re still a long way from seeing the platform mimicked by developers outside the company. A team of Israeli engineers is building a clone they called OpenDremel, though one of these developers, David Gruzman, tells us that coding is only just beginning again after a long hiatus.
Mike Miller — an affiliate professor of particle physics at the University of Washington and the chief scientist of Cloudant, a company that’s tackling many of the same data problems Google has faced over the years — is amazed we haven’t seen some big-name venture capitalist fund a startup dedicated to reverse-engineering Dremel.
That said, you can use Dremel today — even if you’re not a Google engineer. Google now offers a Dremel web service it calls BigQuery. You can use the platform via an online API, or application programming interface. Basically, you upload your data to Google, and it lets you run queries on its internal infrastructure.
This is part of a growing number of cloud services offered by the company. First, it let you run build, run, and host entire applications atop its infrastructure using a service called Google App Engine, and now it offers various other utilities that run atop this same infrastructure, including BigQuery and the Google Compute Engine, which serves up instant access to virtual servers.
The rest of the world may lag behind Google. But Google is bringing itself to the rest of the world.

Saturday, June 7, 2014

知识改变命运?"90后"凤凰男:寒门难出贵子

http://www.wenxuecity.com/news/2014/06/07/3336884.html

从西南地区的农村考上上海的名校,一个原本年少轻狂的少年发现在知识改变命运的表象背后,还隐藏着许多并不由他掌控的现实和未来。和那些出身中上阶层的同学相比,他缺少对人生的规划意识和执行力。他们之间的命运,也正在不断分野。

曾经的年少轻狂

我今年24岁。在我已有的人生版图中,发生过两次大的地理迁徙:一次是到上海读大学,一次是到北京工作。再之前的时光,就局限于并不算贫穷的西南乡村:农村的家、镇上的初中、区里的高中。

12岁那年,上初中,父母送我去学校。估计是我父亲乡村医生的身份,以及母亲为人细致豁达,镇上的学生家长都客气地夸赞我。母亲客套地说: “农村长大的小孩,和镇上的孩子还是有差距的”。那时的我,对这番话完全不以为然。

随后的三年初中生涯中,不论是学习成绩、文体活动、礼仪素质,我都用自己的实际行动“粉碎”了母亲这番“谬论”。即便是进入区里的高中,我也以后来居上的姿态证明了自己。反而是那些小镇上的同学,给我印象并不佳。

但即便是在这样一个中下甚至底层的乡村中,每个成绩好的小孩都会被告知“学校和社会不一样”,成绩不好的小孩很多家长也不再强求,而是积极利用亲戚、朋友、熟人等资源为孩子谋划其他的出路——俗话说的“拼爹”。

比如高考。我高考那年,北大在我们那边的录取分数高到什么地步?仅比当年状元的裸分分数低一分!你要上北大,除非能加分。加分有多种类型,有鼓励性的特长加分、有照顾性的少数民族加分等,这本是一项在丰富选拔标准、促进公平录取等方面都很好的政策,但在实践中,由于考核标准、监管制度等不到位,多个加分项成为谁“活动到位”谁拥有的战利品。即便是纯粹看特长,我是农村家庭的孩子,如何去培养类似于小提琴、古筝之类的特长?

结果就是,我们彻底被特长加分政策抛弃在外。

他们的先见之明

凭借努力和运气,我当年高考的成绩还算差强人意,考上了复旦,大家都很高兴。庆功酒那天,每个亲戚都恭喜我父母,大家都认为我们这个家庭的命运已经被知识改变。

但进入大学,特别是步入社会的现实后,母亲当年的那些话,却一遍又一遍地从我的记忆中被唤起。只不过,“农村小孩”和“镇上小孩”的区别,变成了屌丝和高富帅/白富美之间的距离。而且,差距一词的内涵也由纯粹的个人努力,拓展到了能动用多少背后的社会资源的层面。

四年的大学生活,足够我熟悉和了解这里的环境,让我从太多的惊艳中将曾经的自负磨为谦卑。尽管也在很努力地去提高自己,却不知我最大的落后不在于英语口语发音、入门级计算机水平、贫乏的歌舞才艺等,真正差距在于,和那些出身中上阶层的同学相比,我缺少对人生的规划意识和执行力。

2010年世博会,学校要组织大批志愿者,我们学院大部分同学都在这个行列里。但当时的我并不愿意参加,因为正值家里农忙收割,缺失我这个主要劳动力,意味着父母需要付出成倍的劳动。和家里几经商量,最后的结果是父母强烈坚持我去世博会,他们的理由是难得有机会见识这么大的场面。而与此同时,我那些早就规划好读研和出国的同学们,正为此次难得的义工经历做着周密的日程安排,他们知道,这是通过下一道关卡的重要筹码。

对他们来说,清晰的人生规划是全方面的。我的手机里至今仍保存着一条短信,是大学期间唯一一次对女孩表白收到的回复:很难得,我们这么有默契,你也很优秀;但我是本地人,家中独女,我不可能去你家乡发展,更不愿意折腾自己去磨合我们两个家庭间的差距,我清楚我想要的生活,祝福你!

当时的我,难以理解这般的不近人情。直到工作之后,才慢慢懂得了其中的理性和得体,不得不感叹:有些人的人生迈出的每一步,都在为下一步的攻城掠地积攒力量;而有些人,真的是车到山前再找路,简单走走,随便看看。对于20出头的年轻人来说,这种规划意识的启蒙和支撑,都离不开家庭的熏陶和远见。知识改变命运的逻辑,在这里变得芜杂。

最终的分野

大学毕业后,每个人的选择迅速地分野。

我所在的寝室比较典型,当时宿舍一共四人,两个上海本地同学,家庭背景一官一商,毕业后分别到美国和英国深造;另外一名同学来自普通工薪家庭,毕业后父母托关系在准一线城市的老家为他谋了一份大型国企员工的职位。而我因为在老家找工作几经挫折,毕业前夕不得不选择北漂。

现在工作两年,因为在事业单位上班,领导们的背景大家明里暗里都知道一些,对于阶层、圈子、关系、资源的代际传承已经司空见惯。最近,办公室一位领导正张罗着将他在北京四中读高三的小孩送到美国念大学,这不禁让我想到我的小学/初中、高中、大学三个不同时期同学的现状:

很明显,知识确实能改变一部分人的命运。像我这样来自农村上大学的孩子绝大部分不会再务农,同样远远优于打工的同龄人。但对我们这部分人来说,在城市生活也绝对不会容易,特别是在房子、职业发展等大的人生机会方面需要自力更生,在大城市成家立业、为后代积累优渥社会资源的过程,尤其漫长而艰难。

今年春节回家,大年三十上午还在大巴上往家赶的我,在车上偶遇小学同学。拖家带口在温州打工的他告诉我,火车票太难买了,他中转了好几个站想尽办法才赶回来。尽管疲惫,但他的脸上还是洋溢着即将团聚的喜庆。

我们已无什么共同话题,他恭维我读书才有出路,我则夸奖他小孩很可爱。终于我先下车,思来想去,我给了他孩子一百元作为压岁钱,他反复推脱,最后还是收下,然后翻箱开包要给我他从温州带回来的特产。我看着他手脚并用的忙碌样,兜里的手机开始不停地振动,那是我的大学同学微信群,此时,他们正在发微信红包、晒马尔代夫的度假照、热烈讨论年终奖……

耳里传来连绵的爆竹声,午饭吃得早一些的人家已经开始上坟。手里拎着一袋温州特产的我,快步向家里走去。尽管那里有满怀期待的家人,有热腾腾的丰盛饭菜,可我知道,这已是我回不去的故乡;而千里之外的北京,尽管我常年生活在那里,但在可预见的未来,我都将在心灵上是那座城市的过客。

Monday, February 24, 2014

How to Get a Job at Google

http://www.nytimes.com/2014/02/23/opinion/sunday/friedman-how-to-get-a-job-at-google.html?_r=0




MOUNTAIN VIEW, Calif. — LAST June, in an interview with Adam Bryant of The Times, Laszlo Bock, the senior vice president of people operations for Google — i.e., the guy in charge of hiring for one of the world’s most successful companies — noted that Google had determined that “G.P.A.’s are worthless as a criteria for hiring, and test scores are worthless. ... We found that they don’t predict anything.” He also noted that the “proportion of people without any college education at Google has increased over time” — now as high as 14 percent on some teams. At a time when many people are asking, “How’s my kid gonna get a job?” I thought it would be useful to visit Google and hear how Bock would answer.
Don’t get him wrong, Bock begins, “Good grades certainly don’t hurt.” Many jobs at Google require math, computing and coding skills, so if your good grades truly reflect skills in those areas that you can apply, it would be an advantage. But Google has its eyes on much more.
“There are five hiring attributes we have across the company,” explained Bock. “If it’s a technical role, we assess your coding ability, and half the roles in the company are technical roles. For every job, though, the No. 1 thing we look for is general cognitive ability, and it’s not I.Q. It’s learning ability. It’s the ability to process on the fly. It’s the ability to pull together disparate bits of information. We assess that using structured behavioral interviews that we validate to make sure they’re predictive.”
The second, he added, “is leadership — in particular emergent leadership as opposed to traditional leadership. Traditional leadership is, were you president of the chess club? Were you vice president of sales? How quickly did you get there? We don’t care. What we care about is, when faced with a problem and you’re a member of a team, do you, at the appropriate time, step in and lead. And just as critically, do you step back and stop leading, do you let someone else? Because what’s critical to be an effective leader in this environment is you have to be willing to relinquish power.”
What else? Humility and ownership. “It’s feeling the sense of responsibility, the sense of ownership, to step in,” he said, to try to solve any problem — and the humility to step back and embrace the better ideas of others. “Your end goal,” explained Bock, “is what can we do together to problem-solve. I’ve contributed my piece, and then I step back.”
And it is not just humility in creating space for others to contribute, says Bock, it’s “intellectual humility. Without humility, you are unable to learn.” It is why research shows that many graduates from hotshot business schools plateau. “Successful bright people rarely experience failure, and so they don’t learn how to learn from that failure,” said Bock.
“They, instead, commit the fundamental attribution error, which is if something good happens, it’s because I’m a genius. If something bad happens, it’s because someone’s an idiot or I didn’t get the resources or the market moved. ... What we’ve seen is that the people who are the most successful here, who we want to hire, will have a fierce position. They’ll argue like hell. They’ll be zealots about their point of view. But then you say, ‘here’s a new fact,’ and they’ll go, ‘Oh, well, that changes things; you’re right.’ ” You need a big ego and small ego in the same person at the same time.

The least important attribute they look for is “expertise.” Said Bock: “If you take somebody who has high cognitive ability, is innately curious, willing to learn and has emergent leadership skills, and you hire them as an H.R. person or finance person, and they have no content knowledge, and you compare them with someone who’s been doing just one thing and is a world expert, the expert will go: ‘I’ve seen this 100 times before; here’s what you do.’ ” Most of the time the nonexpert will come up with the same answer, added Bock, “because most of the time it’s not that hard.” Sure, once in a while they will mess it up, he said, but once in a while they’ll also come up with an answer that is totally new. And there is huge value in that.
To sum up Bock’s approach to hiring: Talent can come in so many different forms and be built in so many nontraditional ways today, hiring officers have to be alive to every one — besides brand-name colleges. Because “when you look at people who don’t go to school and make their way in the world, those are exceptional human beings. And we should do everything we can to find those people.” Too many colleges, he added, “don’t deliver on what they promise. You generate a ton of debt, you don’t learn the most useful things for your life. It’s [just] an extended adolescence.”
Google attracts so much talent it can afford to look beyond traditional metrics, like G.P.A. For most young people, though, going to college and doing well is still the best way to master the tools needed for many careers. But Bock is saying something important to them, too: Beware. Your degree is not a proxy for your ability to do any job. The world only cares about — and pays off on — what you can do with what you know (and it doesn’t care how you learned it). And in an age when innovation is increasingly a group endeavor, it also cares about a lot of soft skills — leadership, humility, collaboration, adaptability and loving to learn and re-learn. This will be true no matter where you go to work.

Tuesday, February 4, 2014

出了 Tokina 12-24 DX

今天以较为低廉的价格出让了Tokina 12-24 DX . 有些不舍,  用过的镜头都会有感情.
但对方也是一个爱好摄影之人, 也算去得其所.  发帖 纪念.

Tuesday, December 31, 2013

My last photo in 2013


陈小二:我有点同情张艺谋了

2013年12月31日08:14  青年时报 

陈小二
  张艺谋超生这件事,闹这么大动静,估计张艺谋本身也没有想到
  张艺谋超生这件事,闹这么大动静,估计张艺谋本身也没有想到。其实,公众、媒体之所以逼迫张艺谋道歉,是站在计生委办事无法一碗水端平的基础上 ——凭什么普通老百姓多生一个孩子,计生部门就把人家罚得砸锅卖铁,有的人怀孕七八个月了,还要给人家强制引产;而张艺谋孩子超生却可以逍遥法外,不受任 何追究。
  可看了张艺谋的道歉视频,相信很多人也乐不起来。我们没有享受到应有的“庶民的胜利”,相反,倒有点同情他。张艺谋为何道歉?就因为自己多生了三五个娃?生育权是最基本的人权,有人因为做父亲而道歉,这事是不是很滑稽。
  而且,这么多年来,张艺谋全家就像“超生游击队”,东躲西藏;三个孩子在上学期间,老师从未见过孩子们的父亲;父亲的真名,也必须隐瞒;和孩子外出,至少拉开两百多米的距离……这对孩子的影响有多大?全国又有多少这样类似的家庭?
  当年,在出台计生政策的时候,计生部门曾允诺“计划生育好,国家来养老”。可如今,独生子女家庭享受到必要的政府养老政策了吗?每年又有多少社会抚养费拨向双独家庭的养老建设?又有多少失独家庭能够领到政府的补偿金呢?
  我的一个同学,是名校毕业生,也是独生子。在他刚刚考上研究生那一年因意外去世,父母至今生活在巨大的悲痛之中,每每看到他们的眼神,我都感到痛彻心扉,对这样的失独家庭,计生部门的抚养费又何曾眷顾过?
  一厢计生部门收取巨额超生罚款,另一厢又对做出贡献的家庭一毛不拔,坐守巨额社会抚养费,在此背景下,在打破计划生育壁垒的时候,可想而知会有多难。我们国家的人口到底到了什么样的情况,是否需要完全放开生育,这些问题需要尽快讨论,决不能再听计生部门一面之词。
  我仔细看了张艺谋道歉的视频,其情可悯。再过几十年,我们想想,全国人民逼迫一个享誉世界的大导演因为超生而在全国主流媒体道歉,这事可能很荒谬。但愿张艺谋超生事件能让我们对待生育问题的时候更加严肃、更加人性化。