10/19/2015 – The Black Box Society chapters 1-3

This week we read the first three chapters of Frank Pasquale’s The Black Box Society: The Secret Algorithms That Control Money and Information.

Students in the class should come to class with a preliminary idea of what book they would like to present during the last two sessions. Think also about back-up ideas in case we have a conflict over a book title. Suggested books for student presentations are listed by topic on the course website.

Once more:

UPDATE: The word limit for comments has been increased to 300 words or less.

Advertisements

15 Comments

Filed under Class sessions

15 responses to “10/19/2015 – The Black Box Society chapters 1-3

  1. acamperi

    As I was reading the second chapter of Pasquale’s book, about data mining and how companies are tailoring ads to users based on their searches, I was reminded of a game a friend and I would play in high school. We were aware of this data collection and analysis, so we would come up with the most ridiculous Google searches, and based on those see who could get the strangest ads. We always had good fun with this, and would just click random links when we were bored in class to see who could find the weirdest webpage, but we were never preoccupied about potential consequences, except for the professor finding us out and prohibiting laptop use in class (which in hindsight would not have been the worst thing).

    However, as I read about the people who were put on watch lists (21) or targeted for certain medications (30) solely based on internet history, I started wondering whether these forays into the stranger corners of the internet were perhaps not the wisest idea. Pasquale mentions ways of protecting yourself from these companies who snoop on searches, such as anonymous proxies, but I was definitely not well versed enough with technology at the time (and even now) to take such precautions, so my searches were definitely associated with my name and IP address.

    I just hope that I did not jeopardize my future with a few dumb searches I made when I was 15 years old.

  2. pd92, 83

    Central to Frank Pasquale’s narrative is that “Google and Facebook were once in the right place at the right time” [p87], and it is because of their priveleged position that “outside innovation is dead in the water.” [p82]. While I agree (though with caveats) with Pasquale’s point that a Google-like search engine “brewing is somebody’s garage” [p82] is unlikely, Pasquale overlooks numerous points, of which I shall describe two.

    First, shifts in business model. Time and again, history has shown that being in the right place at the right time is not enough. Thanks to technological innovation (such as the shift to mobile) and societal changes (less research, more surfing), Search as a product is becoming less and less relevant to the everyday web consumer. One could argue that Google’s other web properties would make up for the slump in revenue, but most of Google’s other products are far easier to compete with. (They also collectively bring in an order of magnitude less cash).

    Second, innovation. Despite Pasquale’s missives to the contrary, there’s a tremendous amount of innovation threatening to ‘distrupt’ the original distrupters. As any Stanford CS graduating student will tell you, the market for talent is red hot, and that talent percieves startups as ‘sexier’ than incumbent giants. Globally, even search organizations (Yandex in Russia, Baidu in China) have proven that it is very much possible to build excellent, profitable, market-leading search engines that are not Google. That “Silicon Valley is no longer a wide-open realm of opportunity” [p81] is plain false.

  3. The question that immediately arises from this text is, in essence: if the technology and finance industries have become such metaphorical black boxes that conceal and manipulate so much information, how do we even begin to tackle such an issue that has become so widespread and out of our control? This question is difficult and convoluted to answer simply because of the implications of the ‘black box’ nature of the operations of many of these companies: the fact that the companies/industries are black boxes in and of themselves “defeats any definitive resolution of the issue” (40). Additionally, Pasquale complicates this question by demonstrating that Google, for example, has such a monopoly that its power perpetuates the black box issue in “what is for them a wonderfully virtuous cycle” (87). They are to the point where “their dominance is so complete, and their technology so complex, that they have escaped pressures for transparency and accountability that kept traditional media answerable to the public” (61). The answer cannot be to dislodge these companies because Google and other large IT companies’ power is too ingrained in the Internet society, and since “alternatives are demonstrably worse, and likely to remain so as long as the dominant firms’ self-reinforcing data advantage grows” (83), the solution must take on a different form. If society’s awareness of these issues can be significantly increased, then we will be on a path toward coming up with an answer to the overarching question: “If enough readers are shaken from their complacency, they start to make the changes that can prevent the prophecy” (17). This is a more feasible solutions, but will still be a challenge given the side effects of things like search personalization that leads to “increased insularity and reinforced prejudice” (79) on the users’ side of things.

  4. In the initial pages of the second chapter of The Black Box Society, Pasquale makes a fascinating argument about how credit scores were the first black box, “making critical judgements about people, but hiding their methods of data collection and analysis” (22). Pasquale paints a creepy picture the interaction (or lack thereof) between bureaus and consumers highlighting blatant obfuscation, mysterious scoring algorithms that are hid by patent law, and sheer lack of objectivity and reliability that unknowing consumers have, inevitably, come to accept. The argument he makes in this section was – at least in my view – incredibly well-delivered and substantiated with plentiful evidence. However, I do believe he missed out on an opportunity to make his claims even more poignant.

    About a year ago I read an article suggesting that there was a growing trend in the credit industry to begin incorporating a consumer’s social media usage when determining creditworthiness. Pasquale’s argument about the credit industry’s black box mostly touches upon what credit bureaus have done, as such he failed to bring in this new dimension, which I believe would make his argument even more persuasive. Credit bureau’s black box is getting even bigger with the incorporation of user’s social media usage. Now, in addition to not knowing how your financial behavior affects your score, you will have to wonder how your social graph on Facebook will affect it or who you follow (or don’t) on Twitter. Incorporating this new feature only makes the black box larger, and following Pasquale’s logic, even scarier. He might touch upon this later in the book, but for now I think it was a missed opportunity to tie in two industries that historically have been black boxes and discuss this new development of our times.

    http://money.cnn.com/2013/08/26/technology/social/facebook-credit-score/
    http://www.wsj.com/articles/SB10001424052702304773104579266423512930050

  5. Jamison Elizabeth Searles

    I will call into question Pasquale’s insinuation in Chapter 2 that Google is racist.

    He references a study by Lauren Sweeney that compared ads generated by searching names associated with African Americans and names associated with whites. She found ads relating to arrest were more likely to appear for searches of typically African American names.

    Pasquale writes ‘it would be easier to give tech companies the benefit of the doubt if Silicon Valley’s own diversity record weren’t so dismal” (39). He cites that 2 percent of its U.S. employees were African American, compared to 12 percent of the U.S. workforce. However, it should be noted that only about 4.5 percent of computer science or computer engineering degree recipients from prestigious universities are African American, although this gap is still unsettling (1).

    With regards to Latanya Sweeney’s findings, “several explanations,” as Pasquale notes, could account for Google searches’ apparent discrimination. Moreover, The Huffington Post was unable to replicate Sweeney’s results when using Chrome’s incognito mode, though she claims her tests were performed with continuous clearing of the browser’s cache and cookies (2). This might suggest that user associations such as previous internet activity and location are responsible for the discrepancies in ad targeting.

    Rather than acting as a source of racism, perhaps, Google’s hiring record and search results are a reflection of injustices existing in our society such as unequal access to education and disparities in incarceration statistics between different ethnicities (3).

    1) http://www.usatoday.com/story/tech/2014/10/12/silicon-valley-diversity-tech-hiring-computer-science-graduates-african-american-hispanic/14684211/
    2) http://www.huffingtonpost.com/2013/02/05/online-racial-profiling_n_2622556.html
    3) http://www.naacp.org/pages/criminal-justice-fact-sheet

  6. ccibils

    Pasquale’s message is clear, it’s not the algorithms that are bad, it’s their secret nature. It resounds as a message of ethics and, at the core, need for trust. In chapters two and three, he paints the picture of runaway data and how both the private and public sector have been getting their hands on it and computing shady things. It is undoubtedly scary that our digital footsteps can be traced to us, sometimes accurately and fairly, and sometimes quite the opposite. Of course, institutions such as the government and Google stand by their behavior, claiming that the benefits far outweigh the risk and cost.

    What Pasquale manages to do extremely well is to show, quite clearly, that even though all of these institutions may be acting in good faith, there are very strong incentives for them not to. His reference to a one-way mirror is quite correct, and the emphasis on the moral questioning of the “black box” nature of these algorithms that are more and more present in our lives is central.

    So the issue then becomes, how can we trust an institution that has incentives to cheat, and moreover, controls its own system? What mechanisms can we create to overcome this? It’s not to say that the value added by Google or Facebook or Twitter or Finance or Government doesn’t have its value, it’s simply a call to question the scaffolding the algorithmic society is relying on. Information asymmetry is always a disadvantage for the consumer.

    However, there are initiatives out there that seem promising. Bitcoin, and other decentralized protocols, offer a mathematical, game-theoretic solution. Everyone knows the system, there is no black box. What results is an equilibrium of behaviors that allow users to be on the same informational page as producers, and moreover, fades the formerly distinct line between them. Just as there are technological monoliths that have the power to shape society, there are also open source communities that fight for user’s rights.

    From a personal standpoint, if a protocol or an algorithm, doesn’t survive scrutiny of a bilateral nature, it should not be fit to be implemented as a social governor.

  7. Within Chapter 2, Frank Pasquale laments the rise in usage of automated personality tests as barometers for employment, primarily in the context of retail jobs in giant corporations such as Best Buy and Target (p. 36-37). He footnotes a 2009 Wall Street Journal article entitled, “Test for Dwindling Retail Jobs Spawns a Culture of Cheating” by Vanessa O’Connell (http://www.wsj.com/articles/SB123129220146959621).

    The article delves deeper into the effects of this personality test, highlighting the many candidates who are “red-lighted” as unfit for interviews solely from their responses to their relationship with such statements as “Other people’s feelings are their own business” and “You have no big regrets about the past.” The test was created by a company called Unicru, and complaints to it are well-documented (including numerous Facebook groups with workers expressing frustration against the company, a sort of quasi-unionization against this personality-test secrecy in the employment process).

    There is another unfortunate effect of these personality tests: because they create an organized system in which applicants can be sorted, it creates an environment that can be “gamed”—answer keys highlighting which answers employers want to see from these tests can be found all over the internet (especially on blogs such as these: http://melbel.hubpages.com/hub/Unicru). People who do not use these answer keys are left at the mercy of these personality tests, while applicants who might otherwise be very unqualified for these jobs can “cheat” on these tests and get green-lighted for interviews from the arbitrary, qualitative analysis provided by the test.

    Companies have not admitted to any anxiety about this type of cheating, but several large corporations have discontinued their use of Unicru’s technology in recent months, according to the article.

  8. Betsy Alegria

    Can big data and algorithms be discriminatory? Who is responsible and how does it affect society?

    In Chapter 2, we read about Latanya Sweeney’s study on how names affected the ads served. At the start of the chapter, Pasquale made an important claim when he said that “algorithms are not immune from the fundamental problems of discrimination….they are programmed by human beings whose values are embedded into their software” (38). Pasquale also brought up how Google’s workforce, the people behind the algorithms are not ethnically/racially diverse. I don’t think Pasquale is calling such company’s racist, but rather, bringing to light how the lack of diversity can in fact affect algorithms. What would some algorithms look like and how would data be organized if most of the people behind the computers were not heterosexual cisgender white males? It is important to distinguish the two ways algorithms can be discriminatory – 1. it is designed in such a way because the person behind the algorithm has biases 2. the algorithm is designed to learn from us (the users), who teach it the racial/discriminatory tendencies it produces.

    If the algorithms are built off or learn racial biases and other discriminatory tendencies, what is all this data doing other than perpetuating the already discriminatory society we live in? In Chapter 3, Pasquale states that the all powerful search capability has the power “to give each of us a perfect little world of our own, a world tailored so exquisitely to our individual interests and preferences that it is different from the world as seen by anyone else” (59). The powerful search capability also has the power to give us a little world of our own where we are racially profiled on the internet

  9. In Chapter 3, Pasquale references a blog post by Danny Sullivan titled “Google’s Broken Promises & Who’s Running The Search Engine?”, which mentions the “fuzzy management” structure of Google (70-71). Pasquale doesn’t talk much about this management structure but I think it’s rather important to his main arguments about the opacity of algorithms and methodologies used by Google, Apple, Microsoft, etc. This is because he tends to refer to these corporations as intentional, semi-secretive monoliths (this is how I’ve interpreted the book so far), even though they may be much less centralized than one might imagine.

    This isn’t to say that these corporations are not centralized. But the opacity covering their algorithms also surrounds corporate structures, chains of responsibility, and individual scopes of corporate knowledge that should effect how we choose to act upon the possible moral slip-ups that Pasquale talks about. For example, if Apple’s corporate structure made it so that the apps mentioned earlier in the chapter (Eucalyptus, Drones+, and In a Permanent Save State) were never reviewed by higher-level management, does this necessarily mean that a more centralized structure is necessary to prevent these sort of slip-ups in the future? I think that would be a legitimate argument. I also think that one could argue that decentralized structures have greater potential to be moral, though that’s an off-topic debate.

    Having worked as a software developer, the possibility of fuzzy management also makes me think about these corporation’s developers’ relationship to the products. I wouldn’t be surprised if one Google search is too complex for a single developer (or manager) to fully and completely understand. Thus one developer may not even be able to comprehend the moral implications of their work if what they build only affects a small part of the process.

  10. Jack Cook

    In all I found this a very clear account of the problem with enormous data-driven monopolies and oligopolies – it’s not that we know they are behaving in suspect and anti-competitive ways but we don’t know enough about their operations to know whether we the users are being harmed. In the most recent round of EU antitrust litigation against Google the complaints made against them by the prosecution are almost identical to the concerns that Pasquale raises (http://www.nytimes.com/2015/08/28/technology/google-eu-competition.html).

    Even reading the NYTimes brief I linked is not enough to circumvent the central problem that Pasquale is raising in his book – without access to the “black box” algorithms that Google uses to run its business we can only speculate as to potential wrongdoings that are actually occurring.

    I get the feeling that Pasquale will get deeper into what he thinks is the solution for tech oligopolists like Google, but I think the answer to that comes down to how we view the Internet and the structure that Google has provided. If, as the trend seems to be amongst modern economists, the Internet is and should be a municipal good that everybody is entitled to, then perhaps Google and similar companies should operate as government-owned monopolies. While breaking Google up has long been touted by various pro-privacy luddites (e.g. http://www.economist.com/news/leaders/21635000-european-moves-against-google-are-about-protecting-companies-not-consumers-should-digital/) it seems we are at a point where the infrastructure Google provides is too valuable a good to the average Internet user and too vital to the Silicon Valley ecosystem as a whole that municipalizing it is the only option that makes sense. I look forward to the in-class debate on it.

  11. I think its interesting to consider the possibilities in which the “black boxes” that Pasquale describes could potentially cause instances of Greenfield’s Mind Change, particularly when it comes to intense personalization (60). A notable feature of black boxes is that they hide their values within them (8), meaning they likely promote and demote certain types of content based either on what they believe we like or on what the people who designed them like. Pasquale seems to focus on the ethical consequences of this, but in light of our last reading I wonder if there might be long-term personal consequences as well.

    As an example, if Google believes someone is more interested in critical articles, it may begin to show more of them in their search results. This may gradually condition the reader to suspect the world is a more negative place than it really is and could potentially affect his or her behavior as he or she attempts to fit in. More pertinently, however, I wonder if this reader could then associate the dopamine reactions that Greenfield suggests come with internet browsing with the strongly-worded content in these articles, and then could become a more critical and attacking person entirely on the basis of the search results he or she has been served. He or she would not realize that this change was happening but would have to live with the effects. This could potentially add strength to Pasquale’s calls for attention, as it would suggest these software black boxes may actually affect the circuits in our brain in a potentially negative manner.

  12. sgussman

    Thus far in The Black Box Society, Frank Pasquale painted a dystopian reality justified by vast amounts of data and secret black-box algorithms. In this world, we find our destinies decided by secret flags and convoluted algorithms. Pasquale’s discussion of medical data was especially compelling. As Pasquale points out, privacy legislation (like HIPAA) only affects our healthcare providers and insurers directly. Third parties we have direct contact with, like pharmacies, are still able to collect and sell our medical information (like what prescriptions we take). Other third parties, like search engines and retailers, can also sell pertinent information to data brokers who compile detailed descriptions and predictions about us and our medical history. These data-brokers present their predictions as opinions to circumvent libel litigation, but the bottom line is that their reports are utilized as if they were fact. This can have damning consequences from impacting your health insurance rate to causing you to become a target for predatory advertisers.
    Pasquale’s picture is dark, and he seems to be calling for even greater restrictions on the collection of medical data. What Pasquale fails to address, however, is the massive impediment privacy legislation has created for the effective statistical study of medicine. Larry Page, despite Pasquale’s views of Google, estimated in 2014 that 100,000 lives could be saved annually via the massive-scale mining of medical data. Page’s bold claim points out the value of sweeping macro-level medical studies, which are effectively impossible to conduct because of HIPAA. These types of studies could have broad impacts– from better cancer treatments to reducing medical side effects. HIPAA prevents mining the very medical data that would be most beneficial to both individuals and society while enabling third parties to sell flawed/incomplete data that is regularly utilized with questionable intentions. HIPAA was intended to help consumers with pre-existing medical conditions, but that rationale disappeared when the ACA was passed. It now serves mainly as a symbol of privacy and information control while also binding the hands of researchers who would use our data for unobjectionable good.
    Pasquale highlights that legislation does little to keep our medical information private in today’s data rich world, but he fails to recognize that it’s simultaneously costing lives and hamstringing researchers by preventing access to our most valuable data.

  13. Pasquale does a fine job in countering counter-arguments and convincing us why we should be worried about the lack of our knowledge in what information companies gather. Pasquale later derides companies like Google for preventing innovation/accountability by not revealing how search algorithms work, not anonymizing/opening up data, and not taking sufficient action when bad things occur (65)(83).

    He presents these companies in a sinister light, intimating the possibility of darker consequences, such as perpetuating people’s own beliefs and stereotypes (79), but nothing as extreme as examples from earlier chapters.

    Yet because “don’t be evil” clearly ruled the Google while I worked there, Pasquale’s examples have not yet convinced me. Companies like Google are not evil, but self-interested; hackers could de-anonymize data, resulting in huge legal issues, and Google has taken action in the past, especially on discriminatory acts (73) and revenge porn. Alphabet-formerly-Google also innovates in many different sectors (self-driving cars, contacts for diabetics, free internet for the world) and can afford to do this because of Google’s advertising profits.

    I agree that there is a problem. But what concerns me is that even after reading these three chapters, I’m not particularly worried about companies like Google. I’m more worried about my credit score.

    Perhaps this is also because I’m part of a generation that cares increasingly less about privacy. Pasquale may mention this later, but given this, how do we get the average person to be worried? What can the average person do in the short term, if adblock/TrackMeNot fails? Given a limited driving public force, how can we even provide this regulation, especially with government/businesses so intertwined (57)? And given ultimate opacity in the overwhelming amount of data collected by all these companies — how do we deal with it?

  14. Before railing into the tech industry for pursuing profit and secrecy over the public good and transparency, Pasquale concedes that “[data]-intensive advertising helps generate over $150 billion a year in economic activity” (19). The figure comes from a 2013 AdWeek article, and it deserves closer scrutiny. The report AdWeek cites was prepared by the Direct Marketing Association, a trade group of data-driven marketers lobbying against regulatory measures on digital advertising. In the article, we learn that “the data-driven marketing economy added $156 billion in revenue to the U.S. economy and fueled more than 675,000 jobs.” What they mean is that advertising revenues increased by $156 billion, and that this revenue is simply appended to the rest of the economy. As lovely as this sounds, it makes no sense. The principal aim of advertisers is to divert consumers’ wallets toward the products their advertising. They do this by convincing consumers that their product will make the consumer happier than they would be if they bought some other product. Advertisers’ impact on the economy is to influence consumer behavior; if they make $156 billion, that means they have had at least a $156 billion influence on consumer behavior, not that they have “generated” $156 billion to the economy. The economic infrastructure of the internet is based on us consuming free services while agreeing to allow companies to influence our behavior as consumers. This adds an important dimension to Pasquale’s work, because the source of the tech world’s secrecy and ignorance of the public good comes from the model of advertising on the internet. Pasquale writes that “surely [internet companies’] interests must conflict with ours sometimes–and then what?” (61) This is a big question, and we ought to think from the ground up about how to make the internet work for everyone.

  15. aselvan2012

    After reading chapter 3 of The Black Box Society, titled The Hidden Logics of Search, I have come to the position that perhaps the Apple App Store and Google Search are markets that need to be regulated. After all, it is impossible to launch on the App Store and make any money if Apple decide that they don’t want you there – such as the case with Drones +. Even more concerning is how Google has become the fundamental tool for accessing the vast information of the internet. For instance, if Google doesn’t want you to be found – as in the case of Foundem, it is likely you can never be found.
    Surely, it would make sense for regulators to realize the something like Google Search, or even the App Store are now marketplaces that need to be strongly regulated, in a similar manner to our financial markets, and take away control of them from Google and Apple respectively. Whilst this does seem like theft of property, one cannot deny that in the information age, there must be an open an unbiased way of accessing this information. I would be interested in further discussing in class what people feel about an idea like this. Of course, there would be pitfalls e.g. who gets to control the new Google Search – would it be correct that the US government control it even though it is used around the world?
    After all, wouldn’t you find it sketchy if Goldman Sachs also owned the NYSE?

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s