See: Data Pipelines Pocket Reference: Moving and Processing Data for Analytics 1st Edition

Fair Use Source:



See: Mastering Azure Analytics: Architecting in the Cloud with Azure Data Lake, HDInsight, and Spark 1st Edition

Fair Use Source:



See: Learning Spark: Lightning-Fast Data Analytics 2nd Edition

Fair Use Source:

C# .NET Cloud Data Science - Big Data History Software Engineering

Microsoft SQL Server

“” (CS9PR)

Previous: — Next:


Fair Use Sources:

Artificial Intelligence Cloud History Software Engineering

Artificial General Intelligence (AGI)

Return to Timeline of the History of Computers


Artificial General Intelligence (AGI)

“The definition and metric that determines whether computers have achieved human intelligence is controversial among the AI community. Gone is the reliance on the Turing test — programs can pass the test today, and they are clearly not intelligent.

So how can we determine the presence of true intelligence? Some measure it against the ability to perform complex intellectual tasks, such as carrying out surgery or writing a best-selling novel. These tasks require an extraordinary command of natural language and, in some cases, manual dexterity. But none of these tasks require that computers be sentient or have sapience—the capacity to experience wisdom. Put another way, would human intelligence be met only if a computer could perform a task such as carrying out a conversation with a distraught individual and communicating warmth, empathy, and loving behavior—and then in turn receive feedback from the individual that stimulates those feelings within the computer as well? Is it necessary to experience emotions, rather than simulate the experience of emotions? There is no correct answer to this, nor is there a fixed definition of what constitutes “intelligence.”

The year chosen for this entry is based upon broad consensus among experts that, by 2050, many complex human tasks that do not require cognition and self-awareness in the traditional biochemical sense will have been achieved by AI. Artificial general intelligence (AGI) comes next. AGI is the term often ascribed to the state in which computers can reason and solve problems like humans do, adapting and reflecting upon decisions and potential decisions in navigating the world—kind of like how humans rely on common sense and intuition. “Narrow AI,” or “weak AI,” which we have today, is understood as computers meeting or exceeding human performance in speed, scale, and optimization in specific tasks, such as high-volume investing, traffic coordination, diagnosing disease, and playing chess, but without the cognition and emotional intelligence.

The year 2050 is based upon the expected realization of certain advances in hardware and software capacity necessary to perform computationally intense tasks as the measure of AGI. Limitations in progress thus far are also a result of limited knowledge about how the human brain functions, where thought comes from, and the role that the physical body and chemical feedback loops play in the output of what the human brain can do.”

SEE ALSO: The “Mechanical Turk” (1770), The Turing Test (1951)

Artificial general intelligence refers to the ability of computers to reason and solve problems like humans do, in a way that’s similar to how humans rely on common sense and intuition.

Fair Use Sources: B07C2NQSPV

Cloud Data Science - Big Data DevSecOps-Security-Privacy History Software Engineering

Data Breaches – 2014 AD

Return to Timeline of the History of Computers


Data Breaches

“In 2014, data breaches touched individuals on a scale not seen before, in terms of both the amount and the sensitivity of the data that was stolen. These hacks served as a wake-up call to the world about the reality of living a digitally dependent way of life—both for individuals and for corporate data masters.”

“Most news coverage of data breaches focused on losses suffered by corporations and government agencies in North America—not because these systems were especially vulnerable, but because laws required public disclosure. High-profile attacks affected millions of accounts with companies including Target (in late 2013), JPMorgan Chase, and eBay. Midway through the year”, it was revealed that the Obama Administration’s United States Office of Personnel Management (OPM) was hacked via out-sourced contractors connected to the Chinese Communist government and “that highly personal (and sensitive) information belonging to 18 million former, current, and prospective federal and military employees had been stolen. Meanwhile, information associated with at least half a billion user accounts at Yahoo! was being hacked, although this information wouldn’t come out until 2016.”

Data from organizations outside the US was no less immune. The European Central Bank, HSBC Turkey, and others were hit. These hacks represented millions of victims across a spectrum of industries, such as banking, government, entertainment, retail, and health. While some of the industry and government datasets ended up online, available to the highest bidder in the criminal underground, many other datasets did not, fueling speculation and public discourse about why and what could be done with such data.

The 2014 breaches also expanded the public’s understanding about the value of certain types of hacked data beyond the traditional categories of credit card numbers, names, and addresses. The November 24, 2014, hack of Sony Pictures, for example, didn’t just temporarily shut down the film studio: the hackers also exposed personal email exchanges, harmed creative intellectual property, and rekindled threats against the studio’s freedom of expression, allegedly in retaliation for the studio’s decision to participate in the release of a Hollywood movie critical of a foreign government.

Perhaps most importantly, the 2014 breaches exposed the generally poor state of software security, best practices, and experts’ digital acumen across the world. The seams between the old world and that of a world with modern, networked technology were not as neatly stitched as many had assumed.”

SEE ALSO Morris Worm (1988), Cyber Weapons (2010)

Since 2014, high-profile data breaches have affected billions of people worldwide.

Fair Use Sources: B07C2NQSPV

Artificial Intelligence Data Science - Big Data History

Algorithm Influences Prison Sentence – 2013 AD

Return to Timeline of the History of Computers


Algorithm Influences Prison Sentence

“Eric Loomis was sentenced to six years in prison and five years’ extended supervision for charges associated with a drive-by shooting in La Crosse, Wisconsin. The judge rejected Loomis’s plea deal, citing (among other factors), the high score that Loomis had received from the computerized COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) risk-assessment system.

Loomis’ lawyers appealed his sentence on the grounds that his due process was violated, as he did not have any information into how the algorithm derived his score. As it turns out, neither did the judge. And the creators of COMPAS — Northpointe Inc. — refused to provide that information, claiming that it was proprietary. The Wisconsin Supreme court upheld the lower court’s ruling against Loomis, reasoning that the COMPAS score was just one of many factors the judge used to determine the sentence. In June 2017, the US Supreme Court decided not to give an opinion on the case, after previously inviting the acting solicitor general of the United States to file an amicus brief.

Data-driven decision-making focused on predicting the likelihood of some future behavior is not new — just ask parents who pay for their teenager’s auto insurance or a person with poor credit who applies for a loan. What is relatively new, however, is the increasingly opaque reasoning that these models perform as a consequence of the increasing use of sophisticated statistical machine learning. Research has shown that hidden bias can be inadvertently (or intentionally) coded into an algorithm. Illegal bias can also result from the selection of data fed to the data model. An additional question in the Loomis case is whether gender was considered in the algorithm’s score, a factor that is unconstitutional at sentencing. A final complicating fact is that profit-driven companies are neither required nor motivated to reveal any of this information.

State v. Loomis helped raise public awareness about the use of “black box” algorithms in the criminal justice system. This, in turn, has helped to stimulate new research into development of “white box” algorithms that increase the transparency and understandability of criminal prediction models by a nontechnical person.”

SEE ALSO: DENDRAL (1965), The Shockwave Rider (1975)

Computer algorithms such as the COMPAS risk-assessment system can influence the sentencing of convicted defendants in criminal cases.

Fair Use Sources: B07C2NQSPV

Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. “Machine Bias” ProPublica, May 23, 2016.

Eric L. Loomis v. State of Wisconsin, 2015AP157-CR (Supreme Court of Wisconsin, October 12, 2016).

Harvard Law Review. “State v. Loomis: Wisconsin Supreme Court Requires Warning Before Use of Algorithmic Risk Assessments in Sentencing.” Vol. 130 (March 10, 2017): 1530–37.

Liptak, Adam. “Sent to Prison by a Software Program’s Secret Algorithms.” New York Times online, May 1, 2017.

Pasquale, Frank. “Secret Algorithms Threaten the Rule of Law.” MIT Technology Review, June 1, 2017.

State of Wisconsin v. Eric L. Loomis 2015AP157-CR (Wisconsin Court of Appeals District IV, September 17, 2015).

Data Science - Big Data DevSecOps-Security-Privacy History

Differential Privacy – 2006 AD

Return to Timeline of the History of Computers


Differential Privacy

Cynthia Dwork (b. 1958), Frank McSherry (b. 1976), Kobbi Nissim (b. 1965), Adam Smith (b. 1977)

“Differential privacy was conceived in 2006 by Cynthia Dwork and Frank McSherry, both at Microsoft Research; Kobbi Nissim at Ben-Gurion University in Israel; and Adam Smith at Israel’s Weizmann Institute of Science to solve a common problem in the information age: how to use and publish statistics based on information about individuals, without infringing on those individuals’ privacy.

Differential privacy provides a mathematical framework for understanding the privacy loss that results from data publications. Starting with a mathematical definition of privacy—the first ever—it provides information custodians with a formula for determining the amount of privacy loss that might result to an individual as a consequence of a proposed data release. Building on that definition, the inventors created mechanisms that allow statistics about a dataset to be published while retaining some amount of privacy for those in the dataset. How much privacy is retained depends on the accuracy of the intended data release: differential privacy gives data holders a mathematical knob they can use to decide the balance between accuracy and privacy.

For example, using differential privacy, a hypothetical town could publish “privatized” statistics that were mathematically guaranteed to protect individual privacy, while still producing aggregate statistics that could be used for traffic planning.

In the years following the discovery, there were a number of high-profile incidents in which data and statistics were published that were supposedly aggregated or deidentified, but for which the data contributed by specific individuals could be disaggregated and reidentified. These cases, combined with undeniable mathematical proofs about the ease of recovering individual data from aggregate releases, sparked interest in differential privacy among businesses and governments. In 2017, the US Census Bureau announced that it would use differential privacy to publish the statistical results of the 2020 census of population and households.”

SEE ALSO Public Key Cryptography (1976), Zero-Knowledge Proofs (1985)

Differential privacy addresses how to maintain the privacy of individuals while using and publishing statistics based on their data.

Fair Use Sources: B07C2NQSPV

2006, Differential Privacy

Dwork, Cynthia, and Aaron Roth. The Algorithmic Foundations of Differential Privacy. Breda, Netherlands: Now Publishers, 2014.

Data Science - Big Data History

Apache Hadoop Distributed File System (HDFS) with MapReduce Makes Big Data Possible – 2006 AD

Return to Timeline of the History of Computers


Hadoop Makes Big Data Possible

Doug Cutting (dates unavailable)

“Parallelism is the key to computing with massive data: break a problem into many small pieces and attack them all at the same time, each with a different computer. But until the early 2000s, most large-scale parallel systems were based on the scientific computing model: they were one-of-a-kind, high-performance clusters built with expensive, high-reliability components. Hard to program, these systems mostly ran custom software to solve problems such as simulating nuclear-weapon explosions.

Hadoop takes a different approach. Instead of specialty hardware, Hadoop lets corporations, schools, and even individual users build parallel processing systems from ordinary computers. Multiple copies of the data are distributed across multiple hard drives in different computers; if one drive or system fails, Hadoop replicates one of the other copies. Instead of moving large amounts of data over a network to super-fast CPUs, Hadoop moves a copy of the program to the data.

Hadoop got its start at the Internet Archive, where Doug Cutting was developing an internet search engine. A few years into the project, Cutting came across a pair of academic papers from Google, one describing the distributed file system that Google had created for storing data in its massive clusters, and the other describing Google’s MapReduce system for sending distributed programs to the data. Realizing that Google’s approach was better than his, he rewrote his code to match Google’s design.

In 2006, Cutting recognized that his implementation of the distribution systems could be used for more than running a search engine, so he took 11,000 lines of code out of his system and made them a standalone system. He named it “Hadoop” after one of his son’s toys, a stuffed elephant.

Because the Hadoop code was open source, other companies and individuals could work on it as well. And with the “big data” boom, many needed what Hadoop offered. The code improved, and the systems’ capabilities expanded. By 2015, the open source Hadoop market was valued at $6 billion and estimated to grow to $20 billion by 2020.”

SEE ALSO Connection Machine (1985), GNU Manifesto (1985)

Although the big-data program Hadoop is typically run on high-performance clusters, hobbyists have also run it, as a hack, on tiny underpowered machines like these Cubieboards.

Fair Use Sources: B07C2NQSPV

Dean, Jeffrey, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI ’04): December 6–8, 2004, San Francisco, CA. Berkeley, CA: USENIX Association, 2004.

Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. “The Google File System.” In SOSP ‘03: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, 29–43. Vol. 37, no. 5 of Operating Systems Review. New York: Association for Computing Machinery, October, 2003.

DevSecOps-Security-Privacy History

Facebook – 2004 AD

Return to Timeline of the History of Computers



Mark Zuckerberg (b. 1984)

“Facebook, the 800-pound gorilla of the social networking world, is one of the most significant communications platforms of the modern era. While the site was not the first online service that enabled people to exchange information about themselves or publicly promote their interests, it was the service that took the phenomenon global. Facebook raised public awareness about what constituted “social networking” and brought into focus how a simple piece of software could enable the everyday person to have a voice disproportionate to his or her economic position, geographic location, or access to sources of community organization and influence—a voice bounded only by the strength of what he or she had to say.

Facebook also served as a widespread wake-up call to traditional media outlets that their business models were ripe for disruption, as the mass media’s audience flipped and went from being consumers of media to creators. Now suddenly people were their own storytellers, editors, publishers, neighborhood leaders, or global trailblazers with a platform for instantaneous projection of information around the world.

Facebook was founded in 2004 by Mark Zuckerberg and fellow Harvard students who developed a centralized website to connect students across the university. The early roots of the Facebook site are generally believed to have originated with Zuckerberg’s short-lived “FaceMash” site, which gamified the choice of who was more attractive among pairs of people. The launch of “TheFacebook,” as it was first called, demonstrated that there was an unfulfilled desire to connect with and learn about other people—at least among the Harvard student body. Along the way there were various legal challenges and allegations of idea theft, including a lawsuit by two brothers named Cameron and Tyler Winklevoss, who ended up with a settlement worth $65 million—chump change, compared to Facebook’s 2017 market value of more than $500 billion.

Facebook quickly expanded beyond Harvard and opened to other universities, eventually turning into a business that was inclusive of anyone who wanted to join. In March 2017, the site had 1.94 billion monthly active user accounts.”

SEE ALSO Blog Is Coined (1999), Social Media Enables the Arab Spring (2011)

Facebook CEO Mark Zuckerberg appears before a joint hearing of the Commerce and Judiciary Committees on Capitol Hill, on April 10, 2018, about the use of Facebook data in the 2016 US presidential election.

Fair Use Sources: B07C2NQSPV

GCP History

Google – 1998 AD

Return to Timeline of the History of Computers



Larry Page (b. 1973), Sergey Brin (b. 1973)

“The seed for what would become Google started with Stanford graduate student Larry Page’s curiosity about the organization of pages on the World Wide Web. Web links famously point forward. Page wanted to be able to go in the other direction.

To go backward, Page built a web crawler to scan the internet and organize all the links, named BackRub for the backlinks it sought to map out. He also recognized that being able to qualify the importance of the links would be of great use as well. Sergey Brin, a fellow graduate student, joined Page on the project, and they soon developed an algorithm that would not only identify and count the links to a page but also rank their importance based on quality of the pages from where the links originated. Soon thereafter, they gave their tool a search interface and a ranking algorithm, which they called PageRank. The effort eventually evolved into a full-blown business in 1998, with revenue coming primarily from advertisers who bid to show advertisements on search result pages.

In the following years, Google acquired a multitude of companies, including a video-streaming service called YouTube, an online advertising giant called DoubleClick, and cell phone maker Motorola, growing into an entire ecosystem of offerings providing email, navigation, social networking, video chat, photo organization, and a hardware division with its own smartphone. Recent research has focused on deep learning and AI (DeepMind), gearing up for the tech industry’s next battle—not over speed, but intelligence.

Merriam-Webster’s Collegiate Dictionary and the Oxford English Dictionary both added the word Google as a verb in 2006, meaning to search for something online using the Google search engine. At Google’s request, the definitions refer explicitly to the use of the Google engine, rather than the generic use of the word to describe any internet search.

On October 2, 2015, Google created a parent company to function as an umbrella over all its various subsidiaries. Called Alphabet Inc., the American multinational conglomerate is headquartered in Mountain View, California, and has more than 70,000 employees worldwide.”

SEE ALSO: First Banner Ad (1994)

Google’s self-described mission is to “organize the world’s information and make it universally accessible and useful.”

Fair Use Sources: B07C2NQSPV

Batelle, John. “The Birth of Google.” Wired, August 1, 2005.

Brin, Sergey, and Lawrence Page. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” In Proceedings of the Seventh International Conference on World Wide Web 7. Brisbane, Australia: Elsevier, 1998, 107–17.

Data Science - Big Data History Software Engineering

SQL Relational Database Programming Language Invented by Edgar Codd of IBM – 1974 AD

Return to Timeline of the History of Computers

SQL is a relational database programming language and was developed by Edgar Codd in 1974 and is still important in the programming language world.

Fair Use Sources:

See also: Relational Databases (1970 AD), Microsoft SQL Server (1989)

History Software Engineering

First Internet Spam Message – 1978 AD

Return to Timeline of the History of Computers


First Internet Spam Message

Gary Thuerk (dates unavailable), Laurence Canter (b. 1953), Martha Siegel (1948–2000)

On May 3, 1978, at 12:33 EDT, the world’s first mass-mailed electronic marketing message—what we now call spam—blasted out to more than a hundred email accounts on the US Department of Defense’s ARPANET. Sent by DEC employee Gary Thuerk, the email advertised an open house for DEC’s new line of computers, the DECSYSTEM-2020.

Unaware that the early mail program had the ability to read email addresses from an address file, Thuerk entered all the addresses into the email’s To: header; 120 fit in the header, and the remaining 273 overflowed into the message body, making the message both inappropriate and unattractive.

The reaction was overwhelmingly negative. The chief of the ARPANET Management Branch at the Defense Communications Agency responded with an email stating that the message “was a flagrant violation of the use of the ARPANET” and that the network was “to be used for official US Government business only.”

But Richard Stallman at the MIT Artificial Intelligence Laboratory, a free-speech advocate, wrote in response to the complaints: “I get tons of uninteresting mail, and system announcements about babies being born, etc. At least a demo MIGHT have been interesting.” Stallman did object, however, to the way the message was sent, writing, “Nobody should be allowed to send a message with a header that long, no matter what it is about.”

The reference to unsolicited email as spam did not come into vogue until the 1980s. It originates from a 1970 Monty Python sketch about a group of Vikings in a cafeteria who drown out the other conversations in the room by repetitively singing “SPAM, SPAM, SPAM,” referring to the canned-meat product manufactured by Hormel® Foods. The explosion in unsolicited bulk information has since spread to every digital medium. Although some of the messages remain “spam”—the same message being sent everywhere, to all recipients—increasingly messages are precisely customized to the individual recipients by algorithms that reference vast warehouses of personal data.

SEE ALSO First Electromagnetic Spam Message (1864), @Mail (1971)

The world’s first unsolicited email, or “spam,” was sent to ARPANET accounts on May 3, 1978. Today many email services are able to detect and isolate such messages into separate folders.

Fair Use Source: B07C2NQSPV

History Software Engineering

The Shockwave Rider SciFi Book – A Prelude of the 21st Century Big Tech Police State – 1975 AD

Return to Timeline of the History of Computers


The Shockwave Rider

John Brunner (1934–1995)

“For all of humanity’s technical achievements, equally important are the voices that reflect upon the social changes and new norms that emerging technologies might bring. These tales can be particularly insightful when they envision an entire society that has yet to exist. One of the most famous of these is British author John Brunner’s 1975 novel The Shockwave Rider. Influenced heavily by Alvin Toffler’s 1970 nonfiction bestseller Future Shock, which concerns the negative impact of accelerated change and information overload on people, The Shockwave Rider describes in salient details a world in which data privacy and information management are abused by those in power and computer technology dominates individuals’ everyday lives.

The story revolves around Nick Haflinger, a gifted computer hacker who uses his phone-hacking skills to escape from a secret government program that trains highly intelligent people in a dystopian 21st-century America. The government and elitist organizations maintain control of society through a hyperconnected data and information net that keeps the general population ignorant of the world around them. Prominent themes in the book include using technology to change identities, moral decisions associated with data privacy and surveillance, and the mobility of self when the value of personal space and individuality is deemphasized.

The Shockwave Rider is also notable for coining the phrase worm as a computer program that replicates itself and propagates through computer systems. In the book, Haflinger employs different types of “tapeworms” and “counterworms” to alter, corrupt, and liberate data in the net to his advantage.

The Shockwave Rider is generally credited with being an early influence on the emergence of the 1980s sci-fi cyberpunk genre, in which plots focus on unanticipated near-future dystopias, societal conflict, and warped applications of technology. Well ahead of its time, it shows how computer technology is not just a tool to extend human cognition and improve productivity, but also an instrument that can enable the worst extremes of human nature.”

SEE ALSO “As We May Think” (1945), Star Trek Premieres (1966), Mother of All Demos (1968)

Cover of the 1976 Ballantine Books edition of The Shockwave Rider, by John Brunner.

Fair Use Source: B07C2NQSPV

Artificial Intelligence Cloud Data Science - Big Data DevOps

AIOps (Artificial Intelligence for IT Operations)

AIOps (artificial intelligence for IT operations) – “AIOps is an umbrella term for the use of big data analytics, machine learning and other AI technologies to automate the identification and resolution of common IT issues.”

Fair Use Source: 809137