Automating Open Source Intelligence
eBook - ePub

Automating Open Source Intelligence

Algorithms for OSINT

  1. 222 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Automating Open Source Intelligence

Algorithms for OSINT

About this book

Algorithms for Automating Open Source Intelligence (OSINT) presents information on the gathering of information and extraction of actionable intelligence from openly available sources, including news broadcasts, public repositories, and more recently, social media. As OSINT has applications in crime fighting, state-based intelligence, and social research, this book provides recent advances in text mining, web crawling, and other algorithms that have led to advances in methods that can largely automate this process. The book is beneficial to both practitioners and academic researchers, with discussions of the latest advances in applications, a coherent set of methods and processes for automating OSINT, and interdisciplinary perspectives on the key problems identified within each discipline. Drawing upon years of practical experience and using numerous examples, editors Robert Layton, Paul Watters, and a distinguished list of contributors discuss Evidence Accumulation Strategies for OSINT, Named Entity Resolution in Social Media, Analyzing Social Media Campaigns for Group Size Estimation, Surveys and qualitative techniques in OSINT, and Geospatial reasoning of open data. - Presents a coherent set of methods and processes for automating OSINT - Focuses on algorithms and applications allowing the practitioner to get up and running quickly - Includes fully developed case studies on the digital underground and predicting crime through OSINT - Discusses the ethical considerations when using publicly available online data

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Automating Open Source Intelligence by Robert Layton,Paul A Watters in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.
Chapter 1

The Automating of Open Source Intelligence

Agate M. Ponder-Sutton Information Technology & Centre for Information Technology, School of Engineering and Advanced Technology, Massey University, New Zealand

Abstract

Open source intelligence (OSINT) is intelligence that is synthesized using publicly available data. We will discuss the current state of OSINT and data science. The changes in the analysts and users will be explored. We will cover data analysis, automated data gathering, APIs, and tools; algorithms including supervised and unsupervised learning, geolocational methods, de-anonymization. How do all these things interact within OSINT including ethics and context? Now that open intelligence has become more open and playing fields are leveling, the need to ensure and encourage positive use is even stronger.

Keywords

privacy
ethics
automation
surveillance
machine learning
statistics
Open source intelligence (OSINT) is intelligence that is synthesized using publicly available data (Hobbs, Moran, & Salisbury, 2014). It differs significantly from the open source software movement. This kind of surveillance started with the newspaper clipping of the first and second world wars. Now it is ubiquitous within large business and governments and has dedicated study. There have been impassioned, but simplified, arguments for and against the current levels of open source intelligence gathering. In the post-Snowden leaks world one of the questions is how to walk the line between personal privacy and nation state safety. What are the advances? How do we keep up, keep relevant, and keep it fair or at least ethical? Most importantly, how do we continue to “make sense or add value” as Robert David Steele would say, (http://tinyurl.com/EIN-UN-SDG). I will discuss the current state of OSINT and data science. The changes in the analysts and users will be explored. I will cover data analysis, automated data gathering, APIs, and tools; algorithms including supervised and unsupervised learning, geo-locational methods, de-anonymization. How do these interactions take place within OSINT when including ethics and context? How does OSINT answer the challenge laid down by Schneier in his recent article elaborating all the ways in which big data have eaten away at the privacy and stability of private life, “Your cell phone provider tracks your location and knows who is with you. Your online and in-store purchasing patterns are recorded, and reveal if you are unemployed, sick, or pregnant. Your emails and texts expose your intimate and casual friends. Google knows what you are thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.” (Schneier, 2015b). These effects can be seen in worries surrounding the recording and tracking done by large companies to follow their customers discussed by Schneier, (2015a, 2015b) and others as the crossing of the uncanny valley from useful into disturbing. These examples include the recordings made by a Samsung TV of consumers in their homes (http://www.theguardian.com/media-network/2015/feb/13/samsungs-listening-tv-tech-rights); Privacy fears were increased by the cloud storage of the recordings made by the interactive WIFI-capable Barbie (http://www.theguardian.com/technology/2015/mar/13/smart-barbie-that-can-listen-to-your-kids-privacy-fears-mattel); Jay-Z’s Album Magna Carta Holy Grail’s privacy breaking app (http://www.theguardian.com/music/2013/jul/17/jay-z-magna-carta-app-under-investigation); and the Angry Birds location recording which got targeted by the NSA and GCHQ and likely shared with other Five Eyes Countries (http://www.theguardian.com/world/2014/jan/27/nsa-gchq-smartphone-app-angry-birds-personal-data). The Internet can be viewed as a tracking, listening, money maker for the recorders and new owners of your data. Last but not least there must be a mention of the Target case where predictions of pregnancy were based on buying history.
The Target storey was broken by the New York Times (Duhigg, C. “How Companies Learn Your Secrets.” February 16, 2012. http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?_r=0).
The rise of OSINT, data science, business, or commercial has come with the revolution in the variety, volume, and availability public data (Hobbs et al., 2014; Appel, 2014). There has been a profound change in how data are collected, stored, and disseminated driven by the Internet and the advances linked to it. With establishment of Open Source Center and assistant deputy director for open source intelligence in the United States, the shift toward legitimacy of OSINT in the all-source intelligence process was made clear (http://resources.infosecinstitute.com/osint-open-source-intelligence/). The increased importance of OSINT has moved it into the core of intelligence work and allowed a larger number of players to take part, diversifying its uses beyond the original “intelligence community” (Hobbs et al., 2014). Interconnectivity has increased and much of that data can be utilized through open source intelligence methodologies to create actionable insights. OSINT can produce new and useful data and insights; however, it brings technical, political, and ethical challenges and obstacles that must be approached carefully.
Wading through the sheer bulk of the data for the unbiased reality can present difficulties. Automation means the spread of OSINT, out of the government office to businesses, and casual users for helpful or wrong conclusions as in the case of the Boston bomber Redit media gaff (http://www.bbc.com/news/technology-22263020). These problems can also be seen in the human flesh search engine instances in China and the doxing by anonymous and others in positive and negative lights. With more levels of abstraction increasing difficulty is apparent, as tools to look at the tools to look at the output of the data. Due to the sheer volume of data it becomes easier to be more susceptible to cognitive bias. These are issues can be seen in the errors made by the US government in securing their computer networks (“EPIC” fail – how OPM hackers tapped the mother lode of espionage data. Two separate “penetrations” exposed 14 million people’s personal information. Ars Technica. June 22, 2015. 2:30pm NZST. http://arstechnica.com/security/2015/06/epic-fail-how-opm-hackers-tapped-the-mother-lode-of-espionage-data/). With the advent of corporate doxying of Ashley Madison and of Sony it can be seen as a private corporation problem as well.
Groups of users and uses include: governments; business intelligence and commercial intelligence; academia; and Hacker Space and Open Data initiatives. Newer users include nongovernmental organizations (NGOs), university, public, and commercial interests. User-generated content, especially social media, has changed the information landscape significantly. These can all have interactions and integrated interests. Collaboration between these groups is common among some, US government contracting IBM and Booz-Allen and also less inflammatory contracted employees; academia writing tools for Business Intelligence or government contracts. These tend to be mutually beneficial. Others where the collaboration is nonvoluntary such as the articles detailing how to break the anonymity of the netflix prize dataset (Narayanan & Shmatikov, 2008); or any of the multiple blog posts detailing similar anonymity breaking methods such as “FOILing NYC’s Taxi Trip Data” http://chriswhong.com/open-data/foil_nyc_taxi/ and London bicycle data “I know where you were last summer” http://vartree.blogspot.co.nz/2014_04_01_archive.html) have furthered security and OSINT analysis, sometimes to the ire of the data collectors.
image
The extent to which information can be collected is large and the field is broad. The speed, the volume, and variety are enough that OSINT can be considered a “Big Data” problem. Tools to deal with the tools that interface with the data such as Mal...

Table of contents

  1. Cover
  2. Title page
  3. Table of Contents
  4. Copyright
  5. List of Contributors
  6. Chapter 1: The Automating of Open Source Intelligence
  7. Chapter 2: Named Entity Resolution in Social Media
  8. Chapter 3: Relative Cyberattack Attribution
  9. Chapter 4: Enhancing Privacy to Defeat Open Source Intelligence
  10. Chapter 5: Preventing Data Exfiltration: Corporate Patterns and Practices
  11. Chapter 6: Gathering Intelligence on High-Risk Advertising and Film Piracy: A Study of the Digital Underground
  12. Chapter 7: Graph Creation and Analysis for Linking Actors: Application to Social Data
  13. Chapter 8: Ethical Considerations When Using Online Datasets for Research Purposes
  14. Chapter 9: The Limitations of Automating OSINT: Understanding the Question, Not the Answer
  15. Chapter 10: Geospatial Reasoning With Open Data
  16. Subject Index