Efficient Way to Convert Multiple PDFs to Plaintext Format

Researchers often have to analyse data. Sometimes, data are contained in PDF files. While most of the commercial analysis tools, e.g., NVivo, support working with this file format, oftentimes it is better to have these converted to plaintext format, especially if you need to do some preprocessing (e.g. stemming words, removing digits, conflating whitespaces, etc.). While, it is can straightforward to do this from most PDF editors, it becomes cumbersome and time-consuming when you are dealing with multiple files.  On Linux/Mac a quick way of solving this is to use the package (pdftotext) as follows:

for file in *.pdf; do pdftotext -nopgbrk -eol unix "$file"; done

Here, we converted all PDFs found in the current directory to text format.

Alternatively, you can also use pandoc package. This is a very powerful tool that can convert files from multiple sources to different formats, e.g., Markdown, LaTeX, EPUB, and many more.  E.g., hereunder we are converting all text files found in the current directory to PDF format:

for file in *.txt; do pandoc "$file" -o "$file.pdf"; done

Hope you will find this useful!

Creating a Hierarchical Taxonomy Through Latex

One of the things researchers have to occasionally develop is a taxonomy. Essentially, a taxonomy is a process that helps classify concepts in a logical manner.

There are many different tools and methods to help draw a taxonomy. But, if you are working with Latex, you can easily do so through the “forest” package. I am showing here a simple example of how you can draw one to represent household appliances and kitchen aids in a smart home:

Screen Shot 2018-04-04 at 08.02.49

The result of running the above code is the graphic presented hereunder:

Screen Shot 2018-04-04 at 08.04.36

Hope you will find this useful!

Password reuse in different smart home products

Researchers from Ben-Gurion University of the Negev have found that smart home devices can be easily hacked and then used to spy on their users. Omer Shwartz et al. in their research paper analysed the practical security level of 16 popular IoT devices ranging from high-end to low-end manufacturers.

Amongst other things, they discovered that similar products under different brands share the same common default passwords. In some instances, the authors claimed that such passwords were found within minutes and sometimes simply by a web search for the brand. Devices in their study included baby monitors, home security and web cameras, doorbells, and thermostats.  Using such devices in their lab, they were then able to for example, play loud music through a baby monitor, turn off a thermostat, and turn on a camera remotely.

Exactly as I talked today in my PerCom’18 presentation in Greece, manufacturers should avoid using easy, hard-coded passwords, and should be held more accountable for their products and services. At the same time, the end-user as a countermeasure should try to change default passwords or to disable privileged accounts on the device. But, ultimately, security should never be an afterthought but bolted-in from the beginning of the development lifecycle.

In our work, we have identified hundreds of insecure smart connected cameras deployed on the Internet in different places in the world. Similarly, we observed that most of the vendors left their default passwords inside the devices, or had banner information with sensitive data, e.g., firmware version, ports numbers, manufacturer names, that can be used to compromise the security and privacy of householders, business owners, and more.

Risks to Consider Before Buying a Smart Home Device

People are increasingly buying voice-activated speakers (also called digital voice assistants or intelligent personal assistants) and other smart devices for added convenience, enhancing security, and also for entertainment purposes. But doing so blindly, without assessing risks involved with such technologies, can give intruders an accessible window into our homes and personal lives. Here are some risks that you may want to consider before purchasing a smart device for your house:

Listening In: Many new devices are being manufactured with built-in microphones. New generation devices falling in this category include for instance smart speaker systems such as Amazon Echo and Google Home,  and as well smart TVs, TV streaming devices, and Internet-connected toys. Many of these devices are constantly listening in for your commands and when they receive them they connect to corporate servers (can be located anywhere in the world) to satisfy your request.  What if you are having private conversations at home? Are these getting sent to the Internet without your awareness? Indeed, some devices just do that (yes, you may have unknowingly already accepted the vendor’s privacy policy or terms-of-use if that exists!). What can you do then? Well, devices typically have a mute function that disables the device microphone(s). But the question remains, can we actually verify what the manufacturer promises? Further to that, if data is sent over the Internet can it really be removed? I highly doubt that.

Watching You: Cloud security cameras let you check in on your pets, children, and your home status, when you are away, typically through your smartphone, tablet, and other handheld computing devices. Some devices routinely send video footage to online storage automatically while others do so when triggered, example by a motion sensor (typically signalling that an intruder or an unauthorized visitor is nearby). Reputable brands are likely to take security seriously, but no system is bulletproof. If you want to stay extra vigilant then you might want to turn the camera to face the wall or just unplug it altogether when you do not intend to use it. However, this is not a viable solution for many. Thus, my suggestion is that you should carefully inspect the device technical specification and assess whether the company is taking security and privacy seriously!

Digital Trails: Smart locks let you unlock doors from anywhere with an application installed on your digital devices. With this, you can let in guests even when you are away or when you have your hands full with other things (yes you can also connect your smart lock with a digital voice assistant). Similarly, landlords can automatically disable your digital key when you move out, and parents can keep an attentive eye on the time their beloved teens are coming back home. At the same time, intruders might try to hack the system not only forcibly with hardware tools but also through software hacking tools. Smart locks also pose a risk to privacy as usage of such keys leaves a digital trail. This trail can also be used in forensic investigation. This is an added attack surface that these digital devices bring into our lives, into our homes.

In this article, we scratched the surface of risks brought forth by smart devices. If you want to learn more about risks when purchasing smart home devices and as well about the different types of intruders spying on your home take a look at my paper.

Information Assets: An Essential Ingredient of Threat Modelling

Threat models are a way of looking at risks in order to identify the most likely threats to your organisation’s security. The first step in the threat modelling process is concerned with gaining an understanding of the application and how it interacts with external entities. This involves creating use-cases to understand how the application is used, identifying entry points to see where a potential attacker could interact with the application, identifying assets, and more. In this post, we focus on identifying information assets.

Assets are essentially threat targets, i.e. they are the reason threats will exist. Assets can be both physical assets and abstract assets. For example, an asset of an application might be a list of clients and their personal information; this is a physical asset. An abstract asset might be the reputation of an organisation. Hereunder, we identify some key informational assets that your organisation or information system might have or process:

  • Credit card data: yours, or (if you sell stuff) a customer’s.
  • Banking data: account numbers, routing numbers, e-banking usernames and passwords.
  • Personally identifying information: Social Security number, date of birth, income data, W-2s, passport numbers, drivers’ license or national ID numbers.
  • Intellectual property: like source code or software documentation.
  • Sensitive personal or business information and communications: e-mails and texts that could be used to embarrass, blackmail, or imprison you.
  • Politically sensitive information or activities that could get you in trouble with your employer, the government, law enforcement, or other interested parties.
  • Travel plans that could be used to target you or others for fraud or other forms of attack.
  • Other business or personal data that are financially or emotionally essential (family digital photos, for example).
  • Your identity itself, if you are trying to stay anonymous online for your protection.

When it comes to protecting the assets pieces of information that could be used to expose your assets are just as essential. Personal biographical and background data might be used for social engineering against you, your friends, or a service provider. Keys, passwords, and PIN codes should also be considered as valuable as the things that they provide access to.

Other operational information about your activities that could be exploited should also be considered, including the name of your bank or other financial services provider. For instance, a spear-phishing attack on the Pentagon used a fake e-mail from USAA, a bank and insurance company that serves many members of the military and their families.