Unveiling the Lack of Transparency in AI Research

Photo by FOX on Pexels.com

A recent systematic review by Burak Kocak MD et al. has revealed a lack of transparency in AI research. The data, presented in Academic Radiology, showed that only 18% of the 194 selected radiology and nuclear medicine studies included in the analysis had raw data available, with access to private data in only one paper. Additionally, just one-tenth of the selected papers shared the pre-modeling, modeling, or post-modeling files.

The authors of the study attributed this lack of availability mainly to the regulatory hurdles that need to be overcome in order to address privacy concerns. The authors suggested that manuscript authors, peer-reviewers, and journal editors could help make AI studies more reproducible in the future by being conscious of transparency and data/code availability when publishing research results.

The findings of the study highlight the importance of transparency in AI research. Without access to data and code, it is difficult to validate and replicate results, leading to a lack of trust in the results. This is especially important for medical AI research, as the safety and efficacy of treatments and diagnostics depend on accurate and reliable results. What further steps can be taken to increase transparency while still protecting privacy?

Interactive Event on Digital Ethics

On Friday, 23th April, I attended an interactive event on the topic of digital ethics. This event was organised by RISE in collaboration with industry. Together, we explored and discussed the topic of data privacy, integrity, trust, and transparency in AI. Many interesting discussions followed in Zoom breakout rooms, especially after the presentation from “Sjyst data!” project.

We talked about the generic development and implementation of AI for emerging systems, and related ethical implications. An interesting point was raised about the passive collection of MAC addresses and whether these are considered personal data by the GDPR. On that note, over Zoom chat, someone also mentioned foot traffic data and the processing of that, especially during the pandemic of Covid-19. Data, even though, may appear to mean nothing particular or worrying to us at some point, when aggregated and linked with other data sources, it can paint a detailed profile about us.

Here is a screenshot showing the event hosts: Nina Bozic (senior researcher) and Katarina Pietrzak (educational strategist) along with RISE experts and guests.

Interactive event on Digital Ethics

I am looking forward to the next one!

Open-Source Smart Home Simulators

Following, a blog post I have written in 2019 focusing on real smart home testbeds, a lot of readers have reached out asking me if I am aware of tools that can be used to simulate smart home data. I understand this request, because data collection in smart homes can be a tedious, time-consuming, and expensive process.  I identify three of the recent open-source tools that could be useful to simulate activity and human interactions within a smart home, below:

  • OpenSHS (Open Smart Home Simulator) [1]: This is a hybrid, open-source, cross-platform 3D smart home simulator, developed using Blender and Python, allowing for sophisticated dataset generation.

  • Francillette et al. simulator [2]: The authors developed a smart environment simulator, using Java, SketchUp, and Unity engine, capable of generating data from simulated sensors such as RFID, ultrasound, pressure sensors, and contact sensors, amongst others.
  • Smart Environment Simulation (SESim) [3]: This is a simulation tool developed in Unity that supports smart home simulation and the generation of synthetic sensor datasets.

Also, in case you are a researcher and you would like a copy of the data I collected about the technical specifications of smart home products, feel free to get in touch.

[1] Alshammari, N.; Alshammari, T.; Sedky, M.; Champion, J.; Bauer, C. OpenSHS: Open Smart Home Simulator. Sensors 201717, 1003. https://doi.org/10.3390/s17051003

[2]  Francillette, Y.; Boucher, E.; Bouzouane, A.; Gaboury, S. The Virtual Environment for Rapid Prototyping of the Intelligent Environment. Sensors 201717, 2562. https://doi.org/10.3390/s17112562

[3] Brandon Ho, Dieter Vogts, and Janet Wesson. 2019. A Smart Home Simulation Tool to Support the Recognition of Activities of Daily Living. In: Proceedings of the South African Institute of Computer Scientists and Information Technologists 2019. ACM, Article 23, 1–10. DOI:https://doi.org/10.1145/3351108.3351132

Interesting Book Showed Up In My Mailbox

Today, I am happy to have received a hardcopy of the book – Privacy and Identity Management. Data for Better Living: AI and Privacy. There is a chapter in this book, which I have authored together with my academic advisor titled: “On the Design of a Privacy-Centered Data Lifecycle for Smart Living Spaces.” In that article, I have identified how the software development process can be enhanced to manage privacy threats, amongst other things.

Privacy and Identity Management

Hardcopy of the book “Privacy and Identity Management. Data for Better Living: AI and Privacy”

All the articles included in the book are certainly worth a read covering various aspects of privacy ranging from a technical, compliance, and law perspective.

Data Collected by Smart Home Devices

What type of data smart home devices collect? This is exactly what I talked about last week in Seattle (USA) at the Services Conference Federation (SCF 2018). Understanding the data smart home systems collect is useful to assess what is at stake if a device is compromised and as a precursor for conducting privacy analysis.

Image result for data privacy

By analysing the privacy policies of different smart home and IoT device manufacturers we observed that all investigated devices collect instances of personal data. This in the worst case can include biometric data. Such data is used for instance in smart TVs for authentication purposes and sometimes to support advanced interaction features.

However, there are many other instances of non-personal data which when aggregated can truly paint a detailed coarse-grained model of an individual’s lifestyle preferences, habits, and history.

Read more: https://www.springerprofessional.de/an-empirical-analysis-of-smart-connected-home-data/15852434