Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 24 2018

08:36

What you get is what you C

We have a new paper on compiler security appearing this morning at EuroS&P.

Up till now, writers of crypto and security software not only have to fight the bad guys. We also have to deal with compiler writers, who every so often dream up some new optimisation routine which spots the padding instructions that we put in to make our crypto algorithms run in constant time, or the tricks that we use to ensure that sensitive data will be zeroised when a function returns. All of a sudden some critical code is optimised away, your code is insecure, and you scramble to figure out how to outwit the compiler once more.

So while you’re fighting the enemy in front, the compiler writer is a subversive fifth column in your rear.

It’s time that our toolsmiths were our allies rather than our enemies. We have therefore worked out what’s needed for a software writer to tell a compiler that a loop really must be executed in constant time, or that a variable really must be set to zero when a function returns. Languages like C have no way of expressing programmer intent, so we do this by means of code annotations.

Doing it properly turns out to be surprisingly tricky, but we now have a working proof of concept in the form of plugins for LLVM. For more details, and links to the code, see the web page of Laurent Simon, the lead author; the talk slides are here. This is the first technical contribution in our research programme on sustainable security.

April 20 2018

10:12

Ethics of mathematics

I’m at the world’s first conference on ethics in mathematics and will be speaking in half an hour. Here are my slides. I will be describing the course I teach to second-year computer scientists on Economics, Law and Ethics. Courses on ethics are mandatory for computer scientists while economics is mandatory for engineers; my innovation has been to combine them. My experience is that teaching them together adds real value. We can explain coherently why society needs rules via discussions of game theory, and then of network effects, asymmetric information and other market failures typical of the IT industry; we can then discuss the limitations of law and regulation; and this sets the stage for both principled and practical discussions of ethics.

April 13 2018

15:39

Don’t blame Cambridge for Facebook’s privacy crisis

Mark Zuckerberg tried to blame Cambridge University in his recent testimony before the US Senate, saying “We do need to understand whether there was something bad going on in Cambridge University overall, that will require a stronger action from us.”

The New Scientist invited me to write a rebuttal piece, and here it is.

Dr Kogan tried to get approval to use the data his company had collected from Facebook users in academic research. The psychology ethics committee refused permission, and when he appealed to the University Ethics Committee (declaration: I’m a member) this refusal was upheld. Although he’d got consent from the people who ran his app, the same could not be said of their Facebook “friends” from whom most of the data were collected.

The deceptive behaviour here has been by Facebook, which creates the illusion of privacy in order to get its users to share more data. There has been a lot of work on the economics and psychology of privacy over the past decade and we now understand the dynamics of advertising markets better than we used to.

One big question is the “privacy paradox”. Why do people say they care about privacy, yet behave otherwise? Part of the answer is about context; and part of it is about learning. Over time, more and more people are starting to pay attention to online privacy settings, despite attempts by Facebook and other online advertising firms to keep changing privacy settings to confuse people.

With luck, the Facebook scandal will be a “flashbulb moment” that will drive lots more people to start caring about their privacy online. It will certainly provide interesting new data to privacy researchers.

April 02 2018

14:50

PhD studentship in side-channel security

I can offer a 3.5-year PhD studentship on radio-frequency side-channel security, starting in October 2018, to applicants interested in hardware security, radio communication, and digital signal processing. Due to the funding source, this studentship is restricted to UK nationals, or applicants who have been resident in the UK for the past 10 years. Contact me for details of the project proposal.

March 31 2018

16:51

A bank statement for app activity (and thus personal data)

During my long sabbatical in 2015-2016 I had plenty of time to think about random things and come up with strange ideas. Most of these ideas are more funny than practical - their primary use is boring people that are reckless enough to have drinks with me.

This blog post describes one of these ideas. With the recent renewed interest in privacy and overreach of smart phone apps, it seems like a topic that is - at least temporarily - less boring than usual.

ML, software behavior, and the boundary between 'malicious' and 'non-malicious'

I have seen a lot of human brain power (and a vast amount of computational power) thrown at the problem of automatically deciding whether a given piece of software is good or bad. 
This is usually done as follows:
  1. Collect a lot of information about the behavior of software (normally by running the software in some simulated environment)
  2. Extract features from this information
  3. Apply some more-or-less sophisticated machine learning model to decide between "good" or "bad"
The underlying idea behind this is that there is "bad" behavior, and "good" behavior, and if we could somehow build a machine learning model that is sufficiently powerful, we could automatically decide whether a given piece of software is good or bad.
In practice, this rarely works without significant false-positive problems, or significant false-negative-problems, or all sorts of complicated corner-cases where the system fails.
In 2015, I had to deal with the fallout of the badly-phrased Wassenaar wording: Export-control legislation which tried to define "bad behavior" for software. During this, it became clear to me that the idea that behavior alone determines good/bad is flawed.
The behavior of a piece of software does not determine whether it is malicious or not. The true defining line between malicious and non-malicious software is whether software does what the user expects it to do
Users run software because they have an expectation for what this software does. They grant permissions for software because they have an expectation for the software to do something for them ("I want to make phone calls, so clearly the app should use the microphone"). This permission is given conditionally, with context -- the user does not want to give the app permission to switch on the microphone when the user does not intend to make a phone call.
The question of malicious / non-malicious software is a question of alignment between user expectations and software behavior.
This means, in practice, that efforts in applying machine learning to separate malicious from non-malicious software are doomed to fail, because they fail to measure the one dimension through which the boundary between good and bad runs.
Intuitively, this can be illustrated with the two pictures below. They show the same set of red and green points in 3d-space from two different perspectives -- once with their z-axis projected away, and once in a 3-d plot where the z-axis is still visible:
Cloud of points from the side, with the "important" dimension projected away. It is near-impossible to draw a sane boundary between red and green points, and whatever boundary you draw won't generalize well.
Same cloud of points, with the "important" dimension going from left to right. It is much clearer how to separate green from red points now. The question that arises naturally, then, is:

How can one measure the missing dimension (user intent)?

User intent is a difficult thing to measure. The software industry has the practice of forcing the user to agree to some ridiculously wide-reaching terms-of-services or EULA that few users read, even fewer understand, and which are often near-equivalent to giving the person you hire to clean your flat a power of attorney over all your documents, and allowing them to throw parties in your flat while you are not looking.
It is commonly argued that - because the user clicked "agree" to an extremely broad agreement - the user consented to everything the software can possibly do.
But consent to software actions is context-dependent and conditioned on particular, specific actions. It is fine for my messenger to request access to my camera, microphone and files - I may need to send a picture, I may need to make a phone call, and I may need to send an attachment. It is not OK for my messenger to use my microphone to see if a particular ultrasonic tracker sound is received, it is not OK for my messenger to randomly search through files etc.
Users do not get to tell the software vendor their intent and the context for which they are providing consent.
Now, given that user intent is difficult to measure up-front - how about we simply ask the user whether something that an app / software did was what he expected it to do?

Information and attention is a currency - but one with bad accounting

The modern ad economy runs on attention and private data. The big advertising platforms make their money by selling the combination of user attention and the ability to micro-target advertisements given contextual data about a user. The user "pays" for goods and services by providing attention and private data.
People often fear that big platforms will "sell their data". This is, at least for the smarter / more profitable platforms, an unnecessary fear: These platforms make their money by having data that others do not have, and which allows better micro-targeting. They do not make their money "selling data", they make money "monetizing the data they have".

The way to think about the relationship between the user and the platform is more of a clicheed "musician-agent" relationship: The musician produces something, but does not know how to monetize it. His Agent knows how to monetize it, and strikes a deal with the musician: You give me exclusive use of your product, and I will monetize it for you - and take a cut from the proceeds.
The profits accumulated by the big platforms are the difference between what the combination of attention & private data obtained from users is worth and the cost of obtaining this attention and data.
For payments in "normal" currency, users usually have pretty good accounting: They know what is in their wallet, and (to the extent that they use electronic means for payments) they get pretty detailed transaction statements. It is not difficult for a normal household to reconstruct from their bank statements relatively precisely how much they paid for what goods in a given month.
This transparency creates trust: We do not hesitate much to give our credit card numbers to online service providers, because we know that we can intervene if they charge our credit cards without reason and in excess of what we agreed to.
Private information, on the other hand, is not accounted for. Users have no way to see how much private data they provide, and whether they are actually OK with that.

A bank statement for app/software activity

How could one empower users to account for their private data, while at the same time helping platform providers identify malicious software better?

By providing users with the equivalent of a bank statement for app/software activity. The way I imagine it would be roughly as follows:
A separate component of my mobile phone (or computer) OS keeps detailed track of app activity: What peripherals are accessed at what times, what files are accessed, etc.

Users are given the option of checking the activity on their device through a UI that makes these details understandable and accessible:
  • App XYZ accessed your microphone in the last week at the following times, showing you the following screen:
    • Timestamp 1, screenshot 1
    • Timestamp 2, screenshot 2
  • Does this match your expectations of what the app should do? YES / NO
  • App ABC accessed the following files during the last week at the following times, showing you the following screen:
    • Timestamp 3, screenshot 3
      • Filename
      • Filename
      • filename
  • Does this match your expectations of what the app should do? YES / NO
At least on modern mobile platforms, most of the above data is already available - modern permissions systems can keep relatively detailed logs of "when what was accessed". Adding the ability to save screenshots alongside is easy.

Yes, a lot of work has to go into a thoughtful UI, but it seems worth the trouble: Even if most users will randomly click on YES / NO, the few thousand users that actually care will provide platform providers valuable information about whether an app is overreaching or not. At the same time, more paranoid users (like me) would feel less fearful about installing useful apps: If I see the app doing something in excess of what I would like it to do, I could remove it.
Right now, users have extremely limited transparency into what apps are actually doing. While the situation is improving slowly (most platforms allow me to check which app last used my GPS), it is still way too opaque for comfort, and overreach / abuse is likely pervasive.
Changing this does not seem hard, if any of the big platform providers could muster the will.

It seems like a win / win situation, so I can hope. I can also promise that I will buy the first phone to offer this in a credible way :-).
PS: There are many more side-benefits to the above model - for example making it more difficult to hack a trusted app developer to then silently exfiltrate data from users that trust said developer - but I won't bore you with those details now.

March 30 2018

14:24

IDA on non-OS X/Retina Hi-DPI displays

The problem

Some users running IDA on Windows & Linux X11 platforms with Hi-DPI displays, have reported that IDA looks rather odd: the navigator bar is too narrow, the text under it gets truncated, and there is overall feeling of packing & clumsiness:

  • Windows:
  • Linux X11:

Looking closely, one can notice the following issues (probably not an exhaustive list)

Note: this should not happen and shouldn’t apply for OS X users running IDA on Retina displays. Nor should it happen (but we didn’t get a chance to test this) on non-X11 Linux display managers, such as Wayland.

Fix / mitigation:

On Linux X11 & Windows, if you are using Hi-DPI monitors and IDA looks somewhat like it does in the above screenshots, please try setting the environment variable QT_AUTO_SCREEN_SCALE_FACTOR to 1:

E.g., on Linux/X11:

~# export QT_AUTO_SCREEN_SCALE_FACTOR=1
~# path/to/ida my.idb

IDA should now look more pleasant:

  • Windows:
  • Linux X11:

Some things are still not perfect (e.g., checkboxes might remain small), but IDA definitely looks better.

Please give it a try!

Gory details

When one applies scaling/zooming, either in Windows and Linux X11, that will have the effect of causing the OS to return scaled values for font metrics when queried using point sizes (which is almost always the case.)

For example, when the font metrics for a font of size 12pt are requested by a Qt application, instead of returning 14 pixels as it would on a non-scaled system, the operating system will instead return 28 pixels on a 200% scaled one (in other words, this is essentially a font database-related feature).

That will, in turn, have the net effect of causing Qt to compute layout of the surrounding widgets according to those scaled font metrics, which explains why the text is (for the most part) not truncated.

However, what applying scaling does not do, is tell Qt that it should scale all other pixel measurements by that scale factor.

Consequently, paddings, margins, scrollbars, etc… all have uncomfortably small dimensions, especially when compared to text.

The QT_AUTO_SCREEN_SCALE_FACTOR environment variable is an opt-in that program users can define, in order to control how the program should look. It will in essence instruct Qt to perform automatic scaling of (non-font-related) graphical operations according to the pixel density of the screen(s).

More information can be found on Qt’s website.

Why is this not needed under OS X + Retina?

This is not needed under OS X + Retina, because Qt does not need to perform any kind of scaling there: the scaling is performed by the drawing primitives of the OS itself, and is entirely transparent to the application.

(In fact, an OSX application doesn’t even work with the real screen geometry, but rather with an “abstract” coordinate system, normalizing pixel sizes across screen densities.)

March 26 2018

08:45

Tracing stolen bitcoin

A new Computerphile video explains how we’ve worked out a much better way to track stolen bitcoin. Previous attempts to do this had got entangled in the problem of dealing with transactions that split bitcoin into change, or that consolidate smaller sums into larger ones, and with mining fees. The answer comes from an unexpected direction: a legal precedent in 1816. We discussed the technical details last week at the Security Protools Workshop; a preprint of our paper is here.

Previous attempts to track tainted coins had used either the “poison” or the “haircut” method. Suppose I open a new address and pay into it three stolen bitcoin followed by seven freshly-mined ones. Then under poison, the output is ten stolen bitcoin, while under haircut it’s ten bitcoin that are marked 30% stolen. After thousands of blocks, poison tainting will blacklist millions of addresses, while with haircut the taint gets diffused, so neither is very effective at tracking stolen property. Bitcoin due-diligence services supplant haircut taint tracking with AI/ML, but the results are still not satisfactory.

We discovered that, back in 1816, the High Court had to tackle this problem in Clayton’s case, which involved the assets and liabilities of a bank that had gone bust. The court ruled that money must be tracked through accounts on the basis of first-in, first out (FIFO); the first penny into an account goes to satisfy the first withdrawal, and so on.

Ilia Shumailov has written software that applies FIFO tainting to the blockchain and the results are impressive, with a massive improvement in precision. What’s more, FIFO taint tracking is lossless, unlike haircut; so in addition to tracking a stolen coin forward to find where it’s gone, you can start with any UTXO and trace it backwards to see its entire ancestry. It’s not just good law; it’s good computer science too.

We plan to make this software public, so that everybody can use it and everybody can see where the bad bitcoins are going.

I’m giving a further talk on Tuesday at a financial-risk conference in Paris.

March 16 2018

13:33

We will make you like our research

This is the title of a paper that appeared today in PLOS One. It describes a tool we developed initially to assess the gullibility of cybercrime victims, and which we now present as a general-purpose psychometric of individual susceptibility to persuasion. An early version was described three years ago here and here. Since then we have developed it significantly and used it in experiments on cybercrime victims, Facebook users and IT security officers.

We investigated the effects on persuasion of a subject’s need for cognition, need for consistency, sensation seeking, self-control, consideration of future consequences, need for uniqueness, risk preferences and social influence. The strongest factor was consideration of future consequences, or “premeditation” for short.

We offer a full psychometric test in STP-II with 54 items spanning 10 subscales, and a shorter STP-II-B with 30 items to measure first-order factors, but that omits second-order constructs for brevity. The scale is here with the B items marked, and here is a live instance of the survey for you to play with. Once you complete it, there’s an on-the-fly interpretation at the end. You don’t have to give your name and we don’t record your IP address.

We invite everyone to use our STP-II scale – not just in security contexts, but also in consumer and marketing psychology and anywhere else it might possibly be helpful. Do let us know what you find!

March 05 2018

11:36

IDA 7.1: Qt 5.6.3 configure options & patch

A handful of our users have already requested information regarding the Qt 5.6.3 build, that is shipped with IDA 7.1.

Configure options

Here are the options that were used to build the libraries on:

  • Windows: ...\5.6.3\configure.bat "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "win32-msvc2015" "-opengl" "desktop" "-prefix" "C:/Qt/5.6.3-x64"
    • Note that you will have to build with Visual Studio 2015, to obtain compatible libs
  • Linux: .../5.6.3/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "linux-g++-64" "-developer-build" "-fontconfig" "-qt-freetype" "-qt-libpng" "-glib" "-qt-xcb" "-dbus" "-qt-sql-sqlite" "-gtkstyle" "-prefix" "/usr/local/Qt/5.6.3-x64"
  • Mac OSX: .../5.6.3/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "macx-g++" "-debug-and-release" "-fontconfig" "-qt-freetype" "-qt-libpng" "-qt-sql-sqlite" "-prefix" "/Users/Shared/Qt/5.6.3-x64"

patch

In addition to the specific configure options, the Qt build that ships with IDA includes the following patch. You should therefore apply it to your own Qt 5.6.3 sources before compiling, in order to obtain similar binaries (patch -p 1 < path/to/qt-5_6_3_full-IDA71.patch)

Note that this patch should work without any modification, against the 5.6.3 release as found there. You may have to fiddle with it, if your Qt 5.6.3 sources come from somewhere else.

February 21 2018

21:35

Two small notes on the "malicious use of AI" report

After a long hiatus on this blog, a new post! Well, not really - but a whitepaper was published today titled "The Malicious Use of Artificial Intelligence", and I decided I should cut/paste/publish two notes that apply to the paper from an email I wrote a while ago.
Perhaps they are useful to someone:
1) On the ill-definedness of AI: AI is a diffuse and ill-defined term. Pretty much *anything* where a parameter is inferred from data is called "AI" today. Yes, clothing sizes are determined by "AI", because mean measurements are inferred from real data.
To test whether one has fallen into the trap as viewing AI as something structurally different from other mathematics or computer science (it is not!), one should try to battle-test documents about AI policy, and check them for proportionality, by doing the following:
Take the existing test and search/replace every occurrence of the word "AI" or "artificial intelligence" with "Mathematics", and every occurrence of the word "machine learning" with "statistics". Re-read the text and see whether you would still agree.
2) "All science is always dual-use":
Everybody that works at the intersection of science & policy should read Hardy's "A mathematicians apology". https://www.math.ualberta.ca/mss/misc/A%20Mathematician%27s%20Apology.pdf
I am not sure how many of the contributors have done so, but it is a fascinating read - he contemplates among other things the effect that mathematics had on warfare, and to what extent science can be conducted if one has to assume it will be used for nefarious purposes.
My favorite section is the following:
We have still one more question to consider. We have concluded that the trivial mathematics is, on the whole, useful, and that the real mathematics, on the whole, is not; that the trivial mathematics does, and the real mathematics does not, ‘do good’ in a certain sense; but we have still to ask whether either sort of mathematics does harm. It would be paradoxical to suggest that mathematics of any sort does much harm in time of peace, so that we are driven to the consideration of the effects of mathematics on war. It is every difficult to argue such questions at all dispassionately now, and I should have preferred to avoid them; but some sort of discussion seems inevitable. Fortunately, it need not be a long one.
There is one comforting conclusions which is easy for a real mathematician. Real mathematics has no effects on war.
No one has yet discovered any warlike purpose to be served by the theory of numbers or relativity, and it seems very unlikely that anyone will do so for many years. It is true that there are branches of applied mathematics, such as ballistics and aerodynamics, which have been developed deliberately for war and demand a quite elaborate technique: it is perhaps hard to call them ‘trivial’, but none of them has any claim to rank as ‘real’. They are indeed repulsively ugly and intolerably dull; even Littlewood could not make ballistics respectable, and if he could not who can? So a real mathematician has his conscience clear; there is nothing to be set against any value his work may have; mathematics is, as I said at Oxford, a ‘harmless and innocent’ occupation. The trivial mathematics, on the other hand, has many applications in war.
The gunnery experts and aeroplane designers, for example, could not do their work without it. And the general effect of these applications is plain: mathematics facilitates (if not so obviously as physics or chemistry) modern, scientific, ‘total’ war.

The most fascinating bit about the above is how fantastically presciently wrong Hardy was when speaking about the lack of war-like applications for number theory or relativity - RSA and nuclear weapons respectively. In a similar vein - I was in a relationship in the past with a woman who was a social anthropologist, and who often mocked my field of expertise for being close to the military funding agencies (this was in the early 2000s). The first thing that SecDef Gates did when he took his position was hire a bunch of social anthropologists to help DoD unravel the tribal structure in Iraq.
The point of this disgression is: It is impossible for any scientist to imagine future uses and abuses of his scientific work. You cannot choose to work on "safe" or "unsafe" science - the only choice you have is between relevant and irrelevant, and the militaries of this world *will* use whatever is relevant and use it to maximize their warfare capabilities.

January 18 2018

16:42

LDAP based UDP reflection attacks increase throughout 2017

There have been reports that UDP reflection DDoS attacks based on LDAP (aka CLDAP) have been increasing in recent months. Our network of UDP honeypots (described previously) confirms that this is the case. We estimate there are around 6000 attacks per day using this method. Our estimated number of attacks has risen fairly linearly from almost none at the beginning of 2017 to 5000-7000 per day at the beginning of 2018.
Number of attacks rises linearly from 0 at the beginning of 2017 to 5000-7000 per day at the beginning of 2018

Over the period where Netlab observed 304,146 attacks (365 days up to 2017-11-01) we observed 596,534 attacks. This may be due to detecting smaller attacks or overcounting due to attacks on IP prefixes.

The data behind this analysis is part of the Cambridge Cybercrime Centre’s catalogue of data available to academic researchers.

January 03 2018

14:02

What Goes Around Comes Around

What Goes Around Comes Around is a chapter I wrote for a book by EPIC. What are America’s long-term national policy interests (and ours for that matter) in surveillance and privacy? The election of a president with a very short-term view makes this ever more important.

While Britain was top dog in the 19th century, we gave the world both technology (steamships, railways, telegraphs) and values (the abolition of slavery and child labour, not to mention universal education). America has given us the motor car, the Internet, and a rules-based international trading system – and may have perhaps one generation left in which to make a difference.

Lessig taught us that code is law. Similarly, architecture is policy. The architecture of the Internet, and the moral norms embedded in it, will be a huge part of America’s legacy, and the network effects that dominate the information industries could give that architecture great longevity.

So if America re-engineers the Internet so that US firms can microtarget foreign customers cheaply, so that US telcos can extract rents from foreign firms via service quality, and so that the NSA can more easily spy on people in places like Pakistan and Yemen, then in 50 years’ time the Chinese will use it to manipulate, tax and snoop on Americans. In 100 years’ time it might be India in pole position, and in 200 years the United States of Africa.

My book chapter explores this topic. What do the architecture of the Internet, and the network effects of the information industries, mean for politics in the longer term, and for human rights? Although the chapter appeared in 2015, I forgot to put it online at the time. So here it is now.

December 28 2017

14:33

IDAPython: namespacing for plugins, loaders & processor modules.

Intended audience

IDAPython plugins, loaders or processor modules developers.

The problem

Until now, IDAPython would load all loaders, processor modules & plugins in the '__main__' module.

This causes namespace pollution, which can sometimes leads to very obscure errors.

The solution

Starting with version 7.1, IDA will import plugins, loaders & processor modules in their own, separate Python modules.

The names of those Python modules is derived from the plugin, loader or processor module’s file name.

E.g.,

  • a plugin named myplg.py will now be imported into its own '__plugins__myplg' Python module.
  • a loader named myldr.py will now be imported into its own '__loaders__myldr' Python module.
  • a processor module named myprc.py will now be imported into its own '__procs__myprc' Python module.

My plugin/loader/processor module complains about unexisting variables/functions!

It very likely means that the code in question was, in effect, relying on some other code being loaded before it, and that was “polluting” the '__main__' module in a way that was very fortunate from its point-of-view (a happy coincidence, if you will.)

The solution is of course to fix the code in question so that it imports everything it needs, before making use of it.

However…

I’m in a hurry

As a temporary workaround, you can set NAMESPACE_AWARE to NO in cfg/python.cfg.

This will cause IDA to revert to the old behavior, where the '__main__' module is shared across all plugins, loaders, processor modules & user scripts.

However, please note that support for NAMESPACE_AWARE will be removed sometimes in the future.

December 04 2017

11:59

End of privacy rights in the UK public sector?

There has already been serious controversy about the “Henry VIII” powers in the Brexit Bill, which will enable ministers to rewrite laws at their discretion as we leave the EU. Now Theresa May’s government has sneaked a new “Framework for data processing in government” into the Lords committee stage of the new Data Protection Bill (see pages 99-101, which are pp 111–3 of the pdf). It will enable ministers to promulgate a Henry VIII privacy regulation with quite extraordinary properties.

It will cover all data held by any public body including the NHS (175(1)), be outside of the ICO’s jurisdiction (178(5)) and that of any tribunal (178(2)) including Judicial Review (175(4), 176(7)), wider human-rights law (178(2,3,4)), and international jurisdictions – although ministers are supposed to change it if they notice that it breaks any international treaty obligation (177(4)).

In fact it will be changeable on a whim by Ministers (175(4)), have no effective Parliamentary oversight (175(6)), and apply retroactively (178(3)). It will also provide an automatic statutory defence for any data processing in any Government decision taken to any tribunal/court 178(4)).

Ministers have had frequent fights in the past over personal data in the public sector, most frequently over medical records which they have sold, again and again, to drug companies and others in defiance not just of UK law, EU law and human-rights law, but of the express wishes of patients, articulated by opting out of data “sharing”. In fact, we have to thank MedConfidential for being the first to notice the latest data grab. Their briefing gives more details are sets out the amendments we need to press for in Parliament. This is not the only awful thing about the bill by any means; its section 164 will be terrible news for journalists. This is one of those times when you need to write to your MP. Please do it now!

November 07 2017

19:04

Ethical issues in research using datasets of illicit origin

On Friday at IMC I presented our paper “Ethical issues in research using datasets of illicit origin” by Daniel R. Thomas, Sergio Pastrana, Alice Hutchings, Richard Clayton, and Alastair R. Beresford. We conducted this research after thinking about some of these issues in the context of our previous work on UDP reflection DDoS attacks.

Data of illicit origin is data obtained by illicit means such as exploiting a vulnerability or unauthorized disclosure, in our previous work this was leaked databases from booter services. We analysed existing guidance on ethics and papers that used data of illicit origin to see what issues researchers are encouraged to discuss and what issues they did discuss. We find wide variation in current practice. We encourage researchers using data of illicit origin to include an ethics section in their paper: to explain why the work was ethical so that the research community can learn from the work. At present in many cases positive benefits as well as potential harms of research, remain entirely unidentified. Few papers record explicit Research Ethics Board (REB) (aka IRB/Ethics Commitee) approval for the activity that is described and the justifications given for exemption from REB approval suggest deficiencies in the REB process. It is also important to focus on the “human participants” of research rather than the narrower “human subjects” definition as not all the humans that might be harmed by research are its direct subjects.

The paper and the slides are available.

November 01 2017

09:48

Internet Measurement Conference

I’m at IMC 2017 at Queen Mary University of London, and will try to liveblog a number of the sessions that are relevant to security in followups to this post.

October 30 2017

09:01

IDA and common Python issues

With IDA 7.0 switching fully to native x64 architecture, we also switched to the x64 Python which brought some new issues but also exposed some we’ve seen before. This post tries to summarize the most common issues we’ve seen our users encounter as well as suggestions how to fix them or at least diagnose where it went wrong before you have to contact support.

Common errors

  1. The specified module could not be found

    You may see such messages in IDA’s Output window:

    LoadLibrary(C:\Program Files\IDA\plugins\python.dll) error: The specified module could not be found.
    C:\Program Files\IDA\plugins\python.dll: can't load file

    …while the file is obviously there.

    This message (it comes from the OS) is a little misleading: it actually means that one of the modules that python.dll (IDAPython plugin) links to could not be found.
    The most common one is python27.dll (python runtime) which is usually installed in %windir%\system32 but may be in a different place for whatever reason (e.g. you’re using an alternative Python distribution or user-specific Python installation). You may need to check your PATH environment variable or possibly reinstall Python (x64 version).

  2. %1 is not a valid Win32 application

    Typical output:
    LoadLibrary(C:\Program Files\IDA\plugins\python.dll) error: %1 is not a valid Win32 application.
    C:\Program Files\IDA\plugins\python.dll: can't load file

    or:

    ImportError: DLL load failed: %1 is not a valid Win32 application

    This error can happen if IDA tries to load a DLL with wrong architecture (e.g. x86 DLL into x64 IDA or vice versa). Common causes:

    • wrong variant of python27.dll present in PATH
    • wrong versions of native modules (.pyd) are being used (e.g. you have PYTHONPATH or PYTHONHOME environment variable set pointing to the x86 install)
    • installing x86 and x64 Python into the same directory (this will never work).
  3. IDAPython: importing “site” failed

    This issue is usually caused by presence of non-standard python27.dll in the PATH which uses its own set of modules (you should edit PATH in this case). However, it may happen if your Python installation is broken in some way. Reinstalling Python manually may fix it.

Diagnosis checklist

  1. Check that you have one and only one python27.dll in PATH. you can check it by executing “where python27.dll” in a command prompt. Expected output:

    c:\>where python27.dll
    C:\Windows\System32\python27.dll

  2. Check where Python is looking for modules. Check this registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.7\InstallPath

    expected value is C:\Python27-x64\

  3. check your environment (type “set” in command prompt) for any variables starting with PYTHON, especially PYTHONHOME or PYTHONPATH and fix or remove them. If you need these variables for other software, we suggest making a .bat file to clear these variables and then start IDA.
  4. if IDAPython loads but you get import errors when running scripts, dump sys.path and check for any unexpected/wrong entries.
    Example good output:

    Python>import sys
    Python>sys.path
    ['C:\\Windows\\system32\\python27.zip', 'C:\\Python27-x64\\Lib', 'C:\\Python27-x64\\DLLs', 'C:\\Python27-x64\\Lib\\lib-tk', 'C:\\Program Files\\IDA 7.0\\python', 'C:\\Python27-x64', 'C:\\Python27-x64\\lib\\site-packages', 'C:\\Program Files\\IDA 7.0\\python\\lib\\python2.7\\lib-dynload\\ida_32', 'C:\\Program Files\\IDA 7.0\\python']

  5. trace from which paths IDAPython is loading modules. You can do it by setting environment variable PYTHONVERBOSE=1 before running IDA. Paths will be printed into Output Window (you can also save it to file by adding -L<logfile> to IDA’s command line).

Reinstalling Python/IDA

In case you decide to reinstall Python and/or IDA, do not just remove their directories as this may leave remains of installation in registry or elsewhere and mess up future installs. Use the corresponding uninstallers. Note that IDA’s uninstaller does not uninstall Python so that needs to be done separately if required.
Normally IDA 7.0 installer installs x64 Python if it’s not already installed but you can also download a 2.7 installer from python.org (pick the “Windows x86-64 MSI installer”). You can install it into any location of your choice as long as IDA can find it (python27.dll should be in PATH), however we recommend using C:\Python27-x64 (“All Users” option) so it does not conflict with the 32-bit install.
Before uninstalling IDA, check that you still have the original installer. If necessary, you can request a new download (only possible with active support).

October 13 2017

14:23

Security economics MOOC running once more

Colleagues and I created a massively open online course in the economics of information security, which ran in 2015 and again in 2016.

I’m pleased to announce that it’s now running again until December 30th as a self-paced course. Registration is open here.

September 19 2017

16:54

IDA 7.0: Qt 5.6.0 configure options & patch

A handful of our users have already requested information regarding the Qt 5.6.0 build, that is shipped with IDA 7.0.

Configure options

Here are the options that were used to build the libraries on:

  • Windows: ...\5.6.0\configure.bat "-nomake" "tests" "-qtnamespace" "QT"
    "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platfor
    " "win32-msvc2015" "-opengl" "desktop" "-prefix" "C:/Qt/5.6.0-x64"
    • Note that you will have to build with Visual Studio 2015, to obtain compatible libs
  • Linux: .../5.6.0/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "linux-g++-64" "-developer-build" "-fontconfig" "-qt-freetype" "-qt-libpng" "-glib" "-qt-xcb" "-dbus" "-qt-sql-sqlite" "-gtkstyle" "-prefix" "/usr/local/Qt/5.6.0-x64"
  • Mac OSX: .../5.6.0/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "macx-g++" "-debug-and-release" "-fontconfig" "-qt-freetype" "-qt-libpng" "-qt-sql-sqlite" "-prefix" "/Users/Shared/Qt/5.6.0-x64"

patch

In addition to the specific configure options, the Qt build that ships with IDA includes the following patch. You should therefore apply it to your own Qt 5.6.0 sources before compiling, in order to obtain similar binaries.

Note that this patch should work without any modification, against the 5.6.0 release as found there. You may have to fiddle with it, if your Qt 5.6.0 sources come from somewhere else.

September 10 2017

18:09

Is this research ethical?

The Economist features face recognition on its front page, reporting that deep neural networks can now tell whether you’re straight or gay better than humans can just by looking at your face. The research they cite is a preprint, available here.

Its authors Kosinski and Wang downloaded thousands of photos from a dating site, ran them through a standard feature-extraction program, then classified gay vs straight using a standard statistical classifier, which they found could tell the men seeking men from the men seeking women. My students pretty well instantly called this out as selection bias; if gay men consider boyish faces to be cuter, then they will upload their most boyish photo. The paper authors suggest their finding may support a theory that sexuality is influenced by fetal testosterone levels, but when you don’t control for such biases your results may say more about social norms than about phenotypes.

Quite apart from the scientific value of the research, which is perhaps best assessed by specialists, I’m concerned with the ethics and privacy aspects. I am surprised that the paper doesn’t report having been through ethical review; the authors consider that photos on a dating website are public information and appear to assume that privacy issues simply do not arise.

Yet UK courts decided, in Campbell v Mirror, that privacy could be violated even by photos taken on the public street, and European courts have come to similar conclusions in I v Finland and elsewhere. For example, a Catholic woman is entitled to object to the use of her medical record in research on abortifacients and contraceptives even if the proposed use is fully anonymised and presents no privacy risk whatsoever. The dating site users would be similarly entitled to object to their photos being used in research to which they might have an ethical objection, even if they could not be identified from their photos. There are surely going to be people who object to research in any nature vs nurture debate, especially on a charged topic such as sexuality. And the whole point of the Economist’s coverage is that face-recognition technology is now good enough to work at population scale.

What do LBT readers think?

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl