My GSoC Adventure

Noseyparker - 22 June

I discovered a tutorial on how to add a Python linter to the Atom editor. The linter was great and detected redundant statements (unused imports and variables), incorrect formatting and poor indenting. It has already flagged some issues in my code, and should make my coding quicker and more readable. The tutorial was: http://www.marinamele.com/install-and-configure-atom-editor-for-python

I tried the timing decorator (mentioned in Friday's post) and had more success in timing critical HTTP PII detection methods. Good news - the timer showed each method executed in less than 0.0002s. A typical mobile app will trigger 4 detection methods (HTTP query string, header, message body request and response) - so the combined time should be less than 1 millisecond. This is very good performance! ... so good in fact I'm skeptical ... considering it's an interpreted language running on modest hardware ... I'll keep investigating. Anyways here is the timing decorator code: https://gist.github.com/mkenne11/f6ab4e24463b0e3bed46

My repo on GitHub is now public yay! It's called nogotofail-pii and the link is (pointing to dev branch): https://github.com/mkenne11/nogotofail-pii/tree/dev

I kept a private copy of the repo which contains the full commit history.

Tomorrow - I will improve my code in the Java client. As well, I'll explore methods I can use in the next stage of the project ... reading nogotofail logs and analysing.

Noseyparker - 19 June

I tried 2 Python libraries to profile to find performance hotspots and the time taken to run key functions in my code - the cProfile and profilehooks libraries.

Both libraries produced timing results for functions, however they didn't present timings for the key functions. I looked online for clues but unfortunately couldn't find any. The parent classes for these methods are 4 levels deep in inheritance, and I suspect the profiler can't trace code this deep.

I'll keep trying settings, as well as some basic timing code. The decorator timing code technique (option 1) looks like a neat solution I'll try. See link: http://www.marinamele.com/7-tips-to-time-python-scripts-and-control-memory-and-cpu-usage

I'd like to make my GitHub repository public soon. Unfortunately in early commits I hard coded personal information which I want to remove first. I'll work out which commits I need to squash together to remove this - I do plan to keep a private repository copy with all of these commits.

Noseyparker - 18 June

Today I finished changes to the Android test harness app. I spent some time restructuring the code, moving the new "base" code into a separate PII test class - rather than use the generic test class. I created a new class because the PII tests are quite different to the generic tests.

I also tidied up the server HTTP PII detection handlers. I reviewed these classes again and realized they could be better structured, to improve readability and efficiency.

My task for tomorrow is to investigate profiling of the server HTTP PII detection handler (Python) code. From my research so far cProfile and pycallgraph look useful.

Noseyparker - 17 June

Today I finished the bulk of the work on the test harness app. Writing code to post information in a HTTP request message body (using this POST method) was tricky. I first tried using the java.net.URLConnection library to post PII data in the message body, however it wasn't showing up in the body. I then tried the org.apache.http library to send the data in JSON format - and miraculously the data appeared in the body! (Stack Overflow where would I be without you!!).

There is one scenario I couldn't code in the test harness - testing if PII appears in a HTTP response message body. This will be hard to implement, and luckily I found an app which exhibits this behavior. I will add this to the GitHub project issues list and come back to it.

I plan to tidy-up the test harness client code tomorrow and then move back to working on the server application.

Noseyparker - 16 June

Managed to solve the problem with obtaining the Application Context in the Android test harness. The problem - I was trying to fetch the Context from thin-air. I realized this after through Stack Overflow that this approach is problematic.

Instead the problem was fixed (like most things in life) by working with the tool (Android) the way it was designed, and accessing the App Context from within an Activity (basis for Android UIs).

Things went smoother after that, and I created methods to test PII detection in HTTP query strings and headers. At first, I was reading the test PII from the Android "strings.xml" file. This is a hassle, it requires the Android test harness to recompiled each time PII elements are updated. I am in the process of updating the code to read device PII using the Android PII. This leaves just some PII identifiers and details that need to be in the strings.xml file.

The good news is the server PII detection methods I tested (using the test harness) worked successfully!

Tomorrow, I'll finish this and work on test harness functions to validate PII detection in HTTP message bodies.

Noseyparker - 14 June

I spent some time over the weekend tidying up my code and refactoring. The result is everything looks neater and efficiency should have improved slightly.

Today I started work on the existing nogotofail test harness, adding test cases to validate new functionality. The test harness (like the rest of the application) is well written and it didn't take too long to understand it's structure.

I've added the PII parameters to be tested to the Android test app config file (strings.xml). However, so far I have been unable getting the Application Context to enable reading of this file. That's tomorrow's task!

Noseyparker - 12 June

Today was productive - I finished coding the event handler to detect PII in unencrypted message request bodies, both HTTP requests and responses. I tested the code and found a surprising number of apps sending apps sending PII this way.

I re-tested some of the event handlers I had previously written and added some additional error handling to manage exceptions.

I also started on tidying up code and re-factoring. I expect this will take at least a day to complete.

Noseyparker - 11 June

I continued to work on the network traffic event handler to detect PII in HTTP message bodies. I implemented the code to decompress (inflate) http message bodies compressed using deflate.

I tested parsing retrieved HTTP message body content for PII and it appears to work - however more testing is needed. Performance is still good, mobile app requests aren't timing out and they seem responsive. I was surprised to see that peak CPU usage is still below 11%.

Truncating large HTML message bodies did create an issue. The Python compression library (zlib) method I was using didn't support decompressing partial files, however after consulting the Oracle (Stack Overflow) I found another method that did!

Noseyparker - 10 June

I worked on more on the network traffic event handler to detect PII in HTTP message bodies. I coded the handler to uncompress gzip'ed http body content using Python (zlib library). I also found some HTTP responses using deflate compression, this appears to be straight forward to decompress using the standard library.

I examined app HTTP responses containing html/text content, and it appears some mobile apps act as wrappers for html generated content (hybrid apps) - and html responses can be a large size. This could have a big impact on performance, I will look at truncating this to a smaller size.

Noseyparker - 9 June

I'm feeling more confident in my understanding of the existing code, and I feel (after lots of reading code) I know better how the network traffic event handlers work.

I added a new event method to the (Non-HTTPS) request handler class to catch HTTP responses. The new event method appears to fire when expected. I created a new class (inheriting from the modified one) and caught HTTP responses - I was also able to detect if they used GZIP compression ("content-encoding" header) and if they contained content types I'm interested in ("content-type" header) such as json, html and text.

I still need to investigate other HTTP compression techniques like deflate.

Syndicate content