My GSoC Adventure

Noseyparker - 6 July

Today I worked more on the code to generate Application PII report data. I feel I'm close to finishing it, just need to iron out some logic bugs. It will excited to see the output from this report, it will provide a great insight to the accumulated data mobile apps send.

I spent some time addressing comments and making changes to the conference paper. Reviewing the results I collected a few months ago raised some questions about the pii mobile apps send to advertising providers - I'll be able to answer some of these questions with the functions I'm developing. I'm being careful not to let the scope creep, the more I look into it the more features I find I could add :)

Noseyparker - 3 July

I began work on the 3rd JSON reporting structure today Application PII data - this will display personal data leaked (unencrypted) or sent (encrypted) to online services, and flag privacy issues. The main purpose of this report is to show the accumulated data sent by a mobile app during a session. Earlier research I did showed accumulated personal data disclosed during a session could be significant.

Noseyparker - 2 July

Today I started by fixing bugs in my first report data structure, application messages i.e. application log messages grouped by HTTP hostname. Once this class was stable I used as the basis for my second data structure, application alerts i.e. application log messages grouped by alert type - INFO, WARNING, ERROR, DEBUG. It was much easier to create this class. I spent some time researching Python object oriented features, looking at private methods, abstract methods and properties.

Noseyparker - 1 July

I was able to generate a JSON data structure for my first report today - application alerts. Mapping data from multiple log files into a single structure was challenging. To work out the correct logic I needed to write all the steps as pseudocode and verify the logic, before I added code. I left the pseudocode as comments. Tomorrow I will verify the structure is correct and fine tune the logic.

I had good news last week. A research paper (I co-authored) which was the inspiration for this project was accepted at an upcoming conference. I'll need to update the paper to address the reviewers comments - so will probably need to go offline for a day or so soon.

Noseyparker - 30 June 2015

Spent most of today working on the JSON data structure for the initial report - A list of mobile apps and WARNING, ERROR and CRITICAL events detected for each. I added the draft data structure to the GitHub repo for reference: application_entries.json

It's taking longer to define the data structure than I first thought. I'm starting to better appreciate the effort and complexity of defining good data schemas!

I spent time working on the code to transform the log data into this format. However I discovered a missing element in the existing log format that needs to be addressed first.

Recently I read about Firefox introducing features to prevent user tracking. I'm interested in exploring user tracking in mobile apps - something I don't believe has been researched yet. I looked at the EasyPrivacy tracking (URL/domains) list. This is a stretch objective for my project and I added it as a project issue to explore if I have time.

Noseyparker - 29 June

Over the weekend I created a code snapshot (release) in my GitHub repository for the GSoC midterm point. It was a good time to do this - most of the core functionality has been completed. This release can be found at:

I continued work on processing application logs. Problems with the regular expressions to parse application and event logs have been worked out - I am now able to read there contents into Python dictionaries. I investigated data structures to transform the log information into HTML reports - I plan to experiment with 2 reports based on JSON data structures:

  • A list of mobile apps and WARNING, ERROR and CRITICAL events detected for each
  • The accumulated PII disclosed for each mobile app
  • The filter.js looks like a good choice to render the JSON data into HTML treeviews, and provide basic client-side filtering. It also appears to have a nice HTML template feature:

    Noseyparker - 26 June

    I spent more time working on regular expressions to filter application logs. The existing application log format is challenging to filter as there is a few exceptions to the normal rules.

    JSON is the format I am considering storing the log entries in. I also had a look at Python ObjectPath library that is used to filter JSON. It appears to be fairly comprehensive.

    However JSON may not be the best format - processing of the logs may require alot of querying and table joins, and semi-structured data like JSON doesn't seem to be ideal for this.

    Noseyparker - 25 June

    Today I worked on a Python function to parse the application main log file. I'm finding it hard to specify regex's that match the log format ... mostly cause I'm rusty with regex. I'll keep looking at it tomorrow.

    Noseyparker - 24 June

    Today I tidied up the Android (Java) client code so the updated code is as close to the original code base as possible. In yesterdays post I mentioned the Android Studio IDE had reformatted this code. After a few hours trial and error I worked out what the editor settings needed to be so no auto reformat occurs. Then it was straight forward to apply the changes I made and verify the commit diffs were cleaner. I'm now more relaxed (I dislike code clutter)!

    The rest of the day I looked more into methods for reading and formatting the application logs for the next project phase. I will continue on this tomorrow.

    One more task left for today - now the repo is public I will email the main author and check if he has any feedback.

    Noseyparker - 23 June

    Today I updated the Android client to improve the pii data format passed to the server - it was messy and inconsistent with the structure used on the server. I completed the change and merged it into the dev branch.

    The time I spent adding pii tests to the test harness was well spent. I found regression testing of code was much quicker using it.

    I did notice that the Android Java IDE (Android Studio) appears to modifying the file format, creating messy file "diff"s when I merge changes. I explored the IDE settings and believe I have turned off the auto-format settings. Fortunately I've only modified about 5 Java files so (fingers crossed) it shouldn't be too hard to apply changes to the original files and merge again.

    Why do IDEs feel they have try and second guess you?! :(

    Syndicate content