TextMessageFilterfeature will be implemented. Since it is a stream-stream gRPC call, it will operate very similar to authenticators.
Tomorrow: implement the above mentioned feature.
I tried grabbing the db download url with python's urlparse and opening the file locally with urlretrieve, but I was getting a big useless blob of data b/c db embeds the file into a page from which you can download the file. But you can apparently hack the download link but changing the value in the query part of the of the URL. An example link looks like: https://www.drpbx.com/x/xxxxxxxxxxxxxxx/LICENSE.txt?dl=0, and if you change the zero to a one the it becomes a direct download link that i can open and read in python. I wrote a function that parses the url and rebuilds a new url with the changed value. I'll be testing the new url to open and retrieve a file locally. Also, I'll look into seeing if a similar method would work for a google drive link. I did a check-in the main author who helped me think through the issue. I really appreciate the immediate feedback of a mentorship relationship when facing an unforeseen problem. The assurance that my approach wasn't wrong, that I wasn't crazy, and the ability to think through a way out of the problem with someone is critical for building skills and confidence.
I collected a couple of application log samples from the server and processed them using the updated report code. There were a few bugs which I fixed. The reporting code seems to be reasonably robust - however more testing is still required.
While collecting the application logs I checked the performance of the MitM server. With just the PII handlers running the CPU usage peaked below 15%, which is pleasing given the large amount of processing occurring and the modest server specifications. Even with a low handler frequency (running handlers every 1 in 5 requests) there is still occasional timeouts. This is most likely caused by the latency of my cloud based setup.
Have been running tests with VTune this week to try to understand how my code is interacting with my CPU and my cache. I have a meeting with David set up tomorrow to discuss the results and talk about where I can head from here.
At this point, I believe that I'm just trying to figure out how to design one of these filters, practically. I've started looking around at tutorials on particle filter design to attempt to get a better grasp on that side of things. For the short amount of time that I've been studying them, I believe that I have a pretty decent grasp of the theory behind a particle filter. Bridging the gap to actually designing my own is an interesting challenge.
Tomorrow: implement more wishlist features.
It's a holiday here in Canada, so what I was planning for today has been moved to Tuesday.
I finished making updating the PII data report JSON report to handle recent application log chaanges. The report is looking good now and and for each online domain shows: - PII disclosed over unencrypted (non-HTTPS) connections - PII disclosed over encrypted (HTTPS) connections - Query-string key/value pairs occuring for multiple requests over unencrypted (non-HTTPS) connections
Note. although some pairs are anonymous when they persist across multiple requests that could allow user tracking.
Here is a sample of the PII data report - https://github.com/mkenne11/nogotofail-pii/blob/57b24763c9f99f75a892093c778734f929db8333/nogotofail/mitm/report/samples/pii_data_report.json
I ironed out a few bugs in the report. Tomorrow I will process more application log samples and fix issues I found.