FAQ: How to find vulnerabilities?

By Gynvael Coldwind | Mon, 14 Aug 2017 00:10:59 +0200 | @domain: favicongynvael.coldwind.pl
Obligatory FAQ note: Sometimes I get asked questions, e.g. on IRC, via e-mail or during my livestreams. And sometimes I get asked the same question repeatedly. To save myself some time (*cough* and be able to give the same answer instead of conflicting ones *cough*) I decided to write up selected question and answer pairs in separate blog posts. Please remember that these answers are by no means authoritative - they are limited by my experience, my knowledge and my opinions on things. Do look in the comment section as well - a lot of smart people read my blog and might have a different, and likely better, answer to the same question. If you disagree or just have something to add - by all means, please do comment.

Q: How does one find vulnerabilities?
A: I'll start by noting that this question is quite high-level - e.g. it doesn't reveal the technology of interest. More importantly, it's not clear whether we're discussing a system vulnerability (i.e. a configuration weakness or a known-but-unpatched bug in an installed service) that one usually looks for during a regular network-wide pentest, or if it's about discovering a previously unknown vulnerability in a an application, service, driver / kernel module, operating system, firmware, etc. Given that I'm more into vulnerability research than penetration testing I'll assume it's the latter. And also, the answer will be as high-level as the question, but should give one a general idea.

My personal pet theory is that there are three* main groups of methods (I'll go in more details below):
* If I missed anything, please let me know in the comments; as said, it's just a pet theory (or actually a pet hypothesis).

1. Code review (this also includes code that had to be reverse-engineered).
2. Black box (this includes using automated tools like scanners, fuzzers, etc).
3. Documentation research.

All of the above methods have a set of requirements and limitations, and are good at one thing or the other. There is no "best method" that always works - it's more target specific I would say. Usually a combination of the above methods is used during a review of a target anyway.
[[[SPLIT]]]
1. Code review
Requirements:
• Knowledge of vulnerability classes** specific to the given technology, as well as universal bug classes.
• [In case of binary targets] Reverse engineering skills.
Benefits:
• Ability to find quite complicated bugs.
Limitations:
• Takes a lot of time and focus.


** - A vulnerability class is a type of a vulnerability that is usually well known, there are known mitigations and sometimes even known patterns/ways/tools to discover it. An example would be e.g. a stack-based buffer overflow, a reflected XSS or an SQL injection. Sometimes security bugs occur due to a couple of problems mixed together (a common example would be an integer overflow into a buffer overflow). Please note that not all vulnerabilities which are being found have actually been classified (as in "classification", not "kept secret") - you might encounter application-specific or technology-specific bugs which don't fall into any common category. For comprehensive lists of classes please check out these three links: Common Weakness Enumeration (MITRE), Adversarial Tactics, Techniques & Common Knowledge (MITRE) and OWASP Periodic Table of Vulnerabilities (special thanks to these folks on twitter and cody for links).

The general idea is to analyze the code and try to pinpoint both errors in logic and classic vulnerability classes (e.g. XSSes, buffer overflows, wrong ACLs, etc). This method is basically as good as the researcher is - i.e. if one loses focus (and just skims through the code instead of trying to truly understand it) or a given vulnerability class is unknown to them, then a bug will be missed.

This method doesn't really scale - the smaller the project, the easier it is to do a full code review. However the more lines of code, the more it's needed to actually limit the review to only interesting sections of code (usually the ones where we expect the most bugs to be, e.g. protocol / file format parsers, usage of user input, etc).

2. Black box
Requirements:
• Knowledge of vulnerability classes specific to the given technology, as well as universal bug classes.
• [Automated black box] Knowledge of how a given tool works and how to set it up.
Benefits:
• [Automated black box] Scales well.
• [Manual black box] If you trigger a bug, you found a bug. The rate of false-positives will be limited (where e.g. during code review you might think you've found a bug, but later discover that something is checked/sanitized/escaped in a lower/upper layer of the code).
Limitations:
• [Automated black box] Scanners/fuzzers are great tools, but they are pretty much limited to low-hanging fruits. Most complicated bugs won't get discovered by them.
• [Manual black box] Takes less time than a code review, but still doesn't scale as well as e.g. automated black box.
• [Manual black box] Limited by the imagination of the researcher (i.e. a lot of complicated bugs will probably not be discovered this way).


The idea here is to "poke" at the application without looking at the code itself, focusing on the interactive side instead. This method focuses on attacking the application from the surface and observing its responses, instead of going through its internals as is the usual approach during a code review.

Manual black box is usually what I would start with when working with a web application, as it gives one a general overview of the tested target. In case of other technologies setting up the application and poking around (checking ACLs, etc) doesn't hurt either.

Automated black box usually scales very well. From my experience, it's good to set up a fuzzer / scanner and let it run in the background while using another (manual) method at the same time. Also, don't forget to review the findings - depending on the technology and the tool there might be a lot of false positives or duplicates. And remember, that these tools are usually limited to only a subset of classes and won't find anything beyond that.

3. Documentation research
Requirements:
• Experience as a programmer or knowledge how a programmer thinks.
• Knowledge of vulnerability classes specific to the given technology, as well as universal bug classes.
Benefits:
• Possibility of discovering the same vulnerability in a set of similar targets.
Limitations:
• Limited by the imagination of the researcher.
• Focuses on a single protocol / format / part of the system.


The general idea is to go through the documentation / specification / reference manual and pinpoint places where the programmer implementing the target might understand something in an incorrect way. In contrast to both code review and black box review there is no interaction with the tested implementation until later phases. To give you an example, please consider the following:

Specification of a format says:
The output codes are of variable length, starting at <code size>+1 bits per code, up to 12 bits per code.

What some programmers might think (e.g. when having a bad day or due to lack of experience):
No need to check the code's length - it's guaranteed to be 12 bits or less.

The correct interpretation:
Verify that the length is 12 or less, otherwise bad things may happen.

And yes, this is a real example from the GIF specification and vulnerabilities related to this were discovered.

Having found an interesting bit one usually creates a set of inputs that break the rule specified in the documentation and then use them to test a given implementation (and perhaps other implementations as well) to check if any problem is triggered (alternatively one can also do a selective code review to check if the programmer got this right).

In some cases it's possible to automate browsing the documentation as was shown e.g. by Mateusz "j00ru" Jurczyk and (a few years later) by James Forshaw.

Final notes
Please remember that once you think you've discovered a vulnerability the work is not yet done. Usually the following steps need to be taken:

0. [Prerequisite] A potential vulnerability must be discovered.
1. [Usually not needed in black box review] An exploit triggering the vulnerability must be created to make sure the code path to the vulnerability is actually reachable (it happens surprisingly often that at this phase it turns out that the vulnerability is not triggerable due to various reasons).
2. A proof of concept exploit (commonly referred to as PoC) must be created to prove the actual effect of the vulnerability (is it really a code execution? maybe due to some reasons its effects are limited to a denial of service?).
3. [Optional] During penetration testing one usually would also create a fully weaponized exploit (spawning a shell, limiting the damage to the process, etc).

And also, please remember that not every bug is a security bug. The rule of thumb here is the following:
• A security bug (i.e. a vulnerability) breaks a context boundary (e.g. is triggerable from low-privileged context, but affects high-privileged context that normally is inaccessible to the attacker).
• A non-security bug stays within the same context (e.g. it influences only the attackers domain, things that the attacker could do anyway or things that affect only the attacker).

One more note is that even security bugs are not always the end of the world (hint: when reporting a bug, don't try to oversell it) - there are several more things that one needs to take into consideration, like the severity (in the worst case scenario, what can a skilled attacker do with such bug?) and risk of exploitation (i.e. will anyone even care to exploit this in real world? or maybe one has to do a billion attempts before hitting the correct conditions and with each attempt the server reboots?). An example of a high-severity high-risk bug is a stable remote code execution in Apache web server. On the other hand an example of a low-severity low-risk bug is changing the language of the UI in a web framework used by 10 people by exploiting an XSRF that requires guessing a 16-bit number (sure, an attacker could do it, but why bother?). A common mistake done by junior researchers (and this includes me in the past as well) is to try to claim high-severity high-risk for every bug, when that's obviously not the case.

And here you go - a high-level answer to a high-level question

In practice start by learning vulnerability classes specific to the technology you're most interested in (and in time extend this to other technologies and universal vulnerability classes as well of course), and then try all of the above methods or a mix of them. And don't be afraid to dedicate A LOT of time to this - even great researchers might go weeks without finding anything (but one still learns through this process), so don't expect to find a good bug in a few hours.

Windows Kernel Debugging - archived videos

By Gynvael Coldwind | Sat, 12 Aug 2017 00:10:58 +0200 | @domain: favicongynvael.coldwind.pl
As I mentioned in this post the last four livestreams on my YouTube channel were done by Artem "honorary_bot" Shishkin (github) and were on the quite anticipated and demanding topic of Windows kernel debugging with a healthy dose of both x86 from a system programming perspective, and an unexpected but very welcomed venture into the world of hypervisors. The series came to an end, therefore I would like again to thank Artem for both reaching out to me offering to do the streams and actually doing them in such a spectacular fashion - speaking for myself, I've learnt a lot!

All the videos are already uploaded on YouTube (links below), so in case you've missed them, nothing is lost (well, maybe for the ability to ask question, but I guess one can always reach out to Artem on Twitter). Please note that the links that Artem visited during the livestream are available for your convenience in each of the video descriptions on YouTube (if I missed anything please let me know).

Windows Kernel Debugging - Part I, in which Artem shows how to configure your kernel debugging environment in several different ways, both including a virtual machine (with/without VirutalKD) and a second PC (controlled using Intel AMT and connected using various means, e.g. USB, Firewire or ethernet).

Windows Kernel Debugging - Part II, during which Artem shows how to work with and configure WinDbg.

Windows Kernel Debugging - Part III, in which Artem goes through the meanders of virtual memory and navigating through it using WinDbg. He also goes into the details of what's in a process and kernels virtual memory.

Windows Kernel Debugging - Part IV, in which Artem showcases the physical memory and explains why a physical address is not always equal to RAM address, as well as ventures into the land of ACPI tables (if you're thinking about OSDev, you should check out this part regardless of whether you're interested in Windows kernel debugging or not). Artem also demos a hypervisor-level system debugger of his making.
[[[SPLIT]]]
On a more technical note, this was the first stream I've done with someone in a remote location (both the streams I've done with Carstein and lcamtuf were at my place), so at the initial concept phase it was a technical riddle that needed to be solved. There were two factors that came into play:

• First, I did not want to go the route of catching skype's/hangout's/your-favorite-video-conference-tool's window / sounds and restreaming that. I do believe that it may work rather well, but it seems to have many unnecessary steps in the middle. And the user has usually very little to say about the quality, codecs, etc. Though it must be noted that the capture→transmit→display lag is pretty small as these tend to be real-time communication tools.

• Second, Artem was already used to Open Broadcast Software, so we wanted to take that into account.

In the end what worked (and what surprisingly was also the first idea) was to use an RTMP restreaming server, i.e. Artem would connect to the RTMP restreaming server as the broadcaster and upload his OBS generated livestream there, and my OBS (using the Media source aka VLC plugin) would download it and embed into one of my scenes (so, to answer one of the questions that appeared on chat, no, Artem did not have to stream from my basement). This worked surprisingly well with about 2-5 second lag between Artem's action and me seeing it in OBS (where my OBS+YouTube added another 3-10 seconds before viewers saw the effect of an action).

However, given the total lag mentioned above (going into the 15 second territory in the worst case scenario) we also used a voice chat for synchronizing scene switches, so that Artem would know when his stream is live immediately as I switch the scene (and not after 15 seconds when he sees it on YouTube; it was a little easier for me as I did see the video signal he broadcasted in OBS preview with a much smaller lag).

All in all thanks for these streams, apart from learning quite a lot of cool tricks / details about WinDbg/kernel debugging/x86, I've also learnt a couple of useful tricks when doing live video broadcasting - this should give me more options to do livestreams with guests in the future. And I also can say that 4 displays which I'm currently using were barely enough to see all important windows (video previous, chats, OBS, music, voice+chat with Artem, etc) at the same time - I guess I've learnt first handed why real pro broadcasting crews have more monitors than NASA's space center ;).

So once again: thanks and kudos go to Artem! Having the opportunity I would also like to thank my livestream crew (especially foxtrot_charlie and KrzaQ) for their usual and irreplaceable help.

And that's it.

Gynvael's Summer GameDev Challenge 2017

By Gynvael Coldwind | Fri, 04 Aug 2017 00:10:57 +0200 | @domain: favicongynvael.coldwind.pl
While gamedev and coding challenges were something I've usually done on the Polish side of the mirror, I finally felt brave enough to try and invite all of my readers/viewers/visitors to participate. So, long story short, I would like to propose a month-long game development challenge - i.e. a kind of an informal (think: compo) competition where the participants (for example: you) try to create a game under a specific set of constraints written below.
[[[SPLIT]]]
EDIT: New category added 2017-08-05:
Since quite a lot of folks are asking about using game engines, game frameworks, high-level libraries for game development and other types of game makers, I have decided to add a second category to the competition: Games Created Using a Game Engine. All rules apart from "Games cannot be created in any "game maker" software or dedicated engines" still apply, and there will be a separate ranking and set of prizes (see Prizes section below) for this category. Have fun!
For the sake of this compo I will refer to the original category (the one disallowing game engines/etc) as Games From Scratch Category.
END OF EDIT

Constrains:
Game related:
• The game can be of any genre or be a mix of genres.
• The game can only use THIS (click to download) graphical art (created by Kenney; see License.txt file for licensing details, but it's CC0). Yes, this will make all the games look the same - please focus on gameplay.
• Modifying the above assets is not allowed. Exceptions: standard transformations like resizing, rotating, moving around, moving sprites to different files, converting to different file types, color transformations. When in doubt, please ask in the comment section.
• The game can use one additional font, provided that it's a monochrome font (e.g. a single color + transparency/single-background-color bitmap font or a ttf/otf/woff vector font or similar); however only characters present in the IBM 437 code page can be used. "Monochrome" requirement is related to the file - you can color the font in game as you like.
• The text in game, if any, must be in English.
• There are no rules regarding music or sound (but please make sure that you can use a given asset for commercial purposes and distribution).
EDIT: New category added 2017-08-05:
• In the Games From Scratch Category games cannot be created in any "game maker" software or dedicated engines (e.g. RPG Maker, Unity, Unreal Engine, etc). Libraries providing I/O, reading assets, rendering fonts, etc are most welcomed. If in doubt, ask in the comments.
• In the Games Created Using a Game Engine Category the above rule does not apply, i.e. you can use any "game maker", game library or game engine. All other rules apply.
END OF EDIT
• A given game MUST run on one of the following operating systems: Ubuntu Desktop 17.04, Microsoft Windows 7 or Microsoft Windows 10. One exception is: a game can run in a standard web browser (Chrome, Firefox, Edge or IE) too.
EDIT: Rules added 2017-08-05 (after discussion in the comments):
• You are allowed to use and/or create additional UI assets, however the UI must be three colors at most (one background color and two foreground colors; antialiasing can be turned on and doesn't count into the limit). I.e. please settle for a simple UI. If you are using UI assets created by someone else, please make sure the license allows you to use them for commercial purposes.
• You are allowed to use procedurally generated graphical effects (but please do not use this rule to create additional object/background assets).
END OF EDIT
Contest related:
• Games must be submitted by sending a link to a downloadable package (hosted e.g. on Google Drive or something similar) to the following e-mail address: gynvael+gamedev2017@coldwind.pl (please also add a comment under this post that you sent an e-mail - otherwise I won't know that I didn't get an e-mail and can't debug it).
• Source code MUST be included (all required binaries should be too). By submitting the game you agree to the game being re-hosted by me for anyone to download and test once the competition is over. I won't be doing anything weird with the game after the competition, but I want participants to learn from each other after the compo is over.
• Submission deadline is 3rd of September 11:59pm (23:59) CEST (i.e. GMT+2).
• Games not following above rules will not participate in the competition.
• When in doubt, ask in the comment section below.
• Gynvael's Livestream crew cannot participate in the contest (but can submit games to be showcased).

Prizes:
The prizes are somewhat symbolic, but I wanted to have some anyway:

Games From Scratch Category
Top 1: 150 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (winner's choice).
Top 2: 100 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (runner up's choice).
Top 3: 50 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (third place's choice).

Games Created Using a Game Engine Category
Top 1: 150 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (winner's choice).
Top 2: 100 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (runner up's choice).
Top 3: 50 USD (or equivalent) giftcard in amazon.com / .co.uk / .jp or .de (third place's choice).

Unfortunately due to local and international laws participants from countries where this would be disallowed due to embargoes by Switzerland/European Union/Poland are not eligible for prizes (sorry). They can still participate though.

How the games be judged:
• There are no written-in-stone rules about that, but...
• I'll generally focus on the overall game quality (gameplay, creative use of assets, fun to play).
• I will totally ignore source code quality - please focus on the gameplay/etc. If the game crashes, but is mostly playable, I'll ignore the crashes too.
• I will also ignore game size / requirements, but I do expect that one of my PCs will be able to run the game smoothly (I will test the game on one/both of the following specs: i7 8-core/16-thread 128GB RAM with GTX 980 Ti and/or i7 4-core/8-thread 32GB RAM with GTX 1080 Ti).
• I will generally try my best (including consulting with the participant if needed) to make sure that the game is running as intended by the creator, as long as eventual patches / resubmissions are within the deadline mentioned in the Constraints section.
• If I will invite more judges to participate (e.g. if I feel the need for a second opinion) they will follow the same guidelines.

FAQ:
Q: Who made the game art that we'll be using?
A: It was created by Kenney. Be sure to visit his site. If you would like to donate, he has both a donation page and patreon running.

Q: Should I write in a Readme.txt / game credits about Kenney?
A: Formally Kenney doesn't require it, but it's always a nice gesture to give credit where credits are due :)

Q: I found Kenney's assets on OpenGameArt/other site and there were additional assets in the "Space Kit" you posted. Can I use them?
A: No. For the sake of the competition please only use the once linked in the rules above.

Q: Why can't I use other graphical assets? Why must I use only the one you posted (by Kenney)?
A: Think of it as a challenge! Try to be creative, focus on the game play. This one time you don't have to worry about making your game assets look super awesome - everyone will have the same assets.

Q: You said I can use resize on the assets... Can I resize each asset to 1px and then make a new asset out of that?
A: No.

Q: Can I use the assets which are available by default on Windows/Ubuntu?
A: For sound? Yes. For graphical assets? Only fonts (see above font rules though).

Q: So how about I put some new graphical assets in the bitmap font sprite? I can use one, right?
A: You can put the assets there, but you cannot use them (see the IBM 437 rule).

Q: Ah, right. So how about I convert my graphical assets to a sound file and render said sound file as a graphical asset?
A: No.

Q: But...
A: No. Please don't try to bypass the asset rule :) Instead e.g. focus on the gameplay.

Q: I've created my own game engine in the past - can I use it in this competition?
EDIT: Answer changed on 2017-08-05 (after adding the new category):
A: Yes, you can use any game engine in the "Games Created Using a Game Engine Category". In the "Games From Scratch Category" only very simple frameworks are allowed that don't do much more than SDL/SFML/etc.
END OF EDIT

Q: I'm using this AMD Radeon / Intel specific graphical option, but your specs don't cover it.
A: Yes, in such case I won't be able to run your game. Maybe there is a way for you to work around the issue and get a similar effect using more common features?

Q: Can I write my game in JavaScript for a web browser?
A: Sure :)

Q: Hey, it's me again. About new assets - can I just put some new pixel bitmap data as a byte/float/string/whatever array in the code?
A: No. Please use only the allowed graphical assets. Seriously.

Q: So how about I convert the pixel bitmap data to a series of "PutPixel" calls?
A: No. Please don't try to bypass this rule. Pretty please. With cherry on top ;)

Q: When will the results be announced?
A: Middle/late September, depending on the number of submissions. I will do a blog post announcing the winners at some point.

Q: I never took part in any gamedev compo like this. Can I still participate?
A: Sure! It's a great learning experience!

Q: Can I blog / tweet / livestream the making of my game?
A: Sure. Keep in mind that you will be revealing your ideas to other participants this way (it's within the rules to do it, but not everyone likes doing it).

Q: So what if I create a new sprite by using top-left corner pixels of various sprites shifted by small offsets and then crop it a little?
A: ...

EDIT: Entries added 2017-08-05 (after adding the UI/procedurally generated graphics rule):
Q: Can my UI elements be a part of the game world (i.e. interact with in-game elements, e.g. collide with assets)?
A: No. This would bypass the "no new graphical assets" constraint. The UI should be the UI and not part of the world.

Q: Can my procedurally generated effects interact with the world elements?
A: It depends. E.g. it would be OK for debris falling after an explosion to collide / cause damage to the surroundings. However, please do not use the procedurally generated graphics to bypass the "no new graphical assets" constraint; these should be used for graphical effects only.
END OF EDIT

See also the comment section for possibly more questions and answers. Feel free to also ask there if you have any questions.

Final words
Feel free to let your friends know about this competition - the more, the merrier.

Have fun, good luck!

Windows Kernel Debugging livestreams

By Gynvael Coldwind | Sun, 30 Jul 2017 00:10:56 +0200 | @domain: favicongynvael.coldwind.pl
It's a real pleasure for me to announce that the next four livestreams will feature Artem "honorary_bot" Shishkin (github), who will do an introduction into a long awaited topic of Windows Kernel Debugging. Artem, in his own words, is a fan of Windows RE, debugging and low-level stuff. He's been using WinDbg for kernel debugging for several years now for fun, customizing BSODs, building Windows kernel source tree or boot dependencies graph. Sometimes he might also accidentally discover such things as SMEP bypass on Windows 8 or how to disable PatchGuard in runtime. Being a great fan of Intel and specifically VMX technology he maintains his own bicycle debugger based on a bare metal hypervisor.

When:
• 2017-08-02 (Wednesday), 8pm CET
• 2017-08-03 (Thursday), 8pm CET
• 2017-08-09 (Wednesday), 8pm CET
• 2017-08-10 (Thursday), 8pm CET

Where:
My YouTube livestreaming channel: www.youtube.com/c/GynvaelEN/live (or
gaming.youtube.com/c/GynvaelEN/live if you prefer darker theme).

How to not forget:
• Subscribe to the YouTube channel and allow notifications.
• Subscribe to Gynvael Hacking Livestreams calendar (also: ICS, calendar ID: pjta7kjkt1ssenq7fi9b6othfg@group.calendar.google.com).

Since I expect some technical problems (first time we'll be doing livestreaming with a guest in a remote location) I'll skip the usual news/announcements/mission solutions part of the streams to save some time (I'll probably do a dedicated stream for mission solutions later on). However DO expect new missions after each episode :)

See you Wednesday!

The mystery of two file descriptors

By Gynvael Coldwind | Sun, 30 Jul 2017 00:10:55 +0200 | @domain: favicongynvael.coldwind.pl
At my last livestream, around 1:02:15, I tried to show an old (as in: 2006) GDB detection trick relying on the fact that GDB "leaked" two file descriptors into the child process, i.e. the child process was spawned having 5 descriptors already allocated instead of the default 3 (stdin/stdout/stderr or 0/1/2). So I've created a small program that opened a file (i.e. allocated the next available file descriptor) and printed the descriptor, compiled it and executed (without GDB) assuming that number 3 will be printed. Instead 5 showed up and left me staring in amazement wondering what just happened. Since investigating this wasn't really the topic of my livestream I ended there, but today I found a few minutes to investigate the mysterious file descriptors. As expected, in the end it turned out that it was a mix of my mistake and unexpected behaviours of other programs. Furthermore, the descriptors could be used to escalate privileges under some very specific and weird conditions. To sum up - it turned out to be a fun bug.
[[[SPLIT]]]
My investigation started with trying to determine the nature of the file descriptors - so I started with the standard ls -l /proc/self/fd followed by lsof:

00:27:30 gynvael:haven> ls -l /proc/self/fd
total 0
lrwx------ 1 gynvael gynvael 64 Jul 30 00:27 0 -> /dev/pts/1
lrwx------ 1 gynvael gynvael 64 Jul 30 00:27 1 -> /dev/pts/1
lrwx------ 1 gynvael gynvael 64 Jul 30 00:27 2 -> /dev/pts/1
lrwx------ 1 gynvael gynvael 64 Jul 30 00:27 3 -> socket:[36875]
lrwx------ 1 gynvael gynvael 64 Jul 30 00:27 4 -> socket:[36612]
lr-x------ 1 gynvael gynvael 64 Jul 30 00:27 5 -> /proc/3800/fd

00:28:31 gynvael:haven> cat wth.c
#include
int main(void) {
getchar();
return 0;
}

00:28:42 gynvael:haven> gcc wth.c && ./a.out &
[1] 3831

00:29:21 gynvael:haven> lsof | grep `pgrep a.out`
...
a.out ... 3u IPv4 ... TCP localhost:33321 (LISTEN)
a.out ... 4u IPv4 ... TCP haven5:38666->192.168.56.1:33321 (ESTABLISHED)

Sockets listening on port 33321 - this actually rung a bell! These are sockets from my Windows↔Linux RPC interface.

The weird thing is that I could have sworn that I never noticed them before and I'm sure I listed file descriptors more than once during the years I've been using this Windows↔Linux setup. There was however one thing that changed a few months ago though - due to a bug in newer versions of GNOME Terminal on Ubuntu Server (i.e. if you don't have a full graphical environment installed it runs only from root for some reason) I recently switched to xterm. Maybe one terminal emulator made sure children processes get only stdin/stdout/stderr, and the other just passes the environment it inherited?



It turned out that that was (almost) exactly the case. I've done a quick test on three emulators (KDE konsole, xterm and GNOME Terminal) and indeed the first two passed on all inherited handles, while GNOME Terminal didn't exhibit this behaviour. However after further investigation it turned out that the latter is actually a side effect of GNOME Terminal being launched by dbus-daemon after an RPC call via dbus-launch.

Regardless of the above, of course the terminal emulators are not to be blamed for "leaking" handles - my RPC interface is.

The solution for this is rather obvious - just close the handles in the forked process when launching the terminal emulator. Since I'm using Python's subprocess.call (with shell=True) I could achieve this in various different ways:

1. Add closing the sockets (i.e. 3<&- 4<&-) to the issued commands:

command = "(cd '%s'; /usr/local/bin/xterm -fa 'Monospace' -fs 14 -bg 'rgb:00/00/00' -xrm '*metaSendEscape:true' -xrm '*eightBitInput:false' 3<&- 4<&- & )" % cwd

# Spawn.
return subprocess.call(command, shell=True)

2. Set the FD_CLOEXEC flag on the descriptors in question (e.g. like it's shown here) - this would close them when execve is invoked.

3. When creating the socket use the SOCK_CLOEXEC option.

I initially went for the first approach (enough for testing), but for the sake of the patch pushed to github I wanted a less "hacky" method, so I settled for SOCK_CLOEXEC. Sadly, it turned out that Python doesn't support this option until the Python 3 family, so I had to fall back to FD_CLOEXEC. The fix has been pushed to github.

Since the problem was fixed I started thinking what was the actual severity of this mistake. I came to a conclusion that this might be a funny (almost horizontal) local privilege escalation vulnerability if the sockets would ever be passed to a child process running under a different (less trusted / less privileged user).

The above would be possible due to a somewhat embarrassing bug with the CMD_l_cmd RPC call. By design this call should only allow the terminal to be executed in the specified location, however it seems I've messed up escaping the shell characters:

# Spawn the terminal.
cwd = cwd.replace("'", "\\'")
command = "(cd '%s'; /usr/local/bin/xterm -fa 'Monospace' -fs 14 -bg 'rgb:00/00/00' -xrm '*metaSendEscape:true' -xrm '*eightBitInput:false' 3<&- 4<&- & )" % cwd

That's not good at all - \' doesn't escape ' in Bash - you have to use '"'"' (seriously). In the current form one could just do a standard '; evil code; ' injection and would get the command launched with the privileges of the user running the RPC script.




One thing left to do is actually call the proper CMD_l_cmd implementation, though this isn't really straightforward (or maybe it is?) given that neither of the two sockets is connected to the "Linux VM" endpoint (see drawing above) - to be more specific, only the Linux VM endpoint implements the CMD_l_cmd call in a way that executes command. This would be the place where I would say "so the bug is not exploitable after all", but that turns out not to be the case, for two reasons:

1. While my RPC's description begins with words "Rather bad", it still requires knowing a secret key to be able to call the RPC. However, since the child process inherited the local RPC listening socket (i.e. socket "C" on the drawing above), it can "race" the actual RPC daemon to accept() an incoming local connection (e.g. one that happens if I click on a link in a Linux process and want it loaded in my Windows Chrome browser), and then happily receive the secret key from the connected RPC client (so it seems "Rather bad" holds true after all as this would not happen if there was a proper private/public-crypto mutual authentication scheme in play). The key can be used to then connect to the local RPC interface, authenticate, and issue the CMD_l_cmd call.

2. That said, the above is not even needed, as while the Windows endpoint doesn't actually execute the CMD_l_cmd call, it does by default forward it to the Linux VM. Well then.

Thankfully all above can be fixed by just doing a simple cwd = cwd.replace("'", "'"'"'") escape (unless... someone knows a bypass for this? if so, please let me know in the comments down below).

In the end the mystery of two additional sockets was indeed fun and led to the discovery of another interesting bug. And while severity was there, the risk was minimal (well, at least in my use case) as the exploitation scenario is really unlikely.

Debugging story: a crash while mapping a texture

By Gynvael Coldwind | Sun, 16 Jul 2017 00:10:51 +0200 | @domain: favicongynvael.coldwind.pl
Recently on my Polish livestreams I've been writing a somewhat simple raytracer (see screenshot on the right; source code; test scene by ufukufuk), with the intention of talking a bit on optimization, multithreading, distributed rendering, etc. As expected, there were a multitude of bugs on the way, some more visual than others. My favorite one so far was a mysterious buffer overflow resulting with a C++ exception being thrown when rendering in 4K UHD (3840x2160) but not in 1080p (1920x1080). While trying to find the root cause I also run into a standard C library bug with the sqrt function (though it turned out not to be related in the end), which made the run even more entertaining.
[[[SPLIT]]]
It all started with a thrown C++ exception:

.............terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Well, I had to agree that 9223372036854775808 is larger-or-equal to 845824. I immediately knew where the exception was thrown, as there was only one place in the whole code base where I used the at() method of a std::vector which could throw this exception*.
* The difference between using some_vector[i] and some_vector.at(i) is that the prior doesn't do any range checking (at least in popular implementations) while the latter does and throws an exception when i points out of bounds.

V3D Texture::GetColorAt(double u, double v, double distance) const {
(void)distance; // TODO(gynvael): Add mipmaps.

u = fmod(u, 1.0);
v = fmod(v, 1.0);
if (u < 0.0) u += 1.0;
if (v < 0.0) v += 1.0;

// Flip the vertical.
v = 1.0 - v;

double x = u * (double)(width - 1);
double y = v * (double)(height - 1);

size_t base_x = (size_t)x;
size_t base_y = (size_t)y;

size_t coords[4][2] = {
{ base_x,
base_y },
{ base_x + 1 == width ? base_x : base_x + 1,
base_y },
{ base_x,
base_y + 1 == height ? base_y : base_y + 1 },
{ base_x + 1 == width ? base_x : base_x + 1,
base_y + 1 == height ? base_y : base_y + 1 }
};

V3D c[4];
for (int i = 0; i < 4; i++) {
c[i] = colors.at(coords[i][0] + coords[i][1] * width);
}
...

I started by running the raytracer with a debugger attached and waited for it to trigger again - it took about 20 minutes to reach the state where the bug was triggered (I felt like I was debugging something that's located on Mars). However it seems that GDB doesn't break by default on C++ exceptions (TIL: you have to issue the catch throw command first):

...........terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
[Thread 26080.0x7b58 exited with code 3]
[Thread 26080.0x7024 exited with code 3]
[Thread 26080.0x8648 exited with code 3]
[Thread 26080.0x84fc exited with code 3]
[Thread 26080.0x81e8 exited with code 3]
[Thread 26080.0x525c exited with code 3]
[Thread 26080.0x53a8 exited with code 3]
[Thread 26080.0x87d0 exited with code 3]
[Thread 26080.0x7de0 exited with code 3]
[Thread 26080.0x41b0 exited with code 3]
[Thread 26080.0x63e4 exited with code 3]
[Thread 26080.0x7e10 exited with code 3]
[Thread 26080.0x6c28 exited with code 3]
[Thread 26080.0x84dc exited with code 3]
[Thread 26080.0x7044 exited with code 3]
[Thread 26080.0x816c exited with code 3]
[Inferior 1 (process 26080) exited with code 03]
(gdb) where
No stack.

I decided to change the strategy and replace the at() method with the operator[] so that the C++ exception is not thrown, but instead a Windows exception (ACCESS_VIOLATION) is raised (which are caught by default by the debugger). A quiet voice in my head told me "that won't work if the value multiplied by the size of the element will overflow and fall within the range" - and sure enough, after 40 minutes of rendering I got a 4K image without any crashes. The reason was that the actual index - 9223372036854775808 (0x8000000000000000 hexadecimal) multiplied by the size of the vector element (12 bytes) overflows into the value 0, and index 0 is perfectly valid.

The next strategy change involved actually catching the exception in C++ (as in a try-catch block) and doing lots of debug prints on such an event. This (after another 20 minutes) yielded good results:

.....u v: nan nan
x y: nan nan
base x y: 9223372036854775808 9223372036854775808
coords:
0: 9223372036854775808 9223372036854775808
1: 9223372036854775809 9223372036854775808
2: 9223372036854775808 9223372036854775809
3: 9223372036854775809 9223372036854775809
w h: 826 1024

The problem seemed to be that u and v were NaN, which is a special IEEE-754 floating point value that has the habit of being propagated all around once some expression yields it as a result, i.e. any operation on NaN will yield NaN, with a small exception of sign change which produces -NaN.

One more exception is actually casting it to integers. The C++ standard is pretty clear here - it's undefined behavior. However, in low-level land there actually needs to be an effect and usually there is an explanation for that effect. I started looking into it by running a simple snippet like cout << (size_t)NAN which produced the value 0 (and not the expected 0x8000000000000000). After doing some more experiments and reading in Intel Manual (Indefinites and Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions sections) I figured out that the x86 assembly instructions themselves (both fist/fistp from the FPU group, as well as cvttsd2si/vcvttsd2si from the SSE/AVX group) return 0x8000000000000000 (called the indefinite value) in such case (a more general rule is: all bits apart from the most significant one are cleared); the #IA fault might be thrown as well, but it's almost always masked out. That said, since technically this is an UB, the compiler can do whatever it feels like and e.g. put a 0 as the result of a (size_t)NAN cast during compilation time. To quote our IRC bot (courtesy of KrzaQ):

<@Gynvael> cxx: { double x = NAN; cout << (size_t)x; }
<+cxx> 9223372036854775808
<Gynvael> cxx: { cout << (size_t)NAN; }
<+cxx> 0

Back to the original problem. Looking at the code, u and v actually come from outside of the function and are modified only in the first lines of the function:

V3D Texture::GetColorAt(double u, double v, double distance) const {
...
u = fmod(u, 1.0);
v = fmod(v, 1.0);
if (u < 0.0) u += 1.0;
if (v < 0.0) v += 1.0;
...

The fmod function doesn't seem to ever return NaN, so u and v arrived at the function already containing the NaN value. The function itself is called from the main shader and the interesting values are the result of a call to the V3D Triangle::GetUVW(const V3D& point) function. The function itself uses the barycentric interpolation method to calculate the UVW texel coordinates given three points making a triangle, UVW values for each of them and an "intersection" point on the triangle that the interpolated value is calculated for. It looks like this:

V3D Triangle::GetUVW(const V3D& point) const {
V3D::basetype a = vertex[0].Distance(vertex[1]);
V3D::basetype b = vertex[1].Distance(vertex[2]);
V3D::basetype c = vertex[2].Distance(vertex[0]);

V3D::basetype p0 = point.Distance(vertex[0]);
V3D::basetype p1 = point.Distance(vertex[1]);
V3D::basetype p2 = point.Distance(vertex[2]);

V3D::basetype n0 = AreaOfTriangle(b, p2, p1);
V3D::basetype n1 = AreaOfTriangle(c, p0, p2);
V3D::basetype n2 = AreaOfTriangle(a, p1, p0);

V3D::basetype n = n0 + n1 + n2;

return (uvw[0] * n0 + uvw[1] * n1 + uvw[2] * n2) / n;
}

The barycentric interpolation method itself is actually pretty simple - the final result is the weighted average of the UVWs of all three points making the triangle, where the weight of the value is actually the area of the triangle made by the provided "intersection" point and the two triangle points that are not the point the weight is calculated for (see the slides liked above for a better explaination).

After adding some more debug prints and waiting another 20 minutes it turned out that the AreaOfTriangle returned the NaN value in some cases when the triangle in question was actually a line (i.e. one of the three points that make the triangle was located exactly on the edge between two other points). This led me to the AreaOfTriangle function itself:

static V3D::basetype AreaOfTriangle(
V3D::basetype a, V3D::basetype b, V3D::basetype c) {
V3D::basetype p = (a + b + c) / 2.0;
V3D::basetype area_sqr = p * (p - a) * (p - b) * (p - c);

return sqrt(area_sqr);
}

The function is rather simple - it uses the Heron's Formula to calculate the area given the length of all three edges of the triangle, by calculating half of the perimeter (variable p in my code), then multiplying with the difference between each length of the edge and p and then calculating the square root out of the result. The last bit is what caught my attention - as far as I knew sqrt() function would actually return NaN if a negative value was fed to it.

I started by verifying both in the documentation and experimentally for the value -1 (no, these are not complex numbers, it's not i). The cppreference.com said "a domain error occurs, an implementation-defined value is returned (NaN where supported)", cplusplus.com stopped at "a domain error occurs", MSDN mentioned "returns an indefinite NaN" and man for glibc agreed. The experimental part had actually much weirder results: on Ubuntu sqrt(-1) yielded the value -NaN (well, I guess -NaN is still technically a NaN), but on Windows I've got -1 (wait, what?). I tried on Windows with a couple other values and it turned out that sqrt is just returning the negatives as I pass them. Given that MSDN claimed that a NaN will be returned I launched IDA and found that the mingw-w64 GCC 7.0.1 compiler I'm using actually has it's own implementation of the sqrt function, which happened to suffer from a regression in the recent months. Oh well.

Wait. Why did I actually get NaN in my raytracer if there is such a bug? It turned out that I'm actually using -O3 when compiling my app and using -O3 causes a sqrt to actually work correctly (probably due to some random compiler optimizations) and return NaN (a positive one for a change).

This left the question of "why is the value passed to sqrt negative?". The answer is the one you would expect: floating point inaccuracies. I printed the exact (somewhat - i.e. in decimal form) lengths of the triangle edges that were passed to the function upon crash:

a=66.22148101937919761894590919837355613708496093750
b=18.93672089509319533817688352428376674652099609375
c=47.28476012428599517534166807308793067932128906250

Given that one of the points lies on the edge between the two other, a should be exactly equal to b+c, however that wasn't the case:

a =66.2214810193791976189459091983735561370849609375
b+c =66.2214810193791834080911939963698387145996093750
a-(b+c)=0.00000000000001421085471520200371742248535156250

This also means that one of the brackets in the area_sqr calculation will probably yield a negative value (even though 0 would be the correct result):

p =66.22148101937918340809119399636983871459960937500
p-a=-0.00000000000001421085471520200371742248535156250
p-b=47.28476012428598806991431047208607196807861328125
p-c=18.93672089509318823274952592328190803527832031250

After the multiplications and the sqrt call we arrive at the infamous NaN, which solves the whole riddle.

The fix was rather easy - since I know that there is no such thing as "negative area", I can just check if area_sqr is negative and correct it to 0.0 in such a case:

// It seems that due to floating point inaccuracies it's possible to get a
// negative result here when we are dealing with a triangle having all points
// on the same line (i.e. with a zero size area).
if (area_sqr < 0.0) {
return 0.0;
}

To sum up, the invalid index calculation was caused by float inaccuracies earlier on in the calculations, which caused an area of triangle to be a square root of a negative number. I must admit I really had fun debugging this bug, especially that I encountered both the interesting NaN-to-integer cast scenario, as well as the sqrt mingw-w64 bug on the way.

And that's about it.

Blind ROP livestream followup

By Gynvael Coldwind | Thu, 13 Jul 2017 00:10:50 +0200 | @domain: favicongynvael.coldwind.pl
Yesterday I've done a livestream during which I tried to learn the Blind Return Oriented Programming technique. Well, that isn't fully accurate - I already read the paper and had one attempt the evening before, so I knew that the ideas I wanted to try live should work. And they did, but only in part - I was able to find a signaling gadget (i.e. a gadget that sent some data to the socket) which could be used later on, as well as a "pop rsi; ret"-equivalent gadget (using a different method than described in the paper - it was partly accidental and heavily challenge specific). But throughout the rest of the livestream I was able to find neither a puts function nor use the gadget I already had to get a "pop rdi; ret"-equivalent gadget. The bug, as expected, was trivial.

Long story short (a more verbose version follows), my mistake was in the following code:

if err == "CRASH" and len(ret) > 0:
return (err, ret)

The correct form is:

if err == "DISCONNECTED" and len(ret) > 0:
return (err, ret)

And that's it. If you were on the YT chat or IRC you probably saw me "facepalming" (is that even a word? it should be) and pointing out the bug even before the stream went offline (i.e. during the mission screen). In the next 5 minutes I had both gadgets I needed, so that was the only bug there. Oh well.

Full version of the story:
[[[SPLIT]]]
I'll start by saying that if you don't know what ROP is (Return Oriented Programming, which is actually an exploitation technique), you might want to start there as BROP (or Blind ROP) is a technique that heavily builds on top of ROP. Here are some random links about ROP:

The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86) (Hovav Shacham)
Return-Oriented Programming: Exploits Without Code Injection
Doing ret2libc with a Buffer Overflow because of restricted return pointer - bin 0x0F (LiveOverflow)
ROP with a very small stack - 32C3CTF teufel (pwnable 200) (LiveOverflow)
And also:
Return-oriented exploiting
Hacking Livestream #20: Return-oriented Programming

There were some follow-up papers and techniques , e.g. Return-Oriented Programming without Returns, but the above should be enough to get you started.

One of such follow-ups was the Hacking Blind paper by Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazieres and Dan Boneh from Stanford University - a really cool piece of research that showed that you don't really need the binary to find gadgets. I highly recommend reading the paper, but as a summary I'll note that it boils down to first locating the binary in memory (by basically reading the stack using e.g. Ben Hawkes' byte-by-byte brute force technique also described by pi3 in Phrack), then finding a gadget which has an observable effect (e.g. it hangs the connection or sends some additional data through the socket) and finally finding gadgets using pretty clever heuristics that boil down to observing one of three signals (a crash, the normal behavior, or the aforementioned observable effect). So yeah, it's the blind SQLI of ROP ;).

The original paper showcases several optimizations, but I decided to start with the basics (I'm a fan of reinventing the wheel). I prepared a small forking-server binary (an echo server basically) that had PIE and stack cookies disabled (and ASLR was also disabled system-wide), compiled it statically and run on a virtual machine.

Both the binary and the source code created before and after the stream are available on my github (the binary listens on port 1337; please note that when learning blind ROP you should not look at the binary in IDA/etc and pretend you don't have access to it).

The first step was to look for the bug (though that was trivial in this case as there is only one input length+data pair basically) and the size of the overflowed buffer. This was done by using growing data lengths (from 1 byte upwards) and checking whether the "Done!" message was transmitted via the socket (denoted as "OK" in my code) or whether the connection abruptly stopped before that (denoted as "DISCONNECTED" or "ERROR" in my socket code, or just "CRASH" later on - I mixed this up on the stream thus I couldn't get it to work to the full extent in the end). It turned out that the buffer was 40 bytes long.

The next step was to find a signaling gadget. A signaling gadget (similarly to a signaling behavior in Blind SQLI) can either be some data transmitted over the socket or a connection hang (i.e. the server application entering an infinite loop or hanging on some input which can be detected e.g. using the socket timeout mechanism). During the livestream I found a gadget which did both, i.e. it printed a 16-hexdigit number (so it was probably a printf("%.16x", arg2) function call) and then started reading from the socket (which basically stopped the application and caused a socket timeout on my side).

This gadget was actually really useful, since it being (probably) a printf("%.16x", arg2) call meant that jumping just after the instruction that puts the second argument in the RSI register (64-bit Linux calling convention) would output the current value of the RSI register. This also meant that I could easily find a "pop rsi; ret"-equivalent gadget (I keep writing "-equivalent" as found gadgets might be a lot large and have unpredictable side-effects, but in the end they would e.g. also pop a value from the stack and place it in a given register) just by using the following ROP-chain:

[unknown-scanned address]
[0x4141414141414141]
[address of the printf("%.16x", rsi) gadget]

The unknown-scanned address would be a sequential-or-random address within the assumed range of the executable section of the binary. Since one needs probably quite a lot of attempts to find such gadget (tens of thousands) I decided (as the paper suggested) to use multiple threads/connections to do it (20 threads worked fine in my case). Part of the output log looked like this (this is actually from the pre-stream version):

0x37b4 PRINT
................................................00000000006ee3c3
Data size:
0x37ef PRINT
......................................................................................................
......................................................................................................
.....................................00000000006ee3c3
Data size:
0x38e14141414141414141
Data size:
PRINT0x38e2
PRINT
.............................4141414141414141

Due to multithreading this isn't really readable, but one can observe some non-4141414141414141 values (mainly 00000000006ee3c3) and then in the end the magical 4141414141414141 value right next to the "PRINT 0x38e2" information (which means: the gadget is at image_base+0x38e2 address).

The next step was to find the puts function (followed probably by a crash) that could be used to easily find a "pop rdi; ret"-equivalent gadget (the idea was to do puts(image_base+1) which would echo "ELF" when the correct gadget is found), but for some reason it didn't work (yeah, this is the place where the "CRASH"/"DISCONNECTED" mix up started to be a problem). I then tried to use the print-gadget I had already, but actually jumping 7 or 8 bytes after it to skip setting the RDI register (and using that instead of puts, risking an accidental format tag like %s at the beginning of the binary resulting in a false negative). This didn't work either (though probably because I wasn't patient enough with 8 bytes variant - well, a stream is time limited).

After some more minutes of trying I decided to finish the stream at that point (it was already 1.5h long) and promised to do this blogpost about what went wrong. So yeah, I mixed up the name of the signal from my socket-helper implementation (serves me right for using strings instead of any sane type of enums/constants).

When I spotted the problem I was able to both find the puts gadget I wanted and the "pop rdi; ret"-equivalent gadget within the next few minutes. Having these two what was left to do was actually just dumping the binary as a puts(controlled_rdi) allows one to do it by sequentially reading the memory (and assuming that there is a \0 at the end of every piece of data). It must be noted that it took several hours to dump the binary (as it was over 1MB in size).

Having the binary would bring this to a standard ROP exploitation which wouldn't take much more time (especially if the magic gadget would be found in the binary; I guess one can look for this gadget in BROP as well).

So there you have it :). Again I would like to recommend reading the paper, especially the sections about the "BROP-gadget" and PLT/GOT methods - I really liked them.

And that's it.

Announcing Bochspwn Reloaded and my REcon Montreal 2017 slides

By j00ru | Tue, 20 Jun 2017 16:14:58 +0000 | @domain: faviconj00ru.vexillium.org
A few days ago at the REcon conference in Montreal, I gave a talk titled Bochspwn Reloaded: Detecting Kernel Memory Disclosure with x86 Emulation and Taint Tracking. During the presentation, I introduced and thoroughly explained the core concept, inner workings and results of my latest research project: a custom full-system instrumentation based on the Bochs […]

Google Capture the Flag 2017 Quals start today

By Gynvael Coldwind | Fri, 16 Jun 2017 00:10:49 +0200 | @domain: favicongynvael.coldwind.pl
Google Capture the Flag 2017 (Quals) start today at 2am CEST tonight! The format will be pretty standard - teams / jeopardy / 48h / dynamic scoring, with one interesting addition - the submitted write-ups might win rewards ($100 - $500) as well - see the rules for details. In any case, I hope it turns out to be a fun and challenging CTF - I personally have every reason to hope the best, as the people behind the scenes are both amazing and experienced. And hey, there is even one task from me :)

I wish you all best of luck!

HF GL

Windows Kernel Local Denial-of-Service #5: win32k!NtGdiGetDIBitsInternal (Windows 7-10)

By j00ru | Mon, 24 Apr 2017 09:39:26 +0000 | @domain: faviconj00ru.vexillium.org
Today I’ll discuss yet another way to bring the Windows operating system down from the context of an unprivileged user, in a 5th and final post in the series. It hardly means that this is the last way to crash the kernel or even the last way that I’m aware of, but covering these bugs indefinitely […]

Windows Kernel Local Denial-of-Service #4: nt!NtAccessCheck and family (Windows 8-10)

By j00ru | Mon, 03 Apr 2017 10:59:46 +0000 | @domain: faviconj00ru.vexillium.org
After a short break, we’re back with another local Windows kernel DoS. As a quick reminder, this is the fourth post in the series, and links to the previous ones can be found below: Windows Kernel Local Denial-of-Service #3: nt!NtDuplicateToken (Windows 7-8) Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10) Windows Kernel Local Denial-of-Service #1: […]

Next livestream: creating Binary Ninja plugins (by carstein)

By Gynvael Coldwind | Tue, 21 Mar 2017 00:10:43 +0100 | @domain: favicongynvael.coldwind.pl
Tomorrow (Wednesday, 22nd of March) at 8pm CET on my weekly Hacking Livestream I'll be joined by Michal 'carstein' Melewski to talk about creating plugins for Binary Ninja. Well, actually Michal will talk and show how to do it, and I'll play the role of the show's host. The plan for the episode is the following (kudos to carstein for writing this down):

1. Little intro to Binary Ninja - 5 minutes
2. Working with Binary Ninja API - console and headless processing
- how to use console effectively
- documentation

3. Basics of API
- Binary view
- Functions,
- Basic Blocks
- Instructions,
- Low Level Intermediate Language (infinite tree)

4. Syscall problem,
- first scan in console,
- simple annotation (getting parameter value)
- going back the instruction list
- detecting same block?

5. Future of API and what can you do with it?
- links to presentation (ripr, type confusion)

See you tomorrow!

P.S. We'll have a single personal Binary Ninja license to give away during the livestream, courtesy of Vector 35 folks (thanks!). Details will be revealed on the stream.

My 0CTF 2017 Quals write-ups

By Gynvael Coldwind | Mon, 20 Mar 2017 00:10:42 +0100 | @domain: favicongynvael.coldwind.pl
During the weekend I played 0CTF 2017 Quals - we finished 15th and therefore sadly didn't qualify. The CTF it self was pretty fun since the tasks had always a non-standard factor in them that forced you to explore new areas of a seemingly well known domain. In the end I solved 4 tasks myself (EasiestPrintf, char, complicated xss and UploadCenter) and put down write-ups for them during breaks I took at the CTF.

*** EasiestPrintf (pwn)
http://blog.dragonsector.pl/2017/03/0ctf-2017-easiestprintf-pwn-150.html
You've got printf(buf) followed by an exit(0), an unknown stack location and non-writable .got - this was was mostly about finding a way to get EIP control (and there were multiple ways to do it).

*** char (shellcoding)
http://blog.dragonsector.pl/2017/03/0ctf-2017-char-shellcoding-132.html
ASCII ROP, i.e. only character codes from the 33-126 range were allowed.

*** Complicated XSS (web)
http://blog.dragonsector.pl/2017/03/0ctf-2017-complicated-xss-web-177.html
XSS on a subdomain, mini-JS sandbox and file upload.

*** UploadCenter (pwn)
http://blog.dragonsector.pl/2017/03/0ctf-2017-uploadcenter-pwn-523.html
A controlled mismatch of size passed to mmap and munmap.

I've added my exploits to the write-ups as well.

That's it.

Windows Kernel Local Denial-of-Service #3: nt!NtDuplicateToken (Windows 7-8)

By j00ru | Tue, 07 Mar 2017 15:34:35 +0000 | @domain: faviconj00ru.vexillium.org
This is the third post in a series about unpatched local Windows Kernel Denial-of-Service bugs. The list of previous posts published so far is as follows: Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10) Windows Kernel Local Denial-of-Service #1: win32k!NtUserThunkedMenuItemInfo (Windows 7-10) As opposed to the two issues discussed before, today’s bug is not in the graphical […]

Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10)

By j00ru | Mon, 27 Feb 2017 14:49:32 +0000 | @domain: faviconj00ru.vexillium.org
Another week, another way to locally crash the Windows kernel with an unhandled exception in ring-0 code (if you haven’t yet, see last week’s DoS in win32k!NtUserThunkedMenuItemInfo). Today, the bug is in the win32k!NtDCompositionBeginFrame system call handler, whose beginning can be translated into the following C-like pseudo-code: NTSTATUS STDCALL NtDCompositionBeginFrame(HANDLE hDirComp, PINPUT_STRUCTURE lpInput, POUTPUT_STRUCTURE lpOutput) […]

Windows Kernel Local Denial-of-Service #1: win32k!NtUserThunkedMenuItemInfo (Windows 7-10)

By j00ru | Wed, 22 Feb 2017 16:24:23 +0000 | @domain: faviconj00ru.vexillium.org
Back in 2013, Gynvael and I published the results of our research into discovering so-called double fetch vulnerabilities in operating system kernels, by running them in full software emulation mode inside of an IA-32 emulator called Bochs. The purpose of the emulation (and our custom embedded instrumentation) was to capture detailed information about accesses to user-mode memory […]

Finishing oxfoo1m3 crackme

By Gynvael Coldwind | Mon, 20 Feb 2017 00:10:39 +0100 | @domain: favicongynvael.coldwind.pl
On the last episode of Hacking Livestream (#10: Medium-hard RE challenge - see below) I've shown how to approach a medium-hard reverse-engineering challenge. The example I used was the oxfoo1m3 challenge found in the "Level5-professional_problem_to_solve" directory of crackmes.de archive (this one), which I picked using such complex criteria as "something that runs on Ubuntu" and "something 32-bit so people with the free version of IDA can open it". As expected (and defensively mentioned several time during the stream), I was not able to complete this challenge during the livestream itself (which is only one hour, and that includes news and updates, and Q&A). However I did finish the task two days ago. It turned out I was close to the goal - took only around 30 minutes of additional work (which makes me wonder if Level5 is actually close to an RE300 challenge; probably it's closer to RE200). Anyway, here is the promised part 2 of the solution.

Note 1: While I'll write down a short recap of the initial steps and discoveries, please take a look at the recording of the episode #10 for details (crackme starts at 15m40s). If you've already seen it, just jump to part 2 in the second half of this post.



Note 2: Since this post is meant to have some education value I'll assume that the readers have only basic knowledge on RE techniques, and therefore I'll try to be verbose on some topics which are most likely well known amongst the more senior folks.
[[[SPLIT]]]

Part 1: Recap

Initial recon (i.e. reconnaissance phase) consisted of a couple of rather simple steps:
  • Obvious things first: just running the binary and seeing how it behaves. Surprisingly we could already derive the possible password length just by looking what was left in the standard input buffer after the crackme stopped (compare the input string to the "asjdhajsdasd" letters seen as the next commands).



  • Running the binary via strace - this actually didn't go too well due to ptrace(PTRACE_TRACEME) protection.



  • Viewing the file in a hex editor - showed quite a lot of "X" characters.



  • Checking the entropy level chart (though at this point it was already known it will be low, as per the hexeditor inspection). You can read more about this entropy-based recon technique in this post from 2009.



  • And finally loading the binary into IDA / Binary Ninja.


Up to this moment we already learned that it's a rather small crackme with no non-trivial encryption (though the entropy test did not rule out trivial encryption - i.e. substitution ciphers like single-byte-key XORs, etc - and such was indeed encountered later on) that actively tries to make debugging and reverse-engineering harder. Not much, but it does give one a broad picture and it usually takes a few minutes tops, which makes it worth it.

PTRACE_TRACEME trick

We've also found the first anti-debugging trick, which was the aforementioned ptrace(PTRACE_TRACEME) call.

The idea behind it is related to the fact that a process can be debugged (ptrace() is the debugging interface on Linux) at most by one debugging process at a time. By calling ptrace(PTRACE_TRACEME) the callee process tells the kernel that it wants to be debugged by its parent (i.e. that its parent process should be notified of all debugger-related events). This call will result in one of two outcomes:

  • The call succeeds and the parent process becomes the debugger for this process (whether it wants it or not). The consequence of this is that no other debugger can attach to this process after this call.
  • The call fails, ptrace(PTRACE_TRACEME) returns -1 and errno is set to EPERM (i.e. "operation not permitted" error). This happens only* if another debugger is already tracing this process, which means this can be used to detect that someone/something is debugging this process and act accordingly (e.g. exit the process complaining that a debugger is present or misbehave in weird ways).

* - This isn't entirely true; it can also fail if the callee process is created from a SUID binary owned by root and was executed by a non-root user.

So the anti-RE-engineer wins regardless of what ptrace returned.

The usual way to go around this is to find the ptrace(PTRACE_TRACEME) call and NOP it out, though in the past (uhm, during a competition in 2006) I remember doing an ad-hoc kernel module which would fake this function for chosen processes.

Another way is to catch the syscall with a debugger and fake the return value, however in our case this turned out to not be possible since the ELF file was modified in a way that prevented loading it in GDB:



I'll get back to bypassing the anti-GDB trick in part 2 down below.

Simple assembly obfuscation schema

Analyzing the binary in IDA quickly led to the discovery of a simple yet moderately annoying obfuscation schema:



As one can see in screenshot above, between each "real" instruction there is a 30-byte block which basically jumps around and does quite a lot of nothing. The sole purpose of that is making the assembly dead-listing (i.e. the output of a disassembler) less readable.

There are several ways to mitigate this kind of protections:
  • One is to simply ignore it, as I did on the livestream. I figured that since it consists only of 30 byte blocks I can just manually make sure all the "real" instructions are disassembled correctly, and then I can skip the non-meaningful ones.
  • Another idea would be to write a simple script which just changes these instructions to NOPs - this would be pretty easy as it seems the whole block consists always of the same bytes (i.e. the call's argument is relative - therefore constant, the offset to _ret is constant, and everything else also uses always the same immediate values, instructions and registers). This simplifies the dead-listing analysis as only the meaningful instructions are left in the end.
  • Yet another way - tracing + simple filtering - is shown in the second half of this post.

Trivial encryption layer

After finding all of the "real" instructions (there was only a handful of them) it turned out that the code is doing in-memory decryption of another area of the code before jumping there:

LOAD:080480C2 mov esi, offset loc_8048196
LOAD:080480CE mov edi, esi
LOAD:080480EE cld
LOAD:0804810D mov ecx, 0A80h
LOAD:08048135 lodsb
LOAD:08048154 xor al, 0x58; 'X'
LOAD:08048174 stosb
LOAD:08048193 loop loc_8048135

The decryption is actually just XORing with a single-byte key - letter 'X', or 0x58 - which explains both why the entropy was low (as mentioned before, certain types of ciphers - which I personally call "trivial ciphers", though it's not really a correct name - don't increase the entropy of the data; XOR with a single key falls into this category) and why we saw so many X'es in the hex editor (it's common for a binary to have a lot of nul bytes all around, and 0^0x58 gives just 0x58).

Again, there are several ways to go about it:

  • First one, which I've learned during the livestream from one viewers [uhm, I'm sorry, I don't remember your name; let me know and I'll put it here], is using the XOR decrypt function in Binary Ninja (if you're using this interactive disassembler): in hex view select the bytes, then right-click and select Transform → XOR → 0x58 as key.
  • Second would be doing a similar thing in IDA - as far as I know this would require a short Python/IDA script though.
  • And yet another method, which I usually prefer is writing a short Python script that takes the input file, deciphers the proper area and saves it as a new file.

The reason I personally prefer the last method is that it's easy to re-run the patching process with all kind of changes (e.g. switching up manual software breakpoints, commenting out parts of the patches, etc).

In any case, the initial script looked like this (Python 2.7):

# Read original file.
d = bytearray(open("oxfoo1m3", "rb").read())

OFFSET = 0x196 # Offset in the file of encrypted code.
LENGTH = 0xA80 # Length of encrypted code.

# Decipher.
k = bytearray(map(lambda x: chr(x ^ 0x58), d[OFFSET:OFFSET+LENGTH]))
d = d[:OFFSET] + k + d[OFFSET+LENGTH:]

# Change XOR key to 0 (i.e. a XOR no-op).
d[0x154 + 1] = 0

# Write new file.
open("deobfuscated", "wb").write(str(d))

Executing the script yielded in a deciphered executable and the analysis could continue.

The end of the livestream

The last thing I did during the livestream was to look around the code and finally to try to figure out where is that PTRACE_TRACEME protection executed from (so I could no-op it). I tried to do this by injecting the CC byte (which translates to int3, i.e. the software breakpoint; it's also one of the very few x86 opcodes that are worth to remember) at a few spots that looked interesting (one of them was an "int 0" instruction), then run the modified version and analyze the core file it generates (i.e. the structured memory state which is dumped when a process is killed due to certain errors on Linux; courtesy of ulimit -c unlimited setting in Bash, amongst other things). This resulted in a weird crash where EIP was set to 0xf001 (which happens to be crackme author's nickname), what made me think there is some kind of a checksum mechanism in play (turned out to be close, but not accurate).

And at think point the livestream has ended.

Part 2: End game

A few days after the livestream I found some time to get back to the challenge. I decided to continue with a different approach - one of using instruction-grained tracing. But before I could do that I had to deal with a problem of not being able to run the challenge under a debugger (due to the ELF file being modified in a way which GDB disliked).

The method I would use in an environment that has a JIT debugger (i.e. Just-In-Time debugger - a registered application which starts debugging an application when it crashes; it's a Windows thing) would be inserting z CC at program's entry point. This would make the crackme crash at the first instruction and automatically attach the debugger - pretty handy.

On Ubuntu however I had to insert an infinite loop instead (a method also suggested by one of my viewers), i.e. bytes EB FE - another opcode worth writing down and remembering. This was done by adding the following line to my Python patching script:

d[0x80:0x82] = [0xeb, 0xfe] # jmp $
# original bytes: e8 01
# set *(short*)0x08048080=0x01e8

Thanks to this approach, after executing the crackme it basically "hangs" (it's technically running, but always the same infinite-loop instruction) there is time to calmly find the PID of the process, attach the debugger, replace the EB FE bytes with the original ones and start the trace.

As previously, I like to do this with a script (this time a GDB script):

set *(short*)0x08048080=0x01e8
set disassembly-flavor intel
set height 0

break *0x8048195
c

# Most basic form of tracing possible.
set logging file log.txt
set logging on
while 1
x/1i $eip
si
end

The script above can be executed using the following command:
gdb --pid `pgrep deobfuscated` -x script.gdb

The result is a log.txt file containing a runtime trace of all executed instructions from 0x8048195 up until the application crashes.

0x08048195 in ?? ()
=> 0x8048195: ret
0x0804809e in ?? ()
=> 0x804809e: call 0x80480a4
0x080480a4 in ?? ()
=> 0x80480a4: pop edx
0x080480a5 in ?? ()
=> 0x80480a5: add edx,0xb
0x080480ab in ?? ()
=> 0x80480ab: push edx
0x080480ac in ?? ()
=> 0x80480ac: ret
0x080480ae in ?? ()
...

The first thing to do was to filter out the log, i.e. remove all lines containing obfuscation instructions, as well as instructions that don't contain any instructions at all (e.g. "0x080480ac in ?? ()"). This can be done with a simple set of regex in your favorite text editor. In my case (gvim) it boiled down to:

%g/^0x/d
%g/push/d
%g/pop/d
%g/call/d
%g/add\tedx,0xb/d
%g/add\s\+edx,0xb/d
%g/add\s\+edx,0xe/d
%g/add\s\+ecx,0x9/d
%g/ret/d

Please note that the above approach is pretty aggressive in the sense that it might remove actual meaningful instructions; but in the end that didn't matter in this case (in other cases it might though, so keep this in mind).

This resulted in a rather short assembly snippet (I've added a few comments):

=> 0x804869c: xor eax,eax
=> 0x80486c9: xor ebx,ebx ; EBX=0 (PTRACE_TRACEME)
=> 0x80486f6: mov ecx,ebx
=> 0x80486f8: inc ecx
=> 0x80486f9: mov edx,ebx
=> 0x8048726: mov esi,eax
=> 0x8048753: add eax,0xd
=> 0x8048783: shl eax,1 ; EAX=26 (ptrace)
=> 0x80488b2: mov edx,0x8048a4e ; Address of int →0← argument.
=> 0x80488e2: mov ch,BYTE PTR [edx] ; Grabbing the argument byte.
=> 0x804890f: mov cl,0x10
=> 0x804893c: shl cl,0x3 ; CL = 0x80
=> 0x804896a: mov BYTE PTR [edx],cl ; It's now int →0x80←.
=> 0x8048997: xor ch,cl ; Checking if the old argument
; was not 0x80.
=> 0x80489c4: je 0x8048653 ; If it was, go crash.
=> 0x8048a4d: int 0x80 ; Call to ptrace()
=> 0x8048aa7: mov edx,0x8048a4e
=> 0x8048ad7: mov BYTE PTR [edx],cl
=> 0x8048826: or al,al ; Check if ptrace() failed.
=> 0x8048853: jne 0x8048653 ; It did, go crash.
=> 0xf001: go.gdb:10: Error in sourced command file:
Cannot access memory at address 0xf001
Detaching from program: , process 30843

Analyzing the above code shows that the int 0 instruction discovered earlier has its argument replaced by 0x80 (Linux system call interface), but also there is a check that checks if the argument wasn't already 0x80 (it's probably this that I mistakenly assumed would be a checksum mechanism; keyword: probably, as I'm having some doubts whether I actually triggered this). After that, at 0x8048a4d ptrace(PTRACE_TRACEME) is called and the return value is checked; if it's non-zero, the 0x8048653 branch is taken and the crackme crashes (note that since this is the filtered trace, we don't see the actual instructions that cause the crash; but it's basically push 0xf001 + ret).

To mitigate this protection is enough to NOP out (i.e. replace with byte 0x90 - the no-operation instruction) the jump at 0x8048853. To do this I've added the following line to my Python patching script:

# nop-out ptrace check
d[0x853:0x859] = [0x90] * 6

Once this was done and the executable was regenerated, I've run the tracing again and filtered it with the aforementioned regex. This resulted in a not much larger listing and another crash, but the content itself was really interesting and strongly hinted at the solution:

=> 0x804832d: mov edx,0xb
=> 0x804835d: int 0x80
=> 0x80483e2: mov esi,0x8048223 ; Input address.
=> 0x8048412: mov edi,0x8048223
=> 0x8048442: mov ecx,0xb
=> 0x8048472: lods al,BYTE PTR ds:[esi]
=> 0x804849e: xor al,dl
=> 0x80484cb: inc dl
=> 0x8048524: neg ecx
=> 0x8048551: add ecx,0x8048239 ; String "myne{xtvfw~".
=> 0x8048582: cmp al,BYTE PTR [ecx]
=> 0x8048584: je 0x80485a4

The code above basically fetches one byte of input, XORs it with 0xB (which is then increased to 0xC, 0xD, and so on), and compares with one byte of the weird "myne{xtvfw~" string. This means that to get the password one needs to XOR the weird string with 0xB+index:

>>>''.join([chr(ord(x) ^ (i+0xb)) for i, x in enumerate(m)])
'fucktheduck'

This resulted in another weird string, which, guess what, this indeed was the solution :)



To sum up, what I really liked about this crackme was that it used multiple protections, but all of them were pretty simple and didn't overdo it - this makes it a great challenge for early intermediate reverse-engineers to practice their skills.

On why my tbreak tracing trick did not work

By Gynvael Coldwind | Thu, 02 Feb 2017 00:10:38 +0100 | @domain: favicongynvael.coldwind.pl
On yesterday's livestream (you can find the video here - Gynvael's Hacking Livestream #8) one viewer asked a really good question - while analyzing a rather large application, how to find the functions responsible for a certain functionality we are interested in? For an answer I've chosen to demonstrate the simple trace comparison trick I've seen years ago in paimei (a deprecated reverse-engineering framework), however my execution was done in GDB (though any other tracing engine might have been used; also, if you understand some Polish, I've pointed at this presentation by Robert Swięcki on last year's Security PWNing Conference in Warsaw). As expected the trick didn't yield the correct result when compared with another method I've shown (backtracking from a related string) and I kept wondering why.

The general idea behind the trick goes like this: you set up tracing (and recording), start the application, and then do everything except the thing you are actually interested in. Then, you run the application again (with tracing and recording), but now you try to do only the thing you are after, touching other functions of the application as little as possible. And in the end you compare the traces - whatever is on the second list, but is not on the first one, is probably what we've been looking for.

In case of GDB and temporary breakpoints (which are one-shot, i.e. they become disabled after the first hit) it's even easier, as you can do this in a single run, first exploring all/some/most of the non-interesting functions, and then hitting the exact function you need, which in turn will display temporary breakpoint hits for whatever remaining breakpoints were still set.

So here's what I did (with pictures!):
[[[SPLIT]]]


1. I've generated an LST file of the target application (well, the target was actually GDB as well) from IDA (which is basically a direct dump of what you see in the main screen of IDA).


2. I've grepped for all the functions IDA found.


3. And converted the addresses to a GDB script that set temporary breakpoints (tbreak).


4. Then I run GDB in GDB, executed some commands and waited for all possible temporary breakpoints to hit (and become disabled). After that, I've executed the command which's implementation I was looking for (info w32 selector) and only one tbreak executed - 0x005978bc.


5. Using the known related string backtracking method I found the implementation at a totally different address - 0x0041751c. Oops.

So what did I do wrong? The answer is actually on the last screenshot. Let's zoom it in:



As one can observe, the addresses are dark red (or is it brown?), which means that IDA didn't recognize this part of code as a function. That in turns means that a breakpoint for this function is actually not on the list of temporary breakpoints, so there was never any chance for it to show up.

How to correct the problem? There are two ways:
1. If using this exact method, make sure that all the functions are actually marked as such in IDA before generating the tbreak list.
2. Or just use a different tracing method - e.g. branch tracing or another method offered by the CPU.

I've re-tested this after the stream adding the missing function breakpoint, and guess what, it worked:

(gdb) info w32 selector

Temporary breakpoint 1, 0x00417414 in ?? ()

Temporary breakpoint 4544, 0x005978bc in ?? ()
Impossible to display selectors now.
(gdb)

And that solves yesterday's the mystery :)

Fixing Atari 800XL - part 1

By Gynvael Coldwind | Thu, 19 Jan 2017 00:10:37 +0100 | @domain: favicongynvael.coldwind.pl
So my old Atari 800XL broke and I decided to fix it. Well, that's not the whole story though, so I'll start from the beginning: a long time ago I was a proud owner of an Atari 800XL. Time passed and I eventually moved to the PC, and the Atari was lent to distant relatives and stayed with them for several years. About 15 years later I got to play with my wife's old CPC 464 (see these posts: 1 2 3 4 - the second one is probably the most crude way of dumping ROM/RAM you've ever seen) and thought that it would be pretty cool to check out the old 800XL as well. My parents picked it up from the relatives (I currently live in another country) and soon it reunited with me once again! Unfortunately (or actually, fortunately) technological development moved quite rapidly through the last 20 years so I found myself not having a TV I could connect the Atari too. And so I ordered an Atari Monitor → Composite Video cable somewhere and hid the Atari at the bottom of the wardrobe to only get back to it last week.

After connecting 800XL via Composite Video to my TV tuner card (WinFast PxTV1200 (XC2028)) it turned out that the Atari was alive (I actually thought it won't start at all due to old age), it boots correctly, but the video is "flickery":


So I decided to fix it. Now, the problem is I have absolutely no idea about electronic circuitry - my biggest achievement ever in this field was creating a joystick splitter for CPC 464 (though I am actually proud of myself to have predicted the ghosting problem and fixing it before soldering anything). Which means that this whole "I will fix the Atari" statement actually means "I will learn something about electronic circuits and probably break the Atari and it will never ever work again and I will cry" (though I hope to avoid the latter).

This blog post is the first of an unknown number of posts containing my notes of the process of attempting to fix my old computer. To be more precise, in this post I'm actually describing some things I've already did to try to pinpoint the problem (this includes dumping frames directly from GTIA - this was actually fun to do). Spoiler: I still have no idea what's wrong, but at least I know what actually works correctly.
[[[SPLIT]]]

Part 1 - Debugging, some soldering, more debugging


My guess is that fixing electronic equipment is like fixing a nasty bug in unknown code. You need to familiarize yourself with the code in general and then start the process of elimination to finally pinpoint the cause of the problem. After you have done that, you need to understand in depth the problem and the cause. And then you fix it.

So from a high level perspective I expect the problem to be in one of four places:

1. The Atari motherboard itself.
2. The Atari Monitor socket → Composite video cable I'm using.
3. My TV tuner card.
4. The application I'm using for the TV tuner.

Starting from the bottom I first tried VLC to get the image from the tuner - this worked but I still got the same artifacts and I wasn't able to convince VLC to actually use the parameters I'm passing to it (I suspected the tuner is selecting the wrong PAL/NTSC mode). So I switched to CodeTV which actually did allow me to set various PAL and NTSC modes, but it turned out to not to fix my issue - actually after seeing the effect of decoding the signal as e.g. NTSC I decided this has absolutely nothing to do with my problem. So that's one point off the list.

4. The application I'm using for the TV tuner.

Next was the TV tuner card. My card - WinFast PxTV1200 (XC2028) - is a few years old and I never had much luck with it, so I ordered a new one - WinTV-HVR-1975. It should arrive next week, so whether it fixes anything will be revealed in the next part. I'll also play with some other options once I'm add it.

The second point on the list is the cable, and I'm yet to order a replacement, so I'll update this debugging branch in the next part as well.

Which brings us to the fun part - the motherboard itself.


I started by pestering my Dragon Sector teammate and our top electronics specialist - q3k - about what oscilloscope should I buy (I have some limited oscilloscope training and I figured it will be useful) and he recommended RIGOL DS1054 + a saleae logic analyzer (I got the Logic Pro 16 one) + some other useful pieces of equipment for later.

The second thing I did was to desolder all of the old electrolytic capacitors (making quite details notes about where they were and what types they were) as it seems to be a pretty standard thing to do when it comes to old computers. Once that was done (and I was able to read all of the labels, as some were initially hidden, especially in case of the axial ones) I ordered new ones and patiently waited a few days for them to arrive (along with some IC sockets which I plan to use later).


Once they did, I cleaned the motherboard around where they were at and soldered them in (I'm obviously not going to show you a hi-res photo of this as my soldering skills are that of a newbie) and connected the Atari to test it. It did boot, but the video problem was still there.

Chatting on IRC a colleague (hi skrzyp!) mentioned that it might be a GTIA problem (Graphic Television Interface Adaptor - the last IC in the line that runs Atari's video) so I decided to dump the video frame right at the moment it leaves GTIA. Researching the topic on the internet I learnt that Luminance Output 3 (i.e. pin 24) actually should be the Composite Video output (or what later becomes Composite Video); I also found out that pin 25 is Composite Sync Output, which meant I had all I needed to start.

Given that I didn't know much about Composite output I started by connecting my oscilloscope to the Composite Sync Output pin in hopes it will tell me something.


And indeed it did. It seemed that the sync signal is high for about 60μs and then low for about 5μs - this gives a cycle of 65μs, and 1/65μs seems to be about 15625 Hz. Now, this is meaningful since my Atari was PAL, and PAL is actually 25 FPS of 625 scanlines. Surprisingly 25 * 625 is also 15625, so there seems to be a correlation here - i.e. I figured that the signal goes low at the end of the scanline and goes back up at the beginning of a new one. Furthermore, after inspecting some "random flickering" on my oscilloscope I found that there is a series of 3 much shorter high states from time to time - I assumed this was the end-of-frame marker (or start-of-frame, didn't really matter to me).

After this I connected the second channel's probe to the Luminance Output 3, set the trigger to channel 2 and got this:


It took me a few moments to figure out that what I'm actually seeing is the top pixel row of the classic "READY" prompt, but yup, that was it. So it seemed that getting a frame dump shouldn't be a problem.

The next step was to connect the logic analyzer and acquire some samples. The one I have is actually 6.25MHz in case of analog data - this sadly meant that my horizontal resolution won't be good at all, as it leaves below 400 samples per scanline (6.25M / 25 FPS / 625 lines == ~400). Still, I figured it will be enough to see if there are any artifacts in the frame at this point.


A single frame in Saleae Logic looked like this (this was taken when Atari's Self Test was on):


I've exported the data to a nice 400MB CSV file and written the following Python script to convert it into a series of grayscale raw bitmaps (each CSV file contained about 50 frames):

import sys

def resize(arr, sz):
fsz = float(sz)
arr_sz = len(arr)
arr_fsz = float(arr_sz)
k = arr_fsz / fsz
out_arr = []
for i in range(sz):
idx = int(i * k)
if idx >= arr_sz:
idx = arr_sz - 1
out_arr.append(arr[idx])
return out_arr

def scanline_to_grayscale(scanline):
MAX_V = 7.0
scanline = resize(scanline, 400)
scanline = map(lambda x: chr(int((x / MAX_V) * 255)), scanline)
return ''.join(scanline)

if len(sys.argv) != 2:
print "usage: process.py <fname.csv>"
sys.exit(1)

f = open(sys.argv[1], "r")

# Ignore headers
f.readline()

CLOCK_BOUNDARY_V = 4.7
FRAME_BOUNDARY_COUNT = 100 # Actually it's ~21 vs ~365. 100 is an OK value.

scanline = None
clock_high = None

ST_INIT = 0
ST_LOW = 1
ST_HIGH = 2

state = ST_INIT
frame_sync = 0
frame = -1

fout = None

for ln in f:
ln = ln.split(', ')
t, clock, composite = map(lambda x: float(x), ln[:3])

clock_high = (clock > CLOCK_BOUNDARY_V)

if state == ST_INIT:
if clock_high:
continue
state = ST_LOW

if state == ST_LOW and clock_high:
state = ST_HIGH
scanline = []

if state == ST_HIGH and clock_high:
scanline.append(composite)
continue

if state == ST_HIGH and not clock_high:
state = ST_LOW

if len(scanline) < FRAME_BOUNDARY_COUNT:
frame_sync += 1

if frame_sync == 3:
frame_sync = 0
frame += 1
print "Dumping frame %i..." % frame
fout = open("frame_%.i.raw" % frame, "wb")
else:
if fout is not None:
fout.close()
fout = None
elif fout is not None:
fout.write(scanline_to_grayscale(scanline))

The script generated about 50 frames for a dump of "READY" screen and a similar amount for "Self Test" one. All the frames looked more or less like this:


So, apart from the low resolution and lack of color (both expected) they actually did look correct.

Which means that the GTIA I have is probably OK. One thing to be noted is that my script actually had access to the composite sync clock and the TV tuner doesn't - so if the timing slightly off my script would not catch it.

Nevertheless it was a pretty fun exercise. The next thing on my list is to read more about Composite Video and verify that all the timings at GTIA output, and later on the Composite Video cable, are actually OK. Once I do that, I have to learn analyze the circuitry that lies between the GTIA and the monitor output socket, check if all the paths look good, etc. Should be fun :)

And that's all in today's noob's notes on ECs and old computers. Stay tuned!

P.S. Please note that there are probably several errors in this post - as mentioned, I don't know too much about this field, so errors/misconceptions/misunderstandings are bound to occur. There might be some updates to this post later on.

Hacking Livestream #5 - solving picoCTF 2013 (part 1)

By Gynvael Coldwind | Wed, 23 Nov 2016 00:10:32 +0100 | @domain: favicongynvael.coldwind.pl
Tomorrow (sorry for the late notice) at 7pm CET (GMT+1) I'll do another livestream on CTFs - this time I'll try to show how to solve several picoCTF 2013 challenges in the time frame of the stream (2 hours). PicoCTF 2013 was an entry-level CTF created by the well known team Plaid Parliament of Pwning - so expect the challenges to range from 10 points (or 30 seconds) to 100 points (several minutes). The first stream will actually be a really good opportunity for folks wondering what are CTFs about and how to start with them to have some of their questions answered (at least I think so). Anyway, the details:As always, the stream will be recorded and will be available immediately after the stream on my channel.

See you tomorrow!

Slides about my Windows Metafile research (Ruxcon, PacSec) and fuzzing (Black Hat EU) now public

By j00ru | Tue, 15 Nov 2016 14:12:22 +0000 | @domain: faviconj00ru.vexillium.org
During the past few weeks, I travelled around the world to give talks at several great security conferences, such as Ruxcon (Melbourne, Australia), PacSec (Tokyo, Japan), Black Hat Europe (London, UK) and finally Security PWNing Conference (Warsaw, Poland). At a majority of the events, I presented the results of my Windows Metafile security research, which […]

Django. Restricting user login

By sil2100 | Wed, 05 Oct 2016 19:52:00 GMT | @domain: faviconsil2100.vexillium.org
For a Django-based sub-project I'm working on, I had the need to restrict user login to only one session active and logged-in at once. As currently I am almost a complete newbie in this framework, I tried finding a ready solution around the web and failed, as nothing really fit my needs. After gathering some bits and pieces of information from around the internet I wrote a quick and very simple piece of auth code to do the login restriction I wanted.

Windows system call tables updated, refreshed and reworked

By j00ru | Mon, 15 Aug 2016 13:07:11 +0000 | @domain: faviconj00ru.vexillium.org
Those of you interested in the Windows kernel-mode internals are probably familiar with the syscall tables I maintain on my blog: the 32-bit and 64-bit listings of Windows system calls with their respective IDs in all major versions of the OS, available here (and are also linked to in the left menu): Windows Core (NT) […]

Launchpad API. Confusing binary builds

By sil2100 | Thu, 11 Aug 2016 20:29:00 GMT | @domain: faviconsil2100.vexillium.org
Another post on Launchpad API - will try to make it my last one, no worries. It's just that recently I've been dealing with it so much that I feel like sharing some of its caveats and my experiences with it. Today's post will be a short story about a certain edge-case one would need to watch out, titled: "accessing source packages through binary builds can be confusing".

Hacking Livestream #4 - DEF CON CTF (Friday)

By Gynvael Coldwind | Tue, 09 Aug 2016 00:10:22 +0200 | @domain: favicongynvael.coldwind.pl
I'm back from Black Hat / DEF CON, so it's time to do another live hacking session! The next one will be Friday, 12th of August, same time as usual (7pm UTC+2) at gynvael.coldwind.pl/live-en (aka YouTube). I'll talk about this year's DEF CON CTF (while it's still fresh in my memory), i.e. the formula, the tasks, the metagame, etc. I'll also show a couple of bugs and exploit one or two of them (i.e. whatever I can fit into 2h of the stream).

Where: gynvael.coldwind.pl/live-en
When: August 12, 19:00 UTC+2
What: DEF CON CTF 2016

See you there!

P.S. Feel free to let whoever might be interested know, the more the merrier :)

Articles

Comic