Just a general question of computer security – if I download a .pdf or .docx from an untrusted source, can it seriously harm my machine/give a malicious user access to my information? Or is it sandboxed somehow?
(Sorry if there’s a different Stack Exchange I’m supposed to be posting this on.)
Solution:
PDF files have been increasingly dangerous recently. Turn off Javascript and/or use different viewers (another answer mentioned this, not sure why he was downmodded).
There actually are different ways that files can come after you.
Raw executable files. .exe, .scr, .com, .bat, etc… plain executables that people run on their computer. You’d never do this right? What about screensavers you download (yes, a .scr is just an .exe with a special command line to hook to the screen). Old viruses took advantage of Explorer hiding extensions. What you thought was hotgirl.jpg was really hotgirl.jpg.exe, Explorer would hide .exe since it’s a ‘known’ extension and you’d just see hotgirl.jpg, click it, and owned.
Executable data – macros/javascript. MS Office Docs have a full macro language that has/had access to mail and the filesystem. Internet Explorer as well, plus access to ActiveX controls. Adobe Acrobat PDF has had a particular bad rep recently for Javascript holes. These are a bunch of all those Mozilla/IE errors you read about. Sometimes these are ‘sandboxed’, meaning they are given a place to play that in theory shouldn’t hurt the rest of your computer, but only Chrome has a pretty solid sandbox.
Parsing error. Bugs in programs can lead to code execution. In a program, you think of executing code and incoming data as separate, but in reality they’re mixed in a couple areas called the stack or the heap. If I’m a smart and evil programmer, I can pass you code in the data I give you and trick your program to run it. A lot of image exploits happen this way. PDF readers have had these as well. The more complex the document, the more likely it will have these errors (which is why PDF readers have a lot of exploits).
So, PDF files are both complex (so easy to have bugs in parsing code that can be used for malware) and have Javascript, which can be used for nefarious purposes as well. Same with MS Word files.
What’s safe? Not much. Almost everything has had holes. Plain text files probably the safest, since few things ‘parse’ them in any way. But even a small amount of structure needs a parser, and parsers make bugs. XML is text, and that gets exploited. Docx is a zip file of XML, and that can be unsafe.
In short, it sucks. almost everything out there is dangerous. Keep yourself patched (many exploits are from bugs) get antivirus, don’t use IE, use Chrome or Mozilla. Hmm, no real good answers.
As far as the sandbox goes, as someone else said, try google docs, its a parser on their machines that can get played with, not your machine. As far as I know, neither Word nor Reader have real sandboxing. Both allow you to turn off code (macros or Javascript) and you should do that, but that’s not full sandboxing.
(very late) EDIT: RE: ‘untrusted source’. All sources are to be untrusted. Email attachments haven’t been trustable since the first Outlook viruses that were smart enough to send attachments to people in your contact list. If I can hack somewebserver, somewebserver goes from trusted to an untrustable source. Beware of all.