Computer virus

A computer virus is a malware program that, when executed, replicates by inserting copies of itself (possibly modified) into other computer programs, data files, or the boot sector of the hard drive; when this replication succeeds, the affected areas are then said to be "infected". Viruses often perform some type of harmful activity on infected hosts, such as stealing hard disk space or CPU time, accessing private information, corrupting data, displaying political or humorous messages on the user's screen, spamming their contacts, logging their keystrokes, or even rendering the computer useless. However, not all viruses carry a destructive payload or attempt to hide themselves—the defining characteristic of viruses is that they are self-replicating computer programs which install themselves without user consent.

Virus writers use social engineering and exploit detailed knowledge of security vulnerabilities to gain access to their hosts' computing resources. The vast majority of viruses target systems running Microsoft Windows, employing a variety of mechanisms to infect new hosts, and often using complex anti-detection/stealth strategies to evade antivirus software. Motives for creating viruses can include seeking profit, desire to send a political message, personal amusement, to demonstrate that a vulnerability exists in software, for sabotage and denial of service, or simply because they wish to explore artificial life and evolutionary algorithms.

As stated above, the term "computer virus" is sometimes used as a catch-all phrase to include all types of malware, even those that do not have the reproductive ability. Malware includes computer viruses, computer worms, Trojan horses, most rootkits, spyware, dishonest adware and other malicious and unwanted software, including true viruses. Viruses are sometimes confused with worms and Trojan horses, which are technically different. A worm can exploit security vulnerabilities to spread itself automatically to other computers through networks, while a Trojan horse is a program that appears harmless but hides malicious functions. Worms and Trojan horses, like viruses, may harm a computer system's data or performance. Some viruses and other malware have symptoms noticeable to the computer user, but many are surreptitious or simply do nothing to call attention to themselves. Some viruses do nothing beyond reproducing themselves.

Academic work
The first academic work on the theory of computer viruses (although the term "computer virus" was not invented at that time) was done by John von Neumann in 1949 who held lectures at the University of Illinois about the "Theory and Organization of Complicated Automata". The work of von Neumann was later published as the "Theory of self-reproducing automata". In his essay von Neumann postulated that a computer program could reproduce.

In 1972 Veith Risak published his article "Selbstreproduzierende Automaten mit minimaler Informationsübertragung" (Self-reproducing automata with minimal information exchange). The article describes a fully functional virus written in assembler language for a SIEMENS 4004/35 computer system.

In 1980 Jürgen Kraus wrote his diplom thesis "Selbstreproduktion bei Programmen" (Self-reproduction of programs) at the University of Dortmund. In his work Kraus postulated that computer programs can behave in a way similar to biological viruses.

In 1984 Fred Cohen from the University of Southern California wrote his paper "Computer Viruses - Theory and Experiments". It was the first paper to explicitly call a self-reproducing program a "virus"; a term introduced by his mentor Leonard Adleman.

An article that describes "useful virus functionalities" was published by J.B. Gunn under the title "Use of virus functions to provide a virtual APL interpreter under user control" in 1984.

Science Fiction
The Terminal Man, a science fiction novel by Michael Crichton (1972), told (as a sideline story) of a computer with telephone modem capability, which had been programmed to randomly dial phone numbers until it hit a modem that is answered by another computer. It then attempted to program the answering computer with its own program, so that the second computer would also begin dialing random numbers, in search of yet another computer to program. The program is assumed to spread exponentially through susceptible computers.

The actual term 'virus' was first used in David Gerrold's 1972 novel, When HARLIE Was One. In that novel, a sentient computer named HARLIE writes viral software to retrieve damaging personal information from other computers to blackmail the man who wants to turn him off.

Virus programs
The Creeper virus was first detected on ARPANET, the forerunner of the Internet, in the early 1970s. Creeper was an experimental self-replicating program written by Bob Thomas at BBN Technologies in 1971. Creeper used the ARPANET to infect DEC PDP-10 computers running the TENEX operating system. Creeper gained access via the ARPANET and copied itself to the remote system where the message, "I'm the creeper, catch me if you can!" was displayed. The Reaper program was created to delete Creeper.

A program called "Elk Cloner" was the first computer virus to appear "in the wild" — that is, outside the single computer or lab where it was created. Written in 1981 by Richard Skrenta, it attached itself to the Apple DOS 3.3 operating system and spread via floppy disk. This virus, created as a practical joke when Skrenta was still in high school, was injected in a game on a floppy disk. On its 50th use the Elk Cloner virus would be activated, infecting the computer and displaying a short poem beginning "Elk Cloner: The program with a personality."

The first PC virus in the wild was a boot sector virus dubbed (c)Brain, created in 1986 by the Farooq Alvi Brothers in Lahore, Pakistan, reportedly to deter piracy of the software they had written. However, analysts have claimed that the Ashar virus, a variant of Brain, possibly predated it based on code within the virus.

Before computer networks became widespread, most viruses spread on removable media, particularly floppy disks. In the early days of the personal computer, many users regularly exchanged information and programs on floppies. Some viruses spread by infecting programs stored on these disks, while others installed themselves into the disk boot sector, ensuring that they would be run when the user booted the computer from the disk, usually inadvertently. PCs of the era would attempt to boot first from a floppy if one had been left in the drive. Until floppy disks fell out of use, this was the most successful infection strategy and boot sector viruses were the most common in the wild for many years.

Traditional computer viruses emerged in the 1980s, driven by the spread of personal computers and the resultant increase in BBS, modem use, and software sharing. Bulletin board-driven software sharing contributed directly to the spread of Trojan horse programs, and viruses were written to infect popularly traded software. Shareware and bootleg software were equally common vectors for viruses on BBS's.

Macro viruses have become common since the mid-1990s. Most of these viruses are written in the scripting languages for Microsoft programs such as Word and Excel and spread throughout Microsoft Office by infecting documents and spreadsheets. Since Word and Excel were also available for Mac OS, most could also spread to Macintosh computers. Although most of these viruses did not have the ability to send infected e-mail, those viruses which did take advantage of the Microsoft Outlook COM interface.

Some old versions of Microsoft Word allow macros to replicate themselves with additional blank lines. If two macro viruses simultaneously infect a document, the combination of the two, if also self-replicating, can appear as a "mating" of the two and would likely be detected as a virus unique from the "parents".

A virus may also send a web address link as an instant message to all the contacts on an infected machine. If the recipient, thinking the link is from a friend (a trusted source) follows the link to the website, the virus hosted at the site may be able to infect this new computer and continue propagating.

Viruses that spread using cross-site scripting were first reported in 2002, and were academically demonstrated in 2005. There have been multiple instances of the cross-site scripting viruses in the wild, exploiting websites such as MySpace and Yahoo.

Infection strategies
In order to replicate itself, a virus must be permitted to execute code and write to memory. For this reason, many viruses attach themselves to executable files that may be part of legitimate programs. If a user attempts to launch an infected program, the virus' code may be executed simultaneously. Viruses can be divided into two types based on their behavior when they are executed. Nonresident viruses immediately search for other hosts that can be infected, infect those targets, and finally transfer control to the application program they infected. Resident viruses do not search for hosts when they are started. Instead, a resident virus loads itself into memory on execution and transfers control to the host program. The virus stays active in the background and infects new hosts when those files are accessed by other programs or the operating system itself.

Nonresident viruses
Nonresident viruses can be thought of as consisting of a finder module and a replication module. The finder module is responsible for finding new files to infect. For each new executable file the finder module encounters, it calls the replication module to infect that file.

Resident viruses
Resident viruses contain a replication module that is similar to the one that is employed by nonresident viruses. This module, however, is not called by a finder module. The virus loads the replication module into memory when it is executed instead and ensures that this module is executed each time the operating system is called to perform a certain operation. The replication module can be called, for example, each time the operating system executes a file. In this case the virus infects every suitable program that is executed on the computer.

Resident viruses are sometimes subdivided into a category of fast infectors and a category of slow infectors. Fast infectors are designed to infect as many files as possible. A fast infector, for instance, can infect every potential host file that is accessed. This poses a special problem when using anti-virus software, since a virus scanner will access every potential host file on a computer when it performs a system-wide scan. If the virus scanner fails to notice that such a virus is present in memory the virus can "piggy-back" on the virus scanner and in this way infect all files that are scanned. Fast infectors rely on their fast infection rate to spread. The disadvantage of this method is that infecting many files may make detection more likely, because the virus may slow down a computer or perform many suspicious actions that can be noticed by anti-virus software. Slow infectors, on the other hand, are designed to infect hosts infrequently. Some slow infectors, for instance, only infect files when they are copied. Slow infectors are designed to avoid detection by limiting their actions: they are less likely to slow down a computer noticeably and will, at most, infrequently trigger anti-virus software that detects suspicious behavior by programs. The slow infector approach, however, does not seem very successful.

Vectors and hosts
Viruses have targeted various types of transmission media or hosts. This list is not exhaustive:
 * Binary executable files (such as COM files and EXE files in MS-DOS, Portable Executable files in Microsoft Windows, the Mach-O format in OSX, and ELF files in Linux)
 * Volume Boot Records of floppy disks and hard disk partitions
 * The master boot record (MBR) of a hard disk
 * General-purpose script files (such as batch files in MS-DOS and Microsoft Windows, VBScript files, and shell script files on Unix-like platforms).
 * Application-specific script files (such as Telix-scripts)
 * System specific autorun script files (such as Autorun.inf file needed by Windows to automatically run software stored on USB Memory Storage Devices).
 * Documents that can contain macros (such as Microsoft Word documents, Microsoft Excel spreadsheets, AmiPro documents, and Microsoft Access database files)
 * Cross-site scripting vulnerabilities in web applications (see XSS Worm)
 * Arbitrary computer files. An exploitable buffer overflow, format string, race condition or other exploitable bug in a program which reads the file could be used to trigger the execution of code hidden within it. Most bugs of this type can be made more difficult to exploit in computer architectures with protection features such as an execute disable bit and/or address space layout randomization.

PDFs, like HTML, may link to malicious code. PDFs can also be infected with malicious code.

In operating systems that use file extensions to determine program associations (such as Microsoft Windows), the extensions may be hidden from the user by default. This makes it possible to create a file that is of a different type than it appears to the user. For example, an executable may be created named "picture.png.exe", in which the user sees only "picture.png" and therefore assumes that this file is an image and most likely is safe.

An additional method is to generate the virus code from parts of existing operating system files by using the CRC16/CRC32 data. The initial code can be quite small (tens of bytes) and unpack a fairly large virus. This is analogous to a biological "prion" in the way it works but is vulnerable to signature based detection. This attack has not yet been seen "in the wild".

Methods to avoid detection
In order to avoid detection by users, some viruses employ different kinds of deception. Some old viruses, especially on the MS-DOS platform, make sure that the "last modified" date of a host file stays the same when the file is infected by the virus. This approach does not fool anti-virus software, however, especially those which maintain and date Cyclic redundancy checks on file changes.

Some viruses can infect files without increasing their sizes or damaging the files. They accomplish this by overwriting unused areas of executable files. These are called cavity viruses. For example, the CIH virus, or Chernobyl Virus, infects Portable Executable files. Because those files have many empty gaps, the virus, which was 1 KB in length, did not add to the size of the file.

Some viruses try to avoid detection by killing the tasks associated with antivirus software before it can detect them.

As computers and operating systems grow larger and more complex, old hiding techniques need to be updated or replaced. Defending a computer against viruses may demand that a file system migrate towards detailed and explicit permission for every kind of file access.

Avoiding bait files and other undesirable hosts
A virus needs to infect hosts in order to spread further. In some cases, it might be a bad idea to infect a host program. For example, many anti-virus programs perform an integrity check of their own code. Infecting such programs will therefore increase the likelihood that the virus is detected. For this reason, some viruses are programmed not to infect programs that are known to be part of anti-virus software. Another type of host that viruses sometimes avoid are bait files. Bait files (or goat files) are files that are specially created by anti-virus software, or by anti-virus professionals themselves, to be infected by a virus. These files can be created for various reasons, all of which are related to the detection of the virus:
 * Anti-virus professionals can use bait files to take a sample of a virus (i.e. a copy of a program file that is infected by the virus). It is more practical to store and exchange a small, infected bait file, than to exchange a large application program that has been infected by the virus.
 * Anti-virus professionals can use bait files to study the behavior of a virus and evaluate detection methods. This is especially useful when the virus is polymorphic. In this case, the virus can be made to infect a large number of bait files. The infected files can be used to test whether a virus scanner detects all versions of the virus.
 * Some anti-virus software employs bait files that are accessed regularly. When these files are modified, the anti-virus software warns the user that a virus is probably active on the system.

Since bait files are used to detect the virus, or to make detection possible, a virus can benefit from not infecting them. Viruses typically do this by avoiding suspicious programs, such as small program files or programs that contain certain patterns of 'garbage instructions'.

A related strategy to make baiting difficult is sparse infection. Sometimes, sparse infectors do not infect a host file that would be a suitable candidate for infection in other circumstances. For example, a virus can decide on a random basis whether to infect a file or not, or a virus can only infect host files on particular days of the week.

Stealth
Some viruses try to trick antivirus software by intercepting its requests to the operating system. A virus can hide itself by intercepting the antivirus software’s request to read the file and passing the request to the virus, instead of the OS. The virus can then return an uninfected version of the file to the antivirus software, so that it seems that the file is "clean". Modern antivirus software employs various techniques to counter stealth mechanisms of viruses. The only completely reliable method to avoid stealth is to boot from a medium that is known to be clean.

Self-modification
Most modern antivirus programs try to find virus-patterns inside ordinary programs by scanning them for so-called virus signatures. A signature is a characteristic byte-pattern that is part of a certain virus or family of viruses. If a virus scanner finds such a pattern in a file, it notifies the user that the file is infected. The user can then delete, or (in some cases) "clean" or "heal" the infected file. Some viruses employ techniques that make detection by means of signatures difficult but probably not impossible. These viruses modify their code on each infection. That is, each infected file contains a different variant of the virus.

Encryption with a variable key
A more advanced method is the use of simple encryption to encipher the virus. In this case, the virus consists of a small decrypting module and an encrypted copy of the virus code. If the virus is encrypted with a different key for each infected file, the only part of the virus that remains constant is the decrypting module, which would (for example) be appended to the end. In this case, a virus scanner cannot directly detect the virus using signatures, but it can still detect the decrypting module, which still makes indirect detection of the virus possible. Since these would be symmetric keys, stored on the infected host, it is in fact entirely possible to decrypt the final virus, but this is probably not required, since self-modifying code is such a rarity that it may be reason for virus scanners to at least flag the file as suspicious.

An old, but compact, encryption involves XORing each byte in a virus with a constant, so that the exclusive-or operation had only to be repeated for decryption. It is suspicious for a code to modify itself, so the code to do the encryption/decryption may be part of the signature in many virus definitions.

Polymorphic code
Polymorphic code was the first technique that posed a serious threat to virus scanners. Just like regular encrypted viruses, a polymorphic virus infects files with an encrypted copy of itself, which is decoded by a decryption module. In the case of polymorphic viruses, however, this decryption module is also modified on each infection. A well-written polymorphic virus therefore has no parts which remain identical between infections, making it very difficult to detect directly using signatures. Antivirus software can detect it by decrypting the viruses using an emulator, or by statistical pattern analysis of the encrypted virus body. To enable polymorphic code, the virus has to have a polymorphic engine (also called mutating engine or mutation engine) somewhere in its encrypted body. See Polymorphic code for technical detail on how such engines operate.

Some viruses employ polymorphic code in a way that constrains the mutation rate of the virus significantly. For example, a virus can be programmed to mutate only slightly over time, or it can be programmed to refrain from mutating when it infects a file on a computer that already contains copies of the virus. The advantage of using such slow polymorphic code is that it makes it more difficult for antivirus professionals to obtain representative samples of the virus, because bait files that are infected in one run will typically contain identical or similar samples of the virus. This will make it more likely that the detection by the virus scanner will be unreliable, and that some instances of the virus may be able to avoid detection.

Metamorphic code
To avoid being detected by emulation, some viruses rewrite themselves completely each time they are to infect new executables. Viruses that utilize this technique are said to be metamorphic. To enable metamorphism, a metamorphic engine is needed. A metamorphic virus is usually very large and complex. For example, W32/Simile consisted of over 14000 lines of Assembly language code, 90% of which is part of the metamorphic engine.

The vulnerability of operating systems to viruses
Just as genetic diversity in a population decreases the chance of a single disease wiping out a population, the diversity of software systems on a network similarly limits the destructive potential of viruses. This became a particular concern in the 1990s, when Microsoft gained market dominance in desktop operating systems and office suites. The users of Microsoft software (especially networking software such as Microsoft Outlook and Internet Explorer) are especially vulnerable to the spread of viruses. Microsoft software is targeted by virus writers due to their desktop dominance, and is often criticized for including many errors and holes for virus writers to exploit. Integrated and non-integrated Microsoft applications (such as Microsoft Office) and applications with scripting languages with access to the file system (for example Visual Basic Script (VBS), and applications with networking features) are also particularly vulnerable.

Although Windows is by far the most popular target operating system for virus writers, viruses also exist on other platforms. Any operating system that allows third-party programs to run can theoretically run viruses. Some operating systems are less secure than others. Unix-based operating systems (and NTFS-aware applications on Windows NT based platforms) only allow their users to run executables within their own protected memory space.

An Internet based experiment revealed that there were cases when people willingly pressed a particular button to download a virus. Security analyst Didier Stevens ran a half year advertising campaign on Google AdWords which said "Is your PC virus-free? Get it infected here!". The result was 409 clicks.

, there are relatively few security exploits targeting Mac OS X (with a Unix-based file system and kernel). The number of viruses for the older Apple operating systems, known as Mac OS Classic, varies greatly from source to source, with Apple stating that there are only four known viruses, and independent sources stating there are as many as 63 viruses. Many Mac OS Classic viruses targeted the HyperCard authoring environment. The difference in virus vulnerability between Macs and Windows is a chief selling point, one that Apple uses in their Get a Mac advertising. In January 2009, Symantec announced the discovery of a trojan that targets Macs. This discovery did not gain much coverage until April 2009.

While Linux, and Unix in general, has always natively blocked normal users from having access to make changes to the operating system environment, Windows users are generally not. This difference has continued partly due to the widespread use of administrator accounts in contemporary versions like XP. In 1997, when a virus for Linux was released – known as "Bliss" – leading antivirus vendors issued warnings that Unix-like systems could fall prey to viruses just like Windows. The Bliss virus may be considered characteristic of viruses – as opposed to worms – on Unix systems. Bliss requires that the user run it explicitly, and it can only infect programs that the user has the access to modify. Unlike Windows users, most Unix users do not log in as an administrator user except to install or configure software; as a result, even if a user ran the virus, it could not harm their operating system. The Bliss virus never became widespread, and remains chiefly a research curiosity. Its creator later posted the source code to Usenet, allowing researchers to see how it worked.

The role of software development
Because software is often designed with security features to prevent unauthorized use of system resources, many viruses must exploit software bugs in a system or application to spread. Software development strategies that produce large numbers of bugs will generally also produce potential exploits.

Anti-virus software and other preventive measures
Many users install anti-virus software that can detect and eliminate known viruses after the computer downloads or runs the executable. There are two common methods that an anti-virus software application uses to detect viruses. The first, and by far the most common method of virus detection is using a list of virus signature definitions. This works by examining the content of the computer's memory (its RAM, and boot sectors) and the files stored on fixed or removable drives (hard drives, floppy drives), and comparing those files against a database of known virus "signatures". The disadvantage of this detection method is that users are only protected from viruses that pre-date their last virus definition update. The second method is to use a heuristic algorithm to find viruses based on common behaviors. This method has the ability to detect novel viruses that anti-virus security firms have yet to create a signature for.

Some anti-virus programs are able to scan opened files in addition to sent and received e-mails "on the fly" in a similar manner. This practice is known as "on-access scanning". Anti-virus software does not change the underlying capability of host software to transmit viruses. Users must update their software regularly to patch security holes. Anti-virus software also needs to be regularly updated in order to recognize the latest threats.

One may also minimize the damage done by viruses by making regular backups of data (and the operating systems) on different media, that are either kept unconnected to the system (most of the time), read-only or not accessible for other reasons, such as using different file systems. This way, if data is lost through a virus, one can start again using the backup (which should preferably be recent).

If a backup session on optical media like CD and DVD is closed, it becomes read-only and can no longer be affected by a virus (so long as a virus or infected file was not copied onto the CD/DVD). Likewise, an operating system on a bootable CD can be used to start the computer if the installed operating systems become unusable. Backups on removable media must be carefully inspected before restoration. The Gammima virus, for example, propagates via removable flash drives.

Recovery methods
Once a computer has been compromised by a virus, it is usually unsafe to continue using the same computer without completely reinstalling the operating system. However, there are a number of recovery options that exist after a computer has a virus. These actions depend on severity of the type of virus.

Virus removal
One possibility on Windows Me, Windows XP, Windows Vista and Windows 7 is a tool known as System Restore, which restores the registry and critical system files to a previous checkpoint. Often a virus will cause a system to hang, and a subsequent hard reboot will render a system restore point from the same day corrupt. Restore points from previous days should work provided the virus is not designed to corrupt the restore files or also exists in previous restore points. Some viruses, however, disable System Restore and other important tools such as Task Manager and Command Prompt. An example of a virus that does this is CiaDoor. However, many such viruses can be removed by rebooting the computer, entering Windows safe mode, and then using system tools.

Administrators have the option to disable such tools from limited users for various reasons (for example, to reduce potential damage from and the spread of viruses). A virus can modify the registry to do the same even if the Administrator is controlling the computer; it blocks all users including the administrator from accessing the tools. The message "Task Manager has been disabled by your administrator" may be displayed, even to the administrator.

Users running a Microsoft operating system can access Microsoft's website to run a free scan, provided they have their 20-digit registration number. Many websites run by anti-virus software companies provide free online virus scanning, with limited cleaning facilities (the purpose of the sites is to sell anti-virus products). Some websites allow a single suspicious file to be checked by many antivirus programs in one operation.

Operating system reinstallation
Reinstalling the operating system is another approach to virus removal. It involves either reformatting the computer's hard drive and installing the OS and all programs from original media, or restoring the entire partition with a clean backup image. User data can be restored by booting from a Live CD, or putting the hard drive into another computer and booting from its operating system with great care not to infect the second computer by executing any infected programs on the original drive; and once the system has been restored precautions must be taken to avoid reinfection from a restored executable file.

These methods are simple to do, may be faster than disinfecting a computer, and are guaranteed to remove any malware. If the operating system and programs must be reinstalled from scratch, the time and effort to reinstall, reconfigure, and restore user preferences must be taken into account. Restoring from an image is much faster, totally safe, and restores the exact configuration to the state it was in when the image was made, with no further trouble.