Monday, November 2, 2009

Query Processing & Deductive Database

Case Study: Query Processing & Deductive Database

Query processing:



The retrieval of information from a database according to a set of retrieval criteria, the database itself remaining unchanged.

In the context of a specific query language, the technique of translating the retrieval criteria specified using the language into more primitive database-access software, including a selection among different methods to choose the most efficient in the particular circumstances.

We now give an overview of how a DDBMS processes and optimizes a query. We first discuss the communication costs of processing a distributed query; we then discuss a special operation, called a semijoin, that is used in optimizing some types of queries in a DDBMS.

1. Data Transfer Costs of Distributed Query Processing:

In a distributed system, several additional factors further complicate query processing. The first is the cost of transferring data over the network. This data includes intermediate files that are transferred to other sites for further processing, as well as the final result files that may have to be transferred to the site where the query result is needed. Although these costs may not be very high if the sites are connected via a high-performance local area network, they become quite significant in other types of networks. Hence, DDBMS query optimization algorithms consider the goal of reducing the amount of data transfer as an optimization criterion in choosing a distributed query execution strategy.

We illustrate this with two simple example queries. Consider the relations EMPLOYEE and DEPARTMENT.

There are three simple strategies for executing this distributed query:

1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and perform the join at site 3. In this case a total of 1,000,000 + 3500 = 1,003,500 bytes must be transferred.

2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result to site 3. The size of the query result is 40 * 10,000 = 400,000 bytes, so 400,000 + 1,000,000 = 1,400,000 bytes must be transferred.

3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result to site 3. In this case 400,000 + 3500 = 403,500 bytes must be transferred. If minimizing the amount of data transfer is our optimization criterion, we should choose strategy 3.

2. Distributed Query Processing Using Semijoin:

The idea behind distributed query processing using the semijoin operation is to reduce the number of tuples in a relation before transferring it to another site. Intuitively, the idea is to send the joining column of one relation R to the site where the other relation S is located; this column is then joined with S. Following that, the join attributes, along with the attributes required in the result, are projected out and shipped back to the original site and joined with R. Hence, only the joining column of R is transferred in one direction, and a subset of S with no extraneous tuples or attributes is transferred in the other direction. If only a small fraction of the tuples in S participate in the join, this can be quite an efficient solution to minimizing data transfer.

3. Query and Update Decomposition:

In a DDBMS with no distribution transparency, the user phrases a query directly in terms of specific fragments. For example, consider another query Q: "Retrieve the names and hours per week for each employee who works on some project controlled by department 5," which is specified on the distributed database where the relations at sites 2 and 3 are shown in the figure and those at site 1 are shown in the figure as in the earlier example. A user who submits such a query must specify whether it references the PROJS5 and WORKS_ON5 relations at site 2 or the PROJECT and WORKS_ON relations at site 1. The user must also maintain consistency of replicated data items when updating a DDBMS with no replication transparency.

On the other hand, a DDBMS that supports full distribution, fragmentation, and replication transparency allows the user to specify a query or update request on the schema of the figure just as though the DBMS were centralized. For updates, the DDBMS is responsible for maintaining consistency among replicated items by using one of the distributed concurrency control algorithms. For queries, a query decomposition module must break up or decompose a query into subqueries that can be executed at the individual sites. In addition, a strategy for combining the results of the subqueries to form the query result must be generated. Whenever the DDBMS determines that an item referenced in the query is replicated, it must choose or materialize a particular replica during query execution.

To determine which replicas include the data items referenced in a query, the DDBMS refers to the fragmentation, replication, and distribution information stored in the DDBMS catalog. For vertical fragmentation, the attribute list for each fragment is kept in the catalog. For horizontal fragmentation, a condition, sometimes called a guard, is kept for each fragment. This is basically a selection condition that specifies which tuples exist in the fragment; it is called a guard because only tuples that satisfy this condition are permitted to be stored in the fragment. For mixed fragments, both the attribute list and the guard condition are kept in the catalog.

Deductive Database:

A deductive database system is a database system which can make deductions (ie: conclude additional facts) based on rules and facts stored in the (deductive) database. Datalog is the language typically used to specify facts, rules and queries in deductive databases. Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are more expressive than relational databases but less expressive than logic programming systems. Deductive databases have not found widespread adoptions outside academia, but some of their concepts are used in today's relational databases to support the advanced features of more recent SQL standards.

Deductive databases and logic programming:

Deductive databases reuse a large number of concepts from logic programming; rules and facts specified in the deductive database language Datalog look very similar to those in Prolog. However, there are a number of important differences between deductive databases and logic programming:

· Order sensitivity and procedurality: In Prolog program execution depends on the order of rules in the program and on the order of parts of rules; these properties are used by programmers to build effective programs. In database languages (like SQL or Datalog), however, program execution is independent of the order or rules and facts.

· Special predicates: In Prolog programmers can directly influence the procedural evaluation of the program with special predicates such as the cut, this has no correspondence in deductive databases.

· Function symbols: Logic Programming languages allow function symbols to build up complex symbols. This is not allowed in deductive databases.

· Tuple oriented processing: Deductive databases use set oriented processing while logic programming languages concentrate on one tuple at a time.

Sunday, November 1, 2009

OODBMS (Object Oriented Database Management System) Basics

OODBMS System

An object database (also object-oriented database) is a database model in which information is represented in the form of objects as used in object-oriented programming.

Object databases are a niche field within the broader DBMS market dominated by relational database management systems (RDBMS). Object databases have been considered since the early 1980s and 1990s but they have made little impact on mainstream commercial data processing, though there is some usage in specialized areas.

When database capabilities are combined with object-oriented (OO) programming language capabilities, the result is an object database management system (ODBMS).

Today’s trend in programming languages is to utilize objects, thereby making OODBMS ideal for OO programmers because they can develop the product, store them as objects, and can replicate or modify existing objects to make new objects within the OODBMS. Information today includes not only data but video, audio, graphs, and photos which are considered complex data types. Relational DBMS aren’t natively capable of supporting these complex data types. By being integrated with the programming language, the programmer can maintain consistency within one environment because both the OODBMS and the programming language will use the same model of representation. Relational DBMS projects using complex data types would have to be divided into two separate tasks: the database model and the application.

As the usage of web-based technology increases with the implementation of Intranets and extranets, companies have a vested interest in OODBMS to display their complex data. Using a DBMS that has been specifically designed to store data as objects gives an advantage to those companies that are geared towards multimedia presentation or organizations that utilize computer-aided design (CAD).

Some object-oriented databases are designed to work well with object-oriented programming languages such as Python, Perl, Java, C#, Visual Basic .NET, C++, Objective-C and Smalltalk; others have their own programming languages. ODBMSs use exactly the same model as object-oriented programming languages.

Advantages:

  • The main benefit of creating a database with objects as data is speed.
  • OODBMS are faster than relational DBMS because data isn’t stored in relational rows and columns but as objects.
  • Objects have a many to many relationship and are accessed by the use of pointers.
  • Pointers are linked to objects to establish relationships.
  • Another benefit of OODBMS is that it can be programmed with small procedural differences without affecting the entire system.
  • This is most helpful for those organizations that have data relationships that aren’t entirely clear or need to change these relations to satisfy the new business requirements.
  • This ability to change relationships leads to another benefit which is that relational DBMS can’t handle complex data models while OODBMS can.

Disadvantages:

  • Slower and more difficult to formulate than relational.
  • Lack of interoperability with a great number of tools/features that are taken for granted in the SQL world, including but not limited to industry standard connectivity, reporting tools, OLAP tools, and backup and recovery standards.
  • Lack a formal mathematical foundation, unlike the relational model, and this in turn leads to weaknesses in their query support.

Applications:

Object databases based on persistent programming acquired a niche in application areas such as engineering and spatial databases, telecommunications, and scientific areas such as high energy physics and molecular biology. They have made little impact on mainstream commercial data processing, though there is some usage in specialized areas of financial services. It is also worth noting that object databases held the record for the World's largest database (being the first to hold over 1000 terabytes at Stanford Linear Accelerator Center) and the highest ingest rate ever recorded for a commercial database at over one Terabyte per hour.

Another group of object databases focuses on embedded use in devices, packaged software, and real-time systems.

Wednesday, September 23, 2009

Hackers the true Heros

Jonathan James:


ames gained notoriety when he became the first juvenile to be sent to prison for hacking. He was sentenced at 16 years old. In an anonymous PBS interview, he professes, "I was just looking around, playing around. What was fun for me was a challenge to see what I could pull off."

James's major intrusions targeted high-profile organizations. He installed a backdoor into a Defense Threat Reduction Agency server. The DTRA is an agency of the Department of Defense charged with reducing the threat to the U.S. and its allies from nuclear, biological, chemical, conventional and special weapons. The backdoor he created enabled him to view sensitive emails and capture employee usernames and passwords.

James also cracked into NASA computers, stealing software worth approximately $1.7 million. According to the Department of Justice, "The software supported the International Space Station's physical environment, including control of the temperature and humidity within the living space." NASA was forced to shut down its computer systems, ultimately racking up a $41,000 cost. James explained that he downloaded the code to supplement his studies on C programming, but contended, "The code itself was crappy . . . certainly not worth $1.7 million like they claimed."

Adrian Lamo:



Lamo's claim to fame is his break-ins at major organizations like The New York Times and Microsoft. Dubbed the "homeless hacker," he used Internet connections at Kinko's, coffee shops and libraries to do his intrusions. In a profile article, "He Hacks by Day, Squats by Night," Lamo reflects, "I have a laptop in Pittsburgh, a change of clothes in D.C. It kind of redefines the term multi-jurisdictional."

Lamo's intrusions consisted mainly of penetration testing, in which he found flaws in security, exploited them and then informed companies of their shortcomings. His hits include Yahoo!, Bank of America, Citigroup and Cingular. When white hat hackers are hired by companies to do penetration testing, it's legal. What Lamo did is not.

When he broke into The New York Times' intranet, things got serious. He added himself to a list of experts and viewed personal information on contributors, including Social Security numbers. Lamo also hacked into The Times' LexisNexis account to research high-profile subject matter.

For his intrusion at The New York Times, Lamo was ordered to pay approximately $65,000 in restitution. He was also sentenced to six months of home confinement and two years of probation, which expired January 16, 2007. Lamo is currently working as an award-winning journalist and public speaker.


Kevin Mitnick:


A self-proclaimed "hacker poster boy," Mitnick went through a highly publicized pursuit by authorities. His mischief was hyped by the media but his actual offenses may be less notable than his notoriety suggests. The Department of Justice describes him as "the most wanted computer criminal in United States history." His exploits were detailed in two movies: Freedom Downtime and Takedown.

Mitnick had a bit of hacking experience before committing the offenses that made him famous. He started out exploiting the Los Angeles bus punch card system to get free rides. Then, like Apple co-founder Steve Wozniak, dabbled in phone phreaking. Although there were numerous offenses, Mitnick was ultimately convicted for breaking into the Digital Equipment Corporation's computer network and stealing software.

Mitnick's mischief got serious when he went on a two and a half year "coast-to-coast hacking spree." The CNN article, "Legendary computer hacker released from prison," explains that "he hacked into computers, stole corporate secrets, scrambled phone networks and broke into the national defense warning system." He then hacked into computer expert and fellow hacker Tsutomu Shimomura's home computer, which led to his undoing.

Today, Mitnick has been able to move past his role as a black hat hacker and become a productive member of society. He served five years, about 8 months of it in solitary confinement, and is now a computer security consultant, author and speaker.



Kevin Poulsen:




Also known as Dark Dante, Poulsen gained recognition for his hack of LA radio's KIIS-FM phone lines, which earned him a brand new Porsche, among other items. Law enforcement dubbed him "the Hannibal Lecter of computer crime."

Authorities began to pursue Poulsen after he hacked into a federal investigation database. During this pursuit, he further drew the ire of the FBI by hacking into federal computers for wiretap information.

His hacking specialty, however, revolved around telephones. Poulsen's most famous hack, KIIS-FM, was accomplished by taking over all of the station's phone lines. In a related feat, Poulsen also "reactivated old Yellow Page escort telephone numbers for an acquaintance who then ran a virtual escort agency." Later, when his photo came up on the show Unsolved Mysteries, 1-800 phone lines for the program crashed. Ultimately, Poulsen was captured in a supermarket and served a sentence of five years.

Since serving time, Poulsen has worked as a journalist. He is now a senior editor for Wired News. His most prominent article details his work on identifying 744 sex offenders with MySpace profiles.

Robert Tappan:


Morris: Morris, son of former National Security Agency scientist Robert Morris, is known as the creator of the Morris Worm, the first computer worm to be unleashed on the Internet. As a result of this crime, he was the first person prosecuted under the 1986 Computer Fraud and Abuse Act.

Morris wrote the code for the worm while he was a student at Cornell. He asserts that he intended to use it to see how large the Internet was. The worm, however, replicated itself excessively, slowing computers down so that they were no longer usable. It is not possible to know exactly how many computers were affected, but experts estimate an impact of 6,000 machines. He was sentenced to three years' probation, 400 hours of community service and a fined $10,500.

Morris is currently working as a tenured professor at the MIT Computer Science and Artificial Intelligence Laboratory. He principally researches computer network architectures including distributed hash tables such as Chord and wireless mesh networks such as Roofnet.


Vladimir Levin:



Mass media claimed at the time he was a mathematician and had a degree in biochemistry from Saint Petersburg State Institute of Technology.According to the coverage, in 1994 Levin accessed the accounts of several large corporate customers of Citibank via their dial-up wire transfer service (Financial Institutions Citibank Cash Manager) and transferred funds to accounts set up by accomplices in Finland, the United States, the Netherlands, Germany and Israel.In 2005 an alleged member of the former St. Petersburg hacker group, claiming to be one of the original Citibank penetrators, published under the name ArkanoiD a memorandum on popular Provider.net.ru website dedicated to telecom market. According to him, Levin was not actually a scientist (mathematician, biologist or the like) but a kind of ordinary system administrator who managed to get hands on the ready data about how to penetrate in Citibank machines and then exploit them.ArkanoiD emphasized all the communications were carried over X.25 network and the Internet was not involved. ArkanoiD’s group in 1994 found out Citibank systems were unprotected and it spent several weeks examining the structure of the bank’s USA-based networks remotely. Members of the group played around with systems’ tools (e.g. were installing and running games) and were unnoticed by the bank’s staff. Penetrators did not plan to conduct a robbery for their personal safety and stopped their activities at some time. Someone of them later handed over the crucial access data to Levin (reportedly for the stated $100).

David Smith:


David Smith, the author of the Melissa virus, was facing nearly 40 years in jail when he decided to cooperate with the FBI. Facing jail time, public wrath and a fortune in potential fines, the 30-year-old sender of the fast-spreading Melissa computer virus did what hundreds of criminals have done before. He agreed to go undercover. Federal court documents unsealed at the request of the Associated Press show that for almost two years, Smith – then out on bail – worked mostly full time cruising the dark recesses of the Internet while the FBI paid his tab.
What did the FBI get? A windfall of information about malicious code senders, leading directly to two major international arrests and pre-empting other attacks, according to federal prosecutors.
What did Smith get? Just 20 months in federal prison, which was about two years less than the minimum sentencing requirement, and about 38 years less than he faced when initially charged.
Sometimes it takes a thief to catch a thief, said former federal prosecutor Elliot Turrini, who handled Smith’s case and agreed to the reduced sentence.
About 63,000 viruses have rolled through the Internet, causing an estimated $65 billion in damage, but Smith is the only person to go to federal prison in the United States for sending one.

Michael Calce:


The computer hacker known as “Mafiaboy,” who crippled several major Internet sites including CNN, arrives in court Thursday, Jan. 18, 2001 in Montreal, Canada. He pleaded guilty on Thursday to 55 charges of mischief. The trial of the 16-year-old Montrealer, who can not be identified under Canadian law, was set to begin Thursday on 66 charges relating to attacks last year on several major Web sites, as well as security breaches of other sites at institutions such as Yale and Harvard.


Mark Abene:



Mark Abene (born 1972), better known by his pseudonym Phiber Optik, is a computer security hacker from New York City. Phiber Optik was once a member of the Hacker Groups Legion of Doom and Masters of Deception. In 1994, he served a one-year prison sentence for conspiracy and unauthorized access to computer and telephone systems.
Phiber Optik was a high-profile hacker in the early 1990s, appearing in The New York Times, Harper’s, Esquire, in debates and on television. Phiber Optik is an important figure in the 1995 non-fiction book Masters of Deception — The Gang that Ruled Cyberspace.


Stephen Wozniak:


"Woz" is famous for being the "other Steve" of Apple. Wozniak, along with current Apple CEO Steve Jobs, co-founded Apple Computer. He has been awarded with the National Medal of Technology as well as honorary doctorates from Kettering University and Nova Southeastern University. Additionally, Woz was inducted into the National Inventors Hall of Fame in September 2000.
Woz got his start in hacking making blue boxes, devices that bypass telephone-switching mechanisms to make free long-distance calls. After reading an article about phone phreaking in Esquire, Wozniak called up his buddy Jobs. The pair did research on frequencies, then built and sold blue boxes to their classmates in college. Wozniak even used a blue box to call the Pope while pretending to be Henry Kissinger.
Wozniak dropped out of college and came up with the computer that eventually made him famous. Jobs had the bright idea to sell the computer as a fully assembled PC board. The Steves sold Wozniak's cherished scientific calculator and Jobs' VW van for capital and got to work assembling prototypes in Jobs' garage. Wozniak designed the hardware and most of the software. In the Letters section of http://woz.org/

Wednesday, July 8, 2009

BSNL Dataone in Fedora (specially Fedora 11)

In Fedora, you may use either the Network GUI or the rp-pppoe-gui package.

Network GUI Method:
- In Fedora, go to: Desktop --> Administration --> Network,
OR

just type the command "system-config-network" the shell prompt to get the same GUI (both of these methods will ask for root password). The window you will get will have title of "Network Configuration".

- From the toolbar of this new window, click on "New" button. You will get a window with title "Add new Device Type".

- In this new windoe, choose xDSL connection, click "Forward" and follow the prompt. It will ask you for the provider name (type anything. e.g. BSNL Dataone), your connection username (bridewin@dataone in our example in case if you are using Fedora 11 just type bridewin i.e your username there is no need for @dataone) and password.

- Once setup is complete, to test the connection, go back to the "Network Configuration" window (or open it again as specified in the first point above), and select the new connection and click on Activate button in the toolbar (If the active and Deactive button does not appear to be available then edit the BSNLDataone connection using edit option and deselect the controlled by network manager option and save it..!!) . Wait a few seconds and give the following commands in a terminal:

$> ping 4.2.2.2
$> ping google.com

The first one if successfuly, will prove that the internet connection is working, the second one, if successful, will prove that your DNS nameserves are also working.

Then make sure that the device will become automatically active when the computer boots up by selecting this connection in the network configuration window and by editing its profile and by checking the option that tells the device to activate on boot ups. Also, give normal users the option of controlling this device.