2000 Mar 26
In most states, sales tax is charged when purchasing items at stores and sometimes for catalog or Internet sales. Currently, there are over seven thousand different tax jurisdiction, each with its own tax rate and regulations. The growth of Internet-based shopping and ecommerce promises to complicate the convoluted codes.
Recently, the National Governors' Association proposed a simplified sales tax system [Nat99]. Each merchandising company would contract with an independent third-party organization to calculate and collect sales taxes, remitting the funds to the appropriate state governments.
We propose to construct an Internet-based sales tax system capable of calculating the appropriate sales tax on a group of taxable and tax-exempt items for any particular tax jurisdiction. When sent an address and a list of items, the server program will compute the correct amount of sales tax for the appropriate tax jurisdiction. To do so, the program must determine the jurisdiction, its set of rules for which products are taxable, match this list against the submitted items, and then compute the sales tax rate.
The design and implementation of the program will reveal which aspects of proposed sales tax ``schemes'' can be automated. We will also develop and justify recommendations how to simplify aspects of current sales tax systems that cannot easily be automated. If successful and our recommendations are implemented by state legislatures, we hope to reduce the cost of complying with sales tax regulations by an order of magnitude. We may also be able to spin-off our work to form a company specializing in sales tax computation and collection.
The current sales tax system is characterized by
Sales tax rates can apply to entire states, individual cities, individual counties, and even special tax districts. Cline and Neubig [CN99] estimate there are currently about 7500 different tax rates in the United States. All but four states impose a statewide sales tax. Most states permit local governments, e.g., cities, counties, and transit districts, to impose their own taxes. (See http://www.salestax.com/usmap.htm for a map showing which states charge sales tax and their different tax rates.) For example, consider California's sales tax structure. A statewide rate of 6.00% is supplemented by a 1.25% county and city tax rate. Other tax districts can further increase and complicate the tax rate. For example, all of Nevada County, California, participates in the 0.125% Nevada County Public Library District while only Truckee is in the Town of Truckee Road Maintenance District. Thus, the tax rate in the county is either 7.375% or 7.875%, depending on one's address. Washington State's similar tax system yields 331 different tax rates [Was99] with twelve different tax rates. Texas's five-level system of a base statewide tax, city taxes, county taxes, transit districts, and special district taxes (capped at a maximum 8.25%) is even more complicated because the districts need not obey county or city boundaries. For example,
The De Leon Hospital District is in Comanche County, which has a county sales and use tax, but the district does not include all of Comanche County. The city of De Leon is totally within the De Leon Hospital District. The unincorporated areas known as Comyn, Downing, and Rucker are located within the De Leon Hospital District. The unincorporated areas of Comanche County in ZIP code 76444 is totally within the De Leon Hospital District. The unincorporated areas of Comanche County in ZIP codes 76442, 76445, and 76454 are partially within the De Leon Hospital District. Contact the Comanche County Tax Office at 254/893-2193 for additional boundary information [Tex99b].
Unlike many other taxes, sales taxes collection costs are mainly borne by retailers, not government agencies. When selling to the public, companies calculate and collect sales taxes. The U.S. Supreme Court decisions of National Bellas Hess v. Department of Revenue of the State of Illinois (386 U.S. 753, 1967) and Quill v. North Dakota (504 U.S. 298, 1992) [U.S92] limit companies' responsibilities to collect a state's sales tax to those having a significant physical presence in that state. Thus, catalog and Internet-based companies need not collect sales taxes for most states. Companies having a physical presence in only one state face significant costs dealing with sales taxes. A Washington State Department of Revenue study showed the total cost of collecting and remitting sales tax for small retailers is 6.47% of collected taxes, for medium retailers 3.35%, and for large retailers 0.97% [Wel98]. Cline and Neubig estimate the cost of compliance for a large seller having a physical presence in fifteen states is 8.3% of collected taxes because of the complexity of the numerous tax regulations [CN99]. In addition to their obligations to collect sales taxes from their customers, many large business taxpayers purchase items without paying sales taxes but must file tax returns directly for these taxable purchases. Thus, states have legislated that businesses must bear the burden of the sales tax system. To compensate businesses for these costs, twenty-six states permit businesses to keep a portion of sales tax collections [Wel98, Chapter 8]. Cline and Neubig note these discounts frequently cover less than five percent of compliance costs.
The increasing prevalence of ecommerce has raised awareness of the complexity of the sales tax system. The taxation of electronic goods, electronic delivery of goods, and anonymous transactions further complicate sales tax computations. States are also afraid of losing sales tax revenue to a quickly growing Internet-based retail industry. The Internet Tax Freedom Law (Title XI of Public Law 105-277) forbid new taxes on Internet access, forbid taxes that would multiply tax electronic commerce, and established the Advisory Committee on Electronic Commerce to investigate the electronic commerce taxation issues. These electronic commerce tax investigations highlighted the current complexity of the sales tax system.
The Advisory Committee solicited proposals, including proposals regarding the taxation of transactions conducted through the Internet, asking to what extent the existing sales and use tax administration should be simplified. Numerous proposals have been submitted. Many of these recommended the creation of third-party software to calculate, compute, and collect sales taxes. We illustrate many of the issues by reviewing the relevant policy of the National Governors' Association.
The association recently adopted a policy calling for ``a streamlined sales tax with simplified compliance requirements [that] will ensure that states are prepared to support the global electronic marketplace of the next century'' [Nat99]. Among the provisions are
to encourage the establishment of a system of independent third-party organizations that would be responsible for remitting taxes to states. Remote sellers would use a software package preapproved by the states that would calculate the tax due on the purchase based on the state rate where the item is sent, and electronically remit that tax to the collection organization [Nat99, Section 12.2].
Summaries of the key points of various other proposals appear in Table 1. For each proposal, we list
We will develop a prototype Internet-based software package calculating, but not collecting and remitting, sales tax due. There already exist numerous companies specializing in collection of moneys for Internet- and non-Internet-based businesses, e.g., CyberCash and Charge.com. The existence of off-the-shelf income tax programs and services, e.g., Quicken TurboTax and Kiplinger TaxCut, and electronic filing, e.g., Internal Revenue Service's Electronic Federal Tax Payment System and California Franchise Tax Board's's electronic services, demonstrate that electronic filing is technologically possible. For individual companies, Taxware International and Vertex provide software to compute sales taxes, rates, and jurisdictions, but these require extensive customizing and maintenance. Use of our prototype will be as simple as sending an address and a list of items to a WWW address and reading the response.
We predict our prototype will demonstrate which aspects of the National Governors' Association's policy are technologically possible to implement on today's engineering workstations. We predict that
Our prototype's purpose is to compute the sales tax to charge for a given list of items at the correct tax rate in a given tax jurisdiction. Our prototype, by definition, will not have data on all possible products and all possible tax rates and jurisdictions. This is because
The prototype's purpose is to compute the sales tax to charge for any given list of items at the correct tax rate for any tax jurisdiction. Thus, the user, e.g., a retailer, must provide enough information to determine the tax rate and jurisdiction and also a list of possibly taxable items. The prototype server will return one of four possible answers:
Most input must be separated into clauses surrounded by matching left and right parentheses. Whitespace is not important except to separate terms. That is, input can be freely formatted using whatever newlines, tabs, and extra spaces the user desires.
An address can be specified as
Items are specified in an items clause, which consists of zero or more item clauses. Each of the latter consists of enough subclauses to uniquely specify a particular item, the number of units purchased, and the price. Among permissible keywords for subclauses of item clauses are
Input to the server consists of an input clause. In any order, it must have one address subclause and one items subclause and may have any number of comment subclauses.
The server's output consists of an output clause. It must have one tax subcaluse xor one error subclause and may have any number of comment subclauses in no particular order. Example output clauses include
The server's two major tasks are to determine the tax rate and jurisdiction and to determine the taxability of each item in the tax jurisdiction. We propose using ZIP+4 postal codes to determine tax rates and jurisdictions. Thus, the server consists of three major components:
As noted in Section 2.1, there are currently over 7500 different tax rates. The current sales tax system in some states, e.g., Texas, is too complicated to easily determine a tax rate. A simplification is necessary to easily automate determination of tax rates and jurisdictions.
There are several practical difficulties with using ZIP+4 postal codes. Postal code regions regularly change.4 An address does not necessarily correspond to the correct tax rate. A California tax board publication states
It is not always possible to determine the correct rate based solely on the mailing address. For example, a customer may live near a county line and have a zip code and city name that routes mail to a post office in a neighboring county which may have a different tax rate. If you relied solely on the post office mailing address to determine the tax rate, you would assume the customer lived in a county other than the one in which he or she actually resides. As a result, you could apply an incorrect tax rate [Cal99, p. 1].These ``edge effect'' difficulties will occur in any sales tax system with differing tax rates; consumers will seek to minimize taxes by shifting purchases to decrease tax rates. Thus, we do not seek to resolve this political difficulty.
We also assume only one address is needed.
Transit tax is collected differently from other local taxes. The key element is where you deliver your service or product. Transit sales tax is due on all taxable items delivered from a place of business in a transit authority to a location in the same authority. Transit sales tax is not due on goods or services delivered to a location outside the transit authority. But you may need to collect transit use tax if that delivery is made to a location inside another transit authority. [Tex99a]These tax rules could be incorporated into a production-quality server without significant additional space, time, or programming requirements.
Given an address, the server will determine the corresponding tax rate and jurisdiction using the address's ZIP+4 postal code. For the prototype, we will use the U.S. Postal Service's (USPS) ZIP+4 Code Look-up Engine server if the ZIP+4 postal code is not already specified. USPS use guidelines permit only occasional use of the server, referring heavy users to companies that provide similar services. The production-quality server could use one of these services.
Addresses could be stored in address objects. The operations addresses must support include:
Given a ZIP+4 postal code, the server determines the corresponding tax rate and jurisdiction. If this information was stored in a table with one row per postal code, this would just involve finding the code's row and returning the tax rate and jurisdiction. Unfortunately, the table's size requires too much memory to be practical.
The naïve implementation with one row per postal code could require up to one billion rows to store information about ten thousand tax rates, a redundancy factor of one hundred thousand. Using similarities between the geographic layout of postal codes and tax rates, we can reduce the table's size:
We will use three separate hash tables to convert from a ZIP+4 postal code to a tax rate and a jurisdiction. A hash table is a table permitting fast lookup. More precisely, each row consisting of a key and an item. Given the key, the hash table will yield the associated item. The hash table data structure uses randomness to quickly find (in constant time) the key and its associated item.
The first-tier hash table contains the nine-digit ZIP codes in different tax rate regions than the corresponding five-digit ZIP code. For example, suppose most of the 94301 postal code region is taxed at 8.25% but the 94301-2301 postal code region is taxed at 4.56%. The nine-digit hash table will have a 943012301 entry of 4.56% while the five-digit hash table will have a 94301 entry of 8.25%. Associated with each nine-digit ZIP code is either a tax rate and jurisdiction or an indication of failure. If the lookup fails, we should use the second-tier hash table.
The second-tier hash table contains the five-digit ZIP codes with differing tax rates than their parent three-digit ZIP code tax rates. Associated with each five-digit ZIP code is either a tax rate and jurisdiction. If the lookup fails, we should use the third-tier hash table.
The third-tier hash table contains three-digit ZIP codes. Associated with each three-digit ZIP code is either a tax rate and jurisdiction. If the lookup fails, the server should indicate it could not compute the sales tax.
Three required data types are ZIP, taxRate, and taxJurisdiction. A ZIP is an opaque type stored in an address object. The address object should support
The hash tables need only consider taxRate and taxJurisdiction types as opaque types, i.e., no operations are necessary. Operations will be described below.
We omit discussion of storage of the ZIP code, tax rate, and tax jurisdiction information and creation of the hash tables.
Given a tax rate, a tax jurisdiction, and a list of items, the most difficult step to compute sales tax is to determine the taxability of each item. A less important step is rounding the sales tax according to the jurisdiction's rules. We will first concentrate on determining taxability.
Conceptually, item taxability can be stored in a table having one row per item and as many columns as tax jurisdictions. Unfortunately, the table's size is prohibitively large. We conjecture that there are about one hundred million different items sold in the United States each year5 and at least several hundred different tax jurisdictions so the table's size would be at least several gigabytes, larger than the largest possible virtual memory on many of today' 32-bit computers. A realistic upper bound is one billion items and ten thousand tax jurisdictions, requiring on the order of ten trillion bits of information. This estimate is based on a comment by Dennis Epley of the UC Council [UCC99]. He said most UCC members have less than one thousand items. Since the original UPC scheme had a six-digit company field, the upper bound of 109follows.6 This much information would not fit even in the storage devices of today's large engineering workstations.
We reduce the table's size by assuming that most tax jurisdictions will have the same taxation rules for a given item. That is, for any particular item, we store a default value of either taxed or not taxed and a list of tax jurisdictions with different values. At the very least the table's size will still be several hundred million bytes because it must store at least ten million items, each identified by at least a twelve-byte number and having a one-byte taxability status. In practice, we expect the table to be one to two orders of magnitude larger.
Because of the table's size, we will store the information as a set of files, rather than in memory or in one huge file. Each file identified by a six-digit number will contain taxability information of all items beginning with those six digits. This corresponds nicely with the UPC encoding of a company as its first six digits. Using a set of files permits easier addition and removal of entries and relatively quick search for individual items. A production-quality system might try to store the contents of most frequently used files in main memory and might also explore tradeoffs between the number of files and file sizes.
As for the address format, each item's taxability entry is surrounded by matching left and right parentheses. Whitespace is unimportant except to separate terms. For example, an item's taxability can be specified as
We will implement a function that, given a tax jurisdiction and an item, returns whether the item is taxed, not taxed, is unknown whether taxed, or is an unknown item.
Given a function determining the taxability of any particular item, we compute sales tax on a list of items by summing together the total cost of all taxable items, multiplying by the sales tax rate, and rounding to the nearest cent.
Although other algorithms are possible, we assume that all tax jurisdiction explicitly or implicitly require this algorithm. Even if this assumption is not true, we conjecture that the other algorithms have similar complexity and can easily be implemented in a production-quality server.
Few tax jurisdictions publish clear guidelines for rounding sales tax amounts. We conjecture that most jurisdiction require any amount greater than or equal to half a cent to be rounded up to one cent and any amount strictly less than half a cent to be rounded down. Some other plausible rounding schemes include
To accommodate these rounding concerns, the taxJurisdiction object could have a private rounding function. Taking an amount representing cents, the function would return the rounded number of cents as an Cost. It could also have a member function returning the sales tax amount given a taxRate object and the cost of taxable items.
Even if all of our findings are implemented by legislatures and trusted third parties (TTP), retailers would still bear costs to deal with a sales tax system.
First, retailers' records would need to be audited to ensure they submitting all taxable transactions to a trusted third party for sales tax collection. This auditing burden is likely to be comparable to other auditing burden retailers already face.
Second, for trusted third parties to be as efficient as possible, these parties should be independent companies, not government agencies. These independent companies would compete to provide good service at reasonable prices with retailers paying for the services. Thus, the cost of complying with sales taxes would still fall on retailers, not the government. Even though governments would not have a direct incentive to reduce the cost of complying with sales tax laws, the service costs of trusted third parties, set by market forces, would clearly indicate these compliance costs. Thus, governments would be able to measure the tradeoff between increasing sales tax complexity to increase revenue and the resulting increase in revenue.
Third, a trusted third party must be trusted by its customers, contracting retailers. As part of its business, a trusted third party would have access to some or all of a retailer's sales information. Dishonest trusted third parties could sell this sales information to competitors or to suppliers. Concerned retailers could either contract with several trusted third parties so that no one TTP has access to all of its sales information or operate its own sales tax division. If sales tax computation and collection is performed by independent companies rather than government agencies, a market in tax computation and collection software could develop so concerned companies could have their own divisions.
Fourth, using trusted third parties raises legal liability issues. If a TTP computes the incorrect sales tax or fails to remit all collected sales tax, is the contracting retailer or the TTP responsible? Thus, legislatures must not only simplify the sales tax system but also establish a legal framework for TTPs.