Techniques — Database basics
|
Back to techniques
https://rcardinal.ddns.net/techniques/linuxnet.html
|
Based on Ubuntu 10.04 (a variety of Debian Linux).
sudo apt-get install mysql-server mysql-client mysql-navigator mysql-admin
It'll ask for a root password during installation.
Configuration is in /etc/mysql/my.cnf
(and /etc/mysql/conf.d/
). Logs are in /var/log/mysql/
. Data is by default in /var/lib/mysql/
(so a database called "xxx" becomes a directory of that name there, probably with data elsewhere in /var/lib/mysql/
too).
ssh server -L 3306:localhost:3306
, and leave the SSH client running once logged in)...hostname=localhost
, port=3306
, username
=databaseuser, password
=databasepassword.
hostname
=yourserver, port
=22, username
=unixuser, method
=password
, password
=unixpassword).Logging in and saying hello:
# log in, prompting for a password (you don't need to specify the --host or --port options for a local login with default settings) # Also: -u and -p are shorthand for --user and --password: mysql --host=127.0.0.1 --port=3306 --user=root --password
# Now, let's enter some SQL commands: SHOW DATABASES; USE database; SHOW TABLES; SHOW CREATE TABLE table; SELECT * FROM table; # etc... help help contents status quit
Creating a database and changing the root password:
CREATE DATABASE newdb; GRANT ALL ON newdb.* TO root@'myhostname' IDENTIFIED BY 'newpassword';
To dump all databases:
mysqldump -u username -ppassword --all-databases > backupfile.sql
To dump one database:
mysqldump -u username -ppassword databasename > backupfile.sql
As above, can use just -p
for a live prompt for the password.
To import from a multi-database dump (which will contain CREATE DATABASE commands):
mysql -u username -p[password] < backupfile.sql
To import from a single-database dump (which will not):
mysql -u username -p[password] databasename < backupfile.sql
mysqldump -u username -ppassword db1name > db.sql echo "CREATE DATABASE db2name;" | mysql -u username -ppassword mysql -u username -ppassword db2name < db.sql
See SQLZoo.net, an excellent interactive SQL tutorial.
In MySQL, you can add user-defined functions, including user-defined aggregate functions (aggregate UDFs), such as MEDIAN(). See http://mysql-udf.sourceforge.net/; http://sourceforge.net/projects/mysql-udf/forums/forum/390115/topic/3698697; http://www.mooreds.com/wordpress/archives/376.
Specifically, that MEDIAN() one... download udf_median.cc. Then do this:
gcc -I /usr/include/mysql -I /usr/include -o udf_median.o -c udf_median.cc # this will need the MySQL header files... on Ubuntu, probably from package "libmysqlclient-dev" ld -shared -o udf_median.so udf_median.o sudo cp udf_median.so /usr/lib/mysql/plugin/ mysql [options]
then im MySQL:
CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
If you need to drop it:
DROP FUNCTION median;
Now you can use it:
SELECT MEDIAN(field) FROM TABLE... ;
The most supported ODBC standard on Linux, it seems.
# install unixodbc sudo apt-get install unixodbc # print config information odbcinst -j # query the drivers installed odbcinst -q -d # query the data sources available odbcinst -q -s
# http://www.unixodbc.org/unixODBCsetup.html sudo apt-get install libmyodbc sudo ODBCConfig & Drivers > Add Name = myodbc Description = MySQL ODBC (myodbc) driver Driver = /usr/lib/odbc/libmyodbc.so Driver64 = Setup = /usr/lib/odbc/libodbcmyS.so Setup64 = UsageCount = [... leave... is 1] CPTimeout = CPReuse =
... which writes this stuff to /etc/odbcinst.ini
Now a specific data source:
ODBCConfig & User DSN > Add > pick the "myodbc" driver specify server (e.g. 127.0.0.1), port (e.g. 3306), database - plus a short name and description
ODBCConfig
writes to ~/.odbc.ini
— or sudo
it for system-wide DSNs, in /etc/odbc.ini
.
Specifying a source manually in these files:
[tim1] Description = Tim Composite Exp1 Driver = myodbc Server = 127.0.0.1 Port = 3306 Database = junk1 User = XXX_MYSQLUSER Password = XXX_MYSQLPASSWORD
This is not a terribly advanced connection. Don't use it for anything serious; convert the .MDB to another database instead (see below).
First, add the driver:
# http://mdbtools.sourceforge.net/ sudo apt-get install mdbtools libmdbtools libmdbodbc sudo ODBCConfig & Drivers > Add Name = mdb Description = MDB Tools (Microsoft Access format) ODBC driver Driver = /usr/lib/libmdbodbc.so.0 Driver64 = Setup = Setup64 = UsageCount = [... leave... is 1] CPTimeout = CPReuse =
Now a specific data source. ODBCConfig doesn't like it (as there's no Setup library; it says "cannot construct a property list"). So edit ~/.odbc.ini
(or /etc/odbc.ini
for a system-wide data source), e.g.
[tim1] Description = Tim Composite Exp1 Driver = mdb Database = /home/rudolf/tmp/Tim_Composite_Exp1.mdb
sudo perl -MCPAN -e shell
; press Enter for all the defaults and pick a nearby CPAN mirror; the first time round (only) use o conf commit
to write the config file to disk, as suggested; then e.g. install DBI
to install the DBI module; then q
to quit; or (2) sudo perl -MCPAN -e "install DBI"
to do it all in one step.sudo apt-get install python-mysqldb # how would we find the name of the package that supplies PyODBC? aptitude search pyodbc # now install it sudo apt-get install python-pyodbc
# 1. Connect library(RODBC) channel <- odbcConnect("my_DSN") # specify your DSN here # if you need to specify a username/password, use: # channel <-odbcConnect("mydsn", uid="username", pwd="password") # 2. List all tables sqlTables(channel) # 3. Fetch a whole table into a data frame mydataframe <- sqlFetch(channel, "my_table_name") # fetch a table from the database in its entirety close(channel) # 4. Fetch the results of a query into a data frame. Example: mydf2 <- sqlQuery(channel, "SELECT * FROM MonkeyCantab_LOOKUP_TaskTypes WHERE TaskType < 6")or talk to MySQL directly: http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf. The native connection native connection is likely to be faster: http://mailman.unixodbc.org/pipermail/unixodbc-support/2005-March/000507.html.
#!/usr/bin/python2.7 # Converts a .MDB file (Microsoft Access) database to MySQL, copying structure and data. # Uses the mdb-tools package. # Under ubuntu, the following packages are required: # sudo apt-get install python2.7 mdbtools mysql-server mysql-client mysql-admin # though you may also want these, if you're planning to use Python/ODBC with MySQL: # sudo apt-get install mysql-navigator libmdbtools libmdbodbc unixodbc python-mysqldb python-pyodbc # What works: # schema copied # data copied # What doesn't work: # indexes are not described by mdb-schema, so these must be recreated manually # relationships are not supported by mdb-schema, so these must be recreated manually # We'll do this with calls to other command-line tools. # See http://nialldonegan.me/2007/03/10/converting-microsoft-access-mdb-into-csv-or-mysql-in-linux/ # REVISED 1 Jan 2013: mdb-schema syntax has changed (-S option gone) # SEE https://github.com/brianb/mdbtools import sys, getpass, shlex, subprocess, re, os def raw_default(prompt, dflt=None): prompt = "%s [%s]: " % (prompt, dflt) res = raw_input(prompt) if not res and dflt: return dflt return res def get_external_command_output(command): args = shlex.split(command) ret = subprocess.check_output(args) # this needs Python 2.7 or higher return ret def get_pipe_series_output(commands, stdinput=None): # Python arrays indexes are zero-based, i.e. an array is indexed from 0 to len(array)-1. # The range/xrange commands, by default, start at 0 and go to one less than the maximum specified. # print commands processes = [] for i in xrange(len(commands)): if (i==0): # first processes processes.append( subprocess.Popen( shlex.split(commands[i]), stdin=subprocess.PIPE, stdout=subprocess.PIPE) ) else: # subsequent ones processes.append( subprocess.Popen( shlex.split(commands[i]), stdin=processes[i-1].stdout, stdout=subprocess.PIPE) ) return processes[ len(processes)-1 ].communicate(stdinput)[0] # communicate() returns a tuple; 0=stdout, 1=stderr; so this returns stdout def replace_type_in_sql(sql, fromstr, tostr): whitespaceregroup = "([\ \t\n]+)" whitespaceorcommaregroup = "([\ \t\),\n]+)" rg1 = "\g<1>" rg2 = "\g<2>" return re.sub(whitespaceregroup + fromstr + whitespaceorcommaregroup, rg1 + tostr + rg2, sql, 0, re.MULTILINE | re.IGNORECASE) if len(sys.argv) != 2: # the program name is one of these sys.exit("Syntax: convert_mdb_to_mysql.py mdbfile") mdbfile = sys.argv[1] tempfile = raw_default("Temporary filename", "TEMP.txt") host = raw_default("MySQL hostname", "127.0.0.1") # not "localhost" port = raw_default("MySQL port number", 3306) user = raw_default("MySQL username", "root") password = getpass.getpass("MySQL password: ") mysqldb = raw_input("MySQL database to create: ") print "Getting list of tables" tablecmd = "mdb-tables -1 "+mdbfile # -1: one per line (or table names with spaces will cause confusion) tables = get_external_command_output(tablecmd).splitlines() print tables print "Creating new database" createmysqldbcmd = "mysqladmin create %s --host=%s --port=%s --user=%s --password=%s" % (mysqldb, host, port, user, password) # we could omit the actual password and the user would be prompted, but we need to send it this way later (see below), so this is not a huge additional security weakness! # Linux/MySQL helpfully obscures the password in the "ps" list. print get_external_command_output(createmysqldbcmd) print "Shipping table definitions (sanitized), converted to MySQL types, through some syntax filters, to MySQL" schemacmd="mdb-schema "+mdbfile+" mysql" # JAN 2013: Since my previous script, mdb-schema's mysql dialect has got much better. # Now convert the oddities that emerge: schemasyntax = get_external_command_output(schemacmd) # The following presupposes that no fields actually have these names (which are reserved, so they shouldn't!). # Access data types: http://www.databasedev.co.uk/fields_datatypes.html # An Access "Long Integer" is 4 bytes. # There's no Access 8-byte integer (which is a BIGINT under MySQL: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html ). #schemasyntax = replace_type_in_sql(schemasyntax, "Text", "VARCHAR") #schemasyntax = replace_type_in_sql(schemasyntax, "Byte", "INT") #schemasyntax = replace_type_in_sql(schemasyntax, "Long Integer", "INT") #schemasyntax = replace_type_in_sql(schemasyntax, "Integer", "INT") # put this after "Long Integer" #schemasyntax = replace_type_in_sql(schemasyntax, "Single", "FLOAT") #schemasyntax = replace_type_in_sql(schemasyntax, "Double", "FLOAT") #schemasyntax = replace_type_in_sql(schemasyntax, "Replication ID", "NUMERIC (16)") #schemasyntax = replace_type_in_sql(schemasyntax, "DateTime \(Short\)", "DATETIME") #schemasyntax = replace_type_in_sql(schemasyntax, "Currency", "FLOAT") #schemasyntax = replace_type_in_sql(schemasyntax, "Boolean", "BOOLEAN") # MySQL: BOOLEAN is a synonym for TINYINT(1) #schemasyntax = replace_type_in_sql(schemasyntax, "OLE", "VARCHAR") #schemasyntax = replace_type_in_sql(schemasyntax, "Memo/Hyperlink", "TEXT") #schemasyntax = re.sub("^--", "#", schemasyntax, 0, re.MULTILINE) # mdb-schema uses "---------" for some of its comment lines; # MySQL only permits "-- " (with a space) as the start of a comment, or "#": http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-comments.html . # one of many alternatives as a pipe filter would be: # "perl -pe 's/^--/#/'" #schemasyntax = re.sub("^DROP.*$", "", schemasyntax, 0, re.MULTILINE) # we're creating a new database, so we don't need DROP statements, and they add to danger if the user specifies an existing database # an alternative as a pipe filter would be: # "grep -v '^DROP'" #schemasyntax = re.sub("([\ \t]+)group([\ \t\),]+)", "\g<1>_group\g<2>", schemasyntax, 0, re.MULTILINE | re.IGNORECASE) # Access allows "Group" as a table/field name; MySQL doesn't. See # sed regular expressions are mostly documented in "man grep" # perl regular expressions: see http://www.troubleshooters.com/codecorn/littperl/perlreg.htm # ... or "sudo apt-get install perl-doc" then "man perl", "perldoc perlretut", and "perldoc perlrequick" # ... obviously, doing these replacements in python would also be an option! # This filter replaces "group" with "_group" in all relevant output from mdb-schema and mdb-export # Use \1, \2... within the SAME regexp; in a replace expression, use $1, $2... # In Python, best syntax for backreferences in replacement text is \g<1>, \g<2> and so on. # (In Perl, \1 and \2 are backreferences within the same regexp, while $1, $2 are used in replace operations.) # Alternative as a pipe filter would be: # "perl -pe 's/([\ \t]+)group([\ \t\),]+)/$1_group$2/gi'" # ... no, correctly quoted as `group`, it's fine # "COMMENT ON COLUMN" produced by mdb-schema and rejected by MySQL: schemasyntax = re.sub("^COMMENT ON COLUMN.*$", "", schemasyntax, 0, re.MULTILINE) print "-----------------" print schemasyntax print "-----------------" mysqlcmd = "mysql --host=%s --port=%s --database=%s --user=%s --password=%s" % (host, port, mysqldb, user, password) # regrettably we need the password here, as stdin will be coming from a pipe # print schemasyntax print get_pipe_series_output( [mysqlcmd], schemasyntax ) # For the data, we won't store the intermediate stuff in Python's memory, 'cos it's vast; I had one odd single-character mutation # from "TimeInSession_ms" to "TimeInSession_mc" at row 326444 (perhaps therefore 37Mb or so into a long string). # And I was trying to export ~1m records in that table alone. # We'll use pipes instead and let the OS deal with the memory management. # ... BUT (Jan 2013): now mdb-tools is better, text-processing not necessary - can use temporary disk file # Turns out the bottleneck is the import to MySQL, not the export from MDB. So see http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-bulk-data-loading.html # The massive improvement is by disabling autocommit. (Example source database is 208M; largest table here is 554M as a textfile; it has 1,686,075 rows.) # This improvement was from 20 Hz to the whole database in a couple of minutes (~13 kHz). # Subsequent export from MySQL: takes a second or two to write whole DB (177M textfile). print "Copying data to MySQL" #semicolonfilter = "sed -e 's/)$/)\;/'" #groupfilter = "perl -pe 's/([\ \t]+)group([\ \t\),]+)/$1_group$2/gi'" for t in tables: print "Processing table", t #exportcmd = "mdb-export -I mysql -D \"%Y-%m-%d %H:%M:%S\" " + mdbfile + " " + t # -I backend: INSERT statements, not CSV # -D: date format # MySQL's DATETIME field has this format: "YYYY-MM-DD HH:mm:SS" # so we want this from the export #print get_pipe_series_output( [exportcmd, semicolonfilter, groupfilter, mysqlcmd] ) #print get_pipe_series_output( [exportcmd, mysqlcmd] ) os.system('echo "SET autocommit=0;" > ' + tempfile) exportcmd = 'mdb-export -I mysql -D "%Y-%m-%d %H:%M:%S" ' + mdbfile + ' "' + t + '" >> ' + tempfile os.system(exportcmd) os.system('echo "COMMIT;" >> ' + tempfile) importcmd = mysqlcmd + " < " + tempfile os.system(importcmd) print "Finished."